These startups fight deepfakes by making deepfakes

I was unsure if my parents would notice that the voice on the other end wasn’t mine — or that it was mine, sort of, but it wasn’t me. The voice said hello, asked my dad how he was doing, and asked again when he didn’t respond quickly enough. “What is that, Gaby?” He realized something was wrong almost immediately. I explained I had tried to trick him and it clearly hadn’t worked. “It didn’t,” he said. “It sounded like a robot.”

It wasn’t a perfect experiment. My parents were out of the country, which made for a shoddy connection. They were having lunch with friends, and the voice couldn’t deal with crosstalk or delays in the audio — it tried to fill the silences. And most importantly, the voice sounded human, but it didn’t sound like me.

The voice was generated by the deepfake detection company Reality Defender. The problem of manipulated media isn’t new, but the advent of consumer-grade AI tools has made the creation of fake audio, video, and images essentially frictionless, and a number of companies have sprung up in recent years to combat it. Reality Defender, Pindrop, and GetReal are part of a rapidly growing deepfake detection cottage industry valued at an estimated $5.5 billion as of 2023. These startups use machine learning to identify manipulated media. To fight deepfakes, you have to be able to make them.

The term “deepfake” refers to a specific type of manipulated media that has been generated with “deep” learning, but aside from the way they’re made, there is no one commonality that unites all deepfakes. They have been used for fraud, harassment, and memes. Tools like Grok AI have led to a proliferation of nonconsensual sexual deepfakes, including child sexual abuse material. Scammers have cloned people’s voices, called their relatives, and had the voice say they’re being held for ransom. During the 2024 election, a political strategist and a magician teamed up to create a deepfake of former President Joe Biden, which they used to discourage registered Democrats in New Hampshire from voting in the state’s primary. The head of the Senate Foreign Relations Committee took a Zoom call from someone using AI to pose as a Ukrainian official. At the corporate level, deepfake fraud is now “industrial,” according to one study.

The deepfake detection industry primarily exists to address one of these problems: the issue of corporate fraud.

Reality Defender is effectively training AI to combat AI. The company uses an “inference-based model” to detect deepfakes, CTO Alex Lisle told me. “Our foundational model uses something called a student/teacher paradigm. We take a bunch of real things and say, ‘These are real,’ and then a bunch of fake things and say ‘This is fake.’”

For the fake me, we spent some time fine-tuning the voice: fiddling with the consistency, stability, and tone to make it sound more like the actual me. We could only do so much. There isn’t much publicly available footage of me speaking Spanish — the language I use to communicate with my parents — aside from a single podcast interview from 2021, most of which is unusable because there’s music in the background. But with nine seconds of audio and data scraped from years of posts, we managed to cobble together a somewhat convincing AI agent that was able to carry on a conversation with my parents, albeit an impersonal one. The English model we used on my brother was better, because we had much more training data, but even then it wasn’t convincing enough.

But family is the toughest test.

“They know what your voice sounds like,” Scott Steinhardt, the head of communications at Reality Defender, told me. Steinhardt made the deepfake with my consent and tinkered with it until it more or less sounded like me. It might not fool my family, but it’d probably be good enough for, say, colleagues or corporate entities like banks.

We’ve gone the last 40,000-odd years believing our ears and eyesight, but now we can’t

To be effective, these tools have to work quickly. Generative AI is rather slow. The model we used to call my parents sacrificed quality for speed. To get the voice to respond quickly, we had to accept lower quality all around. Text-to-speech was far better, but it took longer to generate. When we had the voice read Lucky’s monologue from Waiting for Godot, it sounded almost exactly like me.

“As a person, it’s pretty challenging to not be deepfaked,” Nicholas Holland, the chief product officer at Pindrop, told me. “I think that the challenge of ‘How do I protect my personal identity?’ is something that the world hasn’t figured out yet. I think ‘How do my institutions know it’s me?’ is where different institutions are implementing different security layers.”

It’s also a question of resources. I don’t have the funds to hire a deepfake detection company to screen my calls, but my bank does — and my bank has more to lose, in absolute terms if not relative ones. One 2024 survey found that businesses have lost $450,000 per deepfake incident, with more than one firm having lost upwards of $1 million in a single fraudulent transaction.

Some of these cases have involved scammers posing as executives, calling their subordinates, and asking them to transfer large sums of money to their accounts. Before I logged in to the call with Holland, I got a pop-up notification on Zoom:

This meeting is being analyzed. Pindrop Security and its third-party providers record the audio and video of your meeting to determine whether you’re a real person and/or the right person. By clicking ‘Agree’ below, you consent to Pindrop’s collection, use and storage of the meeting and audio, your voice and face scans (which may be considered biometric information), and your IP address (to further determine your state, province or country) for the above purposes.

My face, voice, and IP address, they assured me, would be retained for no longer than 90 days.

Holland told me that companies are now being inundated with fake job applicants — ironically, even at Pindrop. “We’re seeing a range of it. We’re seeing where people are actually doing the job, maybe they work in the IT department,” Holland said. “We’ve had customers who have had somebody get hired, but then that person has made referrals. They’ve hired two other people and it turns out to be the same person hired three times using three different voices, three different faces, three different Slack identities.”

Typically, these aren’t entirely AI-generated video personas; they’re people using deepfake technology to change their own features, almost like a digital mask. There used to be a trick for detecting this: asking the person to hold three fingers in front of their face.

“That doesn’t work at all now. The AI models are so good that they can absolutely create hands, you can put hands in front of your face,” Holland said. “It’s basically imperceptible with your eyes now.”

Lisle from Reality Defender told me that as the technology improves, attacks become less high-effort. Where scammers would once impersonate a single executive, they’re now targeting employees at all levels of a company. He told me of a recent attack on a publicly traded company that he declined to name, in which the fraudster went to LinkedIn, pulled the name of every current employee, and then scraped TikTok and Facebook to create a “pool of information” and get a voiceprint for each of these people. Their information and voiceprints were put into an LLM, which built a context window and a map, and then “scattershotted the entire company” calling employees at all levels.

“In cybersecurity, we talk about these things called ‘trust boundaries,’” Lisle said. “The problem with deepfakes is that there’s always this implicit trust boundary, which is seeing and hearing is believing. We’ve gone the last 40,000-odd years believing our ears and eyesight, but now we can’t. There are all these trust boundaries we’ve never had to think about before that hackers are leveraging in interesting ways.”

For now, this software is only aimed at big companies — they have the need, the high stakes, and the deep pockets to pay for it. But regular people don’t have deepfake detection software, nor will they in the near future. As Holland explains it, the biggest challenge to mass adoption is awareness, since “many consumers aren’t aware of the threat, so they don’t know how to go find a solution — ground zero is with the businesses that serve the consumer.” Pindrop doesn’t have a consumer product yet, but it hasn’t ruled out developing one in the future. The challenge, Holland said, is “making these systems fast, accurate, and trustworthy enough for people to rely on in everyday moments.”

Reality Defender has a different perspective. Steinhardt said a consumer product would create “an uneven and spotty playing field for people.”

“Think of it as antivirus: Whereas this used to be a thing individual people worried about (or, worse, didn’t), now our browsers, email providers, internet providers, and the like are all scanning files before they hit our computer for malware,” Steinhardt said. “This is our approach to deepfake detection.”

My deepfake hadn’t been able to trick my family, but I hadn’t really put it to the test. For years, law enforcement agencies across the country have warned of a deepfake kidnapping scam: A parent will get a call from a very convincing voice begging for help, and then the “kidnapper” will demand a ransom. Even if the voice isn’t entirely convincing, the crying and screaming is. I couldn’t bring myself to do that to my parents, even if it was fake. I briefly considered other scams: I could call my bank, or maybe my health insurance provider, but the idea of locking myself out of my own accounts — or of committing actual, legitimate fraud — made me sour on the experiment. Instead, I called my brother. “Oh, NO,” he said as soon as the voice greeted him. He hadn’t been fooled either.

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.


Source link