OpenAI Unveils Voice Engine: Responsible Voice Cloning Technology
As the world grapples with the rapid spread of deepfakes, OpenAI is taking a measured approach to refining voice cloning technology. The company has introduced Voice Engine, an extension of its existing text-to-speech API, with a focus on responsible deployment.
A Model in Plain Sight
The generative AI model behind Voice Engine has been quietly powering various applications, including the voice and “read aloud” features in ChatGPT and the preset voices in OpenAI’s text-to-speech API. Even Spotify has been utilizing the model since September to dub podcasts for well-known hosts like Lex Fridman in different languages.
Training Data Confidentiality
OpenAI remains tight-lipped about the specifics of the training data used for the Voice Engine model, only revealing that it was trained on a combination of licensed and publicly available data. This secrecy is not uncommon among generative AI vendors, who view training data as a competitive advantage and a potential source of IP-related legal issues.
“We want to make sure that everyone feels good about how it’s being deployed — that we understand the landscape of where this tech is dangerous and we have mitigations in place for that,” Jeff Harris, a member of the product staff at OpenAI, told The Zero Byte in an interview.
Synthesizing Voice without User Data
Interestingly, Voice Engine does not rely on user data for training or fine-tuning. The model generates speech by simultaneously analyzing the provided speech data and the text meant to be read aloud, creating a matching voice without the need for a custom model for each speaker.
Navigating the Landscape of Voice Cloning
While voice cloning technology is not new, with numerous startups and Big Tech companies offering similar products, OpenAI is taking a cautious approach to its release. The company has not yet announced a date for public availability, allowing time to assess and mitigate potential misuse of the technology.
As the AI landscape continues to evolve, OpenAI’s responsible approach to voice cloning technology serves as an example of how companies can balance innovation with ethical considerations.
Microsoft Launches Voice Cloning Tool, Raising Ethical Concerns
OpenAI’s Voice Engine Offers Affordable Speech Synthesis
In a significant development, OpenAI, backed by Microsoft, has introduced Voice Engine, a voice cloning tool that generates high-quality synthetic speech at a fraction of the cost of traditional voice actors. The tool, priced at $15 per million characters, translates to around $1 per hour of audio, undercutting the rates of popular competitors like ElevenLabs.
Implications for the Voice Acting Industry
The advent of affordable AI-generated speech poses a potential threat to the livelihoods of voice actors, whose salaries range from $12 to $79 per hour on ZipRecruiter. As clients increasingly request voice actors to sign away rights to their voices for AI-generated versions, the industry faces the risk of commoditization, particularly in the entry-level market.
Balancing Innovation and Ethical Concerns
Some AI voice platforms are attempting to strike a balance between technological advancement and fair compensation for voice actors. Replica Studios signed a deal with SAG-AFTRA to create and license copies of union members’ voices, while ElevenLabs hosts a marketplace that compensates original voice creators when their synthetic voices are used by others.
OpenAI, however, has not established similar arrangements and only requires users to obtain consent from the individuals whose voices are cloned, make clear disclosures about AI-generated voices, and refrain from using the voices of minors, deceased people, or political figures.
“How this intersects with the voice actor economy is something that we’re watching closely and really curious about,” Harris said. “I think that there’s going to be a lot of opportunity to sort of scale your reach as a voice actor through this kind of technology. But this is all stuff that we’re going to learn as people actually deploy and play with the tech a little bit.”
Addressing the Potential for Misuse
Voice cloning apps have been abused in the past, with instances of hateful messages mimicking celebrities on 4chan and convincing voice clones fooling bank authentication systems. There are also concerns about the potential use of voice cloning to sway elections, as evidenced by a recent phone campaign employing a deepfaked President Biden to deter voters in New Hampshire.
To mitigate the risk of misuse, OpenAI is initially making Voice Engine available to a small group of developers, prioritizing low-risk and socially beneficial use cases in healthcare, accessibility, and responsible synthetic media. Early adopters include Age of Learning, HeyGen, Livox, Lifespan, and Dimagi, which are using the tool for educational content, storytelling, and assisting individuals with speech impairments.
Here’s generated voices from Lifespan:
And here’s one from Livox:
OpenAI Unveils Voice Engine: Transforming Text to Speech with Unparalleled Realism
OpenAI, the pioneering artificial intelligence research laboratory, has taken a significant stride forward in the realm of text-to-speech technology with the introduction of Voice Engine. This groundbreaking tool empowers developers to generate stunningly realistic voice recordings from mere text input, marking a new era in the field of AI-powered speech synthesis.
Harnessing the Power of GPT-4 for Unmatched Vocal Realism
At the heart of Voice Engine lies the formidable GPT-4 language model, which has been meticulously fine-tuned on an extensive dataset of speech recordings. By leveraging the linguistic prowess of GPT-4, Voice Engine can generate audio that closely mimics the nuances, intonations, and idiosyncrasies of human speech, resulting in an unparalleled level of realism.
Safeguarding Against Misuse: Inaudible Watermarks and Red Teaming
To address the potential misuse of Voice Engine, OpenAI has implemented a robust watermarking system. Developed in-house, this technique embeds imperceptible identifiers within the generated recordings, allowing OpenAI to trace the origin of any audio clip created using their system. While the specifics of the watermarking technology remain confidential, OpenAI is exploring the possibility of making it publicly available in the future, albeit with careful consideration of the associated risks.
“If there’s an audio clip out there, it’s really easy for us to look at that clip and determine that it was generated by our system and the developer that actually did that generation,” Harris said. “So far, it isn’t open sourced — we have it internally for now. We’re curious about making it publicly available, but obviously, that comes with added risks in terms of exposure and breaking it.”
Furthermore, OpenAI is engaging its red teaming network, a group of contracted experts, to identify and mitigate potential risks associated with Voice Engine. While some experts argue that AI red teaming alone may not be sufficient, OpenAI remains committed to prioritizing safety in the release and development of this technology.
The Future of Voice Engine: Balancing Innovation and Responsibility
As Voice Engine enters its preview phase, OpenAI is carefully evaluating the reception and feedback from developers and the public. The company is exploring additional security measures, such as requiring users to read randomly generated text to prove their presence and awareness of how their voice is being utilized. This proactive approach underscores OpenAI’s dedication to responsible AI development and deployment.
“What’s going to keep pushing us forward in terms of the actual voice matching technology is really going to depend on what we learn from the pilot, the safety issues that are uncovered and the mitigations that we have in place,” he said. “We don’t want people to be confused between artificial voices and actual human voices.”
As Voice Engine continues to evolve, OpenAI remains steadfast in its commitment to striking a delicate balance between pushing the boundaries of AI innovation and ensuring the technology is used ethically and responsibly. With the potential to revolutionize industries ranging from entertainment to education, Voice Engine stands poised to reshape the landscape of human-computer interaction, one realistic voice at a time.
5 Comments
Sounds like science fiction is knocking on our doors, just not letting us in yet!
Voice cloning? That’s the future calling, but it seems we missed the call!
Zara Rodriguez: Voice cloning by OpenAI? Brace yourselves, it’s almost like we’re stepping into a sci-fi novel, just without the page turn yet!
Saffron: Voice cloning’s on the horizon, yet we’re still stuck using our own vocal cords; how quaint!
Imagine, cloning your voice with OpenAI’s tech and accidentally arguing with yourself, talk about an identity crisis waiting to happen!