OpenAI shares early test results of a feature that can read words aloud with a convincing human voice – highlighting a new frontier for artificial intelligence and raising the specter of deep false risks. The company is sharing early demos and use cases of a small preview of the text-to-speech model called Voice Engine, which it has shared with about 10 developers so far, a spokesperson said. OpenAI has backed off from rolling out the feature more widely, which it told reporters earlier this month.

An OpenAI spokesperson said the company decided to scale back the release after receiving feedback from stakeholders such as policymakers, industry experts, educators and creatives. The company originally planned to roll out the tool to up to 100 developers through an application process, according to an earlier press briefing.

“We recognize that generating speech that resembles people’s voices poses serious risks that are particularly important in an election year,” the company wrote in a blog post on Friday. “We’re engaging with US and international partners from government, media, entertainment, education, civil society and beyond to ensure we’re incorporating their feedback as we build.”

Another AI technology has already been used to fake voices in some contexts. In January, a fake but realistic-sounding phone call purported to be from President Joe Biden encouraged people in New Hampshire not to vote in the primary, an event that fueled AI fears ahead of the critical global election.

Unlike previous efforts by OpenAI to generate audio content, the Voice Engine can create speech that sounds like individual people, along with their specific rhythm and intonations. All the software needs is 15 seconds of recorded audio of a person speaking to recreate their voice.

During a demonstration of the tool, Bloomberg listened to a clip of OpenAI CEO Sam Altman briefly explaining the technology in a voice that sounded indistinguishable from his real speech but was entirely AI-generated.

“If you have the right audio setup, it’s basically a human-caliber voice,” said Jeff Harris, product lead at OpenAI. “That’s a pretty impressive technical feat.” Still, Harris said, “There’s obviously a lot of delicacy about safety around being able to really accurately mimic human speech.”

One of OpenAI’s current developer partners using the tool, the Norman Prince Neurosciences Institute at the nonprofit Lifespan Health System, is using the technology to help patients restore their voices. For example, the tool was used to restore the voice of a young patient who had lost her ability to speak clearly due to a brain tumor by reproducing her speech from an earlier recording for a school project, the company’s blog post said.

OpenAI’s custom language model can also translate the audio it generates into different languages. This makes it useful for companies in the audio business, such as Spotify Technology SA. Spotify is already using the technology in its own pilot program to translate podcasts by popular hosts like Lex Fridman. OpenAI is also touting other useful uses for the technology, such as creating a wider range of voices for educational content for children.

In the testing program, OpenAI requires its partners to agree to its usage policies, obtain consent from the original speaker before using their voice, and disclose to listeners that the voices they hear are AI-generated. The company also installed an inaudible audio watermark to be able to distinguish whether some of the audio was created by its tool.

Before deciding whether to roll out the feature more widely, OpenAI said it wanted feedback from outside experts. “It’s important that people around the world understand where this technology is headed, whether or not we end up adopting it widely ourselves,” the company said in the blog post.

OpenAI also wrote that it hopes the preview of its software “motivates the need to strengthen societal resilience” against the challenges posed by more advanced AI technologies. For example, the company called on banks to phase out voice authentication as a security measure for accessing bank accounts and sensitive information. It also seeks public education about fraudulent AI content and more development of techniques to detect whether audio content is real or AI-generated.

© 2024 Bloomberg LP


(This story has not been edited by NDTV staff and is automatically generated by a syndicated channel.)

Affiliate links may be automatically generated – see our ethics statement for details.

https://www.gadgets360.com/ai/news/openai-preview-audio-tool-read-text-mimic-voices-5339806#rss-gadgets-all