How AI will soon be able to generate synthetic voices speaking in any language



AI companies are developing methods to translate and synthesize voices in advertisements, movies, and television.

Why is this important: Advances in text-to-speech could help correct bad movie dubbing – and they come as international content becomes increasingly important to studios and streaming platforms as part of the globalization of entertainment.

  • But they raise concerns about the possibility of tampering with the audio, as well as how a celebrity’s voice could be used after their death.

What is happening: Foreign language hits like “Squid Game” and “La Casa de Papel” are attract record audiences, but subtitles remain a stumbling block for studios trying to tap into a growing international market.

  • More Netflix Subscribers Watched dubbed versions of “Squid Game” as the subtitled versions.
  • With blockbusters consuming a lot of bandwidth, small producers of foreign language content are to find difficult to find enough translators and voice-over actors to meet the demand.
  • “We’re still stuck in the mindset of the one-to-many delivery model,” says Ryan Steelberg, co-founder and president of AI company Veritone.

Between the lines: Veritone has developed a product called which allows content producers to generate and authorize what he calls “hyperrealistic” synthetic voices.

  • This means, for example, that a podcast creator could have an audio copy of the ad translated to another language, and then would generate a synthetic version of their voice by playing the ad in the new language.
  • “This gives you the ability to hyper-customize audio on a much larger scale and at a lower cost,” says Steelberg.

How it works: Text-to-speech technology has been around for decades, but Veritone’s product uses “speech to speech”, what Steelberg calls “voice as a service”.

  • Veritone accesses petabytes of data from media libraries and uses it to form its AI product, creating a synthetic version of the original voice that can be tailored for different types of feelings or emotions, or with translation, speak a foreign language.
  • “It will no longer be the new voice of another person speaking on behalf of, say, Tom Cruise,” Steelberg said. “It will really be Tom Cruise’s voice speaking another language.”
  • Nvidia has developed a technology this would allow the AI ​​to modify the video or animation in a way that takes an actor’s lips and facial expression and associates them with the new language – so no more out-of-sync dubbing like in 1970s kung fu movies.

And after: This technology will likely be used in advertisements first, but as it migrates to higher quality content, it will open up potential opportunities and pitfalls for celebrities.

  • “In terms of dubbing and post-production, synthetic vocals will become mainstream, and you’ll find that is part of the talent deals,” says Steelberg.
  • This will not only be to ensure that Hollywood stars (and their agents) get a cut for any use of their synthesized voices, but also to prevent those voices from being misused for malicious purposes as technology becomes more accessible.

What to watch: How the voices and other creative attributes of deceased celebrities could be harnessed by AI.

  • Holograms of dead musicians like Frank Zappa are already used to present “live” shows that have brought tens of millions of turnover, while Kenny G recently released a “duet” with the great jazzman Stan Getz, who died 30 years ago.
  • Sample notes from Getz’s existing library were used to generate a new synthetic melody, even though jazz writer Ted Gioia called a “Frankenstein record”.

The bottom line: Soon we should get used to hearing celebrities speaking in almost any language – and these celebrities should get used to combing their wills.



Comments are closed.