How AI will soon be able to generate synthetic voices speaking in any language

0


AI companies are developing methods to translate and synthesize voices in advertisements, movies, and television.

Why is this important: Advances in text-to-speech could help correct bad movie dubbing – and they come as international content becomes increasingly important to studios and streaming platforms as part of the globalization of entertainment.

  • But they raise concerns about the possibility of tampering with the audio, as well as how a celebrity’s voice could be used after their death.

What is happening: Foreign language hits like “Squid Game” and “La Casa de Papel” are attract record audiences, but subtitles remain a stumbling block for studios trying to tap into a growing international market.

  • More Netflix Subscribers Watched dubbed versions of “Squid Game” as the subtitled versions.
  • With blockbusters consuming a lot of bandwidth, small producers of foreign language content are to find difficult to find enough translators and voice-over actors to meet the demand.
  • “We’re still stuck in the mindset of the one-to-many delivery model,” says Ryan Steelberg, co-founder and president of AI company Veritone.

Between the lines: Veritone has developed a product called MARVEL.ai which allows content producers to generate and authorize what he calls “hyperrealistic” synthetic voices.

  • This means, for example, that a podcast creator could have an audio copy of the ad translated to another language, and then MARVEL.ai would generate a synthetic version of their voice by playing the ad in the new language.
  • “This gives you the ability to hyper-customize audio on a much larger scale and at a lower cost,” says Steelberg.

How it works: Text-to-speech technology has been around for decades, but Veritone’s product uses “speech to speech”, what Steelberg calls “voice as a service”.

  • Veritone accesses petabytes of data from media libraries and uses it to form its AI product, creating a synthetic version of the original voice that can be tailored for different types of feelings or emotions, or with translation, speak a foreign language.
  • “It will no longer be the new voice of another person speaking on behalf of, say, Tom Cruise,” Steelberg said. “It will really be Tom Cruise’s voice speaking another language.”
  • Nvidia has developed a technology this would allow the AI ​​to modify the video or animation in a way that takes an actor’s lips and facial expression and associates them with the new language – so no more out-of-sync dubbing like in 1970s kung fu movies.

And after: This technology will likely be used in advertisements first, but as it migrates to higher quality content, it will open up potential opportunities and pitfalls for celebrities.

  • “In terms of dubbing and post-production, synthetic vocals will become mainstream, and you’ll find that is part of the talent deals,” says Steelberg.
  • This will not only be to ensure that Hollywood stars (and their agents) get a cut for any use of their synthesized voices, but also to prevent those voices from being misused for malicious purposes as technology becomes more accessible.

What to watch: How the voices and other creative attributes of deceased celebrities could be harnessed by AI.

  • Holograms of dead musicians like Frank Zappa are already used to present “live” shows that have brought tens of millions of turnover, while Kenny G recently released a “duet” with the great jazzman Stan Getz, who died 30 years ago.
  • Sample notes from Getz’s existing library were used to generate a new synthetic melody, even though jazz writer Ted Gioia called a “Frankenstein record”.

The bottom line: Soon we should get used to hearing celebrities speaking in almost any language – and these celebrities should get used to combing their wills.


Share.

Comments are closed.