Microsoft AI can now clone human speech with jaw dropping accuracy

NEWS
By
15 July 2024
Despite its impressive capabilities, the tech giant has decided not to share it with the public, citing "potential risks" of misuse
Microsoft AI can now clone human speech with jaw dropping accuracy
( Image credits: Search Engine Journal )

The tool, named VALL-E 2, is a text-to-speech generator capable of mimicking a voice based on just a few seconds of audio. Despite its impressive capabilities, the tech giant has decided not to share it with the public, citing “potential risks” of misuse.

VALL-E 2 is trained to recognize concepts without being provided any examples beforehand, a scenario called zero-shot learning. According to Microsoft, VALL-E 2 is the first of its kind to achieve “human parity,” meaning it meets or surpasses benchmarks for human likeness. It follows the original VALL-E system, which was announced in January 2023.

Developers at Microsoft Research claim that VALL-E 2 can produce “accurate, natural speech in the exact voice of the original speaker, comparable to human performance.” It can synthesize complex sentences as well as short phrases. To achieve this, the tool utilizes two key features: Repetition Aware Sampling and Grouped Code Modeling.

Repetition Aware Sampling addresses the issue of repetitive tokens, the smallest units of data a language model can process, represented by words or parts of words. This feature prevents recurring sounds or phrases during the decoding process, helping to vary the system’s speech and make it sound more natural.

Grouped Code Modeling limits the number of tokens the model processes at once, generating faster results.

The researchers compared VALL-E 2 against audio samples from LibriSpeech and VCTK, two English-language databases. They also used ELLA-V, an evaluation framework for zero-shot text-to-speech synthesis, to assess how well VALL-E 2 handled more complex tasks. According to a June 17 paper summarizing the results, the system ultimately outperformed its competitors “in speech robustness, naturalness, and speaker similarity.”

Microsoft claims VALL-E 2 will remain a research project and will not be released to the public anytime soon. “Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public,” the company wrote on its website. “It may carry potential risks in the misuse of the model, such as spoofing voice identification or impersonating a specific speaker.”

The tech behemoth notes that suspected abuse of the tool can be reported using an online portal.

Microsoft’s concerns are well-founded. This year, cybersecurity experts have seen a surge in the use of AI tools by malicious actors, including those that replicate speech. “Vishing,” a combination of “voice” and “phishing,” is an attack where scammers pose as friends, family, or other trusted parties on the phone. Voice spoofing could even pose a national security risk. In January, a robocall using President Joe Biden’s voice urged Democrats not to vote in New Hampshire primaries. The man behind the plot was later indicted on charges of voter suppression and impersonation of a candidate.

Microsoft has faced increased scrutiny over its implementation of AI, particularly regarding antitrust and data privacy concerns. Regulators have voiced concerns about the tech giant’s $13 billion partnership with OpenAI and its resulting control over the startup. The company has also faced backlash from its users.

For instance, Recall, an “AI assistant” that takes screen captures of a device every few seconds, saw its release indefinitely postponed last month. Microsoft faced a deluge of criticism from consumers and data privacy experts like the Information Commissioner’s Office in the UK. In a statement to The U.S. Sun, a company spokesperson said Recall would shift “from a preview experience broadly available for Copilot+ PCs…to a preview available first in the Windows Insider Program.”

Receive daily updates, inspiration, and exclusive deals delivered to your inbox.

Sign up to receive breaking news, reviews, opinions, top tech deals, and more.

By submitting your information, you agree to the Terms & Conditions and Privacy Policy and confirm you are 16 or older.

Share this page:

Copyright ©2024 TechyMenia. All Rights Reserved.

This article may include affiliate links. Please refer to our privacy policy for further details.

Related Articles

Today's NYT Strands Hints, Answers and Tips for Sept. 15, #196
Published 21 September 2024 –
By Landon Cole
Today's NYT Connections Hints, Answers and Tips for Sept. 15, #462
Published 21 September 2024 –
By Hina Takahashi
Today's Wordle Hints, Answer and Tips for Sept. 15 #1184
Published 21 September 2024 –
By Grayson Reed

About Author

More From TechyMenia

Apple is now producing its A16 chips at TSMC’s Arizona plant
Published 18 September 2024 –
By Derek Louie
Samsung One UI 7.0 Beta: What We Know So Far
Published 18 September 2024 –
By Craig Anderson
Google and Masimo Join Forces to Revive Wear OS Smartwatches
Published 17 September 2024 –
By Darius Brown
iOS 18 Protects Your Data Like Never Before—Here’s How
Published 17 September 2024 –
By Ryker Westin
iOS 18 AI Features: How Apple is Catching Up in the AI Race
Published 16 September 2024 –
By Jason Pierce