Ivona

Are you replaced by Text-to-Speech software?

by Paul Strikwerda in Articles 1 Comment

Should voice-over artists be afraid of artificial unintelligence?

Will robots take over the role of narrator or do voice-over professionals still have a future?

 The man who had lost his voice from thyroid cancer, spoke again on the Oprah Winfrey show. In 2010, the late film critic Roger Ebert gave his Oscar predictions with the help of text-to-speech (TTS) software that speaks whatever he typed.

The first computer-based speech synthesis systems were created in the late 1950s. They’ve come a long way, but a lot of TTS software still sounds rather robotic and unnatural. That’s why Ebert turned to Scottish firm CereProc for help.

CereProc actually uses someone’s audio recordings to create a digital voice that comes very close to the real thing. Usually, CereProc has people come in to their studio and record about 15 hours of audio. This is used to re-create the original voice.

In Ebert’s case, they used audio commentary he had made for several DVD documentaries. The quality was poor and the recordings were not as long as they would have liked. Nevertheless, they did the impossible and gave Ebert his voice back.

OUR NEW COMPETITOR?  

TTS software is not only used for people who have lost the ability to speak. It’s used to capture accents and dialects that are on the verge of dying out. People also use it to learn a foreign language. There’s one other application you should be aware of: it could eventually be used to replace you and me! Poland-based Ivona Text-to-Speech advertises:

“Save money spent on voice talent recordings. You do not have to look for recording studios and speakers. You do not waste time concluding agreements and contacting the contractors and it’s accessible 24/7.”

If you want to get an idea of what this software is capable of, go to their website; type in a few words and have a digital voice read it back to you. Rival NeoSpeech, headquartered in California claims: 

“Robotic voices are now history.”

Neospeech offers nine different voices that speak US English, Mexican Spanish, Korean, Japanese and Mandarin Chinese for a wide range of hand-held devices, desktop and network/server applications.

POLITICAL VOICES

If it weren’t for a certain former president, Roger Ebert might never have  found CereProc. Ebert came across the Bush-o-Matic talking head, a hilarious re-creation of the 43rd president. I must admit: Bush never sounded so articulate! You can make him say things that are intelligent, and even make him wink, squint or blink.

The CereProc engineers pieced the voice of Bush together from his weekly radio address. It’s kind of scary, but in a fun way. Just to be fair, they also added a virtual version of president Obama’s voice and the inimitable accent of the former governor of California, Arnold Schwarzenegger.

As you can tell from the audio samples, CereProc is getting close, but they’re not quite there yet. One of the biggest challenges any TTS provider needs to overcome, is how to add some emotion to the speech. Most artificial voices still sound a bit flat and get very boring very quickly. And for ordinary mortals, it’s still too expensive to re-create their own voice with the help of this technology. 

TIME TO GO?

So, do you think it’s getting time for professional voice-overs to pack their bags and start looking for other work? Yes and no.

First of all, text-to-speech companies all over the world use voice talent to record different languages and accents for different applications. Secondly, if you’re a musician, you might find this technological development very interesting but non-threatening.

As you probably know, any musical instrument under the sun has been sampled, and entire symphony orchestras can come out of a can. Yet, people are still buying real Steinways and there are plenty of musicians who make a very decent living.

Do you think that we’ll ever see the time when Stravinsky’s “Rite of Spring” as performed on virtual instruments, will win a Grammy? I don’t think so. Will a laboratory ever be able to produce a recording of Bach’s cello solo sonatas that rivals the depth of Yo Yo Ma’s interpretation?

You see, there’s still hope for the most subtle, most flexible, most surprising and unique of all instruments: the human voice.  

Here’s the rub: robots have a hard time emoting. They can patiently and dispassionately guide you to the next exit, but they have a hard time expressing even the most basic of feelings such as fear, anger, hurt, guilt and… love.

However, give it a few years, and who knows what the industry will come up with!

Paul Strikwerda ©nethervoice