Speech Synthesis

What Does Speech Synthesis Mean?

Speech synthesis is artificial simulation of human speech with by a computer or other device. The counterpart of the voice recognition, speech synthesis is mostly used for translating text information into audio information and in applications such as voice-enabled services and mobile applications. Apart from this, it is also used in assistive technology for helping vision-impaired individuals in reading text content.

Techopedia Explains Speech Synthesis

Homer Dudley’s VODER, which was based on the vocoder from Bell Laboratories, is considered the first fully functional voice synthesizer. The computer used in speech synthesis is known as a speech synthesizer or speech computer. The quality of the speech computer is often judged by its similarity to the human voice. Most computer operating systems have incorporated speech synthesizers since the early 1990s. Synthesized speech is usually generated with the help of concatenating pieces of recorded speech, which is contained in a database.

The initial stage in speech synthesis is pre-processing, which eliminates the ambiguity surrounding the manner in which the specific word needs to be read, and which also includes handling homographs. In the next stage of speech synthesis, the computer takes the help of phonemes to convert the text into sequence of sounds. The last stage involves the use of human recordings or basic sound generation techniques to mimic the human voice mechanism and read out the entire text. One of the popular branches of speech synthesis is the audio-visual speech synthesis or multimodal speech synthesis which makes use of an animated face tightly synchronized to complement the synthesized speech. Multimodal speech synthesis also incorporates additional features such as non-verbal cues to the speech to help in communicating the user’s words with more accuracy. Many speech synthesis systems allow users to choose the type of voice such as male or female voice.

Most speech synthesis systems are capable of reading texts and outputting them in a very intelligent manner though the voice can at times be dull. Speech synthesis, however, is yet to develop the ability to fully imitate the wide spectrum of human intonations and cadences.