Pleasant voices for the S-ClassOn November 23, 2009 by PeterH
Stuttgart — The latest generation of the LINGUATRONIC voice-operated control system is entering series production with the 2009 S-Class. Instead of saying the town and street separately as before, drivers can speak the desired destination as a single command — for example “Stuttgart, Epplestraße”. The system immediately begins to work out the route, only pausing to enquire whether a house number is to be entered as well. In Germany, LINGUATRONIC understands around
80,000 town names and 470,000 street names entered in this way. This new, particularly convenient destination input works in six languages and more than 15 European countries.
A dialogue with LINGUATRONIC is practically a person-to-person affair. Around a dozen female speakers and one male speaker lent their voices to the S-Class, recording the individual words, phrases, numerical sequences and names which the system almost instantly joins together into easily understood information and instructions as the situation requires when interacting with the driver. The “voices of the S-Class” come from various European countries, where the ladies – and one man – concerned work for radio and TV stations or synchronising studios.
Scientists spent more than two decades working on the development of a computer-based voice recognition system. In 1996 Mercedes-Benz was the first automobile brand to offer such a system in a car — though initially only to operate the onboard telephone. Voice-operated control has come on in leaps and bounds since then: the times when town and street names had to be spelled out are long gone. When controlling the telephone, audio and navigation system, the latest version of LINGUATRONIC, which Mercedes-Benz offers in various model series, works on the principle of whole-word input.
In the case of the S-Class, Mercedes engineers use the term “one-shot input” to describe the currently most advanced development stage of the system, where
the town and street names can be spoken as a direct sequence.
This new procedure currently works for the following languages and countries:
• German: Germany, Austria and Switzerland
• English: Great Britain, Gibraltar and Ireland
• Spanish: Spain
• French: France, Monaco, Belgium and Switzerland
• Italian: Italy, San Marino, Vatican City and Switzerland
• Dutch: Netherlands, Belgium
Voice-operated control is not just about understanding the driver’s wishes, but also about entering a dialogue with him. The system responds in a friendly voice if it has failed to understand something, for example, or if it wants the driver to confirm certain operating commands. While it would be perfectly possible to generate these voices synthetically — i.e. by computer — Mercedes-Benz holds a low opinion of such “lifeless” announcements, preferring a person-to-person dialogue for its voice-operated control system.
Mercedes-Benz and its system partners have contracted professional female speakers and one male speaker who lend their voices to the voice-operated control and navigation systems of Mercedes models. For each language, it takes three days to record the words, phrases, numerical sequences and names written on around 100 manuscript pages as the basis for the route guidance and voice operation dialogue.
The system joins thousands of individual recordings together for the dialogue
During the recording work in the studio, each of the well over 1000 “takes” is individually saved and encoded, so that the computer is rapidly able to access the relevant command rapidly as the situation requires, adding other information to it if necessary. It is therefore important for the speakers to use the same intonation throughout, so that the information sounds immediate and natural when the system formulates its responses from various acoustic fragments, e.g. telling the driver where to turn off, which lane to take and which road to choose.
The navigation system “speaks” more than a dozen languages
The specialists at the Mercedes development centre make a fundamental distinction between the voice-operated controls with which the car obeys its driver’s every word, so to speak, and the language information used for route guidance. The navigation system in the S-Class “speaks” more than a dozen languages, which are available in the different national versions of the unit. These include Danish, German, English, Spanish, French, Italian, Dutch, Portuguese, Turkish, Russian, US-English, Japanese and Chinese.
When it comes to interacting with drivers and giving them directions, Mercedes-Benz primarily uses female voices. The only exception is Turkey, where drivers prefer to receive directions from a male voice. The “voices of the S-Class” also work for radio stations, synchronise films, do voiceovers for advertising spots, read talking books and perform in theatres.
From spelled-out words to direct input
- Latest-generation LINGUATRONIC in the S-Class
- Whole-word destination input improved even further
- Voice recognition through analysis within milliseconds
To ensure that the LINGUATRONIC voice-operated control system obeys the driver’s every word, it was subjected to a highly involved learning process during its development. It was then tested in all the languages, and by Mercedes customers in all language regions.
It is however very important for LINGUATRONIC not only to understand every word, but also every male or female driver. Every person has his or her own pronunciation, tone and individual speech cadences. To make the dialogue perfect, the Mercedes system offers an “after-training” function: a personal conversation with Ms Libbach or one of her colleagues, during which the driver can individually adapt the voice recognition to the sound of his/her voice and intonation.
Around ten years ago, drivers were only able to operate the onboard telephone with voice commands. Since 2000 LINGUATRONIC has been capable of more, and now controls the car radio and CD-changer as well. Since 2002 the Mercedes-Benz navigation system has also been optionally controllable by the voice recognition system. The first-generation system only required a processor with a memory capacity of 512 kilobytes, but more than ten megabytes are necessary nowadays.
For a long time drivers were obliged to enter the destination by spelling out the town and street names. This changed in 2002, in the E and S-Class, where it was now possible to input around 650 place names in Germany by whole-word voice command. Nowadays LINGUATRONIC not only understands all town and street names when destinations are entered, but also whole words when selecting a radio station or names from the personal telephone directory. The driver only needs to say the destination, whereupon the system searches its electronic memory for the relevant town and street. If there are several similar-sounding names, the display shows a selection.
Destination input: the driver says the town and street names directly in sequence
In the current S-Class, which has been on the market since summer 2009, Mercedes engineers have improved the whole-word voice input function even further. They call this new development a “one-shot” function, and it makes voice-operated control even easier and faster. After speaking the command “Enter destination”, the driver says the desired destination as a single command — for example “Stuttgart, Epplestraße”. The system immediately begins to work out the route, only pausing to enquire whether a house number is to be entered as well. There is then a verbal acknowledgement: “Stuttgart, Epplestraße confirmed. Route guidance starting now.”
The largest active vocabulary is to be found in the LINGUATRONIC system of Mercedes models in the US state of California, where whole-word input of around
220,000 street names is possible. In Germany around 80,000 towns and more than 470,000 street names can be input by voice command.
LINGUATRONIC is a major Mercedes-Benz contribution to road safety, as drivers no longer need to take their hands off the wheel to operate the car phone or audio equipment. They are therefore better able to concentrate on the traffic situation.
Mercedes-Benz also uses speech synthesis technology to read out important traffic information affecting the route, or SMS messages.
Voice recognition: LINGUATRONIC “listens” for phonemes
During the brief dialogue between the driver and LINGUATRONIC, the sound signal is digitised, converted into a frequency range and finally analysed. Within milliseconds, the computer extracts various characteristics from the speech signal in order to recognise what are known as ‘phonemes’. To the linguistic scientist these are the smallest sound components of a language, and they are decisive for understanding the words. The control system is able to recognise words by combining the phonemes and comparing the result with the contents
of a phoneme dictionary stored in memory. Each language has its own, typical phonemes; LINGUATRONIC uses around 40 for the German language.
LINGUATRONIC processes the phonemes as digital codes. The electronics instantly check each sound, join the different phonemes together and also verify the acoustic probability of the word.
So that even fine nuances in pronunciation are recognised reliably, Mercedes engineers have interposed a special background noise suppression feature. This enables voice commands to be well recognised even at higher speeds. Up to a certain speed, this means that LINGUATRONIC even works when the roof of a cabriolet or roadster model is open.
100% of the shots you don’t take don’t go in.