DCS860A – Fall 2001
Mary L. Manfredi
This paper will mainly focus on the speech products available on the market
today. I will be reviewing products from the following companies: (Lernout and
Hauspie) L&H, Nuance, Voice Pilot, Philips, Speechworks, AT&T and IBM.
Speech products are broken down into several categories. You have those
products that can be purchased and used by a customer directly. You also have
those products that are used in support of a complete speech solution. These
products are used by integrators and developers who provide complete speech
solutions for businesses. Examples of such products are speech recognition
engines and test-to-speech engines.
SpeechWorks Products 
SpeechWorks is one of the prominent companies today in the arena of speech
technologies. It’s products are available in several languages including French,
Spanish, and German.
- OpenSpeech Recognizer 1.0 - This product is a speech
recognition engine optimized for VoiceXML. It uses underlying
technology created by AT&T.
- SpeechWorks 6.5 Second Edition - is a comprehensive software
product for building network-based speech recognition services.
It supports multiple languages including French (Continental
and Canadian), Spanish (Latin American, Anglo and Castillian),
Cantonese (Hong Kong), Mandarin (Taiwan), Dutch, German,
English (US, UK, Australia, Asian and South African), Korean,
Japanese, Portuguese (Brazilian) and Italian.
- Speech2Go – Speech2Go is an advanced speech recognition
engine designed for embedded applications in mobile devices,
handsets and automotive systems. It has the ability to
recognize new names and phrases using its grammar update
tool. This eliminates the need for the user to teach it. It
supports several language including US, UK and Australian
English, German, French and Italian. Its size is relatively small,
between 2-8 MB depending on the vocabulary size and
vocabulary is limited by available hardware resources (memory
and CPU speed). It is available for the Windows CE operating
- Speechify - Speechify™ from SpeechWorks is a next-generation
text-to-speech (TTS) engine that is more natural-sounding
synthesized speech than in the past. It uses AT&T natural
language processing technology to accomplish this.
- ETI – Eloquence – is a text-to-speech system that can be used
with any TTS application. Its low memory requirements are
ideal for embedded devices such as in-car navigation systems,
next generation mobile phones and hand-held devices. Because
of its support for many different languages it is also appealing
to international companies.
Voice XML and Standards
VoiceXML is the emerging standard for speech services. It is a markup
language, used to describe an interaction between a caller on a
telephone, and a server. VoiceXML browsers provide an interface between
a caller on a standard telephone and an application running on the web.
- Open Vxi – is a VoiceXML interpreter which allows developers to
add VoiceXML capabilities to their products without developing
that technology themselves.
- SpeechSecure™ - This product uses biometric technology to
verify a caller's identity based on the characteristics of his or her
unique vocal patterns. SpeechSecure provides an extremely
tight level of security for callers who access personal or
sensitive information over the telephone. Many commercial
applications who have requirements of high security, such as
financial services and other self-service sites accessing personal
information will be able to use this service
Lernout & Hauspie Products 
- L&H Voice Xpress Professional (Version 5) – This voice
recognition package features dictation to most windows based
applications. It also works with Windows 2000 and is one of
the leading packages that works with Microsoft Office. A new
feature reads back your text in a very human sounding voice.
Also available are extra plug-in vocabularies for specific fields
such as Business and Finance, Technology, Leisure and In The
- PowerTranslator Pro Version 7 – This product translates English
to Spanish, German, French, Italian, Portuguese and Japanese.
It can be integrated into The MS office 2000 products
preserving the document formatting. This is especially
meaningful to a use because sometimes the format is what
takes the most time to do. Text, original and translated, can be
displayed side by side.
- Dragon NaturallySpeaking Professional 6.0 – This speech
recognition software is geared for corporate and professional
use. It allows for dictation of memos, reports and other
documents, enter data, fill in forms, send e-mail, and work on
the web – all by voice. This software handles multiple specialty
vocabularies and lets you create custom commands to automate
tasks. It is available in American and British English, French,
Italian and Spanish.
- Dragon NaturallySpeaking Legal Suite 6.0 – This suite contains
specialized terminology used in the legal profession. It also
includes Corel WordPerfect Suite 8.
- Dragon NaturallySpeaking Preferred – This edition offers Text-
to-Speech and Dictation Playback which assists with editing.
NaturallyMobile has support for hand-held recording devices. It
is available in American and British English, French, Italian and
- Dragon NaturallySpeaking Medical Suite 6.0 – The distinctive
feature of this edition is the specialized terminology used in the
- Dragon NaturallySpeaking Mobile Recorder Option Kit – contains
the hand-held digital recorder which holds up to 40 minutes of
recorded speech, equating to about 10 pages.
- Dragon NaturallySpeaking Mobile – This is a package containing
Dragon NaturallySpeaking Preferred software, the hand-held
Dragon NaturallyMobile digital recorder and a headset
- DragonDictate – This product allows the operation of the pc to
be totally hands free. One is able to create, edit, format and
move text by voice into most window applications including MS
Word, Corel WordPerfect, MS Excel, Netscape Navigator and MS
IE. Activation of menus and dialog boxes is done by using the
words on the screen. It is available in American and British
English, French, Italian and Spanish.
Voice Pilot Products 
- Hear-Say – This product allows you to make your own voice
files and compresses them so they can be easily sent over the
internet. The voice file can be inserted into standard word
processing programs like MS Word and WordPerfect or even
into a cell of a standard spreadsheet program.
- Hear-Look- A product that lets you send pictures along with
your voice file. It also compresses the files well so that they
take less time to send.
- PAL – Allows the control of a computer by voice. It works with
most popular speech engines including IBM, Dragon and L&H.
You can use PAL to keep your appointment book, your to-do list
and make notes on your contacts without touching the
keyboard. You can also synchronize your Palmtop and other
computing tools by voice.
Nuance Products 
Nuance is a speech technology company that offers a suite of voice software
products in the categories of server software, voice browser solutions,
application enablers, and developer tools.
- Nuance 7.0 - is core speech recognition software for voice-
driven applications over the telephone. Some of the features
include wireless and hands free support, dynamic language
detection™ for multi-lingual systems, hot swappable grammars,
and enhanced barge-in.
- Nuance Verifier 3.0 - Nuance’s voice authentication technology,
uses these voiceprints to deliver high security and secure access
at a low cost without the use of passwords and PINs.
Voice Browser Solutions
- Nuance Voyager- is a voice browser that enables a user to surf
the web over the phone. It also takes advantage of Nuance's
integrated speech recognition and voice authentication, allowing
personal information to remain secure.
- Nuance SpeechObjects – SpeechObjects are reusable software
components. Developers use SpeechObjects to build speech
recognition and voice authentication applications.
- Nuance V-Builder – This is a graphical tool which enables
developers to create voice applications.
- Nuance Foundation SpeechObjects – are 25 pre-built speech
- Nuance Grammar Builder – is also a graphical tool which
enables developers to create, view, edit, manage
- Nuance V-Optimizer – a tool for analyzing and tuning deployed
AT&T Products 
This package comes in 3 flavors, Server, Server-Lite and Desktop. The
server edition is targeted for large businesses serving the needs of many users
across an enterprise network. It includes a female and male U.S. English voice
and supports the creation of unique customized voices. The development
platforms that it supports includes Linux, Solaris, Window XP, NT and 2000. The
Server-Lite configuration is geared towards small business and the Desktop
edition targets individual end-users who want to add TTS capabilities to their
own desktop applications.
Customized Voice Products
The AT&T Labs Natural Voices customized voice products gives the ability
to those that have the AT&T TTS engine to create made-to-order voices. Two
packages are available for this, AT&T Labs’ Natural Voices fonts and AT&T Labs’
Natural Voices icons. The fonts give businesses a library of voices to use when
adding TTS capabilities to an application. The icons include custom-developed
TTS voices. The voices are developed closely with the customer in one of two
ways. The customers can supply their own voice talent. AT&T Labs would record
the voice talent and then produce the synthesized voice. The alternative is that
the customer specify the characteristics of the voice and the AT&T Labs find a
voice talent to match the customer request.
Philips Products 
- SpeechMagic – is a client/server speech recognition software
package used by developers to create applications.
- Speech SDK – a professional software development kit used to
speech-enable software applications.
- SpeechMike Family – A set of devices that are a combination of
speaker, microphone and mouse.
- Digital Dictation Solution – this solution contains several
different models of digital recorders.
- SpeechPearl – is a product family with different components
which is suitable for a variety of telephone applications such as
directory assistance, information and customer service, banking
applications and name dialing.
- SpreechMania – is a natural speech recognition and language
understanding software platform to automate telephone based
information and transaction services.
- SpeechWave – is a speech recognition solution that integrates
six technologies under a common API (application programming
interface). The technologies include discrete digits, continuous
digit strings, alphanumeric strings, phonetic vocabularies,
speaker dependent recognitions and speaker verification.
Voice Control is the use of embedded speech technologies.
Applications for these technologies include navigation systems, telematic
applications, car features, car equipment, mobile cellular phones,
handheld devices, television, audio and others.
IBM Products 
IBM has a host of voice products broken down into two categories. The
categories are Home and Small Business solutions and Enterprise solutions.
Below is a listing of these solutions.
Home and Small Business Solutions
The ViaVoice family of products for home and small business use provide
the necessary speech recognition software to the customer to perform dictation,
internet and command and control features. The ViaVoice vocabularies allow the
expansion of vocabularies such as medical and legal.
- ViaVoice Pro
- ViaVoice Advanced
- ViaVoice for Mac OS X
- ViaVoice for Mac - Millennium
- ViaVoice for Mac – enhanced
- ViaVoice Millennium Pro
- ViaVoice Vocabularies (Mac)
- ViaVoice Vocabularies (Win)
As environments become more mobile, conventional interfaces are
becoming less usable. Voice technology will become the primary user interface
for accessing information and conducting transactions in the new environment.
IBM provides middleware and component parts for companies to build their own
voice solutions as well as all-inclusive voice solution packages.
- IBM WebSphere Voice Server – this software encompasses both
a speech recognition and a text-to-speech engine. It enables
developers to develop and deploy voice-enabled e-business
- WebSphere Voice Response – This is a solution that will allow
businesses to answer and screen a large number of calls
- WebSphere Voice Toolkit
- IBM Message Center
- WebSphere translation server – A useful tool for companies
dealing internationally. It enables the translation of web pages
into different languages without the need to recreate them
- Mobility Suite – It enables PDA functions to respond to voice
- Mobile Device Edition
- Dictation for Linux
ViaVoice Test-to-Speech – This gives Text-to-Speech abilities to mobile devices
such as PDAs, SmartPhones and automobiles.
Many of the products mentioned above are used in developing voice biometrics
applications. The best-known commercialized forms of voice biometrics are
speech verification and speaker identification . Of the two, speaker
identification is the most difficult because when the voice sample is taken from
the user it must be compared to all the voices it has available in the database.
Speaker verification, on the other hand, takes the user’s voice sample and also
takes who they claim to be. Then the 2 samples are compared to see if they
match. The use of voice biometrics is growing and it appears that it will continue
to be used as a means of identification for sensitive applications.
An article written back in 1994 entitled “Survey of Current Speech Technology”
concluded that the greatest potential lies in the development of systems that
combine recognition and synthesis to support conversational interaction between
humans and computers in complex task domains . Looking at the variety of
products that exist now we can see that there are many speech applications
today that perform rather complex interactions and more coming in the near
 Rudnicky, A., Hauptmann, A., Lee, K., “Survey of Current Speech
Technology”, Communications of the ACM, March 1994, Vol. 37 No. 3.
 Markowitz, J., “Voice Biometrics”, Communications of the ACM, September
2000, Vol. 43, No. 9