Learning Center
Plans & pricing Sign in
Sign Out

Guest Lecture Speech Processing


  • pg 1
									CS 378 Natural Language Processing

Speech Processing: Present, Past and Future
Inge M. R. De Bleecker Department of Linguistics
October 14, 2003


Main Types of NLP Applications
• Text processing: information retrieval and search engines, information extraction, text summarization, machine translation, question-answering…


• Speech processing: speech recognition (ASR) over-the-telephone (OTT) and dictation systems (desktop), speaker verification, text-tospeech (TTS)

• Speech industry: history (late 80‟s to present) • Practical Overview of current applications and their future directions: – Speech recognition accuracy – Text-to-speech accuracy – Usability and design – Application building tools • Working in the speech industry


History: Late Eighties
Sentiment: OTT ASR finally ready for commercial applications…


Technology: OTT speaker-independent discrete digits/yesno apps. Word-based language models. TTS mostly used for numbers, if used at all. Pre-recorded strings much more common.
Applications: simple in structure and functionality
OTT: banking, e.g. ask for account balance. Desktop: first dictation systems, medical applications.

Companies: few. Small research-oriented companies, or research arms of big companies. E.g. Dragon Systems, VPC, VCS, Kurzweil, BBN, AT&T, …

History: Early Nineties
Sentiment: credibility and usability of apps grows. Multilingual developments. Technology: OTT SI continuous digits/yesno/command word apps. Move to phoneme-based language models.


OTT: still simple, system-directed dialog (vs user-directed, mixed-initiative) Desktop: more dictation systems, command and control systems (userdirected)

Companies: more companies pop up. Most grow out of research communities.

History: Mid to Late Nineties
Technology: maturing of technologies used. Companies:
– overall growth – dirty politics (L & H) – mergers and buyouts start (still ongoing today)


History: Late Nineties to Present
Technology: maturing of technology continues
– – – – better recognition accuracy unrestricted ASR input (natural speech) move to more sophisticated dialog systems (see next slide) tool standardization


Applications: Wider use of apps. More attention to usability, dialog design, etc…

Dialog System Architecture






Output Generation

Speech Recognition Accuracy


Present: reasonable accuracy on natural speech. Most systems still use grammar to help recognizer. Grammars are written in VoiceXML or vendor-specific language, not very sophisticated from a linguistics point of view. Some systems are (theoretically) purely statistical. E.g. Nuance‟s Accuroute. Future: need to add more linguistic principles to current statistic methods. Make signal processing more robust, encourage reusability.

TTS Accuracy
Present: getting better all the time. During the last few years, additional research in prosody, intonation has paid off. More naturally sounding speech. Also deals with abbreviations, etc. Current TTS can be used to patch up „real‟ speech. E.g. AT&T, Scansoft (Speechworks).


Future: probably never a complete substitute for pre-recorded strings.

Usability – Dialog Design
Present Dialog design (VUI) is becoming more sophisticated through


– use of natural speech input – mixed-initiative dialogs (more complicated for novice users) – chatty applications which provide gracious ways of dealing with low accuracy confirmations and errors, fall-back to system-directed dialog,… – use of persona: e.g. Bell Canada‟s Emily

Future Continued improvements in dialog design are necessary (e.g. usability studies). Dialog design is easier with current (and future) tools, but… still an art! It is (too) easy to design bad speech applications…

Usability – Other Issues
– Natural language generation (NLG) is not receiving much attention – Reasoning components very limited


– – – – NLG needs to adapt to user, conform more to human speech patterns multimodal applications multilingual systems use of e.g. ontologies in reasoning components, …

Application Building Tools


– Standardization: VoiceXML and VoiceXML platforms (alternative: SALT) – Many platform companies: VoiceGenie, Bevocal, Audium,… – Also companies developing tools for platforms: Aptera

– – – – World of VoiceXML: comprehensive site on all things VoiceXML Free developer‟s resources: e.g. Bevocal Small companies: can have voicexml app hosted by a platform company Big companies: in-house platforms (telco-industry grade equipment), quite costly

Future Development of better tools, that make it harder to build bad applications!

Speech Apps State-of-the-Art
Conclusion: ASR and TTS are usable in real-world applications right now.


To develop better applications, we need to improve accuracy, usability, etc or… think about some radically different approaches to the current problems! (=> the “age-old” argument)

Working in the Speech Industry


Working for: A speech recognition/text-to-speech company: a CS undergraduate can work on software development of tools, deployments. With addition of some linguistics classes: dialog designer, QA of deployments, … A VoiceXml platform company: general software development, … A tools company: general software development, … A consulting (services) company: dialog design, deployments. Or… Get a Ph.D. in EE and become a speech scientist who develops the next generation speech recognizer…

To top