Docstoc

SPEECH TECHNOLOGY

Document Sample
SPEECH TECHNOLOGY Powered By Docstoc
					SPEECH TECHNOLOGY
     An Overview



                Gopala Krishna. A
        (gopalakrishna@students)
SPEECH TECHNOLOGY
 WHAT IS SPEECH TECHNOLOGY
 ABOUT ??

 SPEECH TECHNOLOGY IS ABOUT
 PROCESSING HUMAN SPEECH
      as   SIGNAL
        as a form of LANGUAGE
SPEECH TECHNOLOGY
 Speech Processing By Machine




                    Algorithms: Speech
                    Recognition, synthesis,
                    coding etc.
SPEECH TECHNOLOGY
   WHAT ALL IS INVOLVED IN
   PROCESSING SPEECH ??

   MULTI-DISCIPLINARY FIELD
         •Linguistics   •Statistics
        •Physiology         Pattern Recognition
        •Psychology     •Communication Theory
  •Signal Processing    •Computer Science: A.I.
•Acoustics (Physics)        Heuristics / Machine Learning
             Speech Technology
Applications:
Man-Machine
   •   Smart Talking Machines, devices
   •   Speech enabled web interface

Communication
   •   Speech Coding
   •   Speech Enhancement

Bio Metrics
   •   Speaker Identification – security applications

Entertainment Technology
   •   Singing Voices
   •   Voice Conversion
   •   Artificial Characters / Avatar
Research Areas:
• Speech Recognition (Speech To Text)
• Speech Synthesis    (Text to Speech)
• Speech Coding       (Compression of speech)
• Speech Enhancement (Voice quality)
• Speaker Recognition (Identity of the speaker)
• Spoken Language Identification (Which language?)
• Language Models      (Modeling of natural text)
• Multimedia (Integration of Audio & Visual modes)
SPEECH TECHNOLOGY
 WHAT ARE WE DOING CURRENTLY ??

  1. INDIAN LANGUAGE SPEECH SYNTHESIS

    - Hindi and Telugu TTS building
    - Prosody

  2. SPEECH RECOGNITION

      - Large Vocabulary ASR
      - Landmark-based ASR

  3. SPEECH-ENABLED INTERFACES
Text-To-Speech Synthesis (TTS)
 Indian Language TTS Effort
       Hindi, Telugu, Tamil, Kannada, Bengali



 Text Normalization
       Machine learning Techniques



 Speech Segmentation
       Ergodic HMMs, SVMs….etc
Automatic Speech Recognition
 Large Vocabulary ASR
       “Mimic”ing the Sphinx



 Alternative ASR Techniques
       HMM-ANN Hybrid
       Dynamic Bayesian Networks (DBNs)



 Landmark-based Segmentation
Speech Enabled Interfaces
 Screen Readers
       RAVI (Reading Aid for the Visually Impaired)



 Porting to Low Memory devices
       Talking Tourist Aid Agent (PDA)



 Speech-to-Speech Devices
       Limited Domain Bi-lingual translation
Projects

• Hindi TTS (Sponsored by NOKIA)
• Telugu TTS (Sponsored by Bhrigus Inc.)
• Speech Recognition Systems for Indian Lang. (Sponsored by HP
  labs India)
• Reading Software For Blind (Sponsored by Ministry of Social
  Justice).
Stream Courses
  Speech Technology: A Practical Introduction
  Topics in Speech Processing
     Building TTS and ASR Systems
  Signal Processing
 Language Modeling
       - Intro. To NLP

       - Language and Statistics

  Machine Learning, Pattern Recognition
SPEECH TECHNOLOGY
 WHO WILL YOU BE WORKING WITH ??
• S. P. Kishore (Ph.D. @ CMU, Scientist @ IIIT)

• Prof. Rajeev Sangal

• Dr. Vasudeva Varma

• Faculty Members of Speech Group at CMU
   • Dr. Alan Black, Dr. Jim Baker…..
       (TTS)         (ASR)
• Fellow researchers
What you get at the end
Career Opportunities – Off late, many companies and R & D
organizations are investing in Indian Language and specifically
in speech systems
 - Microsoft Research (Bangalore), HP Labs India, Yahoo India

Research skills – publications

Interaction and collaboration with faculty members of Speech
Group at Carnegie Mellon

System building skills - Would have developed speech systems
using state of art techniques

Opportunities for higher education in India and abroad
SPEECH TECHNOLOGY



          QUESTIONS

     [ skishore@cs.cmu.edu ]

				
DOCUMENT INFO