Bangla_ASR_Shammur by xiangpeng

VIEWS: 50 PAGES: 29

									 Implementation of Speech
Recognition System for Bangla
     Supervisor: Prof Mumit Khan

          Conducted by:
     Shammur Absar Chowdhury
            07110071
08/11/10
To the people those who dedicated their
    life for Bangla Language in 1952




08/11/10                        3
08/11/10   4   4
08/11/10   5   5
            Text Preparation
• Domain Chosen
   -Product Price Query System
• Item Selection
• Text/Sentence Preparation
• Example




08/11/10                         6   6
               Speech Corpus
• Recording

• Using 8 speakers
  → Male – 5
  → Female – 3

• Scripts is given


08/11/10                       7   7
           Speaker Profile includes
Name
 Age
Gender
 Language dialect




08/11/10                       8      8
Other Factors Considered

 Environmental condition of recording

 Technical details of device

 Date and time of recording




08/11/10                          9      9
  Parameters of the Audio Recordings
Sampling rate of the audio: 16 kHz

Bit rate (bits per sample) : 16

Channel : mono (single channel)




08/11/10                              10   10
           Other Required Files
• Pronunciation Dictionary

• Transcription File

• Language Model etc




08/11/10                     11   11
08/11/10   12   12
           Figure 2: Overview of Training Process
08/11/10                                            13   13
           Acoustic model generation
• Using Model Created by SphinxTrain

• Jar file

• Import in to the Project




08/11/10                          14   14
08/11/10
08/11/10   16   16
• Using two Decoder:

           PocketSphinx
           Sphinx4


• Done using audio inputs of test speaker and
  live test from the microphone in different
  environment



08/11/10                            17   17
             Environments
• Class Room
• Lab
• Department
• Open Space
• Closed Room
• Cafeteria
 Using Different Microphone, Trained and
     Untrained Speaker

08/11/10                          18   18
     Performance Of PocketSphinx




           Figure 4: Performance Chart for PocketSphinx

08/11/10                                                  19   19
           Average Accuracy




            90.65%
08/11/10                      20   20
           Performance Of Sphinx-4
                     Input Type: Audio Files




           Figure 5: Performance Chart for Sphinx 4 Audio

08/11/10                                                    21   21
           Average Accuracy




            71.38%
08/11/10                      22   22
           Performance Of Sphinx-4
           Input Type: Live from Microphone




           Figure 6: Performance Chart for Sphinx 4 Live

08/11/10                                                   23   23
           Average Accuracy




            86.79%
08/11/10                      24   24
                 Limitations
• Built using Small Data
• For a Fixed Domain
• Performance Depends on:
   Speaker
   Environment
   Microphone Quality and Distance kept between
  Speaker and Microphone
   Depends on Pronunciation, stress etc.



08/11/10                              25   25
                Future Work
• Work for Performance enhancement

• Increase Data size with better Accuracy

• Integrate with Interactive Voice Response
  (IVR)

• Try to include Speaker Adaption

08/11/10                             26     26
Future System and Work Done till Now




           Figure 7: Proposed System
08/11/10                               27   27
                                                 References
•   CMU – Robust Group Tutorial
    http://www.speech.cs.cmu.edu/sphinx/tutorial.html
•   CMU Sphinx – wiki
    http://cmusphinx.sourceforge.net/wiki/
•   Sphinx-4 A speech recognizer written entirely in the JavaTM programming language
    http://cmusphinx.sourceforge.net/sphinx4/
•   Sphinx-4 Application Programmer's Guide –
    http://cmusphinx.sourceforge.net/sphinx4/doc/ProgrammersGuide.html
•   Speech Recognition Software
    http://speech.blau.in/
•   ASR Assignment
    http://www.speech.cs.cmu.edu/15-492/homework/hw2/index.html
•   Acoustic Model Creation using SphinxTrain
    http://forum.visionopen.com/viewtopic.php?f=39&t=1130
•   Acoustic Model Creation using SphinxTrain
    http://www.bakuzen.com/?p=16
•   Sphinx-4 Instrumentation
    http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/instrumentation/doc-files/Instrumentation.html
•   Sphinx-4 Configuration Management
    http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/util/props/doc-files/ConfigurationManagement.html
•   Hello World Decoder Quick Start Guide
    http://sphinx.subwiki.com/sphinx/index.php/Hello_World_Decoder_QuickStart_Guide
•   baküzen » Blog Archive » Sphinx4
    http://www.bakuzen.com/?p=4
•   How to Use Models form SphinxTrain in Sphinx-4
    http://cmusphinx.sourceforge.net/sphinx4/doc/UsingSphinxTrainModels.html



08/11/10                                                                                                28              28
08/11/10   29

								
To top