Embed
Email

sonic.ppt

Document Sample

Shared by: hedongchenchen
Categories
Tags
Stats
views:
5
posted:
12/3/2011
language:
English
pages:
13
Using SONIC to build a

speech recognizer



Pellom & Hacioglu, ``Sonic: The University of

Colorado Continuous Speech Recognizer,'' Center for

Spoken Language Research Technical Report TR-

CSLR-2001-01, U. Colorado, 2003





Presented by Yang Shao, CIS788K04 Wi04

Performance on standard tasks









 On a 1.7GHz Pentium 4

Procedures

 Preparation

– identify the goal;

– decide the recognition unit: phoneme, syllable,

word etc;

– preparing the corpus: training, development,

testing;

– label part of training data (opt).

– etc.

Procedures cont.

ˆ

W  arg max p(O | W ) P(W )

W

 Training

– Acoustic model training;

– Language model training;

 Adaptation

– Speaker adaptation (VTLN, MLLR, MAP);

– Environment adaptation (mismatch of training and

testing);

 Testing

Acoustic model training

 Feature extraction and iterative steps of viterbi state-

based alignment and model estimation;

 Outputs a set of decision-tree state-clustered HMMs;

Feature extraction (PMVDR)

 Perceptual Minimum Variance Distortionless

Response cepstral coefficients;

– fea [options] speechfile.raw featurefile.fea









 Dynamic features;

Language Model I

 Finite state grammar in terms of a regular

expression;

Language model II

 Language model:

– P(W) = P(w1, w2, …, wm) gives the probability of a

given word sequence;

– expanded as





– N-gram





– Calculated as





 Bigram example: P(Mary loves that person) =

P(Mary|)P(loves|Mary)P(that|loves)P(person|that)

Recognition overview

 Speech-enabled applications can be built by

calling functions within the Sonic API.

– Sonic_batch –c config.txt [-l]

Configuration file

 It is a text file that has a set of parameters

followed by arguments to establish the basic

settings of the recognizer.

– location of the acoustic model files;

– location of the language model file;

– location of the pronunciation lexicon;

– recognizer settings such as search beams, pruning

settings, etc.;

– (opt) a pointer to a control file containing a list of audio

files to process.

Components

 Audio file format:

– 16-bit linear PCM format (raw);

– sampling rate is configurable (8k default);

 Phoneme configuration file format

– support 55-phoneme symbol set adopted by

CMU Sphinx-II speech recognizer.

Components cont.

 LM format

– support up to 4-gram language model

 Pronunciation lexicon format



 Acoustic model format

– using binary files from trainer function;

– .-, ex. AA.1-l;

Discussion

 Unlike HTK, the trainer code estimates

models for one base phone at a time.

Potential problem?



Other docs by hedongchenchen
AMS11-AV-Order-form
Views: 0  |  Downloads: 0
Rural Telephone Bank
Views: 5  |  Downloads: 0
04tbl2-32a
Views: 0  |  Downloads: 0
CG9 Licence No.
Views: 0  |  Downloads: 0
1996
Views: 0  |  Downloads: 0
2011 CATALOG
Views: 11  |  Downloads: 0
NEURO-_summary.doc - STJ PA 2012
Views: 1  |  Downloads: 0
1995-1996 Prepaid Health Plan Contract
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!