Enhanced Speech Recognition Using ADAG SVM Approach
W
Description
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 1, Issue 4, November – December 2012, ISSN 2278-6856, Impact Factor of IJETTCS for year 2012: 2.524
Document Sample


International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November – December 2012 ISSN 2278-6856
Enhanced Speech Recognition Using ADAG
SVM Approach
Mr. Rajkumar S. Bhosale1, Ms. Archana R. Panhalkar2, Mr. V. S. Phad3, Prof. N. S.
Chaudhari4
1,2,3
Parikrama GOI, Department of Computer Engineering
Parikrama College of Engineering, Kashti-41470, M.S. India
4
Indian Institute of Technology
Indore (M.P.) 452 017
Abstract: Speech is a unique human characteristic used as a
tool to communicate and express ideas. Automatic speech
recognition (ASR) finds application in electronic devices that
are too small to allow data entry via the commonly used input
devices such as keyboards. Personal Digital Assistants (PDA)
and cellular phones are such examples in which ASR plays
an important role. The main objective is to recognize the
spoken word by same speaker using multi-class support
vector machine. The multi-class classifier is used for number
of classes. The input data with linear predictive coefficient
(LPC) feature extracted are input to training and it takes less
time by Support vector Machine (SVM) to find support
vectors. For test on similar input data the Adaptive directed
Acyclic Graph SVM (ADAGSVM) classifies the input. The
result is obtained in online by interfacing with machine
command of any operating system with better results. For
training and testing we have constructed sample datasets of
Marathi digits zero to nine (Shunya to Nau) and English
letter data set A to Z and machines commands such as login,
shutdown.
Keyword: Dynamic time alignment SVM, Linear
Predictive Coefficient (LPC), Support vectors (SVs) and Figure 1 Structure of speech recognition system.
Adaptive Directed acyclic graph SVM. Marathi language and 10 machine commands for offline
training and testing. In online, few machine commands
1. INTRODUCTION for training as well as for testing are used.
Speech recognition is a conversion from an acoustic We tried to solve the problem of time alignment of speech
waveform to a written equivalent of the message data when using SVM as a classifier and try to reduce the
information. The nature of the speech recognition training time of SVM with good recognition performance.
problem is heavily depending upon the constraints placed The different time alignment algorithm is used for
on speaker, speaking situation and message context. The equating different time spoken utterances of the same
most of the application of speech recognition systems are word in online [10-14]. It is shown in Fig. 1.This paper
many and varied; e.g. a voice operated typewriter and is into three parts. (i) Preprocessing (iii) Training (iii)
voice communication with computers and command line Testing.
interface with machine. The listening tests are conducted i)Preprocessing: The preprocessing part consists of two
on a large vocabulary task, recognition accuracy by operations namely End point detection and linear
human was found to be an order of magnitude higher predictive coding. End point detection is used to separate
than machines. Though these tests are included data with the speech signal from non-speech signal. Then linear
varied signal qualities, human recognition performance predictive coding (LPC) is applied for feature extraction.
was found to be consistent over a diverse set of ii) Training: The training part of SVM uses number of
conditions. classes and due to this the multiclass problem is solved
The task of this is to create learning machines that can by one against one approach. The Radial basis kernel is
classify the given spoken digit in to 10 classes used to pass the data from input space to higher
(‘zero’//Shunya// to ‘nine’//nau//). Use 10 digits in
Volume 1, Issue 4 November - December 2012 Page 106
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November – December 2012 ISSN 2278-6856
dimension feature space. Dynamic time alignment is alignment tool is used into the kernel function so called
used for variable length vector. dynamic time alignment SVM (DTAKSVM).
iii) Testing: Adaptive directed acyclic graph SVM
i. Given two sequence of vectors X and Y first
(ADAGSVM) algorithm is used for testing each word.
choose the small length vector of the two. Without
The max win algorithm is used to know the test word.
loss of generality denote this vector by X .
2. PRE-PROCESSING TECHNIQUES
j
2.1 Speech Separation by End Point Detection
j Rang
The “sound forge” software is used for end point
detection. The speech samples are recorded with bit depth J
of 16; channel mono and sample rate 44100 Hz. Sound
forge is used to remove the non-speech part from the
wave. The end point is also known by finding average
noise and average amplitude in the recorded wave. It is
shown in Fig. 2.
After pattern detection, it is applied to the LPC front-end
processor for analysis of speech signal. Jk+ (I-J)
2.1 LPC Model for Feature Extraction.
I- I i
The order of LPC is 14 and window sizes of 256 are
considered for finding LPC coefficient [5]. The basic idea
Figure 3 Nonlinear time alignment algorithm.
behind the LPC model is that a given speech sample at
i. Considering the first feature vector x1 of the
time n, s (n) can be approximated as linear combination
of the past p speech chosen sequence X , and compute the best local
match of this vector in sequence Y over a window
of predetermined size denoted by p units. Initially
the window starts from y1. Let the best match be
denoted by yk where 1 k p .
ii. Select the next feature vector x2 and repeat the
above procedure except that the match window
now starts from index k instead of 1 as previously.
Repeat the above procedure till we have obtained two
new
vectors X and Y which are of the same length.
4. TRAINING USING USING SVM
4.1 Support Vector Classification
Here in detail explain classification mechanism of SVMs
[14] in three cases of linearly separable, linearly non-
Figure 2 End point detection by sound forge samples. separable and non-linear through a two-class pattern
Given as recognition problem. Support vector machines are
originally designed for binary classification problem.
s (n) a1 s (n 1) a 2 s (n 2) ...... a p s (n p)
4.2 Linearly Separable Case
The LPC contains six operations which are preemphasis, The general two-class classification problem can be stated
frame blocking, windowing, autocorrelation, LPC as follows [2].It is shown in figure 4.
analysis conversion to cepstral, coefficients. i. Given a data set D of N samples:
x1 , y1 , x 2 , y 2 , x N , y N
. Each sample
3. DYNAMIC TIME ALIGNMENT SVM
Dynamic time alignment used in the SVM for this each xi of length
is composed of a training example
sample is a vector of fixed length, for variable length
vector time alignment tool used. The dynamic time
Volume 1, Issue 4 November - December 2012 Page 107
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November – December 2012 ISSN 2278-6856
xi x1 , x 2 , x M , max max 1 l l
M, with elements and a W y y K x x
y 1,1 2 i1 j 1 i j i j i j
target value i .
l
ii. The goal is to find a classifier with decision
i
function,
f x , such that i 1 (4.5)
f xi y i , x , y i D It is not all functions that can be used as kernels. Feasible
.
kernel must satisfy the some conditions like as
Figure 4 shows the hyperplane, which separates the
Exponential radial basis function implementation.
positive from the negative point. This can be formulated
as follows: suppose that all the training data satisfy the xi x j 2 2
following constraints: K ( xi , x j ) e
(4.6)
Is used as c in implementation.
xi w b 1 y i 1
for (4.1) The term is needs to be defined by user. Its value
xi w b 1 y i 1 used by try and error method.
for (4.2)
5. TESTING USING SVM
The testing part consists of number of data sets i.e.
multiple classes. But the SVMs are binary classifiers so a
technique is used to extend the method to handle multiple
classes. To handle the multi-class problem used technique
is called one to one classifier [2] [11]. Methods for
solving the multi-class problem of SVMs are typically to
consider the problem as a combination of two class
decision functions, e.g. one-against-one and one-against-
rest.
Figure 4 Linear separating hyperplanes for the separable
case.
4.2 Linearly Non-Separable Case
The solution to this problem is identical to the separable
case except for modification of the bounds of the
Lagrange multipliers. The parameter C introduces
additional capacity control within the classifier. In some
circumstances C can be directly related to a regularization
parameter.
4.3 Non-Linear Case
By using non-linear case to map the input data to some
higher dimensional feature space, where the data is
linearly separable. For mapping the data different kernel
functions are uses.
Thus, a mapping from input space to feature space can be
achieved via a substitution of the inner product with:
xi x j x i x j (4.3)
calculating each explicitly is not needed. Instead, we Figure 5 The role of the kernel
can use a functional representation K(xi, xj) that 5.1 Adaptive Directed Acyclic Graph S V M
computes the inner product in feature space. (ADAGSVM)
K xi x j xi x j ADAGSVM stands for Adaptive Directed Acyclic Graph
(4.4) SVM. It is multi-class method used to alleviate the
The functional representation is called kernel. problem of the DDAG structure [2] [4] [18]. An Adaptive
The role of Kernel is shown in Figure 5. DAG (ADAG) is a DAG with reversed triangular
The optimization problem of above equation becomes, structure. This approach provides accuracy comparable
Volume 1, Issue 4 November - December 2012 Page 108
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November – December 2012 ISSN 2278-6856
to that of max Wins, which is probably the currently most For training Marathi 10 digit “Shunya to Nau” and 10
accurate method for multi-class SVMs, and requires low machine command (on, off… login) with 20 replication
computations. It employs exact same training phase as of each are used. Table 1 is shows the percentage
efficiency for 100 dataset in offline testing of data and it
One vs. One classifier. I.e. it creates K ( K 1) 2 binary is calculated as follow.
SVMs, one for each pair of classes. However, it
distinguishes itself in the testing phase; the nodes are EfficiencyofEachWord (ei )
K 2 nodes Numberofsamples Re cognized
arranged in a reversed triangle with
2 TotalNumberOfSamples 100
(rounded up) at the top, K 2 nodes in the second layer
and so on until the lowest layer of a final node. It has K-1
internal nodes. n
Given a test example x, starting at the top level, the
i 1
e i
binary decision function is evaluated. The node is then
exited via the outgoing edge with a message of the TotalEfficiency n
preferred class. In each round, half reduces the number of
candidate classes. Based on the preferred classes from its
parent nodes, the binary function of the next-level node is Where n is total number of words. Speaker from Training
chosen. and Testing Data sets are same. The LPC Features, with
The reduction process continues until reaching the final 100 input words for training and same 100 words for
node at the lowest level. Consider for the testing of class testing in offline. In online, table 2 uses few machine
char (4), the ADAG structure is shown in Figure 6. commands for training and one machine command for
testing are used.
Table: 6.1
Performance of SVMs classifier using LPC features for
100 points
Input C=0.5 C= 0.7 C=0.9
Word (Overall (Overall (Overall
efficiency in efficiency in efficiency in
%) %) %)
Marathi
94 95 97
Data set
Offline
Command
93 94 96.5
Data Set
Offline
Results are shown for machine commands in percentage
when each command applied for 10 times in testing.
Table: 6.2
Figure 6 Structure of an Adaptive DAG classifier for 10 Performance of SVMs classifier using LPC features for
class problem. few machine commands.
6. RESULTS Input Word C= 0.9
(Over all efficiency in
The SVM classifier is used for 10 Marathi digits and 10 %)
machine commands for recognition in training and
testing. There are 200 data point by 20 replications of Command Data Set Online 96.00
each digit and commands by same speaker. Results are
obtained for different values of c and constant value of d
equal to 10. The following results are obtained in We have tried to reduce the training time of SVM with
percentage for 100 data points’ recognition. The same good recognition performance. Proposed method
speaker is used for training and testing dataset. (DTAKSVM) required less training time for recognition
performance.
Volume 1, Issue 4 November - December 2012 Page 109
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November – December 2012 ISSN 2278-6856
6. CONCLUSION kernel-based learning methods”, Cambridge
University Press 2000.
Support Vector Machine classifier using Adaptive
[13] T. Joachims. “Text categorization with support
Directed Acyclic Graph SVM algorithm is used for 10
vector machines: learning with many relevant
Marathi digits, 10 machine commands for offline speech
features”. In Proc. 10th European Conference on
recognition. It is also uses few machine commands for
Machine Learning ECML-98, pp. 137–142, 1998.
training and one command for testing online. It is found
[14] S. Gunn. “Support vector machines for
to be promising classifier for speech recognition. Due to
classification and regression”. Technical Report,
the less training and testing time performance, it can be
University of Southampton Image Speech and
used for real time application. LPC features are used for
Intelligent Systsmes Group, 1997.
speech recognition task, DTAK algorithm with the kernel
function (RBF). The system is speaker dependent. The
testing part of speech recognition uses ADAG SVM. The AUTHORS
ADAG gives better result in less iteration for the number
of classes. It may be use for authentication as it is speaker
dependent. The running time of this scheme is directly 1
Bhosale R.S. received the B.E. and M.E.
proportional to number of classes in input dataset. This degrees in Computer Science and
scheme can be extended for continuous speech Engineering from SRTMU, Nanded in
recognition. 2001 and 2007, respectively. During 2001-
2007, he stayed in Signal Processing and
REFERENCES Computer Networking Research Laboratory.
[1] Boonserm Kijsirikul and Nitiwut Ussivakul, 2
Panhalkar A. R. received the B.E. and
“Multiclass Support Vector Machine Using M.E. degrees in Computer Science and
Adaptive Directed Acyclic Graph”, IEEE 2002, pp Engineering from SRTMU, Nanded in
980-985. 2003 and 2008, respectively. During 2003-
[2] Xin Dong, Wu Zhaohui and Pan Yunhe, “A New 2008, she stayed in Signal Processing and
Multi-Class Support Vector Machines”, IEEE 2001, Image Processing Research Laboratory.
pp 1673-1676.
[3] Chen Junil, Jiao Licheng, “Classification 3
Phad V.S. received the B.E. and M.E.
Mechanism of Support Vector Machines”, IEEE degrees in Computer Science and
2000. Engineering from SRTMU, Nanded in 2003
[4] K.P. Bennett, J.A. Blue. “A Support Vector Machine and 2008, respectively. During 2003-2008,
Approach to Decision Trees”, IEEE 1998, pp 2396- she stayed in Signal Processing and Image
2401. Processing Research Laboratory.
[5] L. Rabiner and B. Juang, Fundamental of Speech
Recognition. Prentic Hall, 1993.
[6] C. J. C. Burges. “A Tutorial on Support Vector
Machines for Pattern Recognition”. Knowledge
Discovery and Data Mining, 2(2), 1998.
[7] V.N.Vapnik, “The nature Of Statistical Learning
Theory”, Springer Verlag, New York 1995.
[8] V.N.Vapnik, “Statistical Learning Theory”,
Springer Verlag, New York 1998.
[9] Hiroshi Shimodaira, et al, “Support Vector Machine
with Dynamic Time-Alignment Kernel for Speech
Recognition”, Eurospeech 2001, Scandinavia.
[10] Shantanu Chakrabartty and Yunbin Deng,
“Dynamic Time alignment in Support Vector
Machines for Recognition Systems”.
[11] Kijsirikul and N. Ussivakul. “Multiclass support
vector machines using adaptive directed acyclic
graph”. In Proceedings of International Joint
Conference on Neural Networks (IJCNN 2002), pp
980–985, 2002.
[12] Nello Cristinini and Jhon Shawe-Taylor. “An
Introduction to Support Vector Machine and other
Volume 1, Issue 4 November - December 2012 Page 110
Related docs
Other docs by editorijettcs
Get documents about "