Enhanced Speech Recognition Using ADAG SVM Approach by editorijettcs


More Info
									    International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November – December 2012                                    ISSN 2278-6856

         Enhanced Speech Recognition Using ADAG
                     SVM Approach
                 Mr. Rajkumar S. Bhosale1, Ms. Archana R. Panhalkar2, Mr. V. S. Phad3, Prof. N. S.

                                       Parikrama GOI, Department of Computer Engineering
                                  Parikrama College of Engineering, Kashti-41470, M.S. India
                                                    Indian Institute of Technology
                                                       Indore (M.P.) 452 017

Abstract: Speech is a unique human characteristic used as a
tool to communicate and express ideas. Automatic speech
recognition (ASR) finds application in electronic devices that
are too small to allow data entry via the commonly used input
devices such as keyboards. Personal Digital Assistants (PDA)
and cellular phones are such examples in which ASR plays
an important role. The main objective is to recognize the
spoken word by same speaker using multi-class support
vector machine. The multi-class classifier is used for number
of classes. The input data with linear predictive coefficient
(LPC) feature extracted are input to training and it takes less
time by Support vector Machine (SVM) to find support
vectors. For test on similar input data the Adaptive directed
Acyclic Graph SVM (ADAGSVM) classifies the input. The
result is obtained in online by interfacing with machine
command of any operating system with better results. For
training and testing we have constructed sample datasets of
Marathi digits zero to nine (Shunya to Nau) and English
letter data set A to Z and machines commands such as login,
Keyword: Dynamic time alignment SVM, Linear
Predictive Coefficient (LPC), Support vectors (SVs) and            Figure 1 Structure of speech recognition system.
Adaptive Directed acyclic graph SVM.                              Marathi language and 10 machine commands for offline
                                                                  training and testing. In online, few machine commands
1. INTRODUCTION                                                   for training as well as for testing are used.
Speech recognition is a conversion from an acoustic               We tried to solve the problem of time alignment of speech
waveform to a written equivalent of the message                   data when using SVM as a classifier and try to reduce the
information. The nature of the speech recognition                 training time of SVM with good recognition performance.
problem is heavily depending upon the constraints placed          The different time alignment algorithm is used for
on speaker, speaking situation and message context. The           equating different time spoken utterances of the same
most of the application of speech recognition systems are         word in online [10-14]. It is shown in Fig. 1.This paper
many and varied; e.g. a voice operated typewriter and             is into three parts. (i) Preprocessing (iii) Training (iii)
voice communication with computers and command line               Testing.
interface with machine. The listening tests are conducted         i)Preprocessing: The preprocessing part consists of two
on a large vocabulary task, recognition accuracy by               operations namely End point detection and linear
human was found to be an order of magnitude higher                predictive coding. End point detection is used to separate
than machines. Though these tests are included data with          the speech signal from non-speech signal. Then linear
varied signal qualities, human recognition performance            predictive coding (LPC) is applied for feature extraction.
was found to be consistent over a diverse set of                  ii) Training: The training part of SVM uses number of
conditions.                                                       classes and due to this the multiclass problem is solved
The task of this is to create learning machines that can          by one against one approach. The Radial basis kernel is
classify the given spoken digit in to 10 classes                  used to pass the data from input space to higher
(‘zero’//Shunya// to ‘nine’//nau//). Use 10 digits in

Volume 1, Issue 4 November - December 2012                                                                        Page 106
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November – December 2012                                    ISSN 2278-6856

dimension feature space. Dynamic time alignment is              alignment tool is used into the kernel function so called
used for variable length vector.                                dynamic time alignment SVM (DTAKSVM).
iii) Testing: Adaptive directed acyclic graph SVM
                                                                i.    Given two sequence of vectors X and Y first
(ADAGSVM) algorithm is used for testing each word.
                                                                      choose the small length vector of the two. Without
The max win algorithm is used to know the test word.
                                                                      loss of generality denote this vector by X .
2.1 Speech Separation by End Point Detection
                                                                                           j                     Rang
The “sound forge” software is used for end point
detection. The speech samples are recorded with bit depth             J
of 16; channel mono and sample rate 44100 Hz. Sound
forge is used to remove the non-speech part from the
wave. The end point is also known by finding average
noise and average amplitude in the recorded wave. It is
shown in Fig. 2.
After pattern detection, it is applied to the LPC front-end
processor for analysis of speech signal.                                                                        Jk+ (I-J)

2.1 LPC Model for Feature Extraction.
                                                                                               I-                     I       i
The order of LPC is 14 and window sizes of 256 are
considered for finding LPC coefficient [5]. The basic idea
                                                                     Figure 3 Nonlinear time alignment algorithm.
behind the LPC model is that a given speech sample at
                                                                i.    Considering the first feature vector x1 of the
time n, s (n) can be approximated as linear combination
of the past p speech                                                  chosen sequence X , and compute the best local
                                                                     match of this vector in sequence Y over a window
                                                                     of predetermined size denoted by p units. Initially
                                                                     the window starts from y1. Let the best match be
                                                                     denoted by yk where 1  k  p .
                                                                ii.   Select the next feature vector x2 and repeat the
                                                                      above procedure except that the match window
                                                                      now starts from index k instead of 1 as previously.
                                                                Repeat the above procedure till we have obtained two
                                                                vectors X and Y           which are of the same length.

                                                                4. TRAINING USING USING SVM
                                                                4.1 Support Vector Classification
                                                                Here in detail explain classification mechanism of SVMs
                                                                [14] in three cases of linearly separable, linearly non-
  Figure 2 End point detection by sound forge samples.          separable and non-linear through a two-class pattern
Given as                                                        recognition problem. Support vector machines are
                                                                originally designed for binary classification problem.
s (n)  a1 s (n  1)  a 2 s (n  2)  ......  a p s (n  p)
                                                                4.2 Linearly Separable Case
The LPC contains six operations which are preemphasis,          The general two-class classification problem can be stated
frame blocking, windowing, autocorrelation, LPC                 as follows [2].It is shown in figure 4.
analysis conversion to cepstral, coefficients.                     i.    Given a data set D of N samples:
                                                                            x1 , y1 , x 2 , y 2 ,  x N , y N
                                                                                                                 . Each sample
Dynamic time alignment used in the SVM for this each                                                                 xi of length
                                                                           is composed of a training example
sample is a vector of fixed length, for variable length
vector time alignment tool used. The dynamic time

Volume 1, Issue 4 November - December 2012                                                                                Page 107
      International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November – December 2012                                    ISSN 2278-6856

                                    xi  x1 , x 2 ,  x M ,                max         max 1 l l
          M, with elements                                        and a       W          y y K x  x 
                          y   1,1                                                  2 i1 j 1 i j i j i j
          target value i          .
  ii.     The goal is to find a classifier with decision
                                                                             i
                       f x  ,                  such               that      i 1                                    (4.5)
              f  xi   y i ,  x , y i  D                               It is not all functions that can be used as kernels. Feasible
                                                                           kernel must satisfy the some conditions like as
Figure 4 shows the hyperplane, which separates the
                                                                           Exponential radial basis function implementation.
positive from the negative point. This can be formulated
as follows: suppose that all the training data satisfy the                                       xi  x j   2   2

following constraints:                                                     K ( xi , x j )  e
                                                                            Is used as c in implementation.
 xi  w  b  1                  y i  1
                          for                           (4.1)              The term  is needs to be defined by user. Its value
 xi  w  b  1                  y i  1                                 used by try and error method.
                          for                            (4.2)

                                                                           5. TESTING USING SVM
                                                                           The testing part consists of number of data sets i.e.
                                                                           multiple classes. But the SVMs are binary classifiers so a
                                                                           technique is used to extend the method to handle multiple
                                                                           classes. To handle the multi-class problem used technique
                                                                           is called one to one classifier [2] [11]. Methods for
                                                                           solving the multi-class problem of SVMs are typically to
                                                                           consider the problem as a combination of two class
                                                                           decision functions, e.g. one-against-one and one-against-
Figure 4 Linear separating hyperplanes for the separable

4.2 Linearly Non-Separable Case
The solution to this problem is identical to the separable
case except for modification of the bounds of the
Lagrange multipliers. The parameter C introduces
additional capacity control within the classifier. In some
circumstances C can be directly related to a regularization
4.3     Non-Linear Case
By using non-linear case to map the input data to some
higher dimensional feature space, where the data is
linearly separable. For mapping the data different kernel
functions are uses.
Thus, a mapping from input space to feature space can be
achieved via a substitution of the inner product with:

xi  x    j       x i    x            j                 (4.3)

calculating each  explicitly is not needed. Instead, we                                 Figure 5 The role of the kernel
can use a functional representation K(xi, xj) that                         5.1 Adaptive Directed Acyclic Graph S V M
computes the inner product in feature space.                               (ADAGSVM)
K xi  x j   xi   x j                                            ADAGSVM stands for Adaptive Directed Acyclic Graph
                                                (4.4)                      SVM. It is multi-class method used to alleviate the
The functional representation is called kernel.                            problem of the DDAG structure [2] [4] [18]. An Adaptive
The role of Kernel is shown in Figure 5.                                   DAG (ADAG) is a DAG with reversed triangular
The optimization problem of above equation becomes,                        structure. This approach provides accuracy comparable

Volume 1, Issue 4 November - December 2012                                                                                       Page 108
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November – December 2012                                    ISSN 2278-6856

to that of max Wins, which is probably the currently most      For training Marathi 10 digit “Shunya to Nau” and 10
accurate method for multi-class SVMs, and requires low         machine command (on, off… login) with 20 replication
computations. It employs exact same training phase as          of each are used. Table 1 is shows the percentage
                                                               efficiency for 100 dataset in offline testing of data and it
One vs. One classifier. I.e. it creates K ( K  1) 2 binary    is calculated as follow.
SVMs, one for each pair of classes. However, it
distinguishes itself in the testing phase; the nodes are       EfficiencyofEachWord (ei ) 
                                              K 2 nodes        Numberofsamples Re cognized
arranged in a reversed triangle with
                               2                                  TotalNumberOfSamples       100
(rounded up) at the top, K 2 nodes in the second layer
and so on until the lowest layer of a final node. It has K-1
internal nodes.                                                                             n

Given a test example x, starting at the top level, the                                    
                                                                                          i  1
                                                                                                    e   i
binary decision function is evaluated. The node is then
exited via the outgoing edge with a message of the             TotalEfficiency                 n
preferred class. In each round, half reduces the number of
candidate classes. Based on the preferred classes from its
parent nodes, the binary function of the next-level node is    Where n is total number of words. Speaker from Training
chosen.                                                        and Testing Data sets are same. The LPC Features, with
The reduction process continues until reaching the final       100 input words for training and same 100 words for
node at the lowest level. Consider for the testing of class    testing in offline. In online, table 2 uses few machine
char (4), the ADAG structure is shown in Figure 6.             commands for training and one machine command for
                                                               testing are used.
                                                                                       Table: 6.1
                                                                Performance of SVMs classifier using LPC features for
                                                                                          100 points
                                                                Input           C=0.5             C= 0.7           C=0.9
                                                                Word            (Overall          (Overall         (Overall
                                                                                efficiency in     efficiency in    efficiency in
                                                                                %)                %)               %)

                                                                                     94                 95                97
                                                                  Data set

                                                                                     93                 94                96.5
                                                                  Data Set

                                                               Results are shown for machine commands in percentage
                                                               when each command applied for 10 times in testing.

                                                                                    Table: 6.2
 Figure 6 Structure of an Adaptive DAG classifier for 10        Performance of SVMs classifier using LPC features for
                      class problem.                                          few machine commands.

6. RESULTS                                                                   Input Word                        C= 0.9
                                                                                                        (Over all efficiency in
The SVM classifier is used for 10 Marathi digits and 10                                                          %)
machine commands for recognition in training and
testing. There are 200 data point by 20 replications of            Command Data Set Online                        96.00
each digit and commands by same speaker. Results are
obtained for different values of c and constant value of d
equal to 10. The following results are obtained in             We have tried to reduce the training time of SVM with
percentage for 100 data points’ recognition. The same          good recognition performance. Proposed method
speaker is used for training and testing dataset.              (DTAKSVM) required less training time for recognition
Volume 1, Issue 4 November - December 2012                                                                           Page 109
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November – December 2012                                    ISSN 2278-6856

6. CONCLUSION                                                        kernel-based learning methods”, Cambridge
                                                                     University Press 2000.
Support Vector Machine classifier using Adaptive
                                                                [13] T. Joachims. “Text categorization with support
Directed Acyclic Graph SVM algorithm is used for 10
                                                                     vector machines: learning with many relevant
Marathi digits, 10 machine commands for offline speech
                                                                     features”. In Proc. 10th European Conference on
recognition. It is also uses few machine commands for
                                                                     Machine Learning ECML-98, pp. 137–142, 1998.
training and one command for testing online. It is found
                                                                [14] S. Gunn. “Support vector machines for
to be promising classifier for speech recognition. Due to
                                                                     classification and regression”. Technical Report,
the less training and testing time performance, it can be
                                                                     University of Southampton Image Speech and
used for real time application. LPC features are used for
                                                                     Intelligent Systsmes Group, 1997.
speech recognition task, DTAK algorithm with the kernel
function (RBF). The system is speaker dependent. The
testing part of speech recognition uses ADAG SVM. The           AUTHORS
ADAG gives better result in less iteration for the number
of classes. It may be use for authentication as it is speaker
dependent. The running time of this scheme is directly                       1
                                                                             Bhosale R.S. received the B.E. and M.E.
proportional to number of classes in input dataset. This                    degrees in Computer Science and
scheme can be extended for continuous speech                                Engineering from SRTMU, Nanded in
recognition.                                                                2001 and 2007, respectively. During 2001-
                                                                            2007, he stayed in Signal Processing and
REFERENCES                                                      Computer Networking Research Laboratory.
[1] Boonserm Kijsirikul and Nitiwut Ussivakul,                               2
                                                                              Panhalkar A. R. received the B.E. and
     “Multiclass Support Vector Machine Using                                M.E. degrees in Computer Science and
     Adaptive Directed Acyclic Graph”, IEEE 2002, pp                         Engineering from SRTMU, Nanded in
     980-985.                                                                2003 and 2008, respectively. During 2003-
[2] Xin Dong, Wu Zhaohui and Pan Yunhe, “A New                               2008, she stayed in Signal Processing and
     Multi-Class Support Vector Machines”, IEEE 2001,           Image Processing Research Laboratory.
     pp 1673-1676.
[3] Chen Junil, Jiao Licheng, “Classification                                3
                                                                              Phad V.S. received the B.E. and M.E.
     Mechanism of Support Vector Machines”, IEEE                            degrees in Computer Science and
     2000.                                                                  Engineering from SRTMU, Nanded in 2003
[4] K.P. Bennett, J.A. Blue. “A Support Vector Machine                      and 2008, respectively. During 2003-2008,
     Approach to Decision Trees”, IEEE 1998, pp 2396-                       she stayed in Signal Processing and Image
     2401.                                                      Processing Research Laboratory.
[5] L. Rabiner and B. Juang, Fundamental of Speech
     Recognition. Prentic Hall, 1993.
[6] C. J. C. Burges. “A Tutorial on Support Vector
     Machines for Pattern Recognition”. Knowledge
     Discovery and Data Mining, 2(2), 1998.
[7] V.N.Vapnik, “The nature Of Statistical Learning
     Theory”, Springer Verlag, New York 1995.
[8] V.N.Vapnik,      “Statistical  Learning Theory”,
     Springer Verlag, New York 1998.
[9] Hiroshi Shimodaira, et al, “Support Vector Machine
     with Dynamic Time-Alignment Kernel for Speech
     Recognition”, Eurospeech 2001, Scandinavia.
[10] Shantanu Chakrabartty and Yunbin Deng,
     “Dynamic Time alignment in Support Vector
     Machines for Recognition Systems”.
[11] Kijsirikul and N. Ussivakul. “Multiclass support
     vector machines using adaptive directed acyclic
     graph”. In Proceedings of International Joint
     Conference on Neural Networks (IJCNN 2002), pp
     980–985, 2002.
[12] Nello Cristinini and Jhon Shawe-Taylor. “An
     Introduction to Support Vector Machine and other

Volume 1, Issue 4 November - December 2012                                                                  Page 110

To top