Learning Center
Plans & pricing Sign in
Sign Out



									                                                                                                                              ISSN: 2277 – 9043
                                                     International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                   Volume 1, Issue 2, April 2012

         Self Organizing Markov Map for Speech and
                     Gesture Recognition
                           Ms. Nutan D Sonwane, Prof. S. A. Chhabria, Dr.R.V.Dharaskar

                                                                      environment. The approach involves the combination of self
Abstract— Gesture and Speech based human Computer                      Organizing Markov Map (SOM) and Markov Model. Its
interaction is attractive attention across various areas such as       most effective application is the development of strong and
pattern recognition, computer vision. Thus kind of research            friendly interfaces for human-machine interaction, since
areas find many kind of application in Multimodal HCI,                 gesture and speech are a natural and powerful way of
Robotics control, Sign language recognition. This paper presents
                                                                       communication. The Principle component Analysis approach
head and hand Gesture as well as Speech recognition system for
human computer interaction (HCI).This kind of vision based             describes a method for gesture recognition It is a classical
system can show the capability of computer, which understand           feature extraction technique widely used in the field of
and responding to the hand and head gesture also for Speech in         pattern recognition and computer vision [1]. The gesture
form of sentence. This recognition system consists of two main         recognition using PCA algorithm that involves two phases: •
modules namely 1.Gesture recognition 2.Speech recognition,             Training Phase • Recognition Phase. Support Vector
Gesture recognition consists of various phases.i. image                Machines it is a classical statistical technique for analyzing
capturing, ii. Feature extraction of gesture iii.Gesture modeling      the covariance structure of multivariate data. Self-Growing
(Direction, Position, generalized), 2.Speech recognition consists      and Self-Organized Neural Gas (SGONG) network [2]
of various phases i. taking voice signals ii. Spectral coding iii.
                                                                       describe a method which is an unsupervised neural classifier.
Unit matching (BMU) iv. Lexical decoding v.syntactic,
semantic analysis.          Compared with many existing
                                                                       It achieves clustering of the input data, so as the distance of
algorithms for gesture and speech recognition, SOM provides            the data items within the same class (intra-cluster variance) is
flexibility, robustness against noisy environment. The detection       small and the distance of the data items stemming from
of gestures is based on discrete predestinated symbol sets, which      different classes (inter-cluster variance) is large. The final
are manually labeled during the training phase. The                    number of classes is determined by the SGONG during the
gesture-speech correlation is modelled by examining the                learning process. (SOM) [3] Describes a method of self
co-occurring speech and gesture patterns. This correlation can         organizing map for Speech recognition. Modular system
be used to fuse gesture and speech modalities for edutainment          based on hidden Markov model [4] describes a layered
applications (i.e. video games, 3-D animations) where natural          method based on (HMM) Hidden Markov model.SOMM
gestures of talking avatars are animated from speech. A speech
                                                                       architecture for gesture recognition, fusing separate
driven gesture animation example has been implemented for
demonstration.                                                         component model all of which are based on hand trajectory.
                                                                       The approach involves a combination of Self Organizing
   Keywords—Gesture        recognition,      Human     computer        Maps and Markov Models [5] for gesture trajectory
interaction, speech recognition, self organizing map and Markov        classification, using the trajectory of the hand segment and
model                                                                  direction of motion during a gesture. This classification
                                                                       scheme is based on the transformation of a gesture
                     I INTRODUCTION                                    representation from series of coordinates and movements to a
 This paper presents head and hand Gesture as well as Speech           symbolic form and building probabilistic models based on
recognition system for human computer interaction                      these transformed representations. Automatic speech [6]
(HCI).This kind of vision based system can show the                    recognition is a process by which a machine identifies
capability of computer. Which understand and responding to             speech. The machine takes a human utterance as an input and
the hand and head gesture, Speech in form of sentence. This            returns a string of words phrases or continuous speech in the
recognition system consists of four modules namely 1.                  form of text as output. since gesture and speech are a natural
Manual Module 2.Head Tracker 3.Hand Recognition 4.Voice                and powerful way of communication [2][3][4][6].
Recognition which consists various Symbolic gesture
command and voice command. i. Image capturing, ii. Feature
extraction of gesture iii. Gesture modeling (Direction,
Position, generalized), 2.Speech recognition consists of
various phases i. taking voice signals ii. Spectral coding iii.
(BMU)Best Unit matching iv. Lexical decoding v. syntactic,
semantic analysis. Compared with many existing algorithms
for gesture and speech recognition, SOMM (Self Organizing
Markov map) provides flexibility, robustness against noisy
                                                                                           Figure: 1 Symbolic Hand Gesture

                                             All Rights Reserved © 2012 IJARCSEE
                                                                                                                              ISSN: 2277 – 9043
                                                     International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                   Volume 1, Issue 2, April 2012

               II   SELF ORGANIZING MAP
                                                                           Step III: Scale neighbors
A self-organizing map or self-organizing feature [3] map is a
                                                                              1) Determining Neighbors
type of artificial neural network that is trained using
unsupervised learning to produce a low-dimensional                        There are actually two parts to scaling the neighboring
(typically two-dimensional), discredited representation of the            weights: determining which weights are considered as
input space of the training samples and called a map.                     neighbors and how much each weight can become more
Self-organizing maps are different from other artificial neural           like the sample vector. The neighbors of a winning weight
networks. They use a neighborhood function to preserve the                can be determined using a number of different methods.
topological properties of the input space. Training builds the            Some use concentric squares, others hexagons.
map using input examples. It is a competitive process, also
called vector quantization. Mapping automatically classifies                2) Learning
a new input vector. A self-organizing map consists of                      Learning in the self-organizing map is to cause different
components called nodes or neurons. Three stage of SOM,                   parts of the network to respond similarly to certain input
                                                                          patterns.second part to scaling the neighbors is the learning
   1)    Initialization 2) gets best matching unit 3) scale               function. The winning weight is rewarded with becoming
         nneighbors.                                                      more like the sample vector. The neighbors also become
Step I: Initialization                                                    more like the sample vector. An attribute of this learning
                                                                          process is that the farther away the neighbor is from the
Initialize the weight vector map. Each weight vector random               winning vector, the less it learns. The rate at which the
values for its data. Before the training, initial values are given        amount a weight can learn decreases and can also be set .
to the prototype vectors. The SOM is very robust with respect             Here use a Gaussian function. This function will return a
to the initialization, but properly accomplished it allows the            value ranging between 0 and 1, where each neighbor is
algorithm to converge faster to a good solution. Typically one            then changed using the parametric equation. So in the first
of the three following initialization procedures is used:                 iteration, the best matching unit will get a t of 1 for its
                                                                          learning function, so the weight will then come out of this
1. Random initialization, where the weight vectors are
                                                                          process with the same exact values as the randomly
initialized with small random values.
                                                                          selected sample.
2. Sample initialization, where the weight vectors are
initialized with random samples drawn from the input data                                III   HIDDEN MARKOV MODEL
                                                                         Hidden Markov model (HMM) is a statistical Markov mode
3. Linear initialization, where the weight vectors are                 [4] in which the system being modeled is assumed to be a
initialized in an orderly fashion along the linear subspace            Markov process with unobserved (hidden) states. An HMM
spanned by the two principal eigen vectors of the input data           can be considered as the simplest dynamic Bayesian network.
set. The eigenvectors can be calculated using Gram-Schmidt             In a regular Markov model, the state is directly visible to the
procedure. In SOM Toolbox, random and linear                           observer, and therefore the state transition probabilities are
initializations have been implemented. Random initialization           the only parameters. In a hidden Markov model, the state is
is done by taking randomly values from the d-dimensional               not directly visible, but output, dependent on the state, is
cube defined by the minimum and maximum values of the                  visible. Each state has a probability distribution over the
variables. Linear initialization is done by selecting a mesh of        possible output tokens. Therefore the sequence of tokens
points from the d-dimensional min-max cube of the training             generated by an HMM gives some information about the
data. The axis of the mesh is the eigenvectors corresponding           sequence of states. The parameters of a hidden Markov
to the m greatest values of the training data.                         model are of two types 1. Transition probabilities 2. Emission
                                                                       probabilities (also known as output probabilities).The
Step II: Get best matching unit                                        transition probabilities control the way the hidden state at
                                                                       time t is chosen given the hidden state at time t − 1. The
Go through all the weight vectors and calculate the distance           hidden state space is assumed to consist of one of N possible
from each weight to the chosen sample vector. The weight               values, modeled as a categorical distribution. This means that
with the shortest distance is the winner. If there is more than        for each of the N possible states that a hidden variable at time
one with the same distance, then the winning weight is                 t can be in, there is a transition probability from this state to
chosen randomly among the weights with the shortest                    each of the N possible states of the hidden variable at time t +
distance. The most common method is to use the Euclidean               1, for a total of N2 transition probabilities. (Note, however,
distance. Operation of calculating distances and comparing             that the set of transition probabilities for transitions from any
them is done over the entire map and the weight with the               given state must sum to 1, meaning that any one transition
shortest distance to the sample vector is the winner and the           probability can be determined once the others are known,
BMU. The square root is not computed in the program for                leaving a total of N(N − 1) transition parameters.)
speed optimization.

                                             All Rights Reserved © 2012 IJARCSEE
                                                                                                                             ISSN: 2277 – 9043
                                                    International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                  Volume 1, Issue 2, April 2012

                                                                      speech utterances along with their transcriptions into
                                                                      phonemes and outputs the speech models for the phonemes.

Hidden Markov models can model complex Markov                            IV HARDWARE COMPONENT AS WHEELCHAIR
processes where the states emit the observations according to                          ROBOT
some probability distribution.One such example of
distribution is Gaussian distribution, in such a Hidden                 A wheelchair robot move according to the command given
Markov Model the states output is represented by a Gaussian           to it from various kinds of Symbolic gesture and voice
distribution.HMM uses various technique to solve problem              commands. The system takes symbolic gesture commands as
such as 1) Forward and backward 2) viterbi algorithm and              input to hardware and it will move accordingly. Wheelchair
posterior algorithm 3) Baum Welch algorithm.                          robot made up of various hardware component:
                        III. ALGORITHM                                    a) Microcontroller

                                                                      i. 2K bytes of Flash
                                                                      ii. 128 bytes of RAM
                                                                      iii 15 I/O lines
                                                                      iv Two 16-bit timer/counters
                                                                      v A five vector two-level interrupt architecture
                                                                      vi A full duplex serial port
                                                                      vii A precision analog comparator
                                                                      viii on-chip oscillator and clock circuitry

                                                                         b) Other devices:
                                                                      DC Motor, TX-RX Antenna, USB to serial connector,

                    Figure:2 Self Organizing Map

Kohonen Algorithm:
Step1.Randomize the map's nodes' weight vectors

Step 2.Grab an input vector

Step 3.Traverse each node in the map
 i) Use Euclidean distance formula to find similarity between
  the input vector and the map's node's weight vector
 ii) Track the node that produces the smallest distance (this
node is the best matching unit, BMU)
Step 4.Update the nodes in the neighborhood of BMU by
pulling them closer to the input vector                                                      Figure: 3 Wheelchair Robot

Step 5.Increase t and repeat from step 2

Markov Model include various algorithm:Use Viterbi
algorithm for finding sequence of hidden states called the
Viterbi path. Baum-Welch algorithm is use for finding set of
state transition and output probabilities of sequence.

Step1.The (potentially) occupied state at time t is called qt

Step2. A state can referred to by its index, e.g. qt = j
Step3.1event equal to1 state

At each time t, the occupied state outputs (“emits”) its
corresponding.Markov model is generator of events. Each                                   Figure: 4 Internal circuit of Robot
event is discrete, has single output. In typical finite-state
machine, actions occur at transitions, but in most Markov
Models, actions occur at each state. The data in a speech
recognition system. Training takes as input a large number of

                                              All Rights Reserved © 2012 IJARCSEE
                                                                                                                                   ISSN: 2277 – 9043
                                                          International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                        Volume 1, Issue 2, April 2012

                          V MODULES                                                                    CONCLUSION

i. Manual mode ii Head Gesture iii Hand gesture iv Voice                    Proposed system includes both the approaches speech as well
recognition .This all modules include in one system and                     as gesture recognition. System will take input in form of
                                                                            speech signal and gesture as hand & head coordinates.
according to order of command in form of gesture and                        System will also use one wheelchair as hardware device for
speech, accordingly it will take movements.                                 interaction with system.
                                                                            [1] Soloman Raju Kota,J.L Reheja,Ashutosh Gupta,Archna rathi ,
                                                                            Shashikant Sharma”Principal component analysis for Gesture recognition
Speech and Gesture Recognition:                                             using systemC”2009 international conferences in advance technology in
                                                                            communication and computing 2009IEEE

                                                                            [2] Yean Choon Ham, Yu Shi “Developing a Smart Camera for Gesture
                                                                            Recognition in HCI Applications” The 13th IEEE International Symposium
                                                                            on Consumer Electronics (ISCE2009) 978-1-4244-2976-9/09/$25.00 ©2009

                                                                            [3] E. Stergiopoulou and N. Papamarkos “A New Technique For Hand
                                                                            Gesture Recognition” 1-4244-0481-9/06/ © 2006 IEEE

                                                                            [4] Anjali Kalra, Sarbjeet Singh, Sukhvinder Singh”SpeechRecognition”
        Figure:5 Speech and Gesture Recognition                             International Journal of Computer Science and Network Security,
I. Manual mode                                                              [5] George Caridakis , Kostas Karpouzis, Athanasios Drosopoulos, Stefanos
                                                                            Kollias” SOMM: Self organizing Markov map for gesture recognition”
                                                                            Pattern Recognition Letters 31, 2010

                                                                            [6] WU Song-Lin, CUI Rong-Yi “Human Behavior Recognition Based on
                                                                            Sitting Postures” 2010 International Symposium on Computer,
                                                                            Communication, Control and Automation. 978-1-4244-5567-6/10/ © 2010

                                                                            [7] Jagdish Lal Raheja, Radhey shyam “Real Time Robotic Hand Control
                                                                            Using Hand Gesture” 978-0-7695-3977-5/10 © 2010 IEEE.
              Figure:6 Mannual mode
                                                                            [8] Mr. Chetan A. Burande, Prof. Raju M. Tugnayat, Prof.Dr. Nitin K.
                                                                            Choudhary “Advanced Recognition Techniques for Human Computer
II.Head Gesture III.Hand Gesture                                            Interaction.” 978-1-4244-5586-7/10. 2010 IEEE
                                                                            [9] Shuai Jin, Guang-ming Lu, Jian-xun Luo, Wei-dong Chen Xiao-xiang
                                                                            Zheng ”SOM-based Hand Gesture Recognition for Virtual Interactions” in
                                                                            IEEE International Symposium on Virtual Reality Innovation 2011.
                                                                            [10] G.R.S Murthy, R.S Jadon “Hand gesture recognition using neural
                                                                            network” in 2nd International Advance Computing Conference 2010 Mr.
                                                                            Chetan A. Burande, Prof. Raju M. Tugnayat, Prof.Dr. Nitin K. Choudhary
                                                                            “Advanced Recognition Techniques for Human Computer Interaction.”
                                                                            978-1-4244-5586-7/10. 2010 IEEE
                                                                            [11] M. Ajallooeian, A. Borji, B. N. Araabi , M. Nili Ahmadabadi, H. Moradi
                                                                            “Fast Hand Gesture Recognition based on Saliency Maps: An Application to
                                                                            Interactive Robotic Marionette Playing” The 18th IEEE International
                                                                            Symposium on Robot and Human Interactive Communication Toyama,
                                                                            Japan, Sept. 27-Oct. 2, 2009. 978-1-4244-5081-7 /09/ ©2009 IEEE
              Figure:   7 Head Gesture   / Hand Gesture                     [12] wei-hua andrew wang, chun-liang tung Proceedings of the Seventh
                                                                            International Conference on Machine Learning and Cybernetics, Kunming,
                                                                            12-15 July 2008 “Dynamic Hand Gesture Recognition Using Hierarchical
                                                                            Dynamic Bayesian Networks Through Low-Level Image Processing.”
                                                                            978-1-4244-2096-4/08 ©2008 IEEE
                                                                            [13] Sridhar P. Arjunan, Dinesh K. Kumar School of Electrical and
                                                                            Computer Engineering “Recognition of facial movements and hand gestures
                                                                            using surface Electromyogram (sEMG) for HCI based applications”.
                                                                            0-7695-3067-2/07 © 2007 IEEE
                                                                            [14] T Nakanot , T Mori&, M. Nagata , and A. Iwatat “A
                                                                            Cellular-Automaton-Type Image Extraction Algorithm and Its
                                                                            Implementation Using An Fpga” 0-7803-7690-0/02/$17.00 @2002 IEEE

                   Figure:8 Speech Recognition

This all are the output of particular module. Which perform
work according to command.

                                                  All Rights Reserved © 2012 IJARCSEE
                                                                                                                                    ISSN: 2277 – 9043
                                                           International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                         Volume 1, Issue 2, April 2012

First Author:
Ms. Nutan D. Sonwane
IV sem MTech[CSE],
G.H.Raisoni College of Engineering,Nagpur,
R.T.M.N.U, Nagpur

Second Author :
Prof. S.A. Chhabria
HOD[IT] Department,
G.H.Raisoni College of engineering,Nagpur
R.T.M.N.U, Nagpur

Third Author :
Dr. R.V.Dharaskar
Director of Matoshri Pratishthan's Group of Institutions
MPGI Integrated campus, Nanded India
S.R.T.M Nanded University

                                                  All Rights Reserved © 2012 IJARCSEE

To top