VIEWS: 10 PAGES: 5 CATEGORY: College POSTED ON: 5/7/2012
ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 Self Organizing Markov Map for Speech and Gesture Recognition Ms. Nutan D Sonwane, Prof. S. A. Chhabria, Dr.R.V.Dharaskar environment. The approach involves the combination of self Abstract— Gesture and Speech based human Computer Organizing Markov Map (SOM) and Markov Model. Its interaction is attractive attention across various areas such as most effective application is the development of strong and pattern recognition, computer vision. Thus kind of research friendly interfaces for human-machine interaction, since areas find many kind of application in Multimodal HCI, gesture and speech are a natural and powerful way of Robotics control, Sign language recognition. This paper presents communication. The Principle component Analysis approach head and hand Gesture as well as Speech recognition system for human computer interaction (HCI).This kind of vision based describes a method for gesture recognition It is a classical system can show the capability of computer, which understand feature extraction technique widely used in the field of and responding to the hand and head gesture also for Speech in pattern recognition and computer vision [1]. The gesture form of sentence. This recognition system consists of two main recognition using PCA algorithm that involves two phases: • modules namely 1.Gesture recognition 2.Speech recognition, Training Phase • Recognition Phase. Support Vector Gesture recognition consists of various phases.i. image Machines it is a classical statistical technique for analyzing capturing, ii. Feature extraction of gesture iii.Gesture modeling the covariance structure of multivariate data. Self-Growing (Direction, Position, generalized), 2.Speech recognition consists and Self-Organized Neural Gas (SGONG) network [2] of various phases i. taking voice signals ii. Spectral coding iii. describe a method which is an unsupervised neural classifier. Unit matching (BMU) iv. Lexical decoding v.syntactic, semantic analysis. Compared with many existing It achieves clustering of the input data, so as the distance of algorithms for gesture and speech recognition, SOM provides the data items within the same class (intra-cluster variance) is flexibility, robustness against noisy environment. The detection small and the distance of the data items stemming from of gestures is based on discrete predestinated symbol sets, which different classes (inter-cluster variance) is large. The final are manually labeled during the training phase. The number of classes is determined by the SGONG during the gesture-speech correlation is modelled by examining the learning process. (SOM) [3] Describes a method of self co-occurring speech and gesture patterns. This correlation can organizing map for Speech recognition. Modular system be used to fuse gesture and speech modalities for edutainment based on hidden Markov model [4] describes a layered applications (i.e. video games, 3-D animations) where natural method based on (HMM) Hidden Markov model.SOMM gestures of talking avatars are animated from speech. A speech architecture for gesture recognition, fusing separate driven gesture animation example has been implemented for demonstration. component model all of which are based on hand trajectory. The approach involves a combination of Self Organizing Keywords—Gesture recognition, Human computer Maps and Markov Models [5] for gesture trajectory interaction, speech recognition, self organizing map and Markov classification, using the trajectory of the hand segment and model direction of motion during a gesture. This classification scheme is based on the transformation of a gesture I INTRODUCTION representation from series of coordinates and movements to a This paper presents head and hand Gesture as well as Speech symbolic form and building probabilistic models based on recognition system for human computer interaction these transformed representations. Automatic speech [6] (HCI).This kind of vision based system can show the recognition is a process by which a machine identifies capability of computer. Which understand and responding to speech. The machine takes a human utterance as an input and the hand and head gesture, Speech in form of sentence. This returns a string of words phrases or continuous speech in the recognition system consists of four modules namely 1. form of text as output. since gesture and speech are a natural Manual Module 2.Head Tracker 3.Hand Recognition 4.Voice and powerful way of communication [2][3][4][6]. Recognition which consists various Symbolic gesture command and voice command. i. Image capturing, ii. Feature extraction of gesture iii. Gesture modeling (Direction, Position, generalized), 2.Speech recognition consists of various phases i. taking voice signals ii. Spectral coding iii. (BMU)Best Unit matching iv. Lexical decoding v. syntactic, semantic analysis. Compared with many existing algorithms for gesture and speech recognition, SOMM (Self Organizing Markov map) provides flexibility, robustness against noisy Figure: 1 Symbolic Hand Gesture 119 All Rights Reserved © 2012 IJARCSEE ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 II SELF ORGANIZING MAP Step III: Scale neighbors A self-organizing map or self-organizing feature [3] map is a 1) Determining Neighbors type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional There are actually two parts to scaling the neighboring (typically two-dimensional), discredited representation of the weights: determining which weights are considered as input space of the training samples and called a map. neighbors and how much each weight can become more Self-organizing maps are different from other artificial neural like the sample vector. The neighbors of a winning weight networks. They use a neighborhood function to preserve the can be determined using a number of different methods. topological properties of the input space. Training builds the Some use concentric squares, others hexagons. map using input examples. It is a competitive process, also called vector quantization. Mapping automatically classifies 2) Learning a new input vector. A self-organizing map consists of Learning in the self-organizing map is to cause different components called nodes or neurons. Three stage of SOM, parts of the network to respond similarly to certain input patterns.second part to scaling the neighbors is the learning 1) Initialization 2) gets best matching unit 3) scale function. The winning weight is rewarded with becoming nneighbors. more like the sample vector. The neighbors also become Step I: Initialization more like the sample vector. An attribute of this learning process is that the farther away the neighbor is from the Initialize the weight vector map. Each weight vector random winning vector, the less it learns. The rate at which the values for its data. Before the training, initial values are given amount a weight can learn decreases and can also be set . to the prototype vectors. The SOM is very robust with respect Here use a Gaussian function. This function will return a to the initialization, but properly accomplished it allows the value ranging between 0 and 1, where each neighbor is algorithm to converge faster to a good solution. Typically one then changed using the parametric equation. So in the first of the three following initialization procedures is used: iteration, the best matching unit will get a t of 1 for its learning function, so the weight will then come out of this 1. Random initialization, where the weight vectors are process with the same exact values as the randomly initialized with small random values. selected sample. 2. Sample initialization, where the weight vectors are initialized with random samples drawn from the input data III HIDDEN MARKOV MODEL set. Hidden Markov model (HMM) is a statistical Markov mode 3. Linear initialization, where the weight vectors are [4] in which the system being modeled is assumed to be a initialized in an orderly fashion along the linear subspace Markov process with unobserved (hidden) states. An HMM spanned by the two principal eigen vectors of the input data can be considered as the simplest dynamic Bayesian network. set. The eigenvectors can be calculated using Gram-Schmidt In a regular Markov model, the state is directly visible to the procedure. In SOM Toolbox, random and linear observer, and therefore the state transition probabilities are initializations have been implemented. Random initialization the only parameters. In a hidden Markov model, the state is is done by taking randomly values from the d-dimensional not directly visible, but output, dependent on the state, is cube defined by the minimum and maximum values of the visible. Each state has a probability distribution over the variables. Linear initialization is done by selecting a mesh of possible output tokens. Therefore the sequence of tokens points from the d-dimensional min-max cube of the training generated by an HMM gives some information about the data. The axis of the mesh is the eigenvectors corresponding sequence of states. The parameters of a hidden Markov to the m greatest values of the training data. model are of two types 1. Transition probabilities 2. Emission probabilities (also known as output probabilities).The Step II: Get best matching unit transition probabilities control the way the hidden state at time t is chosen given the hidden state at time t − 1. The Go through all the weight vectors and calculate the distance hidden state space is assumed to consist of one of N possible from each weight to the chosen sample vector. The weight values, modeled as a categorical distribution. This means that with the shortest distance is the winner. If there is more than for each of the N possible states that a hidden variable at time one with the same distance, then the winning weight is t can be in, there is a transition probability from this state to chosen randomly among the weights with the shortest each of the N possible states of the hidden variable at time t + distance. The most common method is to use the Euclidean 1, for a total of N2 transition probabilities. (Note, however, distance. Operation of calculating distances and comparing that the set of transition probabilities for transitions from any them is done over the entire map and the weight with the given state must sum to 1, meaning that any one transition shortest distance to the sample vector is the winner and the probability can be determined once the others are known, BMU. The square root is not computed in the program for leaving a total of N(N − 1) transition parameters.) speed optimization. 120 All Rights Reserved © 2012 IJARCSEE ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 speech utterances along with their transcriptions into phonemes and outputs the speech models for the phonemes. Hidden Markov models can model complex Markov IV HARDWARE COMPONENT AS WHEELCHAIR processes where the states emit the observations according to ROBOT some probability distribution.One such example of distribution is Gaussian distribution, in such a Hidden A wheelchair robot move according to the command given Markov Model the states output is represented by a Gaussian to it from various kinds of Symbolic gesture and voice distribution.HMM uses various technique to solve problem commands. The system takes symbolic gesture commands as such as 1) Forward and backward 2) viterbi algorithm and input to hardware and it will move accordingly. Wheelchair posterior algorithm 3) Baum Welch algorithm. robot made up of various hardware component: III. ALGORITHM a) Microcontroller i. 2K bytes of Flash ii. 128 bytes of RAM iii 15 I/O lines iv Two 16-bit timer/counters v A five vector two-level interrupt architecture vi A full duplex serial port vii A precision analog comparator viii on-chip oscillator and clock circuitry b) Other devices: DC Motor, TX-RX Antenna, USB to serial connector, Battery Figure:2 Self Organizing Map Kohonen Algorithm: Step1.Randomize the map's nodes' weight vectors Step 2.Grab an input vector Step 3.Traverse each node in the map i) Use Euclidean distance formula to find similarity between the input vector and the map's node's weight vector ii) Track the node that produces the smallest distance (this node is the best matching unit, BMU) Step 4.Update the nodes in the neighborhood of BMU by pulling them closer to the input vector Figure: 3 Wheelchair Robot Step 5.Increase t and repeat from step 2 Markov Model include various algorithm:Use Viterbi algorithm for finding sequence of hidden states called the Viterbi path. Baum-Welch algorithm is use for finding set of state transition and output probabilities of sequence. Step1.The (potentially) occupied state at time t is called qt Step2. A state can referred to by its index, e.g. qt = j Step3.1event equal to1 state At each time t, the occupied state outputs (“emits”) its corresponding.Markov model is generator of events. Each Figure: 4 Internal circuit of Robot event is discrete, has single output. In typical finite-state machine, actions occur at transitions, but in most Markov Models, actions occur at each state. The data in a speech recognition system. Training takes as input a large number of 121 All Rights Reserved © 2012 IJARCSEE ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 V MODULES CONCLUSION i. Manual mode ii Head Gesture iii Hand gesture iv Voice Proposed system includes both the approaches speech as well recognition .This all modules include in one system and as gesture recognition. System will take input in form of speech signal and gesture as hand & head coordinates. according to order of command in form of gesture and System will also use one wheelchair as hardware device for speech, accordingly it will take movements. interaction with system. REFERENCES [1] Soloman Raju Kota,J.L Reheja,Ashutosh Gupta,Archna rathi , Shashikant Sharma”Principal component analysis for Gesture recognition Speech and Gesture Recognition: using systemC”2009 international conferences in advance technology in communication and computing 2009IEEE [2] Yean Choon Ham, Yu Shi “Developing a Smart Camera for Gesture Recognition in HCI Applications” The 13th IEEE International Symposium on Consumer Electronics (ISCE2009) 978-1-4244-2976-9/09/$25.00 ©2009 IEEE [3] E. Stergiopoulou and N. Papamarkos “A New Technique For Hand Gesture Recognition” 1-4244-0481-9/06/ © 2006 IEEE [4] Anjali Kalra, Sarbjeet Singh, Sukhvinder Singh”SpeechRecognition” Figure:5 Speech and Gesture Recognition International Journal of Computer Science and Network Security, VOL.10,2010. I. Manual mode [5] George Caridakis , Kostas Karpouzis, Athanasios Drosopoulos, Stefanos Kollias” SOMM: Self organizing Markov map for gesture recognition” Pattern Recognition Letters 31, 2010 [6] WU Song-Lin, CUI Rong-Yi “Human Behavior Recognition Based on Sitting Postures” 2010 International Symposium on Computer, Communication, Control and Automation. 978-1-4244-5567-6/10/ © 2010 IEEE [7] Jagdish Lal Raheja, Radhey shyam “Real Time Robotic Hand Control Using Hand Gesture” 978-0-7695-3977-5/10 © 2010 IEEE. Figure:6 Mannual mode [8] Mr. Chetan A. Burande, Prof. Raju M. Tugnayat, Prof.Dr. Nitin K. Choudhary “Advanced Recognition Techniques for Human Computer II.Head Gesture III.Hand Gesture Interaction.” 978-1-4244-5586-7/10. 2010 IEEE [9] Shuai Jin, Guang-ming Lu, Jian-xun Luo, Wei-dong Chen Xiao-xiang Zheng ”SOM-based Hand Gesture Recognition for Virtual Interactions” in IEEE International Symposium on Virtual Reality Innovation 2011. [10] G.R.S Murthy, R.S Jadon “Hand gesture recognition using neural network” in 2nd International Advance Computing Conference 2010 Mr. Chetan A. Burande, Prof. Raju M. Tugnayat, Prof.Dr. Nitin K. Choudhary “Advanced Recognition Techniques for Human Computer Interaction.” 978-1-4244-5586-7/10. 2010 IEEE [11] M. Ajallooeian, A. Borji, B. N. Araabi , M. Nili Ahmadabadi, H. Moradi “Fast Hand Gesture Recognition based on Saliency Maps: An Application to Interactive Robotic Marionette Playing” The 18th IEEE International Symposium on Robot and Human Interactive Communication Toyama, Japan, Sept. 27-Oct. 2, 2009. 978-1-4244-5081-7 /09/ ©2009 IEEE Figure: 7 Head Gesture / Hand Gesture [12] wei-hua andrew wang, chun-liang tung Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July 2008 “Dynamic Hand Gesture Recognition Using Hierarchical Dynamic Bayesian Networks Through Low-Level Image Processing.” 978-1-4244-2096-4/08 ©2008 IEEE [13] Sridhar P. Arjunan, Dinesh K. Kumar School of Electrical and Computer Engineering “Recognition of facial movements and hand gestures using surface Electromyogram (sEMG) for HCI based applications”. 0-7695-3067-2/07 © 2007 IEEE [14] T Nakanot , T Mori&, M. Nagata , and A. Iwatat “A Cellular-Automaton-Type Image Extraction Algorithm and Its Implementation Using An Fpga” 0-7803-7690-0/02/$17.00 @2002 IEEE Figure:8 Speech Recognition This all are the output of particular module. Which perform work according to command. 122 All Rights Reserved © 2012 IJARCSEE ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 First Author: Ms. Nutan D. Sonwane IV sem MTech[CSE], G.H.Raisoni College of Engineering,Nagpur, R.T.M.N.U, Nagpur Second Author : Prof. S.A. Chhabria HOD[IT] Department, G.H.Raisoni College of engineering,Nagpur R.T.M.N.U, Nagpur Third Author : Dr. R.V.Dharaskar Director of Matoshri Pratishthan's Group of Institutions MPGI Integrated campus, Nanded India S.R.T.M Nanded University 123 All Rights Reserved © 2012 IJARCSEE