Document Sample

SOM Winner-Searching Algorithm Based on Subset of Most Discriminate Codebooks Tarek El.Tobely*, Yuichiro Yoshiki**, Ryuichi Tsuda**, Naoyuki Tsuruta**, and Makoto Amamiya* *Department of Intelligent Systems, Graduate School of Information Science and Electrical Engineering Kyushu University 6-1, Kasuga-Koen, Kasuga, Fukuoka 816-8580 JAPAN **Department of Electronics Engineering, Graduate School of Electronics Engineering Fukuoka University 8-19-1, Nanakuma, Jonan, Fukuoka, 814-0180 JAPAN Abstract: - In the winner-searching algorithm of the Self-Organized Map networks (SOM), the distance between the input and the codebook of all feature map neurons is measured, then the neuron with minimum distance is considered as the winner. However, in case of large applications, this algorithm spends very long time to find the winner, sometimes it also fails to answer the winner category correctly. In this paper, a new winner-searching algorithm for SOM networks is proposed for gesture recognition applications. In such applications, it is required to reduce the recognition time of one image into the normal rates of video camera (30 frames/second) and to increase the recognition accuracy especially for complex image scene with different lighting conditions. The competition in the new algorithm depends on a subset of most discriminate weights of the network codebooks. In addition, to allow recognizing complex images with different lighting conditions, the competition is applied for those pixels, which corresponds to the skin gray levels only. The experimental results showed that the proposed algorithm could recognize the gestures of complex image scene while the normal SOM algorithm failed. Moreover, the recognition time of one image reduced 92 % less than the normal SOM winner-searching algorithm and allowed real-time recognition in the normal rates of video camera. Key-Words: - Neural Network, Self-Organized Map, competitive learning, Winner Searching, computer- Human Interaction, and Gesture Recognition. 1 Introduction The neighborhood decreases in size during the The self-organizing map (SOM) is a neural network training period. Learning done by adjusting the algorithm that has been used for a variety of weights of the units by a small amount to resemble applications, mostly for engineering problems but the input vectors more. also for data analysis [1-3]. SOM transform the The required computations to find the winner in input of arbitrary dimension into a one- or two- SOM depend mainly on the network size. As the dimensional discrete map subject to a topological size increases, the network spends longer time to (neighborhood preserving) constraint. The essential find the winner. Because of that, SOM cannot advantage of this network lies in the clustering effectively be used in dynamic recognition produced by its algorithm, which reduces the input applications, which require short recognition time. If space into representative features using a self- the number of feature map neurons is m and the size organizing process. Hence the underlying structure of input is n, then the number of the required steps of the input space is kept, while the dimensionality to find the winner is approximately m * n. of the space is reduced. SOM feature map is In this paper, it is proposed to use SOM in image- constructed using Kohonen algorithm [4] where a based gesture recognition application for the task of set of vectors is input repeatedly to a map consisting quantizing the continuous gesture scene into discrete of units. Associated with each unit is a codebook posture states. To accomplish this, some vector, initially consisting of random values. The modifications in SOM winner-searching algorithm unit with the highest response to the input is allowed are required to fulfill three requirements. First, to to learn, as well as some units in the neighborhood. have real time recognition, the recognition time of one image should reduce into the range of normal 2.1 subset Size video rate (30 frames/Second). Second, the normal The question of how large the sample is required to SOM winner searching algorithm has no guarantee select arises now. To select a larger sample than the to deliver the correct winner in case of input images requirements to achieve the desired results is with complex background. Finally, the winner- wasteful of the recognition time. In our case, the searching algorithm should be insensitive to the number of pixels in the input image (the population) lighting conditions of the input images. Achieving is statistically large, and the number of pixels in the these requirements is the main motivation to modify subset S is calculated for the sample that could the winner-searching algorithm of SOM. estimate the mean of the input image pixels. In our proposed algorithm, a subset from the The calculation of the sample size depends on the codebook of feature map neurons is selected standard deviation of the image pixels, the required according to off-line statistical computations. After sample confidence coefficient, and the sample that, the input image is tested to find out the range of estimation interval [7]. skin gray level of humans' hand. Finally, the σ competition between the pre-selected subset of the V = Z × (1) k neuron's codebook and the corresponding pixels in Where, V represents the required estimation interval the input images is applied if the gray level of the of the selected sample. Z is the normal distribution input pixel lies in the range of skin gray levels defined in advance, otherwise the competition of curve area for the required confidence coefficient, σ this pixel ignored. is the standard deviation of the population, and k is The experiment applied to recognize six gesture the required sample size. When equation 3 solved images of Jan-Ken-Pon Japanese game. The for k, it gives: proposed algorithm showed excellent performance Z 2σ 2 K= (2) compared to the normal SOM winner-searching V2 algorithm which could not recognize complex image If sampling without replacement from a finite scene. In addition, the recognition time of one image population (n) is required, equation 3 becomes: reduced into the range of normal video rates, this σ n−k V = Z× (3) means the feasibility of real-time recognition of k n −1 input images driven directly from video camera. Our When solved for k, gives: experimental results are far better than the results in [5,6] where it concerns only with reducing the nZ 2 σ 2 K= (4) winner searching time with no attention to the V 2 ( n − 1) + Z 2 σ 2 recognition accuracy. Now, the Pixel Usage Ration (PUR) represents the In the next section, the requirements to select the ratio of the subset size k into the total image pixels n competition subset are presented. In sections 3, the and can be calculated as: use of skin gray levels to find out the object pixels is K explained. Then, Our proposed SOM winner- PUR = (5) searching algorithm is shown in section 4. In section n 5, the image based gesture recognition system is For normal distribution population, the best choice discussed. Then, in section 6, the experimental for the estimation interval and the confidence results to recognize dynamic hand gestures of Jan- coefficient are 5 and 0.95, respectively. Indexing ken-Pon Japanese game are shown. Finally, the this value of confidence coefficient on the table of conclusion and discussion are reported in section 7. normal curve areas yields z=1.96. The above equations are also valid if the sample is selected from non-normal population. Since the 2 Subset Selection central limit theory states that for large samples, the As mentioned before, the competition in our distribution of its mean is approximately normally proposed searching algorithm depends on subset {S} distributed regardless of how the parent population of the codebook of all feature map neurons. The is distributed. In this case, it is recommended to selection of this subset is a crucial issue for the decrease the estimation interval to 2, this will recognition time and accuracy. Where the subset increase the sample size for the same standard size directly effects the recognition time and the deviation and confidence coefficient. subset elements has its effect on the recognition accuracy. 2.2 Subset Elements descending order and its original position (i) in the The selection of the elements in the subset {S} is codebook. very important for the recognition accuracy. In [8], this subset was selected randomly. The results { OS = (sd j , i) sd j > sd j+1;1 ≤ j ≤ m − 1, } showed very short recognition time; nevertheless, (8) the random subset is not reliable for dynamic Finally, the subset S is an ordered set containing the recognition. It may happen that some pixels with original position index (i) of the highly deviated common pose positions in the network feature map weights in descending order. This means that, the will be selected, which cause incorrect winner first element in S points to the weight with selection. Therefore, it is required to select this maximum deviation in the feature map, and the subset in advance. The selection process can be done standard deviation of the weight pointed by element from input image point-of-view or feature map number (l) is greater than the standard deviation of the weight pointed by element number (l+1). { } point-of-view. In the former, the selection must be applied on-line to determine the object pixels in the S= i ( sd j , i );1 ≤ i ≤ m (9) input image, this of course will overhead the Now, the set S contains the position index of the recognition time. While in the later, the selection highly deviated pixels. For competition, the first K will be applied off-line depending on some elements in {s} are selected. statistical analysis for the network feature map, no overhead in this case. However, the former technique is more robust, it is preferred to use the 3 Skin Gray Level later one because the expected overhead for on-line Using subset from the feature map codebook ensures subset selection can not support dynamic image the reduction of the recognition time. In addition, recognition. The question now, is how we can know selecting this subset from the most discriminate that certain weight is more important for codebooks increases the recognition accuracy. competition than another. Simply, the most However, there is no guarantee that the algorithm discriminate weights have more significance for can recognize complex image scene. In this case, it competition. These weights are corresponding to the is required to apply the competition based on object weights with high standard deviations. Therefore, it pixels only; in other words it is required to filter out is proposed to calculate the standard deviation of the the object pixels form the whole image. In hand weights in the same codebook position of all feature gesture recognition applications; this task seems to map neurons. Then the elements in S are selected be simple, since the image object is always the from the weights with maximum standard deviation. human hand, which has a fixed skin gray level range Now, suppose that the SOM feature map is for every person. So, it is proposed to check the constructed using its traditional Kohonen algorithm. range of the skin gray level of the hand then apply To determine which pixels will be used in the competition based on those pixels that lie in this competition, we can apply the following off-line range only. This idea is quite simple and has three computations: main advantages; first, it will not overhead the First, For each weight i in all codebooks calculate competition time. Second, it can easily integrate the average. with SOM competition equation. Third, it allows the m recognition of images taken under different lighting Ai = 1 m ∑µ k =1 ik (6) conditions, where the skin gray level range changes if the lighting conditions modify. Nevertheless, testing the skin gray level is not a difficult task and Where µik is the value of weight i in neuron k and m can be implemented either automatically or is the number of feature map neurons. manually. The experimental results will show how much this idea increased the recognition accuracy. Second, the standard deviation of each weight is calculated. 4 Winner Searching Algorithm m sd i = 1 m ∑ (µ k =1 ik − A i ) 2 (7) The new competition algorithm for SOM is proposed for the recognition phase only, while the normal Kohonen algorithm is used to create the Third, the ordered set OS is maintained, which feature map during the learning phase. The new contains ordered pairs of the standard deviations in algorithm is pursing our main target of reducing the recognition time of one image into the range of normal video camera. The recognition accuracy is 6. The selected winner is considered as the final also considered in this algorithm. Where, SOM winner. recognizing images with complex background, multiple objects, and taken under different lighting The competition in the first third, and fifth steps conditions is a crucial issues for image recognition uses the same subset of codebook. The set of applications. competing neurons changes from step to step. In the first step, the competition between the cluster Our proposed winner-searching algorithm is dubbed representative neurons is applied. Then, the cluster Most Discriminate Codebook Competition of the selected winner is considered as the cluster Algorithm or shortly MDCCA. Before starting the candidate. In the third step, the cluster candidate recognition by MDCCA, the following off-line neurons are competing. The winner from this computations must be applied: competition is considered as the winner candidate. Finally, in the fifth step, the competition between § Divide the network feature map into continuous the set of neurons neighbor to the winner candidate sets of equal size clusters. is applied. The main purpose of this step is to find § From each cluster, select one neuron, usually in the correct winner especially in the cases when the its center, as the cluster representative. gesture images transit from one posture position to § From equations (1-5), find the value of K. another. In this case, the winner may lies in the § From equations (6-9), find the elements in {S}. boundary area between the two clusters mapping § Test for the range of the Skin Gray Level SGL these postures. The size of the neighborhood range of the input object. should be selected as the number of neurons in the boundary between any two clusters. This size of The first step is applicable to SOM because it maps course changes from application to application. similar inputs in contiguous locations on the However, in image recognition applications, the network feature map. Therefore, it is possible to codebook of each neuron shows the exact image it divide the feature map into subsets of contiguous codes. Therefore, it is very simple to find the clusters and the neurons with similar codebook will required size of the neighborhood range just by belong to the same cluster. displaying the codebook of all the neurons as images. After that, the following on-line steps for MDCCA In the off-line computations of MDCCA, the feature algorithm are proposed to run as follow: map is divided into an equal size clusters. The selection of cluster size is crucial for the recognition 1. Apply the competition between the cluster time. The recognition time of one image can be representative neurons with the following rules: formulated as a function of the cluster size as: § A subset with size = K is selected from m T(c) = PUR * ( + c) + NR + a (10) the input image pixels and the c corresponding codebooks in the feature Where, c is the cluster size, NR is the Neighborhood map. range, and a is constant. § Use the elements in {S} as position index to select the subset from the To find the value of c for minimum recognition image pixels and the corresponding time, differentiate the above equation with respect to codebooks in the feature map. c considering that PUR and NR are constants, then § If the gray level of the selected pixel in set equal zero yields: the input image does not belong to SGL, exclude this pixel from the competition. c= m (11) 2. The cluster of the selected winner is considered Therefore, for minimum recognition time, the as the cluster candidates. cluster size should be selected as the square root of 3. Again, apply the competition between all the the feature map neurons. neurons in the cluster candidate based on the same rules of the first step. 5 Image Based Gesture Recognition 4. The winner selected from this competition is called the winner candidates. Hand gesture is one of the most expressive ways for 5. Finally, apply the competition between the set of the physically impaired people. However, due to the neurons neighbor to the winner candidate based complex dynamic nature of the gestures, most on the same rules of the first step. researches focused on static gesture [9]. If real-time recognition is considered, the recognition map of the constructed network divides into three complexity increases. In this work, image-based groups, one for each posture. The codebooks of the gesture recognition system is used to recognize neurons in each group are coded very similar to the different hand gestures using SOM network. Where, images of its posture. each gesture is treated as a set of consequence postures. These postures are used in constructing the features map of SOM network. PAA CHOKI GUU Fig. 2: An example of the learning data showing the three postures images of Jan-Ken-Pon game. The experiments applied on Alpha 21164A / 600 MHz processor, with gcc compiler without optimization. The input images are given as a sequence of gestures. When the input images are simple (without any background) and given under Fig. 1. Gesture recognition system, discrete posture recognition the same lighting conditions as in the training by SOM network, then gesture definition by pattern matching images, the normal winner-searching algorithm of algorithm. SOM could recognize the postures correctly. However, the recognition time of one image was Using SOM in gesture recognition has different 0.124 second, this of course can not support merits. First, the posture images templates are dynamic gesture recognition for input images taken created automatically by Kohonen competition directly from video camera. In case of complex algorithm. Second, the network decides by itself the images, the normal winner-searching algorithm size of each posture cluster. Third, smooth and failed to find the correct winner, and always gradual change in the neuron's codebook in the recognize any input image as PAA. Note that the boundary between any two clusters. This resembles number of object pixels of PAA posture is greater the natural change of the hand movement from one than the other postures. posture to another. Finally, the competition equation The recognition with MDCCA requires first to apply of SOM is so simple, which allows alleviating the the off-line computations to find the cluster common problems associated with gesture representatives, K, S, and SGL. In our case, SGL recognition applications without any overheads. Fig. selected empirically. Fig. 3 shows complex input 1 shows a complete gesture system divided into two image in PAA position before and after filtering the stages. The first convert the input image sequence object with SGL condition. The white (black) pixels into discrete posture states, and the second apply are the pixels with gray level less (greater) than the pattern matching algorithm for each set of discrete range of SGL in the original image. The SGL used posture sequence to give the gesture meaning of the in this image was (50-130). input image. The first stage can be implemented using SOM network, where the network can be constructed to recognize the discrete postures of all gestures. 6 Experimental Results Before After Fig. 3: complex input image in PAA position before and after MDCCA algorithm applied to recognize the hand filtering the object with SGL condition. gestures of Jan-Ken-Pon game. The game includes three postures for three-hand positions. These To find PUR, the standard deviations of the input postures called PAA, CHOKI, and GUU as shown images must be tested. As in equations 2 and 4, the in figure 2, respectively. To construct the feature subset size is directly proportional to the standard map, different images from each posture are used. deviation of the population. The standard deviation The learning images are collected from different of the most complicated images in our experiment persons under the same lighting condition with was 50 with normal distribution. The sample S is 120*160 pixels and 256 gray levels. The feature selected as simple random sample without replacement. From equation 4, using V=4 and Z=1.96, the sample size k should be equal to 582. second, which allows the recognition of more than Therefore, for 100 % accuracy, the PUR should 30 frame images per second. equal to 0.03, which means that only 3% of the input For Gesture recognition, MDCCA has important image pixels is required. This in fact coincides with defect that it can not recognize correctly if the object the experimental results. position shifted from its original position in the The images in figure 4 show an example of more training images. Defeating this defect is left as our complex images taken under different lighting future research. conditions. References: [1] T. Kohonen, E. Oja, O. Simula, A. Visa, and J. Kangas, Engineering applications of the self- organizing map, Proceedings of the IEEE, 1996, 84:1358-1384. [2] B. Back, K. Sere, and H. Vanharanta, Data mining accounting numbers using self- organizing maps. In Proceedings of STeP'96, Fig. 4: Example of complex input images taken under different lighting conditions Finnish Artificial Intelligence Conference, The recognition time of one image by MDCCA was Vaasa, Finland, 1996, pp 35-47. equal to 0.01 second. This value is 92% less than the [3] S. Garavaglia, A self-organizing map applied to recognition time by Normal winner searching macro and micro analysis of data with dummy algorithm and allows the recognition in normal rates variables, In Proceedings of WCNN'93, World of video camera. Congress on Neural Networks, Hillsdale, NJ., 1993, pp 362-368. 7 Conclusions and recommendations [4] T. Kohonen, Self-organized formation of In this paper, MDCCA algorithm is proposed as a topologically correct feature maps. Biological new winner-searching algorithm for Self-Organized Cybernetics, 1982. Map networks. The competition in MEDCCA depends mainly on subset of most discriminate [5] S. Kaski, Dimensionality reduction by random codebooks. MDCCA is more applicable to the mapping: fast similarity computation for applications that require large input size like image clustering, Proceeding of IJCNN'98, recognition application. Here, it is proposed to apply International Joint Conference on Neural for hand gesture recognition. MDDCA has different Network, Anchorage, Alsaka, 1998, Vol. 1, pp. merits over the traditional SOM winner-searching 413-418. algorithm. First, its recognition time is 92% less [6] S. Kaski, Fast Winner Selection for SOM-based than the traditional algorithm. Second, it reduced the Monitoring and Retrieval of High Dimension recognition time of one image into the range of Data, In Proceeding of ICANN'99, Ninth normal video rates, which allows on-line image International Conference on Artificial Neural recognition. Third, it allows the recognition of Networks, Edinburgh, UK, 1999. complex image scene. Fourth, it allows the recognition of images taken under different lighting [7] H. Frank and S. Altheon, Statistics, Concepts conditions. Finally, its recognition time is not and Applications, Cambradge University Press, directly proportional to the network size as in the 1994 normal algorithm. Where the size of selected subset [8] T. El.Tobely, Y. Yoshiki, R. Tsuda, N. Tsuruta, from the input image depends mainly on its standard and M. Amamiya, Randomized-self organizing deviation. Moreover, the used neurons in the maps and its applications, In the proceding of competition depends on the square root of feature the international conference of soft computing, map neurons (equation 11) not on the entire neurons Iizuka, Japan, 2000, pp 207-214 as in the traditional algorithm. The implementation of MDCCA to recognize [9] W. Freeman and M. Roth, Orientation dynamic hand gesture of Jan-Ken-Pon Japanese histograms for hand gesture recognition, game is given. The algorithm showed excellent International workshop on automatic face- and performance from time and accuracy points of view. gesture- recognition, IEEE Computer Society, Also, its the recognition time reduced to 0.01 Zurich, Switzerland, June 1995.

DOCUMENT INFO

Shared By:

Categories:

Tags:
neural networks, data set, pattern recognition, data mining, feature selection, the network, neural network, training set, feature vectors, intrusion detection, anomaly detection, self-organizing maps, the user, face recognition, data sets

Stats:

views: | 4 |

posted: | 8/30/2010 |

language: | English |

pages: | 6 |

OTHER DOCS BY cmk16156

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.