SOM Winner-Searching Algorithm Based on Subset of Most Discriminate

Document Sample
SOM Winner-Searching Algorithm Based on Subset of Most Discriminate Powered By Docstoc
					        SOM Winner-Searching Algorithm Based on Subset of Most
                      Discriminate Codebooks
           Tarek El.Tobely*, Yuichiro Yoshiki**, Ryuichi Tsuda**, Naoyuki Tsuruta**,
                                       and Makoto Amamiya*
      *Department of Intelligent Systems, Graduate School of Information Science and Electrical
                                         Kyushu University
                           6-1, Kasuga-Koen, Kasuga, Fukuoka 816-8580
        **Department of Electronics Engineering, Graduate School of Electronics Engineering
                                         Fukuoka University
                            8-19-1, Nanakuma, Jonan, Fukuoka, 814-0180

Abstract: - In the winner-searching algorithm of the Self-Organized Map networks (SOM), the distance
between the input and the codebook of all feature map neurons is measured, then the neuron with minimum
distance is considered as the winner. However, in case of large applications, this algorithm spends very long
time to find the winner, sometimes it also fails to answer the winner category correctly. In this paper, a new
winner-searching algorithm for SOM networks is proposed for gesture recognition applications. In such
applications, it is required to reduce the recognition time of one image into the normal rates of video camera
(30 frames/second) and to increase the recognition accuracy especially for complex image scene with different
lighting conditions. The competition in the new algorithm depends on a subset of most discriminate weights of
the network codebooks. In addition, to allow recognizing complex images with different lighting conditions,
the competition is applied for those pixels, which corresponds to the skin gray levels only. The experimental
results showed that the proposed algorithm could recognize the gestures of complex image scene while the
normal SOM algorithm failed. Moreover, the recognition time of one image reduced 92 % less than the normal
SOM winner-searching algorithm and allowed real-time recognition in the normal rates of video camera.

Key-Words: - Neural Network, Self-Organized Map, competitive learning, Winner Searching, computer-
Human Interaction, and Gesture Recognition.

1 Introduction                                           The neighborhood decreases in size during the
The self-organizing map (SOM) is a neural network        training period. Learning done by adjusting the
algorithm that has been used for a variety of            weights of the units by a small amount to resemble
applications, mostly for engineering problems but        the input vectors more.
also for data analysis [1-3]. SOM transform the          The required computations to find the winner in
input of arbitrary dimension into a one- or two-         SOM depend mainly on the network size. As the
dimensional discrete map subject to a topological        size increases, the network spends longer time to
(neighborhood preserving) constraint. The essential      find the winner. Because of that, SOM cannot
advantage of this network lies in the clustering         effectively be used in dynamic recognition
produced by its algorithm, which reduces the input       applications, which require short recognition time. If
space into representative features using a self-         the number of feature map neurons is m and the size
organizing process. Hence the underlying structure       of input is n, then the number of the required steps
of the input space is kept, while the dimensionality     to find the winner is approximately m * n.
of the space is reduced. SOM feature map is              In this paper, it is proposed to use SOM in image-
constructed using Kohonen algorithm [4] where a          based gesture recognition application for the task of
set of vectors is input repeatedly to a map consisting   quantizing the continuous gesture scene into discrete
of units. Associated with each unit is a codebook        posture states. To accomplish this, some
vector, initially consisting of random values. The       modifications in SOM winner-searching algorithm
unit with the highest response to the input is allowed   are required to fulfill three requirements. First, to
to learn, as well as some units in the neighborhood.     have real time recognition, the recognition time of
one image should reduce into the range of normal           2.1 subset Size
video rate (30 frames/Second). Second, the normal          The question of how large the sample is required to
SOM winner searching algorithm has no guarantee            select arises now. To select a larger sample than the
to deliver the correct winner in case of input images      requirements to achieve the desired results is
with complex background. Finally, the winner-              wasteful of the recognition time. In our case, the
searching algorithm should be insensitive to the           number of pixels in the input image (the population)
lighting conditions of the input images. Achieving         is statistically large, and the number of pixels in the
these requirements is the main motivation to modify        subset S is calculated for the sample that could
the winner-searching algorithm of SOM.                     estimate the mean of the input image pixels.
In our proposed algorithm, a subset from the               The calculation of the sample size depends on the
codebook of feature map neurons is selected                standard deviation of the image pixels, the required
according to off-line statistical computations. After      sample confidence coefficient, and the sample
that, the input image is tested to find out the range of   estimation interval [7].
skin gray level of humans' hand. Finally, the                                     σ
competition between the pre-selected subset of the                    V = Z ×                                 (1)
neuron's codebook and the corresponding pixels in
                                                           Where, V represents the required estimation interval
the input images is applied if the gray level of the
                                                           of the selected sample. Z is the normal distribution
input pixel lies in the range of skin gray levels
defined in advance, otherwise the competition of           curve area for the required confidence coefficient, σ
this pixel ignored.                                        is the standard deviation of the population, and k is
The experiment applied to recognize six gesture            the required sample size. When equation 3 solved
images of Jan-Ken-Pon Japanese game. The                   for k, it gives:
proposed algorithm showed excellent performance                          Z 2σ 2
                                                                    K=                                        (2)
compared to the normal SOM winner-searching                               V2
algorithm which could not recognize complex image          If sampling without replacement from a finite
scene. In addition, the recognition time of one image      population (n) is required, equation 3 becomes:
reduced into the range of normal video rates, this                                σ    n−k
                                                                    V = Z×                                    (3)
means the feasibility of real-time recognition of                                 k    n −1
input images driven directly from video camera. Our        When solved for k, gives:
experimental results are far better than the results in
[5,6] where it concerns only with reducing the                                    nZ 2 σ 2
                                                                    K=                                        (4)
winner searching time with no attention to the                            V 2 ( n − 1) + Z 2 σ 2
recognition accuracy.                                      Now, the Pixel Usage Ration (PUR) represents the
In the next section, the requirements to select the        ratio of the subset size k into the total image pixels n
competition subset are presented. In sections 3, the       and can be calculated as:
use of skin gray levels to find out the object pixels is                      K
explained. Then, Our proposed SOM winner-                            PUR =                                     (5)
searching algorithm is shown in section 4. In section                          n
5, the image based gesture recognition system is           For normal distribution population, the best choice
discussed. Then, in section 6, the experimental            for the estimation interval and the confidence
results to recognize dynamic hand gestures of Jan-         coefficient are 5 and 0.95, respectively. Indexing
ken-Pon Japanese game are shown. Finally, the              this value of confidence coefficient on the table of
conclusion and discussion are reported in section 7.       normal curve areas yields z=1.96.
                                                           The above equations are also valid if the sample is
                                                           selected from non-normal population. Since the
2 Subset Selection                                         central limit theory states that for large samples, the
As mentioned before, the competition in our                distribution of its mean is approximately normally
proposed searching algorithm depends on subset {S}         distributed regardless of how the parent population
of the codebook of all feature map neurons. The            is distributed. In this case, it is recommended to
selection of this subset is a crucial issue for the        decrease the estimation interval to 2, this will
recognition time and accuracy. Where the subset            increase the sample size for the same standard
size directly effects the recognition time and the         deviation and confidence coefficient.
subset elements has its effect on the recognition
2.2 Subset Elements                                       descending order and its original position (i) in the
The selection of the elements in the subset {S} is        codebook.
very important for the recognition accuracy. In [8],
this subset was selected randomly. The results
                                                                   OS = (sd j , i) sd j > sd j+1;1 ≤ j ≤ m − 1,  }
showed very short recognition time; nevertheless,                                                                    (8)
the random subset is not reliable for dynamic             Finally, the subset S is an ordered set containing the
recognition. It may happen that some pixels with          original position index (i) of the highly deviated
common pose positions in the network feature map          weights in descending order. This means that, the
will be selected, which cause incorrect winner            first element in S points to the weight with
selection. Therefore, it is required to select this       maximum deviation in the feature map, and the
subset in advance. The selection process can be done      standard deviation of the weight pointed by element
from input image point-of-view or feature map             number (l) is greater than the standard deviation of
                                                          the weight pointed by element number (l+1).
                                                                        {                                    }
point-of-view. In the former, the selection must be
applied on-line to determine the object pixels in the              S=        i      ( sd j , i );1 ≤ i ≤ m           (9)
input image, this of course will overhead the
                                                          Now, the set S contains the position index of the
recognition time. While in the later, the selection
                                                          highly deviated pixels. For competition, the first K
will be applied off-line depending on some
                                                          elements in {s} are selected.
statistical analysis for the network feature map, no
overhead in this case. However, the former
technique is more robust, it is preferred to use the      3 Skin Gray Level
later one because the expected overhead for on-line       Using subset from the feature map codebook ensures
subset selection can not support dynamic image            the reduction of the recognition time. In addition,
recognition. The question now, is how we can know         selecting this subset from the most discriminate
that certain weight is more important for                 codebooks increases the recognition accuracy.
competition than another. Simply, the most                However, there is no guarantee that the algorithm
discriminate weights have more significance for           can recognize complex image scene. In this case, it
competition. These weights are corresponding to the       is required to apply the competition based on object
weights with high standard deviations. Therefore, it      pixels only; in other words it is required to filter out
is proposed to calculate the standard deviation of the    the object pixels form the whole image. In hand
weights in the same codebook position of all feature      gesture recognition applications; this task seems to
map neurons. Then the elements in S are selected          be simple, since the image object is always the
from the weights with maximum standard deviation.         human hand, which has a fixed skin gray level range
Now, suppose that the SOM feature map is                  for every person. So, it is proposed to check the
constructed using its traditional Kohonen algorithm.      range of the skin gray level of the hand then apply
To determine which pixels will be used in                 the competition based on those pixels that lie in this
competition, we can apply the following off-line          range only. This idea is quite simple and has three
computations:                                             main advantages; first, it will not overhead the
First, For each weight i in all codebooks calculate       competition time. Second, it can easily integrate
the average.                                              with SOM competition equation. Third, it allows the
                      m                                   recognition of images taken under different lighting
         Ai =
              m       ∑µ
                      k =1
                              ik                    (6)
                                                          conditions, where the skin gray level range changes
                                                          if the lighting conditions modify. Nevertheless,
                                                          testing the skin gray level is not a difficult task and
Where µik is the value of weight i in neuron k and m      can be implemented either automatically or
is the number of feature map neurons.                     manually. The experimental results will show how
                                                          much this idea increased the recognition accuracy.
Second, the standard deviation of each weight is
calculated.                                               4 Winner Searching Algorithm
         sd i =
                  m     ∑ (µ
                        k =1
                                   ik − A i )
                                                          The new competition algorithm for SOM is
                                                          proposed for the recognition phase only, while the
                                                          normal Kohonen algorithm is used to create the
Third, the ordered set OS is maintained, which            feature map during the learning phase. The new
contains ordered pairs of the standard deviations in      algorithm is pursing our main target of reducing the
                                                          recognition time of one image into the range of
normal video camera. The recognition accuracy is       6. The selected winner is considered as the final
also considered in this algorithm. Where,                 SOM winner.
recognizing images with complex background,
multiple objects, and taken under different lighting   The competition in the first third, and fifth steps
conditions is a crucial issues for image recognition   uses the same subset of codebook. The set of
applications.                                          competing neurons changes from step to step. In the
                                                       first step, the competition between the cluster
Our proposed winner-searching algorithm is dubbed      representative neurons is applied. Then, the cluster
Most      Discriminate   Codebook     Competition      of the selected winner is considered as the cluster
Algorithm or shortly MDCCA. Before starting the        candidate. In the third step, the cluster candidate
recognition by MDCCA, the following off-line           neurons are competing. The winner from this
computations must be applied:                          competition is considered as the winner candidate.
                                                       Finally, in the fifth step, the competition between
§   Divide the network feature map into continuous     the set of neurons neighbor to the winner candidate
    sets of equal size clusters.                       is applied. The main purpose of this step is to find
§   From each cluster, select one neuron, usually in   the correct winner especially in the cases when the
    its center, as the cluster representative.         gesture images transit from one posture position to
§   From equations (1-5), find the value of K.         another. In this case, the winner may lies in the
§   From equations (6-9), find the elements in {S}.    boundary area between the two clusters mapping
§   Test for the range of the Skin Gray Level SGL      these postures. The size of the neighborhood range
    of the input object.                               should be selected as the number of neurons in the
                                                       boundary between any two clusters. This size of
The first step is applicable to SOM because it maps    course changes from application to application.
similar inputs in contiguous locations on the          However, in image recognition applications, the
network feature map. Therefore, it is possible to      codebook of each neuron shows the exact image it
divide the feature map into subsets of contiguous      codes. Therefore, it is very simple to find the
clusters and the neurons with similar codebook will    required size of the neighborhood range just by
belong to the same cluster.                            displaying the codebook of all the neurons as
 After that, the following on-line steps for MDCCA     In the off-line computations of MDCCA, the feature
algorithm are proposed to run as follow:               map is divided into an equal size clusters. The
                                                       selection of cluster size is crucial for the recognition
1. Apply the competition between the cluster           time. The recognition time of one image can be
   representative neurons with the following rules:    formulated as a function of the cluster size as:
        § A subset with size = K is selected from                                m
                                                                T(c) = PUR * (     + c) + NR + a        (10)
            the input image pixels and the                                       c
            corresponding codebooks in the feature     Where, c is the cluster size, NR is the Neighborhood
            map.                                       range, and a is constant.
        § Use the elements in {S} as position
            index to select the subset from the        To find the value of c for minimum recognition
            image pixels and the corresponding         time, differentiate the above equation with respect to
            codebooks in the feature map.              c considering that PUR and NR are constants, then
        § If the gray level of the selected pixel in   set equal zero yields:
            the input image does not belong to SGL,
            exclude this pixel from the competition.             c= m                                 (11)
2. The cluster of the selected winner is considered    Therefore, for minimum recognition time, the
   as the cluster candidates.                          cluster size should be selected as the square root of
3. Again, apply the competition between all the        the feature map neurons.
   neurons in the cluster candidate based on the
   same rules of the first step.                       5 Image Based Gesture Recognition
4. The winner selected from this competition is
   called the winner candidates.                       Hand gesture is one of the most expressive ways for
5. Finally, apply the competition between the set of   the physically impaired people. However, due to the
   neurons neighbor to the winner candidate based      complex dynamic nature of the gestures, most
   on the same rules of the first step.                researches focused on static gesture [9]. If real-time
recognition    is   considered,    the    recognition               map of the constructed network divides into three
complexity increases. In this work, image-based                     groups, one for each posture. The codebooks of the
gesture recognition system is used to recognize                     neurons in each group are coded very similar to the
different hand gestures using SOM network. Where,                   images of its posture.
each gesture is treated as a set of consequence
postures. These postures are used in constructing the
features map of SOM network.

                                                                            PAA                 CHOKI                  GUU

                                                                        Fig. 2: An example of the learning data showing the three
                                                                                 postures images of Jan-Ken-Pon game.

                                                                    The experiments applied on Alpha 21164A / 600
                                                                    MHz processor, with gcc compiler without
                                                                    optimization. The input images are given as a
                                                                    sequence of gestures. When the input images are
                                                                    simple (without any background) and given under
 Fig. 1. Gesture recognition system, discrete posture recognition   the same lighting conditions as in the training
  by SOM network, then gesture definition by pattern matching       images, the normal winner-searching algorithm of
                            algorithm.                              SOM could recognize the postures correctly.
                                                                    However, the recognition time of one image was
Using SOM in gesture recognition has different                      0.124 second, this of course can not support
merits. First, the posture images templates are                     dynamic gesture recognition for input images taken
created automatically by Kohonen competition                        directly from video camera. In case of complex
algorithm. Second, the network decides by itself the                images, the normal winner-searching algorithm
size of each posture cluster. Third, smooth and                     failed to find the correct winner, and always
gradual change in the neuron's codebook in the                      recognize any input image as PAA. Note that the
boundary between any two clusters. This resembles                   number of object pixels of PAA posture is greater
the natural change of the hand movement from one                    than the other postures.
posture to another. Finally, the competition equation               The recognition with MDCCA requires first to apply
of SOM is so simple, which allows alleviating the                   the off-line computations to find the cluster
common problems associated with gesture                             representatives, K, S, and SGL. In our case, SGL
recognition applications without any overheads. Fig.                selected empirically. Fig. 3 shows complex input
1 shows a complete gesture system divided into two                  image in PAA position before and after filtering the
stages. The first convert the input image sequence                  object with SGL condition. The white (black) pixels
into discrete posture states, and the second apply                  are the pixels with gray level less (greater) than the
pattern matching algorithm for each set of discrete                 range of SGL in the original image. The SGL used
posture sequence to give the gesture meaning of the                 in this image was (50-130).
input image. The first stage can be implemented
using SOM network, where the network can be
constructed to recognize the discrete postures of all

6 Experimental Results                                                        Before                                 After

                                                                       Fig. 3: complex input image in PAA position before and after
MDCCA algorithm applied to recognize the hand                                     filtering the object with SGL condition.
gestures of Jan-Ken-Pon game. The game includes
three postures for three-hand positions. These                      To find PUR, the standard deviations of the input
postures called PAA, CHOKI, and GUU as shown                        images must be tested. As in equations 2 and 4, the
in figure 2, respectively. To construct the feature                 subset size is directly proportional to the standard
map, different images from each posture are used.                   deviation of the population. The standard deviation
The learning images are collected from different                    of the most complicated images in our experiment
persons under the same lighting condition with                      was 50 with normal distribution. The sample S is
120*160 pixels and 256 gray levels. The feature                     selected as simple random sample without
                                                                    replacement. From equation 4, using V=4 and
Z=1.96, the sample size k should be equal to 582.                 second, which allows the recognition of more than
Therefore, for 100 % accuracy, the PUR should                     30 frame images per second.
equal to 0.03, which means that only 3% of the input              For Gesture recognition, MDCCA has important
image pixels is required. This in fact coincides with             defect that it can not recognize correctly if the object
the experimental results.                                         position shifted from its original position in the
The images in figure 4 show an example of more                    training images. Defeating this defect is left as our
complex images taken under different lighting                     future research.
                                                                  [1] T. Kohonen, E. Oja, O. Simula, A. Visa, and J.
                                                                      Kangas, Engineering applications of the self-
                                                                      organizing map, Proceedings of the IEEE, 1996,
                                                                  [2] B. Back, K. Sere, and H. Vanharanta, Data
                                                                      mining accounting numbers using self-
                                                                      organizing maps. In Proceedings of STeP'96,
  Fig. 4: Example of complex input images taken under different
                       lighting conditions
                                                                      Finnish Artificial Intelligence Conference,
The recognition time of one image by MDCCA was                        Vaasa, Finland, 1996, pp 35-47.
equal to 0.01 second. This value is 92% less than the             [3] S. Garavaglia, A self-organizing map applied to
recognition time by Normal winner searching                           macro and micro analysis of data with dummy
algorithm and allows the recognition in normal rates                  variables, In Proceedings of WCNN'93, World
of video camera.                                                      Congress on Neural Networks, Hillsdale, NJ.,
                                                                      1993, pp 362-368.
7 Conclusions and recommendations                                 [4] T. Kohonen, Self-organized formation of
In this paper, MDCCA algorithm is proposed as a                      topologically correct feature maps. Biological
new winner-searching algorithm for Self-Organized                    Cybernetics, 1982.
Map networks. The competition in MEDCCA
depends mainly on subset of most discriminate                     [5] S. Kaski, Dimensionality reduction by random
codebooks. MDCCA is more applicable to the                            mapping: fast similarity computation for
applications that require large input size like image                 clustering,   Proceeding     of     IJCNN'98,
recognition application. Here, it is proposed to apply                International Joint Conference on Neural
for hand gesture recognition. MDDCA has different                     Network, Anchorage, Alsaka, 1998, Vol. 1, pp.
merits over the traditional SOM winner-searching                      413-418.
algorithm. First, its recognition time is 92% less                [6] S. Kaski, Fast Winner Selection for SOM-based
than the traditional algorithm. Second, it reduced the                Monitoring and Retrieval of High Dimension
recognition time of one image into the range of                       Data, In Proceeding of ICANN'99, Ninth
normal video rates, which allows on-line image                        International Conference on Artificial Neural
recognition. Third, it allows the recognition of                      Networks, Edinburgh, UK, 1999.
complex image scene. Fourth, it allows the
recognition of images taken under different lighting              [7] H. Frank and S. Altheon, Statistics, Concepts
conditions. Finally, its recognition time is not                      and Applications, Cambradge University Press,
directly proportional to the network size as in the                   1994
normal algorithm. Where the size of selected subset               [8] T. El.Tobely, Y. Yoshiki, R. Tsuda, N. Tsuruta,
from the input image depends mainly on its standard                   and M. Amamiya, Randomized-self organizing
deviation. Moreover, the used neurons in the                          maps and its applications, In the proceding of
competition depends on the square root of feature                     the international conference of soft computing,
map neurons (equation 11) not on the entire neurons                   Iizuka, Japan, 2000, pp 207-214
as in the traditional algorithm.
The implementation of MDCCA to recognize                          [9]     W. Freeman and M. Roth, Orientation
dynamic hand gesture of Jan-Ken-Pon Japanese                            histograms for hand gesture recognition,
game is given. The algorithm showed excellent                           International workshop on automatic face- and
performance from time and accuracy points of view.                      gesture- recognition, IEEE Computer Society,
Also, its the recognition time reduced to 0.01                          Zurich, Switzerland, June 1995.