Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Expectation-maximization _EM_ Algorithm

VIEWS: 21 PAGES: 23

									Expectation-maximization
(EM) Algorithm

Chien-Yu Chen
Graduate school of biotechnology and
bioinformatics, Yuan Ze University
Probability-Based Clustering

 The foundation of the probability-based
  clustering approach is based on a so-called
  finite mixture model.
 A mixture is a set of k probability distributions,
  each of which governs the attribute values
  distribution of a cluster.




                                                                    2
                               Lecture notes of Dr. Yen-Jen Oyang
A 2-Cluster Example of the Finite
Mixture Model
   In this example, it is assumed that there are
    two clusters and the attribute value
    distributions in both clusters are normal
    distributions.




         N(1,12)     N(2,22)
                                                                        3
                                   Lecture notes of Dr. Yen-Jen Oyang
Assume that we have the following 51 samples and it
is believed that these samples are taken from two
normal distributions:

 51   62   64   48   39   51
 43   47   51   64   62   48
 62   52   52   51   64   64
 64   64   62   63   52   42
 45   51   49   43   63   48
 42   65   48   65   64   41
 46   48   62   66   48
 45   49   43   65   64
 45   46   40   46   48

                                                                     4
                                Lecture notes of Dr. Yen-Jen Oyang
Operation of the EM Algorithm

 The EM algorithm is to figure out the
  parameters for the finite mixture model.
 Let {s1, s2,…, sn} denote the the set of
  samples.
 In this example, we need to figure out the
  following 5 parameters:
    1, 1, 2, 2, P(C1).



                                                                  5
                             Lecture notes of Dr. Yen-Jen Oyang
 For a general 1-dimensional case that has
  k clusters, we need to figure out totally
  2k+(k-1) parameters.
 The EM algorithm begins with an initial
  guess of the parameter values.




                                                                 6
                            Lecture notes of Dr. Yen-Jen Oyang
   For our example, we assume that
    P(C1)=0.5. Accordingly, we divide the
    dataset into two subsets:
    – {51(*2.5),52(*3),62(*5),63(*2),64(*8),65(*3),66(*
      1)};
    – {51(*2.5),49(*2),48(*7),47(*1),46(*3),45(*3),43(*
      3),
       42(*2),41(*1),40(*1),39(*1)};
    – 1=47.63;
    – 1 =3.79;
    – 2=58.53;
    – 2 =5.57.
                                                                         7
                                    Lecture notes of Dr. Yen-Jen Oyang
   Then, the probabilities that sample si belongs
    to these two clusters are computed as follow:


                                                                               xi  1 2
                     Probxi | C1  ProbC1  ProbC1                      
    ProbC1 | xi  
                                                          1                      2 1
                                                       
                                                                                    2
                                                                e
                           Probxi            Probxi  2  1
                                                                                 xi  2 2
                     Probxi | C2  ProbC2  ProbC2                        
    ProbC2 | xi  
                                                          1                         2 2
                                                       
                                                                                       2
                                                                e
                           Probxi            Probxi  2  2
               ProbC1 | xi 
    pi                                  ,
         ProbC1 | xi   ProbC2 | xi 
    where xi is attribute value of sample si .

                                                                                             8
                                                 Lecture notes of Dr. Yen-Jen Oyang
   For example, for an object with value=52,
    we have

                                               
                                                 52  47.632
     ProbC1 | 52 
                        0 .5        1             2(3.79) 2
                                          e
                     Probx52 2  (3.79)
                             0.5
                                   0.1357
                     Probx52 2
                                           
                                             5258.532
     ProbC 2 | 52 
                        0 .5          1
                                              e 2(5.57 )
                                                          2
                                
                     Probx52 2  5.57
                            0.5
                                       0.0903
                     Probx52 2
                  ProbC1 | x52                  0.1357
     pi                                                    0.600,
          ProbC1 | x52  ProbC 2 | x52 0.1357  0.0903
                                                                                    9
                                               Lecture notes of Dr. Yen-Jen Oyang
The  new estimated values of parameters
are computed as follows.

           n                                n

          px      i       i             (1  p ) x i       i
 1      i 1
             n
                               ; 2    i 1
                                           n

          p
           i 1
                       i                     (1  p )
                                            i 1
                                                         i

           n                                             n

           pi xi  1                2
                                                      (1  pi )xi  1            2


  12    i 1
                           n
                                            ; 2 
                                               2     i 1
                                                                  n

                       p
                       i 1
                                   i                              (1  p )
                                                                 i 1
                                                                             i

                   n

                  p           i
 P (C1 )         i 1
                       n
                                                                                                             10
                                                                        Lecture notes of Dr. Yen-Jen Oyang
 The process is repeated until the clustering
  results converge.
 Generally, we attempt to maximize the
  following likelihood function:

     P(C ) Px |C   P(C ) Px |C .
     i
           1     i   1     2     i   2




                                                                     11
                                Lecture notes of Dr. Yen-Jen Oyang
   Once we have figured out the approximate
    parameter values, then we assign sample
    si into C1, if
             ProbC1 | xi 
pi                                     0.5,
       ProbC1 | xi   ProbC2 | xi 
                                                                              xi  1 2
                 Probxi | C1  ProbC1  ProbC1                         
ProbC1 | xi  
                                                      1                         2 1
                                                   
                                                                                   2
                                                            e
                       Probxi            Probxi  2  1
                                                                                xi  2 2
                 Probxi | C2  ProbC2  ProbC2                           
ProbC2 | xi  
                                                      1                            2 2
                                                   
                                                                                      2
                                                            e                                 .
                       Probxi            Probxi  2  2
   Otherwise, si is assigned into C2.

                                                                                        12
                                                Lecture notes of Dr. Yen-Jen Oyang
The Finite Mixture Model for
Multiple Attributes
 The finite mixture model described above can
  be easily generalized to handle multiple
  independent attributes.
 For example, in a case that has two
  independent attributes, then the distribution
  function of cluster j is of form:

                                   
                                     x xj 2         
                                                           y  yj 2
f j  x, y  
                     1                 2 xj                 2 yj
                              e                   e
                 2 xj yj
                                                                                       13
                                                  Lecture notes of Dr. Yen-Jen Oyang
 Assume that there are 3 clusters in a 2-
  dimensional data set. Then, we have 14
  parameters to be determined: x1, y1, x1,
  y1, x2, y2, x2, y1, x3, y3, x3, y3, P(C1),
  and P(C2).
 The probability that sample si belongs to Cj
  is:
                                Prob( xi , yi ) | C j ProbC j 
    ProbC j | ( xi , yi ) 
                                        Prob( xi , yi )
                                                  xi   x j 2             yi   y j 2
          ProbC j              1           
                                                    2 x j
                                                       2
                                                                        
                                                                               2 2 j
                                      e                         e             y

        Prob( xi , yi ) 2 xj yj
                                       ProbC j | ( xi , yi )
    p ji 
             ProbC1 | ( xi , yi )  ProbC2 | ( xi , yi )  ProbC3 | ( xi , yi )
                                                                                                                 14
                                                                            Lecture notes of Dr. Yen-Jen Oyang
   The new estimated values of the
    parameters are computed as follows:
               n                                  n

              p             x
                           ji i              p          ji   yi
      xj    i 1
                 n
                                  ;  yj        i 1
                                                    n

               p
               i 1
                             ji                   p
                                                  i 1
                                                              ji



               p ji xi   xj                                    p ji  yi   yj 
               n                                                    n
                                             2                                             2


      xj 
       2      i 1
                             n
                                                  ;  yj 
                                                      2            i 1
                                                                           n

                           p
                           i 1
                                       ji                                 p
                                                                          i 1
                                                                                   ji

                       n

                     p           ji
     P (C j )        i 1
                           n                                                                                        15
                                                                               Lecture notes of Dr. Yen-Jen Oyang
Limitation of the Finite Mixture
Model and the EM Algorithm

 The finite mixture model and the EM
  algorithm generally assume that the attributes
  are independent.
 Approaches have been proposed for handling
  correlated attributes. However, these
  approaches are subject to further limitations.
 Furthermore, the EM algorithm may converge
  to a local optimal state.
                                                                  16
                             Lecture notes of Dr. Yen-Jen Oyang
Generalization of the Finite Mixture
Model and the EM Algorithm

 The finite mixture model and the EM
  algorithm can be generalized to handle other
  types of probability distributions.
 For example, if we want to partition the
  objects into k clusters based on m
  independent nominal attributes, then we can
  apply the EM algorithm to figure out the
  parameters required to describe the
  distribution.
                                                                 17
                            Lecture notes of Dr. Yen-Jen Oyang
   In this case, the total number of parameters
                           m
    is equal to (k  1)  k A ,
                                 
                                 i 1
                                        i


                     where Ai is the number of possible values of
                     attribute Ai .

   The term     k Ai   is due to the need to
    compute
        ProbC j | ( v1 , v2 ,...,vm )
            Prob( v1 , v2 ,...,vm ) | C j ProbC j 
        
                    Prob( v1 , v2 ,...,vm )
                  ProbC j 
                                          Probv              |Cj
                                            m
                                                                 .
            Prob( v1 , v2 ,...,vm    )
                                                           i
                                            i 1

                                                                                        18
                                                   Lecture notes of Dr. Yen-Jen Oyang
   If two attributes are correlated, then we can
    merge these two attributes to form an
    attribute with |Ai| |Aj| possible values.




                                                    19
An Example

   Assume that we want to partition 100
    samples of a particular species of insects into
    3 clusters according to 4 attributes:
    –   Color(Ac): milk, light brown, or dark brown;
    –   Head shape(Ah): spherical or triangular;
    –   Body length(Al): long or short;
    –   Weight(Aw): heavy or light.



                                                       20
   If we determine that body length and weight are
    correlated, then we create a composite attribute
    As:(length, weight) with 4 possible values: (L, H), (L,
    L), (S, H), and (S, L).
   We can figure out the values of the parameters in the
    following table with the EM algorithm, in addition to
    P(C1), P(C2), and P(C3):
            Color     Head shape   (Body length, Weight)

      C1    P(M|C1)   P(S|C1)      P((L,H)|C1), P((S,H)|C1)
            P(L|C1)   P(T|C1)      P((L,L)|C1), P((S,L)|C1)
            P(D|C1)

      C2    P(M|C2)   P(S|C2)      P((L,H)|C2), P((S,H)|C2)
            P(L|C2)   P(T|C2)      P((L,L)|C2), P((S,L)|C2)
            P(D|C2)

      C3    P(M|C3)   P(S|C3)      P((L,H)|C3), P((S,H)|C3)
            P(L|C3)   P(T|C3)      P((L,L)|C3), P((S,L)|C3)
            P(D|C3)
                                                                                  21
                                             Lecture notes of Dr. Yen-Jen Oyang
 We invoke the EM algorithm with an initial
  guess of these parameter values.
 For each sample si=(v1, v2, v3), we
  compute the following probabilities:
                                 Prob( v1 , v2 , v3 ) | C j ProbC j 
ProbC j | ( v1 , v2 , v3 ) 
                                         Prob( v1 , v2 , v3 )
    Probv1 | C j Probv2 | C j Probv3 | C j ProbC j 

                        Prob( v1 , v2 , v3 )
                                   ProbC j | ( v1 , v2 , v3 )( xi , yi )
p ji 
         ProbC1 | ( v1 , v2 , v3 )  ProbC2 | ( v1 , v2 , v3 )  ProbC3 | ( v1 , v2 , v3 )




                                                                                                    22
                                                               Lecture notes of Dr. Yen-Jen Oyang
   The new estimated values of the
    parameters are computed as follows:

                                n

                                (color( s )  M )  p
                                               i             ji
       P( M | C j )           i 1
                                         n

                                        p
                                        i 1
                                                   ji

                     n

                    p         ji
       P (C j )    i 1
                           n



                                                                                             23
                                                        Lecture notes of Dr. Yen-Jen Oyang

								
To top