# Expectation-maximization _EM_ Algorithm by pptfiles

VIEWS: 21 PAGES: 23

• pg 1
```									Expectation-maximization
(EM) Algorithm

Chien-Yu Chen
bioinformatics, Yuan Ze University
Probability-Based Clustering

 The foundation of the probability-based
clustering approach is based on a so-called
finite mixture model.
 A mixture is a set of k probability distributions,
each of which governs the attribute values
distribution of a cluster.

2
Lecture notes of Dr. Yen-Jen Oyang
A 2-Cluster Example of the Finite
Mixture Model
   In this example, it is assumed that there are
two clusters and the attribute value
distributions in both clusters are normal
distributions.

N(1,12)     N(2,22)
3
Lecture notes of Dr. Yen-Jen Oyang
Assume that we have the following 51 samples and it
is believed that these samples are taken from two
normal distributions:

 51   62   64   48   39   51
 43   47   51   64   62   48
 62   52   52   51   64   64
 64   64   62   63   52   42
 45   51   49   43   63   48
 42   65   48   65   64   41
 46   48   62   66   48
 45   49   43   65   64
 45   46   40   46   48

4
Lecture notes of Dr. Yen-Jen Oyang
Operation of the EM Algorithm

 The EM algorithm is to figure out the
parameters for the finite mixture model.
 Let {s1, s2,…, sn} denote the the set of
samples.
 In this example, we need to figure out the
following 5 parameters:
1, 1, 2, 2, P(C1).

5
Lecture notes of Dr. Yen-Jen Oyang
 For a general 1-dimensional case that has
k clusters, we need to figure out totally
2k+(k-1) parameters.
 The EM algorithm begins with an initial
guess of the parameter values.

6
Lecture notes of Dr. Yen-Jen Oyang
   For our example, we assume that
P(C1)=0.5. Accordingly, we divide the
dataset into two subsets:
– {51(*2.5),52(*3),62(*5),63(*2),64(*8),65(*3),66(*
1)};
– {51(*2.5),49(*2),48(*7),47(*1),46(*3),45(*3),43(*
3),
42(*2),41(*1),40(*1),39(*1)};
– 1=47.63;
– 1 =3.79;
– 2=58.53;
– 2 =5.57.
7
Lecture notes of Dr. Yen-Jen Oyang
   Then, the probabilities that sample si belongs
to these two clusters are computed as follow:

 xi  1 2
Probxi | C1  ProbC1  ProbC1                      
ProbC1 | xi  
1                      2 1
          
2
e
Probxi            Probxi  2  1
 xi  2 2
Probxi | C2  ProbC2  ProbC2                        
ProbC2 | xi  
1                         2 2
          
2
e
Probxi            Probxi  2  2
ProbC1 | xi 
pi                                  ,
ProbC1 | xi   ProbC2 | xi 
where xi is attribute value of sample si .

8
Lecture notes of Dr. Yen-Jen Oyang
   For example, for an object with value=52,
we have


52  47.632
ProbC1 | 52 
0 .5        1             2(3.79) 2
         e
Probx52 2  (3.79)
0.5
                0.1357
Probx52 2

5258.532
ProbC 2 | 52 
0 .5          1
e 2(5.57 )
2

Probx52 2  5.57
0.5
                      0.0903
Probx52 2
ProbC1 | x52                  0.1357
pi                                                    0.600,
ProbC1 | x52  ProbC 2 | x52 0.1357  0.0903
9
Lecture notes of Dr. Yen-Jen Oyang
The  new estimated values of parameters
are computed as follows.

n                                n

px      i       i             (1  p ) x i       i
1      i 1
n
; 2    i 1
n

p
i 1
i                     (1  p )
i 1
i

n                                             n

 pi xi  1                2
 (1  pi )xi  1            2

 12    i 1
n
; 2 
2     i 1
n

p
i 1
i                              (1  p )
i 1
i

n

p           i
P (C1 )         i 1
n
10
Lecture notes of Dr. Yen-Jen Oyang
 The process is repeated until the clustering
results converge.
 Generally, we attempt to maximize the
following likelihood function:

 P(C ) Px |C   P(C ) Px |C .
i
1     i   1     2     i   2

11
Lecture notes of Dr. Yen-Jen Oyang
   Once we have figured out the approximate
parameter values, then we assign sample
si into C1, if
ProbC1 | xi 
pi                                     0.5,
ProbC1 | xi   ProbC2 | xi 
 xi  1 2
Probxi | C1  ProbC1  ProbC1                         
ProbC1 | xi  
1                         2 1
          
2
e
Probxi            Probxi  2  1
 xi  2 2
Probxi | C2  ProbC2  ProbC2                           
ProbC2 | xi  
1                            2 2
          
2
e                                 .
Probxi            Probxi  2  2
   Otherwise, si is assigned into C2.

12
Lecture notes of Dr. Yen-Jen Oyang
The Finite Mixture Model for
Multiple Attributes
 The finite mixture model described above can
be easily generalized to handle multiple
independent attributes.
 For example, in a case that has two
independent attributes, then the distribution
function of cluster j is of form:


x xj 2         
 y  yj 2
f j  x, y  
1                 2 xj                 2 yj
e                   e
2 xj yj
13
Lecture notes of Dr. Yen-Jen Oyang
 Assume that there are 3 clusters in a 2-
dimensional data set. Then, we have 14
parameters to be determined: x1, y1, x1,
y1, x2, y2, x2, y1, x3, y3, x3, y3, P(C1),
and P(C2).
 The probability that sample si belongs to Cj
is:
Prob( xi , yi ) | C j ProbC j 
ProbC j | ( xi , yi ) 
Prob( xi , yi )
 xi   x j 2             yi   y j 2
ProbC j              1           
2 x j
2

2 2 j
                                  e                         e             y

Prob( xi , yi ) 2 xj yj
ProbC j | ( xi , yi )
p ji 
ProbC1 | ( xi , yi )  ProbC2 | ( xi , yi )  ProbC3 | ( xi , yi )
14
Lecture notes of Dr. Yen-Jen Oyang
   The new estimated values of the
parameters are computed as follows:
n                                  n

p             x
ji i              p          ji   yi
 xj    i 1
n
;  yj        i 1
n

p
i 1
ji                   p
i 1
ji

 p ji xi   xj                                    p ji  yi   yj 
n                                                    n
2                                             2

 xj 
2      i 1
n
;  yj 
2            i 1
n

p
i 1
ji                                 p
i 1
ji

n

p           ji
P (C j )        i 1
n                                                                                        15
Lecture notes of Dr. Yen-Jen Oyang
Limitation of the Finite Mixture
Model and the EM Algorithm

 The finite mixture model and the EM
algorithm generally assume that the attributes
are independent.
 Approaches have been proposed for handling
correlated attributes. However, these
approaches are subject to further limitations.
 Furthermore, the EM algorithm may converge
to a local optimal state.
16
Lecture notes of Dr. Yen-Jen Oyang
Generalization of the Finite Mixture
Model and the EM Algorithm

 The finite mixture model and the EM
algorithm can be generalized to handle other
types of probability distributions.
 For example, if we want to partition the
objects into k clusters based on m
independent nominal attributes, then we can
apply the EM algorithm to figure out the
parameters required to describe the
distribution.
17
Lecture notes of Dr. Yen-Jen Oyang
   In this case, the total number of parameters
m
is equal to (k  1)  k A ,

i 1
i

where Ai is the number of possible values of
attribute Ai .

   The term     k Ai   is due to the need to
compute
ProbC j | ( v1 , v2 ,...,vm )
Prob( v1 , v2 ,...,vm ) | C j ProbC j 

Prob( v1 , v2 ,...,vm )
ProbC j 
 Probv              |Cj
m
                                                         .
Prob( v1 , v2 ,...,vm    )
i
i 1

18
Lecture notes of Dr. Yen-Jen Oyang
   If two attributes are correlated, then we can
merge these two attributes to form an
attribute with |Ai| |Aj| possible values.

19
An Example

   Assume that we want to partition 100
samples of a particular species of insects into
3 clusters according to 4 attributes:
–   Color(Ac): milk, light brown, or dark brown;
–   Head shape(Ah): spherical or triangular;
–   Body length(Al): long or short;
–   Weight(Aw): heavy or light.

20
   If we determine that body length and weight are
correlated, then we create a composite attribute
As:(length, weight) with 4 possible values: (L, H), (L,
L), (S, H), and (S, L).
   We can figure out the values of the parameters in the
following table with the EM algorithm, in addition to
P(C1), P(C2), and P(C3):
Color     Head shape   (Body length, Weight)

C1    P(M|C1)   P(S|C1)      P((L,H)|C1), P((S,H)|C1)
P(L|C1)   P(T|C1)      P((L,L)|C1), P((S,L)|C1)
P(D|C1)

C2    P(M|C2)   P(S|C2)      P((L,H)|C2), P((S,H)|C2)
P(L|C2)   P(T|C2)      P((L,L)|C2), P((S,L)|C2)
P(D|C2)

C3    P(M|C3)   P(S|C3)      P((L,H)|C3), P((S,H)|C3)
P(L|C3)   P(T|C3)      P((L,L)|C3), P((S,L)|C3)
P(D|C3)
21
Lecture notes of Dr. Yen-Jen Oyang
 We invoke the EM algorithm with an initial
guess of these parameter values.
 For each sample si=(v1, v2, v3), we
compute the following probabilities:
Prob( v1 , v2 , v3 ) | C j ProbC j 
ProbC j | ( v1 , v2 , v3 ) 
Prob( v1 , v2 , v3 )
Probv1 | C j Probv2 | C j Probv3 | C j ProbC j 

Prob( v1 , v2 , v3 )
ProbC j | ( v1 , v2 , v3 )( xi , yi )
p ji 
ProbC1 | ( v1 , v2 , v3 )  ProbC2 | ( v1 , v2 , v3 )  ProbC3 | ( v1 , v2 , v3 )

22
Lecture notes of Dr. Yen-Jen Oyang
   The new estimated values of the
parameters are computed as follows:

n

 (color( s )  M )  p
i             ji
P( M | C j )           i 1
n

p
i 1
ji

n

p         ji
P (C j )    i 1
n

23
Lecture notes of Dr. Yen-Jen Oyang

```
To top