Expectation-Maximization

Document Sample
Expectation-Maximization Powered By Docstoc
					Expectation-Maximization

            Markoviana Reading Group
                      Fatih Gelgi, ASU, 2005




6/17/2012                                      1
 Outline
   What is EM?
   Intuitive Explanation
        Example: Gaussian Mixture
   Algorithm
   Generalized EM
   Discussion
   Applications
        HMM – Baum-Welch
        K-means

6/17/2012                Fatih Gelgi, ASU’05   2
What is EM?
    Two main applications:
           Data has missing values, due to problems with or
            limitations of the observation process.

           Optimizing the likelihood function is extremely hard,
            but the likelihood function can be simplified by assuming
            the existence of and values for additional missing or
            hidden parameters.
            *  arg max L ( | U )  arg max p (U | )
                                              

                                                                 M                     
                                                                    j p j ui |  j 
                         N                                   N
             arg max     pu   i   |    arg max          j 1
                                                            i 1 
                                                                                        
                        i 1                                                          

6/17/2012                             Fatih Gelgi, ASU’05                                   3
Key Idea…
   The observed data U is generated by some
    distribution and is called the incomplete data.

   Assume that a complete data set exists Z =
    (U,J), where J is the missing or hidden data.

   Maximize the posterior probability of the
    parameters  given the data U, marginalizing over
    J:
             *  arg max P(, J | U )
                     



6/17/2012                    Fatih Gelgi, ASU’05        4
Intuitive Explanation of EM
   Alternate between estimating the unknowns  and
    the hidden variables J.

   In each iteration, instead of finding the best J  J,
    compute a distribution over the space J.

   EM is a lower-bound maximization process
    (Minka,98).

        E-step: construct a local lower-bound to the posterior
         distribution.

        M-step: optimize the bound.

6/17/2012                    Fatih Gelgi, ASU’05                  5
Intuitive Explanation of EM
   Lower-bound approximation method

                                   ** Sometimes provides
                                   faster convergence
                                   than gradient descent
                                   and Newton’s method




6/17/2012        Fatih Gelgi, ASU’05                       6
Example:
Mixture Components




6/17/2012   Fatih Gelgi, ASU’05   7
Example (cont’d):
True Likelihood of Parameters




6/17/2012   Fatih Gelgi, ASU’05   8
Example (cont’d):
Iterations of EM




6/17/2012   Fatih Gelgi, ASU’05   9
    Lower-bound Maximization
Posterior probability  Logarithm of the joint distribution

      arg max P (, J | U )
       *

                

                                           P (U , J
     arg max log P (U , )  arg max log difficult!!!, )
                                               J J n



    Idea: start with a guess t, compute an easily
     computed lower-bound B(; t) to the function log
     P(|U) and maximize the bound instead.

    6/17/2012              Fatih Gelgi, ASU’05                10
Lower-bound Maximization (cont.)
   Construct a tractable lower-bound B(; t)
    that contains a sum of logarithms.



    ft(J) is an arbitrary prob. dist.
   By Jensen’s inequality,



6/17/2012              Fatih Gelgi, ASU’05       11
Optimal Bound
   B(; t) touches the objective function log
    P(U,) at t.
   Maximize B(t; t) with respect to ft(J):



   Introduce a Lagrange multiplier  to enforce
    the constraint



6/17/2012            Fatih Gelgi, ASU’05           12
Optimal Bound (cont.)
   Derivative with respect to ft(J):



   Maximizes at:




6/17/2012           Fatih Gelgi, ASU’05   13
Maximizing the Bound
   Re-write B(;t) with respect to the expectations:




    where




   Finally,

6/17/2012                   Fatih Gelgi, ASU’05          14
EM Algorithm




   EM converges to a local maximum of
    log P(U,)  maximum of log P(|U).


6/17/2012         Fatih Gelgi, ASU’05     15
A Relation to the Log-Posterior
   An alternative way to compute expected
    log-posterior:

    which is the same as maximization with
    respect to ,



6/17/2012         Fatih Gelgi, ASU’05    16
Generalized EM
   Assume ln p( X | ) and B function are differentiable in
      .The EM likelihood converges to a point where
                  
                    ln p( X | )  0
                 

   GEM: Instead of setting t+1 = argmax B(;t)
    Just find t+1 such that
    B(;t+1) > B(;t)

   GEM also is guaranteed to converge

6/17/2012                   Fatih Gelgi, ASU’05                17
HMM – Baum-Welch Revisited
 Estimate the parameters (a, b, ) st. number of correct individual
 states to be maximum.

                                             gt(i) is the probability of
                                             being in state Si at time t




                                             xt(i,j) is the probability of
                                             being in state Si at time t,
                                             and Sj at time t+1




6/17/2012                     Fatih Gelgi, ASU’05                            18
Baum-Welch: E-step




6/17/2012   Fatih Gelgi, ASU’05   19
Baum-Welch: M-step




6/17/2012   Fatih Gelgi, ASU’05   20
K-Means
   Problem: Given data X and the number of
    clusters K, find clusters.
   Clustering based on centroids,
                    1       
            μ(c)        x
                   | c | xc
                         

   A point belongs to the cluster with closest
    centroid.
   Hidden variables centroids of the clusters!

6/17/2012                 Fatih Gelgi, ASU’05   21
K-Means (cont.)
Starting with an initial 0, centroids,
 E-step: Split the data into K clusters
  according to distances to the centroids
  (Calculate the distribution ft(J)).

   M-step: Update the centroids
    (Calculate t+1).

6/17/2012         Fatih Gelgi, ASU’05       22
K Means Example
(K=2)
                                      Pick seeds
                                      Reassign clusters
                                      Compute centroids
                                      Reassign clusters
            x      x                  Compute centroids
        x
                       x
                                      Reassign clusters

                                      Converged!




6/17/2012       Fatih Gelgi, ASU’05                       23
Discussion
   Is EM a Primal-Dual algorithm?




6/17/2012         Fatih Gelgi, ASU’05   24
Reference:
   A.P.Dempster et al “Maximum-likelihood from incomplete data
    Journal of the Royal Statistical Society. Series B
    (Methodological), Vol. 39, No. 1. (1977), pp. 1-38.
   F. Dellaert, “The Expectation Maximization Algorithm”, Tech.
    Rep. GIT-GVU-02-20, 2002.
   T. Minka, “Expectation-Maximization as lower bound
    maximization”, 1998
   Y. Chang, M. Kölsch. Presentation: Expectation Maximization,
    UCSB, 2002.
   K. Andersson, Presentation: Model Optimization using the EM
    algorithm, COSC 7373, 2001


6/17/2012                  Fatih Gelgi, ASU’05                     25
        Thanks!




6/17/2012         Fatih Gelgi, ASU’05   26

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:11
posted:6/17/2012
language:
pages:26