Expectation Maximization _EM_ Algorithm by pptfiles

VIEWS: 15 PAGES: 20

									 Least-Mean-Square Training
of Cluster-Weighted-Modeling
     National Taiwan University
  Department of Computer Science
    and Information Engineering
Outline

• Introduction of CWM
• Least-Mean-Square Training of CWM
• Experiments
• Summary
• Future work
• Q&A
Cluster-Weighted Modeling (CWM)

• CWM is a supervised learning model which are based
   on the joint probability density estimation of a set of
   input and output (target) data.
• The joint probability is expended into clusters which
   describe local subspaces well. Each local Gaussian
   expert can have its own local function
 (constant, linear or quadratic function).
• The global (nonlinear) model can be constructed by
   combining all the local models.
• The resulting model has transparent local structures
   and meaningful parameters.
Architecture
• sdff
Prediction calculation
• Conditional forecast:
  The expected output
  given the input.



• Conditional error
  (output uncertainty):
  The expected output
  covariance given the
  input
Training (EM Algorithm)
• Objective function: Log-likelihood function

• Initialize cluster means (k-means), variances (maximal
  range for each dimension). Initialize
       =1/M. M: Predetermined number of clusters.
• E-step: Evaluate the posterior probability



• M-step:
  Update clusters means


  Update prior probability
M-step ( Cont.)
• Define cluster-weighted expectation



• Update cluster-weighted covariance matrices

• Update cluster parameters    which maximizes
 the data likelihood

                where
• Update output covariance matrices
Least-Mean-Square Training of
CWM
• To train CWM’s model parameters from a least-
    squared perspective.
•   Minimizing squared error function of CWM’s
    training result to find another solution which can
    have a better accuracy.
•   To find another solution when CWM is trapped
    in local minima.
•   Applying supervised selection of cluster centers
    instead of unsupervised method.
LMS Learning Algorithm
The instantaneous error produced by sample n is



The prediction formula is

Using softmax function to constrain prior
  probability to have value between 0 and 1 and
  their summation equal to 1.
LMS Learning Algorithm (cont.)
• The derivation of gradients:
LMS CWM Learning Algorithm
• Initialization: Initialize
Using CWM’s training result. Initialize
Iterate until convergence:
 For n=1:N
  Estimate error
  Estimate gradients
  Update




End
E-step:


M-step:
Simple Demo

•   cwm1d
•   cwmprdemo
•   cwm2d
•   lms1d
Experiments

• A simple Sin function.
• LMS-CWM has a better interpolation result.
Mackey-Glass Chaotic Time Series
Prediction
• 1000 data points. We take the first 500
  points as training set, the last 500 points
  are chosen as test set.
• Single-step prediction
• Input: [s(t),s(t-6),s(t-12),s(t-18)]
• Output: s(t+85)
• Local linear model
• Number of clusters: 30
Results (1)




     CWM      LMS-CWM
Results (2)
• Learning curve
CWM                         LMS CWM




      MSE            CWM         LMS CWM
      Test set       0.0008027   0.0004480
      Training set   0.0006568   0.0004293
   Local Minima


    • The initial locations of four clusters.




The initial locations of four clusters   The resulting centers’ locations after each
                                         training session of CWM and LMS-CWM.
Summary
• A LMS learning method for CWM is presented.
• May lose the benefits of data density estimation and
  characterizing data.
• Provides an alternative training option.
• Parameters can be trained by EM and LMS
  alternatively.
• Combine both advantages of EM and LMS learning.
• LMS-CWM learning can be viewed as a refinement
  to CWM if only prediction accuracy is our main
  concern.
Future work

• Regularization.
• Comparison between different models
 (from theoretical, performance point of
 views)
Q&A



      Thank You!

								
To top