Embed
Email

Lec4_PCA

Document Sample

Shared by: ajizai
Categories
Tags
Stats
views:
0
posted:
12/4/2011
language:
English
pages:
15
Principal Component Regression Analysis





• Pseudo Inverse

• “Heisenberg Uncertainty” for Data Mining

• Explicit Principal Components

• Implicit Principal Components

• NIPALS Algorithm for Eigenvalues and Eigenvectors

• Scripts

- PCA transformation of data

- Pharma-plots

- PCA training and testing

- Bootstrap PCA

- NIPALS and other PCA algorithms

• Examples

• Feature selection

Classical Regression Analysis

 

X nm wm  yn

 

T

X mn X nm wm  X yn

T

mn



 T

 

X mn X nm wm T 

 X mn yn

X T

mn X nm 

1 T

 

X mn X nm wm 

 X X nm

T

mn 

1 T

X yn

mn





 X 

 T 1 T 

wm mn X nm X mn yn



X

T

mn Xnm 

1

XT

mn

Pseudo inverse

Penrose inverse

Least-Squares Optimization

The Machine Learning Paradox



 

X nm wm  yn





wm  X mn X nm

T

1 T

X yn

mn









If data are can learned from, they must have redundancy

X X

T 1 If there is redundancy, (XTX)-1 is ill-conditioned

- similar data patterns

- closely correlated descriptive features

Beyond Regression





wm  X mn X nm

T



1 T 

X mn yn



• Paul Werbos motivated beyond regression in 1972

• In addition, there are related statistical “duals” (PCA, PLS, SVM)

• Principal component analysis:

X nm  Tnm Bmm

Bmm : eigenvectors

Tnm : loading factors

X nm  Tnm Bmm Xnm  Tnh Bhm

Tnm  X nm Bmm T  X BT h = # Principal components

T

nh nm mh



• Trick: eliminate poor conditioning by using h PC’s (largest )





wm  Bmh Bhm X mn X nm Bmh

T T T



1 T 

Bhm X mn yn

• Now matrix to invert is small and well-conditioned

• Generally include ~ 2 - 4 - 6 PCAs

• A Better PCA Regression is PLS (Please Listen to Savanti Wold)

• A Better PLS is nonlinear PNLS

Explicit PCA Regression





• We had

 

X nm wm  yn





wm  X mn X nm

T



1 T

X yn

mn



• Assume we derive PCA features for A according to

Xnm  Tnh Bhm

Tnh  Xnm Bmh h = # Principal components

T





• We now have

  

ˆ n  Tnh wh  yn

y

wh  ThnTnh  Thn yn

 T 1 T 

Explicit PCA Regression on training/test set





• We have for training set:



 train

ˆ train   train

yn  Tnh wh  yn

yn  Tnh Thn Tnh  Thn yn

 train

ˆ train trainT train 1 trainT  train









• And for the test set:

 test  

ˆ  T test wtrain  y test

yk kh h k

 test

yk kh   

ˆ  T test T trainT T train 1T trainT y train

hn nh hn n



 

trainT train 1 trainT  train

 X km Bmh Thn Tnh

test T

Thn yn

Implicit PCA Regression

 

X nm wm  yn





wm  X mn X nm

T

1 T

X yn

mn



Xnm  Tnh Bhm

Tnh  Xnm Bmh h = # Principal components

T









ˆ 

ym  Bmh ThnTnh

T T



1 

T

Bhm B T yn T

mh hn



 1

 T 

 Bmh ThnTnh I hhThn yn

T T





 

1 T 

 Bmh ThnTnh Thn yn

T T





How to apply? Calculate T and B with NIPALS algorithm

Determine b, and apply to data matrix

 

ˆ X w y 

yn nm m1 n

Algorithm



Xnm  Tnh Bhm

Tnh  Xnm Bmh h = # Principal components

T





• The B matrix is a matrix of eigenvectors of the correlation matrix C

• If the features are zero centered we have:



Cmm  1

n 1

T

X mn X nm



• We only consider the h eigenvectors corresponding to largest eigenvalues

• The eigenvalues are the variances

• Eigenvectors are normalized to 1 and solutions of:

 

Cmm wm   wm

Cmm Bmm  Bmm



s.t. w  1

• Use NIPALS algorithm to build up B and T

NIPALS Algorithm: Part 2

Xnm  Tnh Bhm

Tnh  Xnm Bmh h = # Principal components

T





1. Estimate t

 

e.g. tnest  ah

1

 

2. bm1  Amntnest

T

1

 

 b1 b1

3. bm1  m2   Tm

t t 1n tn1

 

4. tn  X nmbm



 tn

5. tn   

T

bm bm

 est 

6. Go to step1 until convergenc with tn  tn

e

  

  T 

t T tn bm bm

7.   n

n 1

8. Deflate X according to

 T

X nm  X nm  tnbm

9. Go to step 2 and repeat for h PCA' s

10. Put t's in Tnh and put b' s in B m h

PRACTICAL TIPS FOR PCA





• NIPALS algorithm assumes the features are zero centered

• It is standard practice to do a Mahalanobis scaling of the data



xi  x

x scaled



 x 

i









• PCA regression does not consider the response data

• The t’s are called the scores

• Use 3-10 PCA’s

• I usually use 4 PCA’s

• It is common practice to drop 4 sigma outlier features

(if there are many features)

PCA with Analyze







• Several options: option #17 for training and #18 for testing

(the weight vectors after training is in file bbmatrixx.txt)

• The file num_eg.txt contains a number equal to # PCAs

• Option –17 is the NIPALS algorithm and generally faster than 17

• SAnalyze has options for calculating T’s, B’s and ’s

- option #36 transforms a data matrix to it’s PCAs

- option #36 also saves eigenvalues and eigenvectors of XTX

• Analyze has also option for bootstrap PCA (-33)

StripMiner Scripts





• last lecture: iris_pca.bat (make PCAs and visualize)

• iris.bat (split up data in training and validation set and predict)

• iris_boot.bat (bootstrap prediction)

REM PCA REGRESIION MODEL FOR IRIS DATA

REM GENERATE IRIS DATA (5)

analyze iris.txt 3301

REM ELIMINATE COMMAS

analyze iris.txt 100

REM MAHALANOBOIS SCALE

analyze iris.txt.txt 3

REM GENERATE # PCAs (5)

analyze num_eg.txt 105

REM SPLIT DATA (100 2)

analyze iris.txt.txt.txt 20

copy cmatrix.txt a.pat

copy dmatrix.txt a.tes

REM MAKE PCA REGRESSION MODEL

analyze a.pat 17

analyze a.tes 18

pause

REM VISUALIZE RESULTS

analyze resultss.xxx 4

copy results.ttt results.xxx

analyze resultss.ttt 4

analyze results.ttt 3313

pause

Bootstrap Prediction (iris_boo.bat)





• Make different models for training set

• Predict Test set on average model

REM PCA BOOTSTRAP REGRESIION MODEL FOR IRIS DATA

REM GENERATE IRIS DATA (5)

analyze iris.txt 3301

REM ELIMINATE COMMAS

analyze iris.txt 100

REM MHALANOBOIS SCALE

analyze iris.txt.txt 3

REM GENERATE # PCAs (5)

analyze num_eg.txt 105

REM SPLIT DATA (100 2)

analyze iris.txt.txt.txt 20

copy cmatrix.txt a.pat

copy dmatrix.txt a.tes

REM MAKE PCA BOOTRTRAP REGRESSION MODEL (7 100)

analyze a.pat 33

REM MAKE PREDICTIONS

analyze a.tes 18

pause

REM VISUALIZE RESULTS

analyze resultss.xxx 4

copy results.ttt results.xxx

analyze resultss.ttt 4

analyze results.ttt 3313

pause

Neural Network Interpretation of PCA

PCA in DATA SPACE

Means that the similarity score with each data point will be weighed

(i.e.., effectively incorporating Mahalanobis scaling in data space)



 TNH THN

T



1





Σ





Σ y1

x1 Σ This layer gives a similarity score

Σ

with each datapoint

... Σ

Σ Σ ˆ

yi

xi yi

Σ

Σ

Kind of a nearest

xM T Σ neighbor weighted

BMH T yM

T HN prediction score

Weights correspond to Weights correspond to

H eigenvectors Σ the dependent variable

corresponding to for the entire training data



 

largest eigenvalues

T 1 

of XTX Weights correspond to bM  BMHTHN TNH THN y N

T T

the scores

or PCAs for the

entire training set

 T T

 

T 1 

yi  xiT BMHTHN TNH THN y N

ˆ



Other docs by ajizai
Fall 2010
Views: 0  |  Downloads: 0
Math 111
Views: 0  |  Downloads: 0
Training_listing_275360_7
Views: 1  |  Downloads: 0
C4-051739
Views: 0  |  Downloads: 0
DEFINITIONS
Views: 0  |  Downloads: 0
Unit POPULATIONS
Views: 0  |  Downloads: 0
albhed
Views: 0  |  Downloads: 0
price_list
Views: 9  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!