Factor Analysis
Statistical Learning Theory
Fall 2005
Outline
General Motivation
Definition/Derivation
The Graphical Model
Implications/Interpretations
Maximum Likelihood Estimation
Motivation
Application (using the EM Algorithm)
Motivation
Know: Discrete Mixture Models (ch.10)
Application: HMM
Want: Continuous Mixture Models
Application: ??
Definition: Factor Analysis
We consider here density estimation, but Factor Analysis can
be extended to regression and classification problems.
Consider a “high-d” data vector V in R n such that the entries
of V lie “near” a lower-dimension manifold M. Then the
factor analysis model is a product of the following
assumptions:
1. A point in M is generated according to a PDF.
2. V is then generated conditionally according to another (simple)
PDF, centered on a point in M.
3. M is a linear subspace of Rn
Another Definition…
Factor analysis is a statistical technique that originated in
psychometrics. It is used in the social sciences and in
marketing, product management, operations research, and
other applied sciences that deal with large quantities of data.
The objective is to explain the most of the variability among
a number of observable random variables in terms of a
smaller number of unobservable random variables called
factors. The observable random variables are modeled as
linear combinations of the factors, plus "error" terms.
~[Wikipedia]
The Graphical Model
X p
NOTE: p < q
Y q
Derivation
We assume:
Now we need:
and
Derivation cont’d…
Identities:
These imply:
Derivation cont’d…
Let
Then
Result #1: The Joint Distribution
So now we can say that the joint is a gaussian
distribution with:
So that
Calculating the Conditional…
The results of chapter 13’s discussion of the marginalization
and conditioning of the multi-variate gaussian yield:
(see equations 13.26 and 13.27 in [Jordan])
Implementation Issues
The derived expressions require the inversion of a
qxq matrix.
Jordan claims that the following forms are
equivalent:
Note that these only require the inversion of a pxp
matrix! (recall that p
Interpretations…
Our discussion of Factor analysis so far can
be seen as a discussion of an update
process.
Before data Y is observed, X is a gaussian
distribution about the origin of the lower
dimension subspace M.
Observing Y=y, in a sense, updates the
distribution of X as given by our derivation of
E(X|y) and Var(X|y).
Geometric Interpretation
Y=y
y3
Rp=3
µ M
y2
y1
(see Ch. 14 p.7)