Principal Component Analysis and Multidimentional Scaling
Shared by: bllbr2S
-
Stats
- views:
- 7
- posted:
- 6/19/2012
- language:
- English
- pages:
- 58
Document Sample


Principal Component Analysis
and
Multidimentional Scaling
Kamal Nasrollahi, Assistant Professor
Computer Vision and Media Technology Lab
Aalborg University
kn@create.aau.dk
But first a review...
http://www.flickr.com/photos/phopix/2402472389/
But first a review...
http://www.flickr.com/photos/7568142@N06/4837454543/
But first a review...
What a computer sees!
Do you see a bee?
But first a review...
How we know that it is a bee?
Using a sensor (eye in this case)
We gather some data of the bee (pattern)
We extract some features from the
gathered data
The extracted features are represented in a
way that some classifier can use them to
finally recognize the pattern.
But first a review...
What exactly are features?
Colour, texture, shape, etc.
Animal with 4 legs.
Bee.
Flying bee.
These vary a lot!
Some imply some sort of ‘recognition’
e.g. How do I know the bee is flying?
But first a review...
Broad types of features
Low-level
Colour, texture
Middle-level
Object with head and two wings
Bee
High-level
Bee is flying.
Bee is sitting.
But first a review...
Low-level features
Objective
Directly reflect specific image and video
features
Colour
Texture
Shape
Etc
But first a review...
Middle-level features
Some degree of subjectivity
They are typically one solution of a problem
with multiple solutions.
Segmentation
Optical Flow
Identification
Etc.
But first a review...
High-level features
Semantic Interpretation
The bee is flying!
The bee is sitting on a flower!
But first a review...
The semantic gap
From the last reference
But first a review...
Features & Decisions
From the last reference
But first a review...
Classifiers
How do I map my M inputs to my N outputs?
Mathematical tools:
Distance-based classifiers
Rule-based classifiers
Neural Networks
Support Vector Machines
From the last reference
But first a review...
Classifiers
Statistical pattern recognition
based on statistical characterizations of patterns,
assuming that the patterns are generated by a
probabilistic system.
Syntactical (or structural) pattern recognition
based on the structural interrelationships of
features.
From the last reference
But first a review...
Knowledge representation
A Representation is a set of syntactic and semantic conventions,
that make it possible to describe something.
Syntax
The syntax of a representation specifies the
symbols that can be used, and the way these
symbols can be combined into words.
Semantic
The semantic of a representation specifies the
coding of its meaning, as well as the way words
can be combined into meaningful phrases.
From the last reference
But first a review...
Knowledge representation
How do I mathematically represent
knowledge?
Various factors:
• Features
• Grammar and Languages
• Rules
– Fuzzy Logic
• Semantic networks
• etc
From the last reference
But first a review...
Features
Features are not a pure representation
They are fundamental blocks of more
complex representations.
Typically:
Numeric representation of something.
From the last reference
Features
Good features:
• Objects from the same class have similar feature values.
• Objects from different classes have different values.
“Good” features “Bad” features
Principal Component Analysis
Face Recognition
Problems:
Dimensionality of each face vector will be very large
(250,000 for a 512X512 image!)
Raw gray levels are sensitive to noise, and lighting
conditions.
Solution:
Reduce dimensionality of face space by finding principal
components (eigen vectors) to span the face space
Only a few most significant eigen vectors can be used to
represent a face, thus reducing the dimensionality
Eigen Vectors and Eigen Values
The eigen vector, x, of a matrix A is a special vector, with
the following property
Ax x Where is called eigen value
To find eigen values of a matrix A first find the roots of:
det( A I ) 0
Then solve the following linear system for each eigen
value to find corresponding eigen vector
( A I ) x 0
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values
1 2 0
A 0 3 4
0 0 7
Eigen Values 1 7, 2 3, 3 1
1 1 1
x 1 4 , x 2 2 , x 3 0
Eigen Vectors
4
0
0
Eigen Vectors and Eigen Values
det( A I ) 0
1 2 0 1 0 0
det( 0 3 4 0 1 0 ) 0
0 0 7
0 0 1
1 2 0
det( 0
3 4 ) 0
0
0 7
(1 )((3 )(7 ) 0) 0
(1 )(3 )(7 ) 0
1, 3, 7
Eigen Vectors and Eigen Values
1 ( A I ) x 0
1 2 0 1 0 0 x1 0
( 0 3 4 0 1 0 ) x2 0
0 0 7 0 0 1 x3 0
0 2 0 x1 0 0 2 x2 0 0
0 4 4 x 0 0 4 x2 4 x3 0
2
0 0 8 x3 0 0 0 8 x3 0
x1 1, x2 0, x3 0
1
x 1 0
0
Principal Component Analysis
(PCA)
− Problems arise when performing recognition in a high-dimensional
space (curse of dimensionality).
− Significant improvements can be achieved by first mapping the data
into a lower-dimensional sub-space.
− The goal of PCA is to reduce the dimensionality of the data while
retaining as much as possible of the variation present in the original
dataset.
Principal Component Analysis
(PCA)
− PCA allows us to compute a linear transformation that maps data
from a high dimensional space to a lower dimensional sub-space.
Principal Component Analysis
(PCA)
− Approximate vectors by finding a basis in an appropriate lower
dimensional space.
(1) Higher-dimensional space representation:
(2) Lower-dimensional space representation:
Principal Component Analysis
(PCA)
− Dimensionality reduction implies information loss !!
− Want to preserve as much information as possible, that is:
• How to determine the best lower dimensional sub-space?
Principal Component Analysis
(PCA)
− Suppose x1, x2, ..., xM are N x 1 vectors
Principal Component Analysis
(PCA)
Principal Component Analysis
(PCA)
• Linear transformation implied by PCA
− The linear transformation RN RK that performs the
dimensionality reduction is:
Principal Component Analysis
(PCA)
• Geometric interpretation
− PCA projects the data along the directions where the data varies the
most.
− These directions are determined by the eigenvectors of the
covariance matrix corresponding to the largest eigenvalues.
− The magnitude of the eigenvalues corresponds to the variance of
the data along the eigenvector directions.
Principal Component Analysis
(PCA)
• How to choose the principal components?
− To choose K, use the following criterion:
Principal Component Analysis
(PCA)
• What is the error due to dimensionality reduction?
− We saw above that an original vector x can be reconstructed using
its principal components:
− It can be shown that the low-dimensional basis based on principal
components minimizes the reconstruction error:
− It can be shown that the error is equal to:
Principal Component Analysis
(PCA)
• Standardization
− The principal components are dependent on the units used to
measure the original variables as well as on the range of values
they assume.
− We should always standardize the data prior to using PCA.
− A common standardization method is to transform all the data to
have zero mean and unit standard deviation:
Principal Component Analysis
(PCA)
• PCA and classification
− PCA is not always an optimal dimensionality-reduction procedure for
classification purposes:
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
− The distance er is called distance within the face space (difs)
− Comment: we can use the common Euclidean distance to compute
er, however, it has been reported that the Mahalanobis distance
performs better:
Multidimensional Scaling
Multidimensional scaling (MDS)
is a set of related statistical techniques often used in
information visualization for exploring similarities or
dissimilarities in data.
An MDS algorithm starts with a matrix of item–
item similarities
then assigns a location to each item in N-dimensional
space, where N is specified a priori.
For sufficiently small N, the resulting locations may
be displayed in a graph or 3D visualisation.
Multidimensional Scaling
MDS algorithms fall into one of the following
groups, depending on the meaning of the input
matrix:
Classical
takes an input matrix giving dissimilarities between pairs of items and
outputs a coordinate matrix whose configuration minimizes a loss function
called strain
Metric
A superset of classical MDS that generalizes the optimization procedure to a
variety of loss functions and input matrices of known distances with weights
and so on. A useful loss function in this context is called stress which is
often minimized using a procedure called Strees Majorization.
Non-Metric
Generalized
Multidimensional Scaling
MDS algorithms fall into one of the following
groups, depending on the meaning of the input
matrix:
Classical
Metric
Non-Metric
In contrast to metric MDS, non-metric MDS both finds a non-parametric
monotonic relationship between the dissimilarities in the item-item matrix
and the Euclidean distance between items, and the location of each item in
the low-dimensional space. The relationship is typically found using isotonic
regression.
Generalized
Multidimensional Scaling
MDS algorithms fall into one of the following
groups, depending on the meaning of the input
matrix:
Classical
Metric
Non-Metric
Generalized
An extension of metric multidimensional scaling, in which the target space is
an arbitrary smooth non-Euclidean space. In case when the dissimilarities
are distances on a surface and the target space is another surface, GMDS
allows finding the minimum-distortion embedding of one surface into
another.
Multidimensional Scaling
From a non-technical point of view,
the purpose of multidimensional scaling (MDS) is to
provide a visual representation of the pattern of
proximities (i.e., similarities or distances) among a
set of objects.
For example, given a matrix of perceived similarities
between various brands of air fresheners, MDS plots
the brands on a map such that those brands that are
perceived to be very similar to each other are placed
near each other on the map, and those brands that
are perceived to be very different from each other
are placed far away from each other on the map.
Multidimensional Scaling
Exercise!
Multidimensional Scaling
From a slightly more technical point of view,
what MDS does is find a set of vectors in p-dimensional space
such that
the matrix of Euclidean distances among them
corresponds as closely as possible to some function of the input
matrix
according to a criterion function called stress.
Multidimensional Scaling
A simplified view of the algorithm is as follows:
Assign points to arbitrary coordinates in p-dimensional space.
Compute Euclidean distances among all pairs of points, to form
the Dhat matrix.
Compare the Dhat matrix with the input D matrix by evaluating
the stress function. The smaller the value, the greater the
correspondence between the two.
Adjust coordinates of each point in the direction that best
maximally stress.
Repeat steps 2 through 4 until stress won't get any lower.
Slides are from:
• Karl Ricanek, Biometrics with Topics in Face Recognition, University of North
Carolina, Wilmington.
• Mubarak Shah, Computer Vision, University of Central Florida, Orlando.
• Joshua I. Cohen, Face Recognition.
• Scarlet Schwiderski-Grosche, State of the Art in Biometrics, Royal Holloway
University of London.
• Yu Su, Shiguang Shan, Xilin Chen, Wen Gao, Patch-based Gabor Fisher
Classifier
for Face Recognition, Institute of Computing Technology, CAS, China.
• George Bebis, Bimetrics, Face Recognition Using Dimensionality Reduction PCA
and LDA, University of Nevada.
• http://www.analytictech.com/borgatti/mds.htm
• http://www.math.hmc.edu/calculus/tutorials/eigenstuff/eigenstuff.pdf
• Gonzalez & Woods, 3rd Ed, Chapter 12.
• Andrew Moore, Statistic Data Mining Tutorial,
http://www.autonlab.org/tutorials/
• Miguel Tavares Coimbra, VCS 2008 – 12, Pattern Recognition, Portugal.
Get documents about "