# Principal Component Analysis and Multidimentional Scaling

Shared by:
Categories
Tags
-
Stats
views:
7
posted:
6/19/2012
language:
English
pages:
58
Document Sample

Principal Component Analysis
and
Multidimentional Scaling

Kamal Nasrollahi, Assistant Professor
Computer Vision and Media Technology Lab
Aalborg University
kn@create.aau.dk
But first a review...

http://www.flickr.com/photos/phopix/2402472389/
But first a review...

http://www.flickr.com/photos/7568142@N06/4837454543/
But first a review...

What a computer sees!
Do you see a bee?
But first a review...

How we know that it is a bee?
Using a sensor (eye in this case)
We gather some data of the bee (pattern)
We extract some features from the
gathered data
The extracted features are represented in a
way that some classifier can use them to
finally recognize the pattern.
But first a review...

What exactly are features?
Colour, texture, shape, etc.
Animal with 4 legs.
Bee.
Flying bee.
These vary a lot!
Some imply some sort of ‘recognition’
e.g. How do I know the bee is flying?
But first a review...

Low-level
Colour, texture
Middle-level
Object with head and two wings
Bee
High-level
Bee is flying.
Bee is sitting.
But first a review...

Low-level features
Objective
Directly reflect specific image and video
features
Colour
Texture
Shape
Etc
But first a review...

Middle-level features
Some degree of subjectivity
They are typically one solution of a problem
with multiple solutions.
Segmentation
Optical Flow
Identification
Etc.
But first a review...

High-level features
Semantic Interpretation
The bee is flying!
The bee is sitting on a flower!
But first a review...

The semantic gap

From the last reference
But first a review...

Features & Decisions

From the last reference
But first a review...

Classifiers
How do I map my M inputs to my N outputs?
Mathematical tools:
Distance-based classifiers
Rule-based classifiers
Neural Networks
Support Vector Machines

From the last reference
But first a review...

Classifiers
Statistical pattern recognition
based on statistical characterizations of patterns,
assuming that the patterns are generated by a
probabilistic system.
Syntactical (or structural) pattern recognition
based on the structural interrelationships of
features.

From the last reference
But first a review...

Knowledge representation
A Representation is a set of syntactic and semantic conventions,
that make it possible to describe something.
Syntax
The syntax of a representation specifies the
symbols that can be used, and the way these
symbols can be combined into words.
Semantic
The semantic of a representation specifies the
coding of its meaning, as well as the way words
can be combined into meaningful phrases.
From the last reference
But first a review...

Knowledge representation
How do I mathematically represent
knowledge?
Various factors:
• Features
• Grammar and Languages
• Rules
– Fuzzy Logic
• Semantic networks
• etc

From the last reference
But first a review...

Features
Features are not a pure representation
They are fundamental blocks of more
complex representations.
Typically:
Numeric representation of something.

From the last reference
Features
Good features:
• Objects from the same class have similar feature values.
• Objects from different classes have different values.

Principal Component Analysis

Face Recognition
Problems:
Dimensionality of each face vector will be very large
(250,000 for a 512X512 image!)
Raw gray levels are sensitive to noise, and lighting
conditions.
Solution:
Reduce dimensionality of face space by finding principal
components (eigen vectors) to span the face space
Only a few most significant eigen vectors can be used to
represent a face, thus reducing the dimensionality
Eigen Vectors and Eigen Values

The eigen vector, x, of a matrix A is a special vector, with
the following property

Ax  x            Where  is called eigen value

To find eigen values of a matrix A first find the roots of:

det( A  I )  0
Then solve the following linear system for each eigen
value to find corresponding eigen vector

( A  I ) x  0
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values

 1 2 0
A   0 3 4
       
 0 0 7
       

Eigen Values       1  7, 2  3, 3  1

1           1           1 
x 1   4  , x 2   2  , x 3  0 
Eigen Vectors
                         
 4
            0 
            0 
 
Eigen Vectors and Eigen Values

det( A  I )  0
 1 2 0    1 0 0
det( 0 3 4   0 1 0 )  0
                 
 0 0 7
           0 0 1 
      

 1    2       0 
det( 0
        3      4 )  0

 0
         0     7  

(1   )((3   )(7   )  0)  0
(1   )(3   )(7   )  0
  1,      3,   7
Eigen Vectors and Eigen Values

  1         ( A  I ) x  0
 1 2 0 1 0 0  x1  0
(  0 3 4    0 1 0  )  x2    0 
                     
 0 0 7 0 0 1  x3  0
                     

0 2 0  x1  0                   0  2 x2  0  0
0 4 4   x   0                 0  4 x2  4 x3  0
       2   
0 0 8  x3  0                   0  0  8 x3  0
         
x1  1, x2  0, x3  0

1 
x 1  0 
 
0 
 
Principal Component Analysis
(PCA)
− Problems arise when performing recognition in a high-dimensional
space (curse of dimensionality).
− Significant improvements can be achieved by first mapping the data
into a lower-dimensional sub-space.

− The goal of PCA is to reduce the dimensionality of the data while
retaining as much as possible of the variation present in the original
dataset.
Principal Component Analysis
(PCA)
− PCA allows us to compute a linear transformation that maps data
from a high dimensional space to a lower dimensional sub-space.
Principal Component Analysis
(PCA)
− Approximate vectors by finding a basis in an appropriate lower
dimensional space.
(1) Higher-dimensional space representation:

(2) Lower-dimensional space representation:
Principal Component Analysis
(PCA)
− Dimensionality reduction implies information loss !!
− Want to preserve as much information as possible, that is:

• How to determine the best lower dimensional sub-space?
Principal Component Analysis
(PCA)
− Suppose x1, x2, ..., xM are N x 1 vectors
Principal Component Analysis
(PCA)
Principal Component Analysis
(PCA)
• Linear transformation implied by PCA
− The linear transformation RN  RK that performs the
dimensionality reduction is:
Principal Component Analysis
(PCA)
• Geometric interpretation
− PCA projects the data along the directions where the data varies the
most.
− These directions are determined by the eigenvectors of the
covariance matrix corresponding to the largest eigenvalues.
− The magnitude of the eigenvalues corresponds to the variance of
the data along the eigenvector directions.
Principal Component Analysis
(PCA)
• How to choose the principal components?

− To choose K, use the following criterion:
Principal Component Analysis
(PCA)
• What is the error due to dimensionality reduction?
− We saw above that an original vector x can be reconstructed using
its principal components:

− It can be shown that the low-dimensional basis based on principal
components minimizes the reconstruction error:

− It can be shown that the error is equal to:
Principal Component Analysis
(PCA)
• Standardization
− The principal components are dependent on the units used to
measure the original variables as well as on the range of values
they assume.
− We should always standardize the data prior to using PCA.
− A common standardization method is to transform all the data to
have zero mean and unit standard deviation:
Principal Component Analysis
(PCA)
• PCA and classification
− PCA is not always an optimal dimensionality-reduction procedure for
classification purposes:
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA

− The distance er is called distance within the face space (difs)
− Comment: we can use the common Euclidean distance to compute
er, however, it has been reported that the Mahalanobis distance
performs better:
Multidimensional Scaling

Multidimensional scaling (MDS)
is a set of related statistical techniques often used in
information visualization for exploring similarities or
dissimilarities in data.
An MDS algorithm starts with a matrix of item–
item similarities
then assigns a location to each item in N-dimensional
space, where N is specified a priori.
For sufficiently small N, the resulting locations may
be displayed in a graph or 3D visualisation.
Multidimensional Scaling

MDS algorithms fall into one of the following
groups, depending on the meaning of the input
matrix:
Classical
 takes an input matrix giving dissimilarities between pairs of items and
outputs a coordinate matrix whose configuration minimizes a loss function
called strain
Metric
A superset of classical MDS that generalizes the optimization procedure to a
variety of loss functions and input matrices of known distances with weights
and so on. A useful loss function in this context is called stress which is
often minimized using a procedure called Strees Majorization.
Non-Metric
Generalized
Multidimensional Scaling

MDS algorithms fall into one of the following
groups, depending on the meaning of the input
matrix:
Classical
Metric
Non-Metric
In contrast to metric MDS, non-metric MDS both finds a non-parametric
monotonic relationship between the dissimilarities in the item-item matrix
and the Euclidean distance between items, and the location of each item in
the low-dimensional space. The relationship is typically found using isotonic
regression.
Generalized
Multidimensional Scaling

MDS algorithms fall into one of the following
groups, depending on the meaning of the input
matrix:
Classical
Metric
Non-Metric
Generalized
An extension of metric multidimensional scaling, in which the target space is
an arbitrary smooth non-Euclidean space. In case when the dissimilarities
are distances on a surface and the target space is another surface, GMDS
allows finding the minimum-distortion embedding of one surface into
another.
Multidimensional Scaling

From a non-technical point of view,
the purpose of multidimensional scaling (MDS) is to
provide a visual representation of the pattern of
proximities (i.e., similarities or distances) among a
set of objects.
For example, given a matrix of perceived similarities
between various brands of air fresheners, MDS plots
the brands on a map such that those brands that are
perceived to be very similar to each other are placed
near each other on the map, and those brands that
are perceived to be very different from each other
are placed far away from each other on the map.
Multidimensional Scaling

Exercise!
Multidimensional Scaling

 From a slightly more technical point of view,
what MDS does is find a set of vectors in p-dimensional space
such that
the matrix of Euclidean distances among them
corresponds as closely as possible to some function of the input
matrix
according to a criterion function called stress.
Multidimensional Scaling

 A simplified view of the algorithm is as follows:
Assign points to arbitrary coordinates in p-dimensional space.
Compute Euclidean distances among all pairs of points, to form
the Dhat matrix.
Compare the Dhat matrix with the input D matrix by evaluating
the stress function. The smaller the value, the greater the
correspondence between the two.
Adjust coordinates of each point in the direction that best
maximally stress.
Repeat steps 2 through 4 until stress won't get any lower.
Slides are from:
•   Karl Ricanek, Biometrics with Topics in Face Recognition, University of North
Carolina, Wilmington.
•   Mubarak Shah, Computer Vision, University of Central Florida, Orlando.
•   Joshua I. Cohen, Face Recognition.
•   Scarlet Schwiderski-Grosche, State of the Art in Biometrics, Royal Holloway
University of London.
•   Yu Su, Shiguang Shan, Xilin Chen, Wen Gao, Patch-based Gabor Fisher
Classifier
for Face Recognition, Institute of Computing Technology, CAS, China.
•   George Bebis, Bimetrics, Face Recognition Using Dimensionality Reduction PCA
•   http://www.analytictech.com/borgatti/mds.htm
•   http://www.math.hmc.edu/calculus/tutorials/eigenstuff/eigenstuff.pdf
•   Gonzalez & Woods, 3rd Ed, Chapter 12.
•   Andrew Moore, Statistic Data Mining Tutorial,
http://www.autonlab.org/tutorials/
•   Miguel Tavares Coimbra, VCS 2008 – 12, Pattern Recognition, Portugal.

Related docs
Other docs by bllbr2S
MURRAY STATE UNIVERSITY - DOC
MOUNT CARMEL VARSITY GOLF ROSTER 2010
Los Angeles Unified School District