Principal Component Analysis and Multidimentional Scaling

Shared by: bllbr2S
Categories
Tags
-
Stats
views:
7
posted:
6/19/2012
language:
English
pages:
58
Document Sample
scope of work template
							  Principal Component Analysis
               and
     Multidimentional Scaling




 Kamal Nasrollahi, Assistant Professor
Computer Vision and Media Technology Lab
           Aalborg University
            kn@create.aau.dk
But first a review...




        http://www.flickr.com/photos/phopix/2402472389/
But first a review...




      http://www.flickr.com/photos/7568142@N06/4837454543/
But first a review...




                  What a computer sees!
                   Do you see a bee?
But first a review...

How we know that it is a bee?
  Using a sensor (eye in this case)
    We gather some data of the bee (pattern)
  We extract some features from the
   gathered data
  The extracted features are represented in a
   way that some classifier can use them to
   finally recognize the pattern.
But first a review...

What exactly are features?
  Colour, texture, shape, etc.
  Animal with 4 legs.
  Bee.
  Flying bee.
These vary a lot!
Some imply some sort of ‘recognition’
  e.g. How do I know the bee is flying?
But first a review...

Broad types of features
  Low-level
    Colour, texture
  Middle-level
    Object with head and two wings
    Bee
  High-level
    Bee is flying.
    Bee is sitting.
But first a review...

Low-level features
  Objective
  Directly reflect specific image and video
   features
    Colour
    Texture
    Shape
    Etc
But first a review...

Middle-level features
  Some degree of subjectivity
  They are typically one solution of a problem
   with multiple solutions.
    Segmentation
    Optical Flow
    Identification
    Etc.
But first a review...

High-level features
  Semantic Interpretation
    The bee is flying!
    The bee is sitting on a flower!
But first a review...

The semantic gap




                        From the last reference
But first a review...

Features & Decisions




                        From the last reference
But first a review...

Classifiers
  How do I map my M inputs to my N outputs?
  Mathematical tools:
     Distance-based classifiers
     Rule-based classifiers
     Neural Networks
     Support Vector Machines



                                       From the last reference
But first a review...

Classifiers
  Statistical pattern recognition
     based on statistical characterizations of patterns,
      assuming that the patterns are generated by a
      probabilistic system.
  Syntactical (or structural) pattern recognition
     based on the structural interrelationships of
      features.



                                                      From the last reference
But first a review...

Knowledge representation
  A Representation is a set of syntactic and semantic conventions,
  that make it possible to describe something.
  Syntax
     The syntax of a representation specifies the
      symbols that can be used, and the way these
      symbols can be combined into words.
  Semantic
     The semantic of a representation specifies the
      coding of its meaning, as well as the way words
      can be combined into meaningful phrases.
                                                           From the last reference
But first a review...

Knowledge representation
  How do I mathematically represent
   knowledge?
    Various factors:
       • Features
       • Grammar and Languages
       • Rules
           – Fuzzy Logic
       • Semantic networks
       • etc

                                       From the last reference
But first a review...

Features
  Features are not a pure representation
  They are fundamental blocks of more
   complex representations.
  Typically:
    Numeric representation of something.




                                            From the last reference
Features
Good features:
                 • Objects from the same class have similar feature values.
                 • Objects from different classes have different values.




             “Good” features                     “Bad” features
Principal Component Analysis

Face Recognition
  Problems:
    Dimensionality of each face vector will be very large
      (250,000 for a 512X512 image!)
    Raw gray levels are sensitive to noise, and lighting
      conditions.
  Solution:
     Reduce dimensionality of face space by finding principal
      components (eigen vectors) to span the face space
     Only a few most significant eigen vectors can be used to
      represent a face, thus reducing the dimensionality
Eigen Vectors and Eigen Values

 The eigen vector, x, of a matrix A is a special vector, with
 the following property

             Ax  x            Where  is called eigen value

  To find eigen values of a matrix A first find the roots of:

                     det( A  I )  0
 Then solve the following linear system for each eigen
 value to find corresponding eigen vector

                     ( A  I ) x  0
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values
Eigen Vectors and Eigen Values

                      1 2 0
                 A   0 3 4
                            
                      0 0 7
                            

Eigen Values       1  7, 2  3, 3  1

                        1           1           1 
                  x 1   4  , x 2   2  , x 3  0 
 Eigen Vectors
                                                 
                         4
                                    0 
                                                  0 
                                                     
Eigen Vectors and Eigen Values

         det( A  I )  0
              1 2 0    1 0 0
         det( 0 3 4   0 1 0 )  0
                              
              0 0 7
                        0 0 1 
                                

             1    2       0 
        det( 0
                    3      4 )  0
                                 
             0
                     0     7  
                                 
       (1   )((3   )(7   )  0)  0
       (1   )(3   )(7   )  0
         1,      3,   7
Eigen Vectors and Eigen Values

     1         ( A  I ) x  0
               1 2 0 1 0 0  x1  0
            (  0 3 4    0 1 0  )  x2    0 
                                   
               0 0 7 0 0 1  x3  0
                                   

            0 2 0  x1  0                   0  2 x2  0  0
            0 4 4   x   0                 0  4 x2  4 x3  0
                   2   
            0 0 8  x3  0                   0  0  8 x3  0
                     
                                                 x1  1, x2  0, x3  0


                            1 
                      x 1  0 
                             
                            0 
                             
Principal Component Analysis
(PCA)
 − Problems arise when performing recognition in a high-dimensional
   space (curse of dimensionality).
 − Significant improvements can be achieved by first mapping the data
   into a lower-dimensional sub-space.




 − The goal of PCA is to reduce the dimensionality of the data while
   retaining as much as possible of the variation present in the original
   dataset.
Principal Component Analysis
(PCA)
 − PCA allows us to compute a linear transformation that maps data
   from a high dimensional space to a lower dimensional sub-space.
Principal Component Analysis
(PCA)
 − Approximate vectors by finding a basis in an appropriate lower
   dimensional space.
       (1) Higher-dimensional space representation:




       (2) Lower-dimensional space representation:
 Principal Component Analysis
 (PCA)
   − Dimensionality reduction implies information loss !!
   − Want to preserve as much information as possible, that is:




• How to determine the best lower dimensional sub-space?
Principal Component Analysis
(PCA)
 − Suppose x1, x2, ..., xM are N x 1 vectors
Principal Component Analysis
(PCA)
 Principal Component Analysis
 (PCA)
• Linear transformation implied by PCA
  − The linear transformation RN  RK that performs the
    dimensionality reduction is:
 Principal Component Analysis
 (PCA)
• Geometric interpretation
 − PCA projects the data along the directions where the data varies the
   most.
 − These directions are determined by the eigenvectors of the
   covariance matrix corresponding to the largest eigenvalues.
 − The magnitude of the eigenvalues corresponds to the variance of
   the data along the eigenvector directions.
Principal Component Analysis
(PCA)
• How to choose the principal components?

   − To choose K, use the following criterion:
Principal Component Analysis
(PCA)
• What is the error due to dimensionality reduction?
   − We saw above that an original vector x can be reconstructed using
     its principal components:




   − It can be shown that the low-dimensional basis based on principal
     components minimizes the reconstruction error:



   − It can be shown that the error is equal to:
 Principal Component Analysis
 (PCA)
• Standardization
   − The principal components are dependent on the units used to
     measure the original variables as well as on the range of values
     they assume.
   − We should always standardize the data prior to using PCA.
   − A common standardization method is to transform all the data to
     have zero mean and unit standard deviation:
 Principal Component Analysis
 (PCA)
• PCA and classification
    − PCA is not always an optimal dimensionality-reduction procedure for
      classification purposes:
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA
Face Recognition Using PCA

 − The distance er is called distance within the face space (difs)
 − Comment: we can use the common Euclidean distance to compute
   er, however, it has been reported that the Mahalanobis distance
   performs better:
Multidimensional Scaling

Multidimensional scaling (MDS)
  is a set of related statistical techniques often used in
   information visualization for exploring similarities or
   dissimilarities in data.
An MDS algorithm starts with a matrix of item–
 item similarities
  then assigns a location to each item in N-dimensional
   space, where N is specified a priori.
  For sufficiently small N, the resulting locations may
   be displayed in a graph or 3D visualisation.
Multidimensional Scaling

MDS algorithms fall into one of the following
 groups, depending on the meaning of the input
 matrix:
  Classical
      takes an input matrix giving dissimilarities between pairs of items and
      outputs a coordinate matrix whose configuration minimizes a loss function
      called strain
  Metric
     A superset of classical MDS that generalizes the optimization procedure to a
      variety of loss functions and input matrices of known distances with weights
      and so on. A useful loss function in this context is called stress which is
      often minimized using a procedure called Strees Majorization.
  Non-Metric
  Generalized
Multidimensional Scaling

MDS algorithms fall into one of the following
 groups, depending on the meaning of the input
 matrix:
  Classical
  Metric
  Non-Metric
     In contrast to metric MDS, non-metric MDS both finds a non-parametric
      monotonic relationship between the dissimilarities in the item-item matrix
      and the Euclidean distance between items, and the location of each item in
      the low-dimensional space. The relationship is typically found using isotonic
      regression.
  Generalized
Multidimensional Scaling

MDS algorithms fall into one of the following
 groups, depending on the meaning of the input
 matrix:
  Classical
  Metric
  Non-Metric
  Generalized
     An extension of metric multidimensional scaling, in which the target space is
      an arbitrary smooth non-Euclidean space. In case when the dissimilarities
      are distances on a surface and the target space is another surface, GMDS
      allows finding the minimum-distortion embedding of one surface into
      another.
Multidimensional Scaling

From a non-technical point of view,
  the purpose of multidimensional scaling (MDS) is to
   provide a visual representation of the pattern of
   proximities (i.e., similarities or distances) among a
   set of objects.
  For example, given a matrix of perceived similarities
   between various brands of air fresheners, MDS plots
   the brands on a map such that those brands that are
   perceived to be very similar to each other are placed
   near each other on the map, and those brands that
   are perceived to be very different from each other
   are placed far away from each other on the map.
Multidimensional Scaling



           Exercise!
Multidimensional Scaling

 From a slightly more technical point of view,
   what MDS does is find a set of vectors in p-dimensional space
    such that
   the matrix of Euclidean distances among them
   corresponds as closely as possible to some function of the input
    matrix
   according to a criterion function called stress.
Multidimensional Scaling

 A simplified view of the algorithm is as follows:
   Assign points to arbitrary coordinates in p-dimensional space.
   Compute Euclidean distances among all pairs of points, to form
    the Dhat matrix.
   Compare the Dhat matrix with the input D matrix by evaluating
    the stress function. The smaller the value, the greater the
    correspondence between the two.
   Adjust coordinates of each point in the direction that best
    maximally stress.
   Repeat steps 2 through 4 until stress won't get any lower.
Slides are from:
•   Karl Ricanek, Biometrics with Topics in Face Recognition, University of North
    Carolina, Wilmington.
•   Mubarak Shah, Computer Vision, University of Central Florida, Orlando.
•   Joshua I. Cohen, Face Recognition.
•   Scarlet Schwiderski-Grosche, State of the Art in Biometrics, Royal Holloway
    University of London.
•   Yu Su, Shiguang Shan, Xilin Chen, Wen Gao, Patch-based Gabor Fisher
    Classifier
    for Face Recognition, Institute of Computing Technology, CAS, China.
•   George Bebis, Bimetrics, Face Recognition Using Dimensionality Reduction PCA
    and LDA, University of Nevada.
•   http://www.analytictech.com/borgatti/mds.htm
•   http://www.math.hmc.edu/calculus/tutorials/eigenstuff/eigenstuff.pdf
•   Gonzalez & Woods, 3rd Ed, Chapter 12.
•   Andrew Moore, Statistic Data Mining Tutorial,
    http://www.autonlab.org/tutorials/
•   Miguel Tavares Coimbra, VCS 2008 – 12, Pattern Recognition, Portugal.

						
Related docs
Other docs by bllbr2S
function advert
Views: 0  |  Downloads: 0
MURRAY STATE UNIVERSITY - DOC
Views: 15  |  Downloads: 0
MOUNT CARMEL VARSITY GOLF ROSTER 2010
Views: 8  |  Downloads: 0
Los Angeles Unified School District
Views: 2  |  Downloads: 0
Field Events Handbook
Views: 0  |  Downloads: 0
EMGR eferral Tips
Views: 0  |  Downloads: 0
Monschau WM 3
Views: 19  |  Downloads: 0
2011 Show Jumping Schedule
Views: 1  |  Downloads: 0
Therapie des lumbalen Bandscheibenvorfalls
Views: 15  |  Downloads: 0
reprography levies
Views: 10  |  Downloads: 0