Face Recognition

Document Sample
Face Recognition Powered By Docstoc
					    Face Recognition:
    A Literature Survey

W. Zhao, R. Chellappa, P.J. Phillips,
         and A. Rosenfeld

          Presented By:
          Shane Brennan

       Early Methods of Recognition
●   Early methods treated recognition as a problem of
    2D pattern recognition.
●   Methods included distance-measuring algorithms.
    These determined the distances between
    important features and compared these distances
    to the distances on known faces.
●   Fairly inaccurate, and performs poorly with
    variations in orientation and size.
●   Does well with variations in intensity.
         More Modern Approaches
●   Among appearance-based methods eigenfaces
    and fisherfaces have been proved effective in
    experiments involving large databases.
●   Feature-based graph matching approaches have
    been successful as well, and are less sensitive to
    variations in illumination and viewpoint, as well
    as inaccuracy in face localization.
●   Feature extraction techniques in graph matching
    approaches are currently inadequate.
●   Example: cannot detect an eye if the eyelid is
    Lessons That Have Been Learned
●   The upper half of the face aids more in
    recognition then the bottom half. Bottom lighting
    may actually make it more difficult to recognize a
●   The nose is not as significant as the eyes, ears,
    and mouth in recognition. Although in profile
    views, a distinctive nose can help greatly in
●   Low-frequency components play a dominant role,
    enough to identify the gender of the face.
    Although higher frequency bands are necessary
    for recognition.
       Three Aspects of Recognition

●   Face detection: Locating the faces in an image or
    video sequence.
●   Feature extraction: Finding the location of eyes,
    nose, mouth, etc.
●   Face recognition: Identifying the face(s) in the
    input image or video.
●   Face detection and feature extraction may be
    performed simultaneously.
                  Face Detection

●   Considered successful if the presence and rough
    location of a face is correctly identified.
●   Two statistics are important: True positives
    (correct detection), and false positives (incorrect
●   Multi-view based methods do much better than
    invariant feature methods when head rotation is
         Face Detection, continued

●   By treating this problem as a two-class problem
    false positives can be reduced while maintaining
    a high true positive rate. This can be done by
    retraining systems with a high number of false-
    positives. (Bootstrapping)
●   Appearance-based methods have achieved the
    best results in face detection, compared to
    feature-based and template-matching methods.
              Feature Extraction

●   Is the most important part of face recognition.
    Even holistic methods need accurate location of
    features for normalization.
●   Some methods use feature restoration, which fills
    in occluded parts of the face using symmetry.
●   Three basic approaches: Edge detection methods,
    feature-template methods, and structural
    matching methods that take into consideration
    geometric constraints on features.
       Feature Extraction, continued
●   Early methods used template approaches that
    focused on individual features.
●   These methods fail when those important features
    are occluded or obscured.
●   More recent methods use structural matching
    methods like Active Shape Modeling (ASM).
    These methods are more robust in terms of
    handling variations in image intensity and feature
●   Create a model of the features that you wish to
    find. This model is defined by a series of “model
    points” as well as the connection between points.
●   Overlay the model onto the image. Examine the
    region around each model point to find the best
    match in the image that fits that point. Move the
    model point to that image point and update the
●   The matching is usually done using image edges.
●   Repeat this process for several iterations until
    convergence (model points do not move far).
Examples of ASM Implementations: A successful match (top),
           and a semi-successful match (bottom).
                ASM, continued
●   Suppose you take k samples on either side of a
    model point, this provides a vector of 2k+1
    sample points. Call this vector gi.
●   Normalize the sample by dividing by the sum of
    absolute element values: gi = gi / ( ∑ j |gij| )
●   Repeat this for each training image to obtain a set
    of normalized samples {gi} for each model point.
●   Assume the set of normalized samples are
    distributed as a multivariate Gaussian, and find
    mean gmean and covariance Sg.
●   Repeat this process for each model point.
                ASM, continued
●   The quality of fit (measure of accuracy) of a new
    sample gs to the model is given by:
    f(gs) = (gs – gmean)T *Sg-1 * (gs – gmean)
●   This is the Mahalanobis distance. Minimizing
    f(gs) is equivalent to maximizing the probability
    that gs comes from the distribution.
●   This iterative process can be sped up by the use
    of multi-resolution (coarse to fine) feature
              Facial Recognition

●   One successful facial recognition system has been
    the use of eigenfaces.
●   This involves projecting an input image into a
    lower-dimension “facespace” and then computing
    the distance between the projected input image
    and known faces.
●   More detail on eigenfaces will be provided in my
    next presentation.
       Linear Discriminant Analysis
●   Face systems using Linear Discriminant Analysis
    (LDA) have also been successful.
●   Training of LDA systems is carried out via scatter
    matrix analysis.
●   So for an M-class problem, the within- and
    between-class scatter matrices Sw and Sb are
    computed as follows:
    Sw = ∑     Pr(wi) * Ci
           i=1 to M

    Sb = ∑    Pr(wi) * (mi – m )(mi – m )T
          i=1 to M              0        0

●   Where Pr(wi) is the prior class probability, and is
    typically assigned the value 1/M.
    Linear Discriminant Analysis, cont.
●   Ci is the average scatter matrix (conditional
    covariance matrix) and is defined as:
    Ci = E[(x(w) – mi)(x(w) – mi)T | w = wi]
●   Sw shows the average scatter (Ci) of the sample
    vectors x of different classes wi around their
    respective means (mi).
●   Sb shows the scatter of the conditional mean
    vectors (mi) around the overall mean vector (m ).

●   A measure for quantifying discriminatory power
    is:    Γ(T) = |TTSbT| / |TTSwT|
●   The projection matrix W which optimizes this
    function can be found by solving the eigenvalue
    problem:     SbW = SWWΛW
    Linear Discriminant Analysis, cont.
●   The basic method of this algorithm is that
    classification is performed by projecting the input
    x into a subspace via a projection/basis matrix Proj
    (W from previous slide).
                     Z = Proj * x
●   Comparing the projection coefficient vector, Z
    of the input to all pre-stored projection vectors
    of known and labeled classes you can identify
    and label the input vector.
●   The vector comparison varies in different
    systems. PCA algorithms tend to use either the
    angle or the Euclidean distance.
●   A proposed fully automatic face detection and
    recognition system based on Probabilistic
    Decision-Based Neural Networks has been
●   It consists of three modules: A face detector, eye
    localizer, and face recognizer.
●   The PDBNN does not use the lower face. This
    excludes the influence of facial expressions
    (smiling, frowning, etc).
              PDBNN, continued

●   Breaks the input into two features at a resolution
    of 14x10 pixels.
●   The features are normalized intensity and edges.
●   Each feature is fed into a separate PDBNN and
    the final recognition result is the combination of
    the output of each PDBNN.
●   Advantages of this implementation are that it
    converges quickly and is easily implemented on
    distributed computing platforms.
A key feature is
  that each individual
  to be recognized has
  a subnet in the PDBNN
  devoted to them.

●   The most successful feature-based structural
    matching approach has been the use of Elastic
    Bunch Graph Matching (EBGM) systems.
●   Local features represented by wavelet coefficients
    for different rotations and scales.
●   Wavelet bases are referred to as “jets.”
●   Based on Dynamic Link Architecture (DLA).
●   DLAs use synaptic plasticity to from sets of
    neurons grouped into structured graphs in a
    neural network.
               EBGM, continued
●   Basic mechanisms are Tij, the connection between
    two neurons (i and j), and Jij, a dynamic variable.
●   These J-variables are the synaptic weights for
    signal transmission among neurons.
●   The T-parameters act as constraints to the J-
    variables. Small changes in T over time from
    synaptic plasticity will cause the J-variables to
    change as well.
●   A new image is recognized by transforming the
    image into a grid of jets and comparing this grid
    to those of known images.
               EBGM, continued
●   This basic DLA architecture is extended to
    EBGM by attaching a set of jets to each grid
    node, instead of just one jet.
●   Each jet in the set is derived from a different
    stored (known) face image.
●   This EBGM method has been applied to:
    Face detection and extraction, pose estimation,
    gender classification, sketch-image-based
    recognition, and general object recognition.
On the left is an image graph. The graph is positioned over the input image.
At each node, the local jet around the corresponding image point is computed
and stored. This pattern of jets is used to represent the pattern clases. A new
image is recognized by transforming it into a grid of jets and comparing it to
known models.

EBGM (represented by the image on the right) works the same way, but at each
node is a set of jets, each derived from a different face image. Pose variation is
handled by determining the pose of the face using prior class information. The
jet transformations under variations in pose are then learned.
          Results and Conclusions
●   The Subspace LDA system, EBGM system, and
    probabilistic eigenface system are judged to be
    the top three methods of face recognition based
    on the accuracy of the results.
●   Each method has different levels of performance
    on different subsets of images.
●   When the number of training samples per class is
    large, LDA performs best. When only one or two
    samples are available per face class, PCA
    (eigenface) is a better choice.
           Other Interesting Results
●   It has been demonstrated that the image size can
    be very small and recognition methods will still
    perform well.
    For LDA system: 12x11 pixels
    For PDBNN: 12x11 pixels
    For human perception: 24x18 pixels
●   It is interesting to note that the algorithms can
    recognize faces at a lower resolution than the
    human brain can.

Shared By: