Face Recognition
Shared by: dfhdhdhdhjr
-
Stats
- views:
- 26
- posted:
- 9/5/2012
- language:
- English
- pages:
- 26
Document Sample


Face Recognition:
A Literature Survey
By:
W. Zhao, R. Chellappa, P.J. Phillips,
and A. Rosenfeld
Presented By:
Shane Brennan
5/02/2005
Early Methods of Recognition
● Early methods treated recognition as a problem of
2D pattern recognition.
● Methods included distance-measuring algorithms.
These determined the distances between
important features and compared these distances
to the distances on known faces.
● Fairly inaccurate, and performs poorly with
variations in orientation and size.
● Does well with variations in intensity.
More Modern Approaches
● Among appearance-based methods eigenfaces
and fisherfaces have been proved effective in
experiments involving large databases.
● Feature-based graph matching approaches have
been successful as well, and are less sensitive to
variations in illumination and viewpoint, as well
as inaccuracy in face localization.
● Feature extraction techniques in graph matching
approaches are currently inadequate.
● Example: cannot detect an eye if the eyelid is
closed.
Lessons That Have Been Learned
● The upper half of the face aids more in
recognition then the bottom half. Bottom lighting
may actually make it more difficult to recognize a
face.
● The nose is not as significant as the eyes, ears,
and mouth in recognition. Although in profile
views, a distinctive nose can help greatly in
recognition.
● Low-frequency components play a dominant role,
enough to identify the gender of the face.
Although higher frequency bands are necessary
for recognition.
Three Aspects of Recognition
● Face detection: Locating the faces in an image or
video sequence.
● Feature extraction: Finding the location of eyes,
nose, mouth, etc.
● Face recognition: Identifying the face(s) in the
input image or video.
● Face detection and feature extraction may be
performed simultaneously.
Face Detection
● Considered successful if the presence and rough
location of a face is correctly identified.
● Two statistics are important: True positives
(correct detection), and false positives (incorrect
detection).
● Multi-view based methods do much better than
invariant feature methods when head rotation is
large.
Face Detection, continued
● By treating this problem as a two-class problem
false positives can be reduced while maintaining
a high true positive rate. This can be done by
retraining systems with a high number of false-
positives. (Bootstrapping)
● Appearance-based methods have achieved the
best results in face detection, compared to
feature-based and template-matching methods.
Feature Extraction
● Is the most important part of face recognition.
Even holistic methods need accurate location of
features for normalization.
● Some methods use feature restoration, which fills
in occluded parts of the face using symmetry.
● Three basic approaches: Edge detection methods,
feature-template methods, and structural
matching methods that take into consideration
geometric constraints on features.
Feature Extraction, continued
● Early methods used template approaches that
focused on individual features.
● These methods fail when those important features
are occluded or obscured.
● More recent methods use structural matching
methods like Active Shape Modeling (ASM).
These methods are more robust in terms of
handling variations in image intensity and feature
shape.
ASM
● Create a model of the features that you wish to
find. This model is defined by a series of “model
points” as well as the connection between points.
● Overlay the model onto the image. Examine the
region around each model point to find the best
match in the image that fits that point. Move the
model point to that image point and update the
model.
● The matching is usually done using image edges.
● Repeat this process for several iterations until
convergence (model points do not move far).
Examples of ASM Implementations: A successful match (top),
and a semi-successful match (bottom).
ASM, continued
● Suppose you take k samples on either side of a
model point, this provides a vector of 2k+1
sample points. Call this vector gi.
● Normalize the sample by dividing by the sum of
absolute element values: gi = gi / ( ∑ j |gij| )
● Repeat this for each training image to obtain a set
of normalized samples {gi} for each model point.
● Assume the set of normalized samples are
distributed as a multivariate Gaussian, and find
mean gmean and covariance Sg.
● Repeat this process for each model point.
ASM, continued
● The quality of fit (measure of accuracy) of a new
sample gs to the model is given by:
f(gs) = (gs – gmean)T *Sg-1 * (gs – gmean)
● This is the Mahalanobis distance. Minimizing
f(gs) is equivalent to maximizing the probability
that gs comes from the distribution.
● This iterative process can be sped up by the use
of multi-resolution (coarse to fine) feature
matching.
Facial Recognition
● One successful facial recognition system has been
the use of eigenfaces.
● This involves projecting an input image into a
lower-dimension “facespace” and then computing
the distance between the projected input image
and known faces.
● More detail on eigenfaces will be provided in my
next presentation.
Linear Discriminant Analysis
● Face systems using Linear Discriminant Analysis
(LDA) have also been successful.
● Training of LDA systems is carried out via scatter
matrix analysis.
● So for an M-class problem, the within- and
between-class scatter matrices Sw and Sb are
computed as follows:
Sw = ∑ Pr(wi) * Ci
i=1 to M
Sb = ∑ Pr(wi) * (mi – m )(mi – m )T
i=1 to M 0 0
● Where Pr(wi) is the prior class probability, and is
typically assigned the value 1/M.
Linear Discriminant Analysis, cont.
● Ci is the average scatter matrix (conditional
covariance matrix) and is defined as:
Ci = E[(x(w) – mi)(x(w) – mi)T | w = wi]
● Sw shows the average scatter (Ci) of the sample
vectors x of different classes wi around their
respective means (mi).
● Sb shows the scatter of the conditional mean
vectors (mi) around the overall mean vector (m ).
0
● A measure for quantifying discriminatory power
is: Γ(T) = |TTSbT| / |TTSwT|
● The projection matrix W which optimizes this
function can be found by solving the eigenvalue
problem: SbW = SWWΛW
Linear Discriminant Analysis, cont.
● The basic method of this algorithm is that
classification is performed by projecting the input
x into a subspace via a projection/basis matrix Proj
(W from previous slide).
Z = Proj * x
● Comparing the projection coefficient vector, Z
of the input to all pre-stored projection vectors
of known and labeled classes you can identify
and label the input vector.
● The vector comparison varies in different
systems. PCA algorithms tend to use either the
angle or the Euclidean distance.
PDBNN
● A proposed fully automatic face detection and
recognition system based on Probabilistic
Decision-Based Neural Networks has been
proposed.
● It consists of three modules: A face detector, eye
localizer, and face recognizer.
● The PDBNN does not use the lower face. This
excludes the influence of facial expressions
(smiling, frowning, etc).
PDBNN, continued
● Breaks the input into two features at a resolution
of 14x10 pixels.
● The features are normalized intensity and edges.
● Each feature is fed into a separate PDBNN and
the final recognition result is the combination of
the output of each PDBNN.
● Advantages of this implementation are that it
converges quickly and is easily implemented on
distributed computing platforms.
A key feature is
that each individual
to be recognized has
a subnet in the PDBNN
devoted to them.
EBGM
● The most successful feature-based structural
matching approach has been the use of Elastic
Bunch Graph Matching (EBGM) systems.
● Local features represented by wavelet coefficients
for different rotations and scales.
● Wavelet bases are referred to as “jets.”
● Based on Dynamic Link Architecture (DLA).
● DLAs use synaptic plasticity to from sets of
neurons grouped into structured graphs in a
neural network.
EBGM, continued
● Basic mechanisms are Tij, the connection between
two neurons (i and j), and Jij, a dynamic variable.
● These J-variables are the synaptic weights for
signal transmission among neurons.
● The T-parameters act as constraints to the J-
variables. Small changes in T over time from
synaptic plasticity will cause the J-variables to
change as well.
● A new image is recognized by transforming the
image into a grid of jets and comparing this grid
to those of known images.
EBGM, continued
● This basic DLA architecture is extended to
EBGM by attaching a set of jets to each grid
node, instead of just one jet.
● Each jet in the set is derived from a different
stored (known) face image.
● This EBGM method has been applied to:
Face detection and extraction, pose estimation,
gender classification, sketch-image-based
recognition, and general object recognition.
On the left is an image graph. The graph is positioned over the input image.
At each node, the local jet around the corresponding image point is computed
and stored. This pattern of jets is used to represent the pattern clases. A new
image is recognized by transforming it into a grid of jets and comparing it to
known models.
EBGM (represented by the image on the right) works the same way, but at each
node is a set of jets, each derived from a different face image. Pose variation is
handled by determining the pose of the face using prior class information. The
jet transformations under variations in pose are then learned.
Results and Conclusions
● The Subspace LDA system, EBGM system, and
probabilistic eigenface system are judged to be
the top three methods of face recognition based
on the accuracy of the results.
● Each method has different levels of performance
on different subsets of images.
● When the number of training samples per class is
large, LDA performs best. When only one or two
samples are available per face class, PCA
(eigenface) is a better choice.
Other Interesting Results
● It has been demonstrated that the image size can
be very small and recognition methods will still
perform well.
For LDA system: 12x11 pixels
For PDBNN: 12x11 pixels
For human perception: 24x18 pixels
● It is interesting to note that the algorithms can
recognize faces at a lower resolution than the
human brain can.
Get documents about "