Survey paper Face Detection and Face Recognition by dsq22728


									                           Survey paper:
                Face Detection and Face Recognition
                                      By Hyun Hoi James Kim

1. Introduction

Face recognition is one of biometric methods identifying individuals by the features of face. Research in this
area has been conducted for more than 30 years; as a result, the current status of face recognition technology
is well advanced. Many commercial applications of face recognition are also available such as criminal
identification, security system, image and film processing.
    From the sequence of images captured by camera, the goal is to find best match with given image. Using a
pre-stored image database, the face recognition system should be able to identify or verify one or more
persons in the scene. Before face recognition is performed, the system should determine whether or not there
is a face in a given image or given video, a sequence of images. This process is called face detection. Once a
face is detected, face region should be isolated from the scene for the face recognition. The face detection and
face extraction are often performed simultaneously. The overall process is depicted in Fig 1.

 Input                                                                                             Identification
 image           Face Detection            Feature Extraction                   Face Recognition   or Verification

                                          Fig.1. Process of face recognition.

   Face recognition can be done in both a still image and video which has its origin in still-image face
recognition. Different approaches of face recognition for still images can be categorized into tree main groups
such as holistic approach, feature-based approach, and hybrid approach [1]:

   Holistic Approach
   In holistic approach, the whole face region is taken into account as input data into face detection system.
   Examples of holistic methods are eigenfaces (most widely used method for face recognition), probabilistic
   eigenfaces, fisherfaces, support vector machines, nearest feature lines (NFL) and independent-component
   analysis approaches. They are all based on principal component-analysis (PCA) techniques that can be
   used to simplify a dataset into lower dimension while retaining the characteristics of dataset.

   Feature-based Approach
   In feature-based approaches, local features on face such as nose, and then eyes are segmented and then
   used as input data for structural classifier. Pure geometry, dynamic link architecture, and hidden Markov
   model methods belong to this category.

   Hybrid Approach
   The idea of this method comes from how human vision system perceives both local feature and whole face.
   There are modular eigenfaces, hybrid local feature, shape-normalized, component-based methods in
   hybrid approach.

    In this paper, one or two of representative papers for each approach will be discussed in terms of
background, implementation, evaluation or/and known issues in each method. Three methods from holistic
approach will be discussed: (a) Eigenfaces Method [3] in section 2; (b) Feature Lines Method [2] in section
3; (c) Fisherfaces Method [6] in section 6. Also, two other methods, one from each feature-based approach:
(d) Hidden Markov Model Method [4] in section 4 and hybrid approach: (e)Component-based and
Morphable Models Method [5] in section 5 will be covered. Face detection will be discussed as a subset of
face recognition.

2. Eigenfaces Method

2.1 Background
Eigenfaces are a set of standardized face component based on statistical analysis of various face images.
Mathematically speaking, eigenfaces are a set of eigenvectors derived from the covariance matrix of a high-
dimensional vector that represents possible faces of humans. Any human face can be represented by linear
combination of eigenface images. For example, one person’s face can be represented by some portion of
eigenface of one type and some other portion of eigenface of another type, and so on. In Pentland’s paper [3],
motivated by principal component analysis (PCA), the author proposes this method, where principle
components of a face are extracted, encoded, and compared with database. PCA techniques are also known as
Karhunen-Loeve methods which choose a dimensionality reducing linear projection that maximizes the
scatter of all projected samples.
    The paper, though it was written more than 10 years ago, is regarded as a standard work in face
recognition research area; therefore, in a variety of other papers, authors compare their results, generated from
different approaches, with that of Pentland’s paper.

2.2 Calculating Eigenfaces
Eigenfaces approach is based on principal component analysis (PCA) to find the vectors that can best
represent the distribution of face images in image space. These vectors define subspace (face space) of face.
An image is treated as a point (or vector) in high dimensional vector space, and each vector that describes N
by N image is a linear combination of the original face images of length N . For example, typical image of
size 256 by 256 describes a vector of dimension 65,536, or a point in 65,535-dimension space. An ensemble
image is mapped to a collection of points in this huge space. It is necessary that the average of the training set
of face images should be calculated before we calculate the difference between each eigenface and the
average. Given the differences between two vectors, one can build the covariance matrix, from which the
eigenvectors are taken. The eigenvalues associated with the eigenvectors make it easy to rank the
eigenvectors according to their usefulness in characterizing the differences amongst the images.

2.3 Using Eigenfaces to Classify a Face Image and Detect Faces
A new face image is projected onto face space simply by multiplying the difference between the image and
the average mentioned in the section 2.1, and the result is multiplied by each eigenvector. The result of this
operation will be the weighted contribution of each eigenface in representing the input face image, treating the
eigenfaces as a basis set for face images. The Euclidean distance taken from each face class determines the
class that best matches the input image.
    Through eigenfaces, the system can detect the presence of face as well. The face image projected onto
face space does not change radically while any non-face image will look quite different; therefore, it is easy to
distinguish between face images and non-face images. Using this basic idea, image is projected onto face
space and then Euclidean distance is calculated between the mean-adjusted input image and the projection
onto face space. The distance is used as “faceness” so the result of calculating the distance is a “face map”,
where low values indicate that there is a face.

2.4 Evaluation and Issues
For experiment, sixteen subjects were digitized at three head orientations, three head sizes or scales, and three
lighting conditions. A six Gaussian pyramid was constructed for each image to simplify 512x512 pixels to
16x16 pixels. Various groups of sixteen images were selected and 2500 images were classified. The system
achieved 96% correct classification over light variation, 85% over head orientation, and 64% over head size.
    Since eigenfaces method directly applies PCA, it does not destroy any information of image by
exclusively processing only certain points, generally providing more accurate recognition results. But these
techniques are sensitive to variation in position and scale. Some serious issues are effects of background and
head size and orientation. Eigenface analysis, described above, does not distinguish the face from the
background. In many cases, significant part of image consists of background which leads to incorrect
classification because background is not required information for detection. Another issue is due to the
different head size of input image because neighborhood pixel correlation is lost under head size change. The
performance over size chance decreased to 64% in correctness and it suggests that there is a need for a multi-
scale approach where each face class includes images of the individual at several different sizes. Note that the
variation of light can also still be a problem if the light source is positioned in some specific directions. This
problem is addressed in Belhumeur’s paper [6] (see section 6).

3. Nearest Feature Line Method
3.1 Background
In Li’s paper [2], the author presents nearest feature lines method (NFL) for face recognition, which is one of
holistic matching methods to deal with problems in eigenfaces approach. To create a feature point in feature
space, it is assumed that there should be at least two prototype feature points available from the same class
(image). A line passing the two prototype features forms a feature line (FL) that generalizes the two feature
points as in Fig. 2. A feature line represents an approximate of two prototypes (images) that are captured
under different conditions, such as different head gaze direction or different light illumination. An input
image is then identified with a corresponding class, according to the distance between feature point of the
given image and FL of the prototype images. Compared to the standard eigenface method [3], NFL has lower
recognition error rates.

3.2 The Nearest Feature Line Method
Facial image is represented as a point in the feature space, which is an eigen space. The line x1 through x2 of
same class denoted as x1 x2 is called feature line of that class. The query which is input image x is projected

onto an FL, and the FL distance x and x1x2 is defined as d(x, x1x2) = |x – p|, where | . | is some norm. The
projection point can be found by setting up a vector line equation with parameter µ . Depending on the sign
of µ , the position of p can be left of x1, right of x2, or between x1 and x2 on the line. The greater the value of
the parameter is, the further the position of p from x1 or x2 becomes. The classification of the input image is
done as follows: Let x and x be two distinct prototype points in feature space.

                       i        j

                               Fig. 2. Generalizing two prototype feature points x1 and x2 [2].

                                                             c   c
The distance between image point x and each FL xi x j is calculated for each class c, where each pair i ≠ j.
This will generate N number of distances, and these N distances are sorted in ascending order with each of
them associated with class c*. The first rank gives the NFL classification consisting of the best matched class
c*, and the two best matched prototypes i* and j* of the class.
3.3 Comparison with standard Eigenspace method
In [2], a compound dataset of 1079 face images from 137 people are used. The dataset is selected from 4 well-
known databases (the Cambridge database, the Bern database, the Yale database, and the Harvard database)
and another database that the author created are used. These databases are constructed under different light
conditions, facial expressions, light source directions and viewpoints.
    Two test schemes with different dataset sizes are used to compare the error rates between generic
eigenface approaches and NFL approaches. In both of the schemes, the images of training set are not used for
query set to avoid any biases. The error rate with NFL method turns out to be about 40% to 60 % of that of
the standard eigenface method, which means NFL method has better performance in terms of recognition
under variation in viewpoint, illumination, and expression. The results imply that by adapting the NFL
method, the issues in conventional eigenfaces method can be addressed thanks to the feature lines’ ability to
expand the representational capacity of feature points, which are representation of images with variations in
the same class.

4. Hidden Markov Models Method
4.1 Background
Hidden Markov models (HMM) is another promising method that works well for images with variation in
different lighting, facial expression, and orientation. HMM is a set of statistical models used to characterize
properties of signals. It has very good performance in speech recognition and character recognition, where the
data is 1-dimentional. The system being modeled is assumed to be a Markov process with unknown
parameters, and the goal is to find hidden parameters from the observable parameters. The each state in HMM
has a probability distribution over the possible output whereas each state in a regular Markov model is
    In Nefian’s paper [4], the authors use HMM approach for face recognition based on the extraction of 2-
dimensional discrete cosine transformation (DCT) feature vectors. The author takes advantage of DCT
compression property for feature extraction. An image is divided by blocks of a subimage associated with
observation vector. More details about HMM method are provided in the following sections.

4.2 Face Image Markov
In HHM, there are unobservable Markov chain with limited number of status in the model, the observation
symbol probability matrix B, a state transition probability matrix A, initial state distribution π, and set of
probability density functions (PDF). A HMM is defined as the triplets λ = (A, B, π).
    For frontal human face images, the important facial components appear in top to bottom order such as hair,
forehead, eyes, nose, mouth, and chin. This still holds although the image rotates slightly in the image plane.
Each of the facial region is assigned to one state in 1-D continuous HMM. The transition probability ai j and
structure of face model is illustrated in Fig. 3 [3].

             Fig. 3. HMM for face recognition.                       Fig 4. block extraction from image.
4.3 Feature Extraction and Training
Each face image with W and height H is divided into overlapping blocks of height L and the same width. The
block extracting is shown in Fig. 4 [3]. The amount of the overlapping P has a significant effect on
recognition rate since features are captured independent of vertical position. The magnitude of L is also
important. Small length of L will assign insufficient information to discriminate to the observation vector. On
the other hand, large value of L will increase the chances of cutting across the feature. Therefore, it is
important to find good value for L.
    Once blocks are extracted from the image, a set of DCT coefficients are calculated for each block. When
each block is transformed with DCT, the most important coefficients with low frequencies are converged and
clustered in small area in the DCT domain. The author from [3] uses 12x3 size window to pick these
significant information of signal energy. In this way, the size of observation vector is reduced significantly,
which makes the system very efficient while still retaining good detecting rate.
    In the training phase, the image is segmented from top to bottom where each segment corresponds to a
state, and initial observation probability matrix B is obtained from observation vectors associated with each
state. Once B is obtained, the initial value of A and π are set given the left to right structure of the face model.
Next, model parameters are re-estimated by a process called E-M procedure until convergence.

4.4 Recognition and Evaluation
The face image is recognized if given Markov mode, the probability of observation symbols is maximum. For
experiment in the paper, 400 images of 40 individuals with 10 face images per individual are used. The image
database contains face images with different expressions, hair styles, eye wears and head orientations. The
system achieves 84 % correct classification with L =10 and P =9 while eigenfaces approach achieves 73% of
correct classification with the same dataset. Considering this fact, HMM approach has a bit better
performance than eigenfaces method for images with variations.

5. Hybrid Method of Component-based and Morphable Models
5.1 Background
The motivations of this approach are based on promising results from two recent advances in component-
based method for face detection and face recognition, and morphable models. In Huang’s paper [5], they
propose the method as pose, illumination, and facial expression invariant method. The main idea of
Component-based approach is that the face image is decomposed into a set of facial components such as
mouth, nose, and eyes, and they are further interconnected by flexible geometrical model. Also, they generate
3-D synthetic face model of each person from three 2-D input face images. Then the 3D models are rendered
under various pose and illumination conditions to build synthetic images, which are used to train a
component-based face recognition system.
    The advantages of pure component–based approach are that changes in head pose mainly cause changes in
the positions of facial components which can be accounted for by the flexibility of the geometric model. Thus,
image detection with various head pose can be dealt with relatively easily. However, this approach requires a
large number of training images captured from different view points under different lighting conditions.
Combined with morphable models approach, the problem that there is a need for large training dataset can be
overcome by generating synthetic images that are different in head pose and illumination conditions. In [5],
only 3 images are required to generate 3D face model of the person.

5.2 Generation 3D Face Model
To generate 3D synthetic face model of each person, three training images of each person in the training
database are needed; three training images are frontal, half-profile, and profile view. The main idea of having
a 3D model of each face is that once a sufficiently large database of 3D synthetic models are created, any
arbitrary face can be generated automatically by morphing technique that is applied to the 3D model. An
initial database of 3D models is built by recording the individual faces, and then 3D correspondences between
head models are established using optic flow computation technique. Also, synthetic loop technique is used to
make sure that the rendered 3D image is as close as possible to the input image.

5.3 Component-based Face Detection and Recognition
Given an input image, component-based face detection system detects the face and extracts the facial
components of image, which are later used for recognition. In Huang’s experiment [5], 14 component
classifiers are automatically extracted from synthetic 3D model, and each of 14 components are trained on the
set of extracted face components as well as on a set of non-face patterns. The maximum outputs of classifiers
become inputs to a combination classifier, linear support vector machine (SVM) for face detection.
    In the detection phase, the output from detection system is fed into a component-based face recognizer.
Synthetic faces are generated with size of 58x58 for 6 subjects by rendering 3D face model, where faces are
rotated with specified angles and rendered with different light models- ambient and direct light sources. 9 out
of 14 components are used for detection due to large portion of overlapping regions. In addition, a histogram-
equalized inner face region was added to improve recognition rate. To determine the identity of each person,
the author compares normalized outputs of the SVM classifiers such as distances to the hyper-panes in the
feature space, and the identity with associated face classifier with highest normalized output is considered to
be the face recognized.

5.4 Comparison with Global Approach (holistic approach)
The dataset used in this paper consists of 200 images of 6 different individuals taken under different pose and
illumination. The component-based face recognition system is compared with global face recognition system.
Both systems are trained on the same set of images. The component-based system achieves 90% of
correctness, which is about 50% above the global approach. The better performance can be explained with
two reasons: components of face vary less than the whole face pattern under pose changes; therefore, pose
changes have less effect on the recognition rate in component-based approach. Also, in the component-based
method, the background is taken into account but uses only the face part as input features for the classifier
whereas background becomes a distracting factor in global system.
    This promising method can deal with problems caused by illumination and pose variation in the face
image. However, further study in this approach is required because the system needs to be verified with large
size of database.

6. Fisherfaces Method

6.1 Background
Belhumeur et al [6] propose fisherfaces method by using PCA and Fisher’s linear discriminant analysis to
produce subspace projection matrix that is very similar to that of the eigen space method. However this
method can solve one of the main issues that arise in Pentland’s eigenfaces method [3]. One of the main
drawbacks of eigen face approach is that the scatter being maximized can also be due to within-class scatter,
which can increase detection error rate if there is a significant variation in pose or lighting condition within
same face images. Considering these changes due to different illumination and head pose are almost always
greater than variation due to difference in face identity, a robust detection system should be able to handle the
problem. Since, the fisherfaces approach takes advantage of within-class information, minimizing variation
within each class, yet maximizing class separation, the problem with variations in the same images such as
different lighting conditions can be overcome.

6.2 Comparison with Eigenfaces Method
In [6], originally, 3 other methods (correlation, eigenfaces, and linear subspaces methods) are compared with
fisherface method with various image datasets, but here, comparison with eigenfaces method tested with 3
subsets varying in lighting will be discussed.
    Images from 5 subsets are captured with different location of light sources. Example images from subset
1 ,2, and 3 are given in Fig.6.
                           Fig.6 Example Images with different lighting from 3 data subsets [6].

The results of fisherface method are promising. The error rate of eigenfaces method is 0%, 31%, and 47% for
subsets 1, 2, and 3 respectively while the error rate of fisherfaces method is recorded as 0%, 0%, and 4.6%
respectively. Not only the error rate, but also the computation time is less in fisherfaces method. This result
implies that using within-class information becomes beneficial in that it can handle variations within same
face images. However, fisherfaces method still has some potential problems that eigenfaces method has,
because the experiment was done with images containing simple background, not taking into account its side
effects. Also no claim is made about relative performance of fisherface and eigenface methods on larger
database since only a database of 176 images are used in this experiment.

7. Conclusion

In this survey paper, five face recognition representative works are presented and discussed. Since Pentlan’s
paper was introduced, many other approaches have been developed to deal with two important issues in
practical face recognition systems- illumination problem and the pose problem. Performances of 4 methods-
NFL, HMM, Component-based with synthetic 3D models, and fisherfaces are compared with eigenface
method in terms of handling two main problems.
    In terms of performance, the current face recognition technologies are still far from that of human
perception and have to deal with some open issues, such as changes in the face due to aging. However, recent
intensive research activities in this area imply that the face recognition technologies will be advanced so that
it becomes available to make our lives easier and more convenient.


1. Zhao, W., Chellappa, R., Phillips, P. J., Rosenfeld, A., 2003, Face recognition: A literature survey, ACM
   Computing Surverys (CSUR), V. 35, Issue 4, pp. 399-458.

2. Li, S. Z., Lu, J., 1999, Face recognition using the nearest feature line method, IEEE Trans. Neural Netw. 10,
   pp. 439-443.

3. Turk, M. A., Pentland, A. P., Eigenfaces for recognition, 1991, Cognitive Neurosci., V. 3, no.1, pp.71-86.

4. Nefian, A.V. et al., 1998, Hidden markov models for face recognition, In Proceedings, International
   Conference on Acoustics, Speech and Signal Proceeding, 2721 -2724.

5. Huang, J. Heisele, B, Blanz, V., 2003, Componet-based face recognition with 3D morphable models,
   Proceedings of International Conference on Audio and Video-based Person Authentication, V. 5, pp. 27-34.

6. Belhumeur, V., Hespanda, J., Kiregeman, D., 1997, Eigenfaces vs. fisherfaces: recognition using class
  specific liear projection, IEEE Trans. on PAMI, V. 19, pp. 711-720.

To top