Survey paper: Face Detection and Face Recognition By Hyun Hoi James Kim 1. Introduction Face recognition is one of biometric methods identifying individuals by the features of face. Research in this area has been conducted for more than 30 years; as a result, the current status of face recognition technology is well advanced. Many commercial applications of face recognition are also available such as criminal identification, security system, image and film processing. From the sequence of images captured by camera, the goal is to find best match with given image. Using a pre-stored image database, the face recognition system should be able to identify or verify one or more persons in the scene. Before face recognition is performed, the system should determine whether or not there is a face in a given image or given video, a sequence of images. This process is called face detection. Once a face is detected, face region should be isolated from the scene for the face recognition. The face detection and face extraction are often performed simultaneously. The overall process is depicted in Fig 1. Input Identification image Face Detection Feature Extraction Face Recognition or Verification Fig.1. Process of face recognition. Face recognition can be done in both a still image and video which has its origin in still-image face recognition. Different approaches of face recognition for still images can be categorized into tree main groups such as holistic approach, feature-based approach, and hybrid approach : Holistic Approach In holistic approach, the whole face region is taken into account as input data into face detection system. Examples of holistic methods are eigenfaces (most widely used method for face recognition), probabilistic eigenfaces, fisherfaces, support vector machines, nearest feature lines (NFL) and independent-component analysis approaches. They are all based on principal component-analysis (PCA) techniques that can be used to simplify a dataset into lower dimension while retaining the characteristics of dataset. Feature-based Approach In feature-based approaches, local features on face such as nose, and then eyes are segmented and then used as input data for structural classifier. Pure geometry, dynamic link architecture, and hidden Markov model methods belong to this category. Hybrid Approach The idea of this method comes from how human vision system perceives both local feature and whole face. There are modular eigenfaces, hybrid local feature, shape-normalized, component-based methods in hybrid approach. In this paper, one or two of representative papers for each approach will be discussed in terms of background, implementation, evaluation or/and known issues in each method. Three methods from holistic approach will be discussed: (a) Eigenfaces Method  in section 2; (b) Feature Lines Method  in section 3; (c) Fisherfaces Method  in section 6. Also, two other methods, one from each feature-based approach: (d) Hidden Markov Model Method  in section 4 and hybrid approach: (e)Component-based and Morphable Models Method  in section 5 will be covered. Face detection will be discussed as a subset of face recognition. 2. Eigenfaces Method 2.1 Background Eigenfaces are a set of standardized face component based on statistical analysis of various face images. Mathematically speaking, eigenfaces are a set of eigenvectors derived from the covariance matrix of a high- dimensional vector that represents possible faces of humans. Any human face can be represented by linear combination of eigenface images. For example, one person’s face can be represented by some portion of eigenface of one type and some other portion of eigenface of another type, and so on. In Pentland’s paper , motivated by principal component analysis (PCA), the author proposes this method, where principle components of a face are extracted, encoded, and compared with database. PCA techniques are also known as Karhunen-Loeve methods which choose a dimensionality reducing linear projection that maximizes the scatter of all projected samples. The paper, though it was written more than 10 years ago, is regarded as a standard work in face recognition research area; therefore, in a variety of other papers, authors compare their results, generated from different approaches, with that of Pentland’s paper. 2.2 Calculating Eigenfaces Eigenfaces approach is based on principal component analysis (PCA) to find the vectors that can best represent the distribution of face images in image space. These vectors define subspace (face space) of face. An image is treated as a point (or vector) in high dimensional vector space, and each vector that describes N 2 by N image is a linear combination of the original face images of length N . For example, typical image of size 256 by 256 describes a vector of dimension 65,536, or a point in 65,535-dimension space. An ensemble image is mapped to a collection of points in this huge space. It is necessary that the average of the training set of face images should be calculated before we calculate the difference between each eigenface and the average. Given the differences between two vectors, one can build the covariance matrix, from which the eigenvectors are taken. The eigenvalues associated with the eigenvectors make it easy to rank the eigenvectors according to their usefulness in characterizing the differences amongst the images. 2.3 Using Eigenfaces to Classify a Face Image and Detect Faces A new face image is projected onto face space simply by multiplying the difference between the image and the average mentioned in the section 2.1, and the result is multiplied by each eigenvector. The result of this operation will be the weighted contribution of each eigenface in representing the input face image, treating the eigenfaces as a basis set for face images. The Euclidean distance taken from each face class determines the class that best matches the input image. Through eigenfaces, the system can detect the presence of face as well. The face image projected onto face space does not change radically while any non-face image will look quite different; therefore, it is easy to distinguish between face images and non-face images. Using this basic idea, image is projected onto face space and then Euclidean distance is calculated between the mean-adjusted input image and the projection onto face space. The distance is used as “faceness” so the result of calculating the distance is a “face map”, where low values indicate that there is a face. 2.4 Evaluation and Issues For experiment, sixteen subjects were digitized at three head orientations, three head sizes or scales, and three lighting conditions. A six Gaussian pyramid was constructed for each image to simplify 512x512 pixels to 16x16 pixels. Various groups of sixteen images were selected and 2500 images were classified. The system achieved 96% correct classification over light variation, 85% over head orientation, and 64% over head size. Since eigenfaces method directly applies PCA, it does not destroy any information of image by exclusively processing only certain points, generally providing more accurate recognition results. But these techniques are sensitive to variation in position and scale. Some serious issues are effects of background and head size and orientation. Eigenface analysis, described above, does not distinguish the face from the background. In many cases, significant part of image consists of background which leads to incorrect classification because background is not required information for detection. Another issue is due to the different head size of input image because neighborhood pixel correlation is lost under head size change. The performance over size chance decreased to 64% in correctness and it suggests that there is a need for a multi- scale approach where each face class includes images of the individual at several different sizes. Note that the variation of light can also still be a problem if the light source is positioned in some specific directions. This problem is addressed in Belhumeur’s paper  (see section 6). 3. Nearest Feature Line Method 3.1 Background In Li’s paper , the author presents nearest feature lines method (NFL) for face recognition, which is one of holistic matching methods to deal with problems in eigenfaces approach. To create a feature point in feature space, it is assumed that there should be at least two prototype feature points available from the same class (image). A line passing the two prototype features forms a feature line (FL) that generalizes the two feature points as in Fig. 2. A feature line represents an approximate of two prototypes (images) that are captured under different conditions, such as different head gaze direction or different light illumination. An input image is then identified with a corresponding class, according to the distance between feature point of the given image and FL of the prototype images. Compared to the standard eigenface method , NFL has lower recognition error rates. 3.2 The Nearest Feature Line Method Facial image is represented as a point in the feature space, which is an eigen space. The line x1 through x2 of same class denoted as x1 x2 is called feature line of that class. The query which is input image x is projected K onto an FL, and the FL distance x and x1x2 is defined as d(x, x1x2) = |x – p|, where | . | is some norm. The projection point can be found by setting up a vector line equation with parameter µ . Depending on the sign of µ , the position of p can be left of x1, right of x2, or between x1 and x2 on the line. The greater the value of the parameter is, the further the position of p from x1 or x2 becomes. The classification of the input image is c done as follows: Let x and x be two distinct prototype points in feature space. c i j Fig. 2. Generalizing two prototype feature points x1 and x2 . c c The distance between image point x and each FL xi x j is calculated for each class c, where each pair i ≠ j. This will generate N number of distances, and these N distances are sorted in ascending order with each of them associated with class c*. The first rank gives the NFL classification consisting of the best matched class c*, and the two best matched prototypes i* and j* of the class. 3.3 Comparison with standard Eigenspace method In , a compound dataset of 1079 face images from 137 people are used. The dataset is selected from 4 well- known databases (the Cambridge database, the Bern database, the Yale database, and the Harvard database) and another database that the author created are used. These databases are constructed under different light conditions, facial expressions, light source directions and viewpoints. Two test schemes with different dataset sizes are used to compare the error rates between generic eigenface approaches and NFL approaches. In both of the schemes, the images of training set are not used for query set to avoid any biases. The error rate with NFL method turns out to be about 40% to 60 % of that of the standard eigenface method, which means NFL method has better performance in terms of recognition under variation in viewpoint, illumination, and expression. The results imply that by adapting the NFL method, the issues in conventional eigenfaces method can be addressed thanks to the feature lines’ ability to expand the representational capacity of feature points, which are representation of images with variations in the same class. 4. Hidden Markov Models Method 4.1 Background Hidden Markov models (HMM) is another promising method that works well for images with variation in different lighting, facial expression, and orientation. HMM is a set of statistical models used to characterize properties of signals. It has very good performance in speech recognition and character recognition, where the data is 1-dimentional. The system being modeled is assumed to be a Markov process with unknown parameters, and the goal is to find hidden parameters from the observable parameters. The each state in HMM has a probability distribution over the possible output whereas each state in a regular Markov model is observable. In Nefian’s paper , the authors use HMM approach for face recognition based on the extraction of 2- dimensional discrete cosine transformation (DCT) feature vectors. The author takes advantage of DCT compression property for feature extraction. An image is divided by blocks of a subimage associated with observation vector. More details about HMM method are provided in the following sections. 4.2 Face Image Markov In HHM, there are unobservable Markov chain with limited number of status in the model, the observation symbol probability matrix B, a state transition probability matrix A, initial state distribution π, and set of probability density functions (PDF). A HMM is defined as the triplets λ = (A, B, π). For frontal human face images, the important facial components appear in top to bottom order such as hair, forehead, eyes, nose, mouth, and chin. This still holds although the image rotates slightly in the image plane. Each of the facial region is assigned to one state in 1-D continuous HMM. The transition probability ai j and structure of face model is illustrated in Fig. 3 . Fig. 3. HMM for face recognition. Fig 4. block extraction from image. 4.3 Feature Extraction and Training Each face image with W and height H is divided into overlapping blocks of height L and the same width. The block extracting is shown in Fig. 4 . The amount of the overlapping P has a significant effect on recognition rate since features are captured independent of vertical position. The magnitude of L is also important. Small length of L will assign insufficient information to discriminate to the observation vector. On the other hand, large value of L will increase the chances of cutting across the feature. Therefore, it is important to find good value for L. Once blocks are extracted from the image, a set of DCT coefficients are calculated for each block. When each block is transformed with DCT, the most important coefficients with low frequencies are converged and clustered in small area in the DCT domain. The author from  uses 12x3 size window to pick these significant information of signal energy. In this way, the size of observation vector is reduced significantly, which makes the system very efficient while still retaining good detecting rate. In the training phase, the image is segmented from top to bottom where each segment corresponds to a state, and initial observation probability matrix B is obtained from observation vectors associated with each state. Once B is obtained, the initial value of A and π are set given the left to right structure of the face model. Next, model parameters are re-estimated by a process called E-M procedure until convergence. 4.4 Recognition and Evaluation The face image is recognized if given Markov mode, the probability of observation symbols is maximum. For experiment in the paper, 400 images of 40 individuals with 10 face images per individual are used. The image database contains face images with different expressions, hair styles, eye wears and head orientations. The system achieves 84 % correct classification with L =10 and P =9 while eigenfaces approach achieves 73% of correct classification with the same dataset. Considering this fact, HMM approach has a bit better performance than eigenfaces method for images with variations. 5. Hybrid Method of Component-based and Morphable Models 5.1 Background The motivations of this approach are based on promising results from two recent advances in component- based method for face detection and face recognition, and morphable models. In Huang’s paper , they propose the method as pose, illumination, and facial expression invariant method. The main idea of Component-based approach is that the face image is decomposed into a set of facial components such as mouth, nose, and eyes, and they are further interconnected by flexible geometrical model. Also, they generate 3-D synthetic face model of each person from three 2-D input face images. Then the 3D models are rendered under various pose and illumination conditions to build synthetic images, which are used to train a component-based face recognition system. The advantages of pure component–based approach are that changes in head pose mainly cause changes in the positions of facial components which can be accounted for by the flexibility of the geometric model. Thus, image detection with various head pose can be dealt with relatively easily. However, this approach requires a large number of training images captured from different view points under different lighting conditions. Combined with morphable models approach, the problem that there is a need for large training dataset can be overcome by generating synthetic images that are different in head pose and illumination conditions. In , only 3 images are required to generate 3D face model of the person. 5.2 Generation 3D Face Model To generate 3D synthetic face model of each person, three training images of each person in the training database are needed; three training images are frontal, half-profile, and profile view. The main idea of having a 3D model of each face is that once a sufficiently large database of 3D synthetic models are created, any arbitrary face can be generated automatically by morphing technique that is applied to the 3D model. An initial database of 3D models is built by recording the individual faces, and then 3D correspondences between head models are established using optic flow computation technique. Also, synthetic loop technique is used to make sure that the rendered 3D image is as close as possible to the input image. 5.3 Component-based Face Detection and Recognition Given an input image, component-based face detection system detects the face and extracts the facial components of image, which are later used for recognition. In Huang’s experiment , 14 component classifiers are automatically extracted from synthetic 3D model, and each of 14 components are trained on the set of extracted face components as well as on a set of non-face patterns. The maximum outputs of classifiers become inputs to a combination classifier, linear support vector machine (SVM) for face detection. In the detection phase, the output from detection system is fed into a component-based face recognizer. Synthetic faces are generated with size of 58x58 for 6 subjects by rendering 3D face model, where faces are rotated with specified angles and rendered with different light models- ambient and direct light sources. 9 out of 14 components are used for detection due to large portion of overlapping regions. In addition, a histogram- equalized inner face region was added to improve recognition rate. To determine the identity of each person, the author compares normalized outputs of the SVM classifiers such as distances to the hyper-panes in the feature space, and the identity with associated face classifier with highest normalized output is considered to be the face recognized. 5.4 Comparison with Global Approach (holistic approach) The dataset used in this paper consists of 200 images of 6 different individuals taken under different pose and illumination. The component-based face recognition system is compared with global face recognition system. Both systems are trained on the same set of images. The component-based system achieves 90% of correctness, which is about 50% above the global approach. The better performance can be explained with two reasons: components of face vary less than the whole face pattern under pose changes; therefore, pose changes have less effect on the recognition rate in component-based approach. Also, in the component-based method, the background is taken into account but uses only the face part as input features for the classifier whereas background becomes a distracting factor in global system. This promising method can deal with problems caused by illumination and pose variation in the face image. However, further study in this approach is required because the system needs to be verified with large size of database. 6. Fisherfaces Method 6.1 Background Belhumeur et al  propose fisherfaces method by using PCA and Fisher’s linear discriminant analysis to produce subspace projection matrix that is very similar to that of the eigen space method. However this method can solve one of the main issues that arise in Pentland’s eigenfaces method . One of the main drawbacks of eigen face approach is that the scatter being maximized can also be due to within-class scatter, which can increase detection error rate if there is a significant variation in pose or lighting condition within same face images. Considering these changes due to different illumination and head pose are almost always greater than variation due to difference in face identity, a robust detection system should be able to handle the problem. Since, the fisherfaces approach takes advantage of within-class information, minimizing variation within each class, yet maximizing class separation, the problem with variations in the same images such as different lighting conditions can be overcome. 6.2 Comparison with Eigenfaces Method In , originally, 3 other methods (correlation, eigenfaces, and linear subspaces methods) are compared with fisherface method with various image datasets, but here, comparison with eigenfaces method tested with 3 subsets varying in lighting will be discussed. Images from 5 subsets are captured with different location of light sources. Example images from subset 1 ,2, and 3 are given in Fig.6. Fig.6 Example Images with different lighting from 3 data subsets . The results of fisherface method are promising. The error rate of eigenfaces method is 0%, 31%, and 47% for subsets 1, 2, and 3 respectively while the error rate of fisherfaces method is recorded as 0%, 0%, and 4.6% respectively. Not only the error rate, but also the computation time is less in fisherfaces method. This result implies that using within-class information becomes beneficial in that it can handle variations within same face images. However, fisherfaces method still has some potential problems that eigenfaces method has, because the experiment was done with images containing simple background, not taking into account its side effects. Also no claim is made about relative performance of fisherface and eigenface methods on larger database since only a database of 176 images are used in this experiment. 7. Conclusion In this survey paper, five face recognition representative works are presented and discussed. Since Pentlan’s paper was introduced, many other approaches have been developed to deal with two important issues in practical face recognition systems- illumination problem and the pose problem. Performances of 4 methods- NFL, HMM, Component-based with synthetic 3D models, and fisherfaces are compared with eigenface method in terms of handling two main problems. In terms of performance, the current face recognition technologies are still far from that of human perception and have to deal with some open issues, such as changes in the face due to aging. However, recent intensive research activities in this area imply that the face recognition technologies will be advanced so that it becomes available to make our lives easier and more convenient. REFERENCES 1. Zhao, W., Chellappa, R., Phillips, P. J., Rosenfeld, A., 2003, Face recognition: A literature survey, ACM Computing Surverys (CSUR), V. 35, Issue 4, pp. 399-458. 2. Li, S. Z., Lu, J., 1999, Face recognition using the nearest feature line method, IEEE Trans. Neural Netw. 10, pp. 439-443. 3. Turk, M. A., Pentland, A. P., Eigenfaces for recognition, 1991, Cognitive Neurosci., V. 3, no.1, pp.71-86. 4. Nefian, A.V. et al., 1998, Hidden markov models for face recognition, In Proceedings, International Conference on Acoustics, Speech and Signal Proceeding, 2721 -2724. 5. Huang, J. Heisele, B, Blanz, V., 2003, Componet-based face recognition with 3D morphable models, Proceedings of International Conference on Audio and Video-based Person Authentication, V. 5, pp. 27-34. 6. Belhumeur, V., Hespanda, J., Kiregeman, D., 1997, Eigenfaces vs. fisherfaces: recognition using class specific liear projection, IEEE Trans. on PAMI, V. 19, pp. 711-720.
Pages to are hidden for
"Survey paper Face Detection and Face Recognition"Please download to view full document