Human Face Detection and Recognition by Improved Gray Matching Based on Geometric Invariant Features
Rong-Qin Luo (駱榮欽) and Wei-Zhi Luo (駱威志) Department of Computer Science and Information Engineering, National Taipei University of Technology, Taiwan, R.O.C. (02)27712171 ext. 2239, Fax: (02)27317120, e-mail: firstname.lastname@example.org Abstract
To increase convenience of “access is difficult, due to most human faces containing too much similar feature sets, for example, each person has two eyes, one nose, one mouth and a pair of ears, those are roughly arranged in the same way. Therefore, how to distinguish one face from another is a challenge task in computer vision and pattern recognition. In the study, the face recognition process contains two steps: face location and face identification. Several related researches are discussed in the following. (1) Face location For a fully automatic face recognition system, to allocate the face position from a given image is the first important step. Haiyuan Wu, Qian Chen, and Masahiko Yachida  describe a new method to detect faces in color images based on the fuzzy theory. They use two fuzzy models to describe the skin color and hair color,
control”, the distinctness of features between human faces is useful. In the paper, we have proposed an approach that can find the invariant face geometric features from the different facial expressions. We utilize homogeneous property and color information of the skin to segment human face from complex environment and have developed a face recognition system. The system consists of two stages, one is learning stage and the other is recognition stage. In learning stage, we extract face geometric features of both brows and mouth from one detected people to the database. After this, in recognition stage, we can recognize the people from the database using improved gray matching. Several experimental results, showing the feasibility of the proposed approach, are also included. Keywords: Facial expression, Geometric features, Gray matching.
respectively. Hui Peng, Changshui Zhang, and Zhaoqi Bian  proposed a novel hybrid neural
In many present security systems, people often use somekey like cards or passwords as person identification. It will be more convenient if automatic face recognition can be used successfully in these systems. Until to now, several face recognition system has been used successfully in several applications such as criminal identification, video surveillance,
method for human eyes detection. M. F. Augusteijn and T. L. Skufca  identified face through texture-based features. The second-order statistics method is used to represent the texture of hair and skin. (2) Face recognition Matthew A. Turk and Alex P.Pentland  present an eigenfaces to the detection and identification of human faces. Y. Zhu, L. C. De Silva and C. C. Ko  use moment invariant under shifting, scaling and rotation. Besides, many face recognition approaches only use the -1-
intelligent toys and so on. Due to the uniqueness, face is the important effective characteristic for recognizing a person. However, face recognition
frontal view image such as feature points extraction, template matching, K-L expansion, algebraic feature, deformable template model and isodensity lines, etc.
Figure 1. Labeled neighboring pixels of pixel
Condition 2: Color Information of Human Skin Distributions of skin color information, hue and saturation(HS), usually are much more different than those of objects in environment. So if color
2. Face Detection
To reduce the brightness influence for face detection, we use HSI(Hue, Saturation and Intensity) color image in the system. Then we segment image into blocks of size 10*10. If 70% pixels of each block satisfy following two conditions, the block value is set to 1(true) to denote it is one block of the face. Condition 1: Homogeneous Property Hue value of human skin usually is
HS of a pixel
p is in the color range of skin,
p will be set as face. The color range will be
learned from the following experiments. Before experiments, we define huei and
saturationi of each block first. The hue range
from 0 to 2 is segmented into 100 parts, and the saturation range from 0 to 1 is also segmented into 100 parts.
i 3 huei (2 / 100) * xi 3 i 1
homogeneous. So we extract face region by this property from captured image first. In Fig. 1, pixel p1 in the image owned eight neighbors will be true if following conditions is satisfied: 1.
i 3 saturationi (1/ 100) * xi 3 i 1
min_ H H ( p1 ) max_ H
where x1 ~ x3 are huei values or saturationi values which have max, second and third accumulative numbers of pixels in the block, respectively. In experiments, we use one video camera and five fluorescent lamps. Power of each fluorescent lamp is 18 watt. Illumination from one to five lamps can be accumulated, then we can estimate huei and saturationi in five different illuminations. Results of the experiment consist of two parts. One is the same people under different illuminations and the other is different people under the same illumination. In first part, mean values of hue and saturation are 4.017 and 0.19, respectively. Variations of hue and saturation both are smaller than 0.001. In second part, mean values of hue and saturation for different people are 4.464 and 0.222, respectively, and variations of hue and saturation both are also smaller than 0.001. Therefore, from two statistics, we will obtain means of hue and saturation of the skin color are -2-
where H( p1 ) is hue value of p1 . According to experimental statistics, described detailedly in condition1, we define average min and max hue values of eight neighbors as
i 9 avg _ H wi * H ( p1 ) 8 , wi 1 i=2..9 i 2
min_ H avg _ H (2 * 0.05) max_ H avg _ H (2 * 0.05)
min_ S S ( p1 ) max_ S
where S( p1 ) is the saturation value of p1 . Similarly, we also define :
i 9 avg _ S wi * H ( S1 ) 8 , wi 1 i=2..9 i 2
min_ S avg _ S 0.15 max_ S avg _ S 0.15
p 9 p 2 p3
p8 p1 p4
p 7 p 6 p5
4.4641 and 2.1357, respectively. Then the color ranges of skin are set by Eqs. (1) through (4).
computed by Eq. (11) using central moment
u i , j of the face region,
ui , j
_ _ 1 W 1 H 1 (m x) i (n y) j B(m, n) N m 0 n 0
(a) (b) Figure 2. Face detecting result
u 2, 0 u 0, 2
Using the computed
the lengths of major
and minor axes of the best-fit ellipse can be determined by evaluating the moment of inertia I. Where the greatest inertia moment I m ax and the least and (13).
_ _ I max (m x) cos (n y) sin B(m, n) m 0 n 0 W 1 H 1 2
3. Mouth and Brows Location
After face detection, the detected binary image as right image of Fig. 2 (b) is recorded into 2D array B. The white pixels in the image are skin region, and set each white pixel of B to 1. Elsewhere (black pixels) are non-skin region, and set to 0. Because shape of human face is approximate to an ellipse, we can find an ellipse to match it. The ellipse is called best-fit ellipse . Then we can use parameters, major axis b and minor axis a and rotating angle of the ellipse, to locate the mouth and brows. Before computing ellipse parameters, we perform a preprocessing to recover several black pixels existing in skin region but owning more than four neighboring pixels to be white. Following, to obtain parameters of the best-fit ellipse, we have to define (k+l) orders moment as Eq. (7).
I m in both can be obtained by Eqs. (12)
_ _ (m x) sin (n y) cos B(m, n). m 0 n 0 W 1 H 1 2
(13) And then we can derive both major axis b and minor axis a by Eqs. (14) and (15), respectively.
( I m ax ) 3 I m in ( I m in ) 3 I m ax
M k .l
W 1 H 1 m0 n 0
n B ( m, n)
Where W and H are width and height of 2D image array B (indexed by B(m,n)) of the face region, respectively. Then we can get center
Locating the mouth From experimental observation, mouth
region belongs to non-skin region, and the center point is located between mouth and brows. So we can find the mouth position downward
point ( x, y ) of the ellipse as face center location according to Eqs. (8) and (9) shown as Fig. 3.
M 1, 0 N M 0,1 N
from ( x, y ) . The searching area of mouth can be selected as follows:
Here N is pixel number of B. The
can be -3-
The width of searching area is x a to x a
and the height of searching area is from y to
y b , as shown in Fig. 4. Then, we can
perform vertical projection on the left brow searching area, and where maximum peak of projection can decide the vertical position of the left brow. Based on left brow location, like mouth locating we can compute the upper boundary, lower boundary of the left brow as brow vertical range (BVR). In the study, the upper boundary position is at left brow location-0.16*a, and the lower boundary is location+0.16*a.
y b , as shown in Fig. 3. Where a and b are
length of the ellipse minor and major axis. Then, we can perform vertical projection on the mouth searching area, and where maximum peak of projection can decide the vertical position of the mouth. According to mouth location, we can compute the upper boundary and lower
boundary position of the exact mouth ranges and distance between both boundaries is treated as mouth vertical range (MVR). In the study, the upper boundary of the mouth range is selected at mouth location-0.2*a and the other lower is at mouth location+0.2*a.
Figure 4. The left brow searching area and position After locating left brow, we can find the left and
Figure 3. The mouth searching area and position
right boundaries. Horizontal projection from x
After locating mouth, we also can find the left and right boundaries where horizontal projection
x a smaller than 0.1*(BVR) or larger than
0.7*(BVR) is left boundary position, and projection smaller than 0.1*(BVR) is right boundary from left boundary. Similarly, all the boundaries of right brow also can be computed as shown in Fig. 5.
from x to
x a and x to x a , respectively
is smaller than 0.1*(MVR).
Locating the left and right brow From experimental observation, brows
region also is in non-skin region. The searching area of left brow can be selected as follows:
The width of searching area is from x to
Figure 5. The right brow searching area and position
x a , and height of searching area is from
4. Geometric Features Extraction
To identify a person, the geometric features of face are the important characteristics. In the study, we adopt eleven geometric invariant features. The first geometric invariant feature is face contour. We define the face contour as follows.
density of brow is selected as third geometric invariant feature, defined as:
Dbrow( i )
N browi Pbrowi
The whole detected area is divided into ten sub-regions to represent the density distribution of the brow. Where N browi stands for the pixel number of the true brow part in area(i),
Contour(i) segment(i) AverageLen / and Pbrowi is the number of all pixels in area(i). gth
AverageLength The fourth geometric invariant feature is ratio of width to height length of brow, defined as:
AverageLen segment(i) 36 gth i 0
Wbrow H brow
where the face contour is described into 36 sections, called contour(i). Segment(i) is the distance from mouth center position to the contour edge scanned per 5 degrees from
where Wbrow is width of the detected brow, and
H brow is distance of (BVR).
The fifth geometric feature is the local height of brow (LHB), defined as:
180 . Due to variance of contour size, we use
the(segment(i) – AverageLength)/AverageLength to normalize the invariant contour(i). Then we use 36 values to describe the contour shape as shown in Figure 6.
LHB(i) ( Height(i) Height(ref ))
/ Height(ref ) ,
i 10 i 1
Hieght ( ref )
In the study, the brow area is divided into ten small areas from lower boundary to upper boundary shown as Fig. 7.
Figure 6. The described face contour Second geometric invariant feature is the global density of the whole brow that is defined as: Figure 7. Local density of brow Where Height(i) denotes height of each small area. To avoid influence of variant size of the brow, we get (Height(i) - Height(ref))/Height(ref) as invariant feature values. The sixth geometric invariant feature is the normalized average height of brow (AHB), defined as:
N brow Pbrow
where N brow is the pixel number of the brow and
Pbrow is pixel number of the whole Dbrow
detected area. Then, value of
normalized to 0 to 1. Due to nonuniform distribution of the whole brow density, the local
i 10 AHB LHB(i ) 10 i 1
The seventh geometric invariant feature is highest of brow (HB) defined maximum of LHB(i). The eighth geometric invariant feature is curve grade of brow (CGB), defined as:
c1 : 1-st geometric feature vector
c2 : 3-rd geometric feature vector c3 : 5-th geometric feature vector c4 : 8-th geometric feature vector c5 : other geometric feature vector (include 2nd,
4th, 6th, 7th, 9th~11th features) The face feature values of one person are defined as:
CGB(i) Height(i) Height(i 1) i=2..10
And we set CGB(1)=0 as initial curve grade. The ninth geometric invariant feature is the density ratio of left to right (DRLR) bound regions of brow, defined as:
xi (c1 , c2 , c3 , c4 , c5 ) ( xi (1), xi (2),..., xi (k )) X
brow ( i )
i 1..m k 1..n N where x1 : The recognized person vector. x2 ~ x m : The trained person vectors in the
brow ( i )
The tenth geometric invariant feature is distance between brow and mouth (DBBM), defined as:
k : Number of one person‟s features
Before face recognition, the face feature values must be normalized by
DBBM distance of mouth and brow left
boundary position/ length of brow The eleventh geometric invariant feature is the ratio of brow length to mouth length (RBM). Then a relational database is employed in this study to store these feature values by training stage. To reduce the face perspective effect, the features are further divided into two parts consisting of left half part and right half part of the face. From view of experimental results, when the detected person turns head to left or right on smaller tilt, we only can find the features of right or left part of the face. In this case, we will only use right or left half features for matching.
ci =1, where ci :
norm of ci . Then we convert the features values into gray relation coefficients by following equation:
r ( x1 (k ), xi ( k ))
m in m ax , 1i (k ) m ax
1i x1 (k ) xi (k ) ,
m in j i k x1 (k ) xi (k ) , m ax j i k x1 (k ) xi (k ) , and
1 r ( x1 (k ), xi (k )) 0
After converting feature values, we adopt gray model GM(1,N) to recognition. Eq. (16) is called
m ax. m ax. m in. m in.
5. Face Recognition
After extracting geometric invariant
GM(1,N) model. Where parameter „1‟ of GM(1,N) is denoted 1-order differentiation equation and second parameter „N‟ of GM(1,N) is denoted N variables. In the model, we set elements x1 ( k ) of recognized person vector x1
features, we can find the face candidates of a recognized person by improved gray matching model. First, according to relationship among features, we classify extracted 11 features to five categories ci , i=1 to 5 , where -6-
to one, and normalized x1 to x m are trained person vectors. Then the GM(1,N) model is illustrated as follows:
i m i 2
x1( 0) (k ) a * z1(1) (k ) bi xi(1) (k )
x1( 0 ) (k ) a * z1(1) (k ) bi xi(1) (k )
xi(1) (k ) wc p x1(1) (k ) , and wc p is the
weight of c p feature. p=1..5. And wc1 to wc5 are the weights of c1 to c 5 , respectively. In the experiment, 10 recognized people (a) to (j) and 20 front view faces of each person are for recognition by gray model. Now, we only use c1 to recognition. If personi is recognized person, where i (a) to ( j ) , and database stores trained person (a) to (j). Perform face recognition by GM(1,N) model, the trained
xi(1) (n) xi( 0 ) (k ) ,
z1(1) (k ) 0.5 x1(1) (k ) 0.5 x1(1) (k 1)
(k ) is generating number and a is
generating coefficient in the Eq. (16). Then Eq. (16) can be written as follow array from:
( ( x1( 0) (2) z1(1) (2) x21) (2) xm1) (2) a (0) (1) (1) (1) x1 (3) z1 (3) x2 (3) xm (3) b2 ( 0) (1) (1) (1) x1 (n) z1 (n) x2 (n) xm (n) bm
personi in the database can get one order
according to arrangement of similarity degree. The order is between one and ten, and order one is denoted the most similar person. Because there are 20 face images of each recognized
x1(0) (2) ( 0) x (3) YN 1 (0) x1 (n)
a b _ a 2, bm
personi , so the trained personi will get 20
orders. In the same reason, another trained person can also get 20 orders. Therefore, all trained people can get 20*10 orders. Then average the set of orders as the mean order of
( ( z1(1) (2) x21) (2) xm1) (2) (1) (1) (1) z1 (3) x2 (3) xm (3) B (1) (1) (1) z1 (n) x2 (n) xm (n)
c1 , and other mean orders of c 2 to c 5 can be
obtained by the same way. Eqs. (19) and (20) are illustrated the mean orders of five categories, and Table 1 shows mean orders of c1 to c 5 .
Hence we can solve above equation via linear minimum square error method.
a ( B T B) 1 B T YN
Orederc p orderc p (i, j ), p 1..5
i 1 j 1
' i M j N
We can obtain b1 to bm as similarity degrees by Eq. (17). After describing gray model, we will set different weight to each feature for GM(1,N) according to following experimental results, and modify Eq. (16) to Eq. (18).
MeanOreder p Orederc p / M * N ' c
orderc p (i, j) : similarity order of recognized
face image j matching trained person i in database by only considering c p . And M is number of trained persons and
containing ten persons. Each recognized person has total 70 face images including of 30 front view faces, 20 other various direction view faces and 20 different facial expression faces for recognition. Each image is of size 640*480. From Table 3, we will get three higher candidates to calculate recognition rate with weight combination (65,12,10,10,3) by gray model. In 1-st, 2-nd, 3-rd candidates, the recognition rate are 92.14%, 94.07% and 97.12%, respectively.
N is number of
recognized face images. When MeanOrder is lower, the recognition is higher. From Table1, MeanOrder of
c5 c3 c4 c2 c1 , so we will set weight
of ci is wc1 wc2 wc4 wc3 wc5 .
Category C1 C2 C3 C4 C5
Mean 2.065 3.04 3.37 3.3 3.565
Table 1. MeanOrder of c1 to c 5 Later we also use original 20 images for recognition using weight combination of
c1 to c 5 . In Table2, the combination of c1 to c 5 , i.e. (65%,12%,10%,10%,3%) will get
(a) (b) (c ) (d) (e) (f) (g) (h) (i) (j) mean
The 1-st 93.40% 89.50% 92.30% 90.22% 93.70% 94.20% 94.30% 90.11% 91.32% 92.33% 92.14%
The 2-nd 95.37% 91.23% 93.36% 93.77% 94.37% 95.43% 95.89% 93.37% 94.45% 93.43% 94.07%
The 3-rd 97.89% 95.76% 96.46% 96.23% 97.85% 97.37% 98.78% 96.11% 98.34% 96.45% 97.12%
Table 3. Recognition rate Figures 8 and 9. show example results of the
Weight Combination MeanOrder (90,2.5,2.5,2.5,2.5) 2.075 (85,4,,4,4,3) 1.825 (80,6,5,5,4) 2.05 (75,8,6,6,5) 1.93 (70,10,8,8,4) 1.645 (65,12,10,10,3) 1.485 (60,14,12,12,2) 1.53 (55,16,,14,14,1) 1.79 (50,18,16,16,0) 1.725 (48,20,16,16,0) 1.835 (46,22,16,16,0) 1.635 (44,24,15,15,2) 1.71 (42,26,14,14,4) 1.925 (40,28,13,13,6) 1.8 (38,30,12,12,8) 1.84
Table 2. The MeanOrder of weight combination
Figure 9. First result of face recognition
6. Experiment Results
To verify the recognition approach, in the study, we have been built face database
Analysis and Machine Intelligence, Vol. 21, NO. 6, JUNE 1999.  Hui Peng, Changshui Zhang, Zhaoqi Bian, ”Human Eyes Detection Using Hybrid Neural Method,” Proc of ICSP’98, pp.1088-1091.  M. F. Augusteijn and T. L. Skufca, “Identification of Human Faces through Texture-Based Feature Recognition and Neural Network Technology,” Proceedings of 1993 IEEE International Conference on
Figure 9. Second result of face recognition 
Neural Network, Vo1. 1, pp.392-398, 1993. Matthew P.Pentland, A. ”Face Turk and Alex Using
The system has been successfully
Eigenfaces,” IEEE Computer Vision and Pattern Recognition, pp. 586-591, 1991. implemented for detecting and recognition  human face in a complex background. Firstly, Moment Invariants and HMM in Facial both the homogeneous property and skin color Expression Recognition,” IEEE Image information are used to detect face region, and Analysis and Interpretation, pp. 305-309, then locate more precise positions of mouth and 2000. brow by ellipse model. Using these, brows and  mouth features can be extracted. Finally, we Yea-Shuan Huang,” Real Time Video recognize the extracted face by improved gray Surveillance System,” CVGIP Biometrics, matching, and the weight combination of five 2001. categories is (65,12,10,10,3) will get the smallest MeanOrder. From face recognition results, the and correct means of recognizing rates of 1-st 2-nd and 3-rd are 92.14%, 94.07% and 97.12%, respectively. The experimental results show that the proposed system can works with acceptable performance for practical applications.  A Text-Driven Face Animation System,” 國立台灣大學，電機工程學研 究所碩士論文，1998.6。 陳玲慧，王明欽,” A Study on human Face Recognition,” 國立交通大學，資訊科學 研究所碩士論文，1995.6。  貝蘇章, 柯智偉,” Human Face Detection Hung-Xin Zhao, Yao-Hong Tasi and Y Zhu, L C De Silva and C C Ko, ”Using
 Haiyuan Wu, Qian Chen, and Masahiko Yachida, “Face Detection From Color Image Using a Fuzzy Pattern Matching Method,” IEEE Transaction on Pattern -9-