Docstoc

Research Report Face Recognition for Security Purposes Using

Document Sample
Research Report Face Recognition for Security Purposes Using Powered By Docstoc
					           Research Report

 Face Recognition for Security Purposes
Using Modular PCA and Neural Networks

              Richard Young

                  2010
                                                Abstract

Recent advances in technology and a growing commercial demand have seen an increase in the need for a
facial recognition system, for security purposes, that is robust in terms of in–plane movement of the head,
and lighting conditions. A method for face recognition is proposed and tested which makes use of both
Principal Component Analysis (PCA) and Neural Networks. The proposed method involves extracting
important features of the face, subjecting them to PCA to reduce the dimensions, and then classifying the
facial features using neural networks. The results of the algorithm are collected and compared to other
algorithms that have been tested on the same dataset, namely a pure neural network approach, and the
well known eigenfaces method. I determine that my proposed method is capable of a better recognition
rate than both of the other tested methods, and is able to train significantly faster than the pure neural
network method. Methods of reducing the false recognition rate (FRR) and false acceptance rate (FAR)
by using ensemble networks are investigated. This document gives a step–by–step guide to how the
research was conducted, and a discussion and comparison of the results.




                                                     i
Declaration

I, Richard Young, hereby declare the contents of this research report to be my own work. This research
report is submitted for the degree of Bachelor of Science with Honours at the University of the Witwa-
tersrand. This work has not been submitted to any other university or for any other degree.




                                                  ii
Acknowledgements

I would like to thank my supervisors, Hima Vadapalli and Clint Van Alten for their input and support,
as well as helping me to deepen my knowledge of the topic. I would also like to thank Angelo Kyrilov,
Michael Mitchley and Sigrid Ewert for passing on their knowledge of document writing.




                                                 iii
Contents

Abstract                                                                                                                                                                           i

Declaration                                                                                                                                                                       ii

Acknowledgements                                                                                                                                                                  iii

1   Introduction                                                                                                                                                                   1

2   Background and Related Work                                                                                                                                                    3
    2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                              .   .   .   .   .   .   .    3
    2.2 The Facial Recognition Problem . . . . . . . . . . . . . . . . . . . . . . .                                                                  .   .   .   .   .   .   .    3
        2.2.1 Motivation for Face Recognition . . . . . . . . . . . . . . . . . . .                                                                   .   .   .   .   .   .   .    3
        2.2.2 Recognition and Verification . . . . . . . . . . . . . . . . . . . . .                                                                   .   .   .   .   .   .   .    4
        2.2.3 Partially Occluded Faces . . . . . . . . . . . . . . . . . . . . . . .                                                                  .   .   .   .   .   .   .    4
    2.3 Image Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                .   .   .   .   .   .   .    4
        2.3.1 Lighting Normalisation . . . . . . . . . . . . . . . . . . . . . . . .                                                                  .   .   .   .   .   .   .    5
        2.3.2 Image Resolution and Noise . . . . . . . . . . . . . . . . . . . . .                                                                    .   .   .   .   .   .   .    6
    2.4 Face Detection and Facial Feature Detection . . . . . . . . . . . . . . . . .                                                                 .   .   .   .   .   .   .    7
        2.4.1 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                 .   .   .   .   .   .   .    7
        2.4.2 Template and Geometric Matching . . . . . . . . . . . . . . . . . .                                                                     .   .   .   .   .   .   .    7
        2.4.3 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                  .   .   .   .   .   .   .    7
    2.5 Recognition Using PCA . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                   .   .   .   .   .   .   .    8
        2.5.1 Eigenfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                  .   .   .   .   .   .   .    8
        2.5.2 Excluding the First Several Principal Components . . . . . . . . .                                                                      .   .   .   .   .   .   .    9
        2.5.3 Modular PCA and Eigenfeatures . . . . . . . . . . . . . . . . . . .                                                                     .   .   .   .   .   .   .    9
    2.6 Recognition Using Neural Networks . . . . . . . . . . . . . . . . . . . . .                                                                   .   .   .   .   .   .   .   10
        2.6.1 Self Organising Maps and Convolutional Networks . . . . . . . . .                                                                       .   .   .   .   .   .   .   10
        2.6.2 Ensemble Networks . . . . . . . . . . . . . . . . . . . . . . . . .                                                                     .   .   .   .   .   .   .   10
        2.6.3 Multi-Layer Perceptron Networks and Auto-Association Networks .                                                                         .   .   .   .   .   .   .   11
        2.6.4 Output Classification . . . . . . . . . . . . . . . . . . . . . . . . .                                                                  .   .   .   .   .   .   .   12
    2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                                .   .   .   .   .   .   .   12

3   Research Methodology                                                                                                                                                          13
    3.1 Introduction . . . . .   .   .   .   .   .   .   .   .   .   .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
    3.2 Research Question .      .   .   .   .   .   .   .   .   .   .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
    3.3 Research Hypothesis      .   .   .   .   .   .   .   .   .   .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
    3.4 Image Pre-Processing     .   .   .   .   .   .   .   .   .   .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
         3.4.1 Motivation .      .   .   .   .   .   .   .   .   .   .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
         3.4.2 Methodology       .   .   .   .   .   .   .   .   .   .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
    3.5 Face Detection . . .     .   .   .   .   .   .   .   .   .   .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
         3.5.1 Motivation .      .   .   .   .   .   .   .   .   .   .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
         3.5.2 Methodology       .   .   .   .   .   .   .   .   .   .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14


                                                                         iv
    3.6  Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
         3.6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
         3.6.2 Methodology . . . . . . . . . . . . . . . . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
    3.7 PCA on Each Feature . . . . . . . . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
         3.7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
         3.7.2 Methodology . . . . . . . . . . . . . . . . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
    3.8 Classification of Faces Using Neural Networks . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
         3.8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
         3.8.2 Methodology . . . . . . . . . . . . . . . . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
    3.9 Implementation and Testing . . . . . . . . . . . . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
         3.9.1 Eigenfaces (PCA) . . . . . . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
         3.9.2 Single Neural Network . . . . . . . . . . . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
         3.9.3 Modular PCA with Neural Networks for Classification                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
         3.9.4 Testing the Hypothesis . . . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
    3.10 Neural Network Structures and Future Work . . . . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
    3.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   19

4   Results                                                                                                                                  20
    4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   20
    4.2 Recognition Rate . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   20
    4.3 Improving the Recognition and False Acceptance Rates         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   21
    4.4 False Recognition Rate and False Acceptance Rate . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
    4.5 Timing . . . . . . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   24
    4.6 Alternate Neural Network Structures and Future Work          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   26
    4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   27

5   Conclusion                                                                                                                               28

References                                                                                                                                   31




                                                     v
Chapter 1

Introduction

Face recognition is a complex problem in computer science. In recent years advances in technology
have allowed us to greatly improve our algorithms and infrastructure for face recognition. While there
are other successful biometric identification systems available today, none of them have the freedom and
extensibility of face recognition [Zhao et al. 2003]. Currently face recognition is still in its infancy stage,
and is not robust or efficient enough to be successfully implemented in industry. Many new and novel
approaches to face recognition have become available recently (which are often quite complex), such as
the work by Rama and Tarres [2005]. However, many of the older techniques have not exhausted their
potential, specifically the work by Turk and Pentland [1991].
In the context of face recognition for security purposes, the system would encounter its own specific
problems. Some of these problems are related to the input data, in that we will not always have (or want
to have) the cooperation of the individual. This means an individual can be recognised without having
to hold their face in a particular way, or force a particular facial expression. Lighting conditions may
vary, not only in ambient light, but also the direction of the light, which may cause shadows in different
parts of the face. Individuals being recognised may change their appearance over longer periods of time,
such as hair style, skin tone, and even natural ageing of the face. The algorithm should be created in
such a way that it is not easily fooled, and should try to minimize unauthorised access, as this defeats the
purpose of the system. The system needs to be able to do the recognition relatively quickly. A complex
algorithm may give almost 100% accuracy, but it may also take hours to train and do the recognition.
The security application that this project will be intended for is mimicked on existing biometric security
systems such as fingerprint scanners. The system will use input from a camera, and will not require any
participation or interaction from the user. A simple scenario where this system could be applied, would
be attached to a personal computer that uses face recognition to determine when its owner is sitting in
front of the computer, and lock the screen when the owner is not present.
This paper suggests a method of solving the face recognition problem by combining existing methods,
and provides results based on the AT&T face database. The method uses principal component anal-
ysis (PCA) for representation of data, and dimension reduction, and then utilises neural networks to
do the classification of the faces, and associate them with specific people. Thresholding methods are
implemented on the output of the neural network for rejection of unauthorised persons. Various other
steps need to be included in the algorithm, such as normalising the lighting in the image, and removing
unwanted data in the original image (such as the background) that may distract the algorithm. Some
additional changes for improving the efficiency of this method, and the results of these changes, are also
discussed.
Chapter 2 on page 3 serves as an introduction to the face recognition problem and explains the steps
involved in successfully solving the problem. It also gives a motivation as to why research into face
recognition is viable and necessary. We take a look at some existing methods for face recognition. In



                                                      1
Chapter 4 I will compare the results of my proposed methods to some of the results of the other methods
that are discussed in Chapter 2.
Chapter 3 on page 13 looks at the method of research, and the details of the implementation of my
face recognition system. Each step is described and motivated, and I also show how it is implemented
specifically in this project, and motivate any changes from the original proposed method. I also discuss
the research questions and the resulting research hypothesis, and how they are shown to be tested in
Chapter 4 on page 20. The AT&T face database that the algorithm will be tested on is discussed, and I
motivate why I have chosen that particular database for testing.
The findings of my research are displayed and discussed in Chapter 4 on page 20. I evaluate if this
system could be implemented effectively for everyday use, and compare it to other commonly known
methods for face recognition. I look at the recognition rate and false acceptance rate of each method
and determine that my proposed method is able to perform recognition at least as well as a pure neural
network method, and is able to train far quicker. My proposed method also outperforms the eigenfaces
approach suggested by Turk and Pentland [1991] in all areas except training and running time.
I also look at the possibility of implementing an ensemble of neural networks for recognition, which
would keep a high recognition rate when the database contains many people. I look at theoretical results,
and what changes would need to be made to my proposed method, and the pure neural network method
before they can be implemented as part of an ensemble recognition system.




                                                   2
Chapter 2

Background and Related Work

2.1     Introduction

The overall background to the face recognition problem covers many areas. The more popular and
well-documented areas are discussed in this chapter, and some of the methods and implementations are
discussed. In particular, details are given about the methods that are required to understand the research
method in this paper. This chapter not only discusses existing methods for face recognition, but also
looks at some of the results and conclusions obtained by other researchers when implementing these
methods, and variations on them. Section 2.2 will give an overview of what the face recognition problem
actually involves, some motivation for research in the area, as well as some common problems that occur
in face recognition when put into the context of security purposes.


2.2     The Facial Recognition Problem

The basic principles behind the face recognition problem are to use a computer algorithm to identify a
person via visual input. Research into facial recognition stems from humans ability to recognise faces
very well. The face recognition problem, and the success of the solution depends on the the solution to
the following two problems: representation and matching [Zhang et al. 1997]. A grey-scale image of a
face is represented on the computer as a two dimensional array of integers, or grey-level values (normally
between 0 and 255, being black and white respectively). Often it is more convenient for us to represent a
picture as a one dimensional array, in which case each row in the matrix is placed side-by-side in such a
way that it follows on from the row above it. This representation would not make any sense to a human
viewer, but the computer would be able to understand it just as easily as if it was represented as a two
dimensional array.


2.2.1   Motivation for Face Recognition

According to Zhao et al. [2003], face recognition has a wide range of commercial and law enforcement
applications, such as CCTV control and human computer interaction. New technology has become
available after 30 years of research in the field, and while current systems have achieved a high level of
maturity and success, they still lack robustness when used in real life applications.
Biometric personal identification systems already exist, such as fingerprint and iris scanners. How-
ever these methods rely on the cooperation of person being identified, whereas face recognition can
be achieved without any interaction from the subject, or even without the subjects knowledge [Zhao et
al. 2003]. This can be useful when searching for criminals in CCTV footage, as well as identifying
unauthorised people in areas they are not supposed to be.

                                                    3
Face recognition systems are often criticised for being easily deceived (for example by using static im-
ages of an authorised person), however this can be averted by using face thermograms or infrared cameras
to ensure the authenticity of the person being tested [Bryliuk and Starovoitov 2002]. These methods can
easily be integrated into, or alongside algorithms based on visual input, such as the methods discussed in
this paper.


2.2.2   Recognition and Verification

Two main problems exist in face recognition [Zhao et al. 2003]. The first is being able to recognise a
particular person as being one of a set of subjects stored on the system, and the second is determining if
the person actually exists in the system or not. The second problem comes about as a result of us not being
able to determine with 100% accuracy the identification of any particular person in the database, and is
a much more challenging problem to solve. The closest matching person in the database is normally
selected. An unauthorised person who does not exist in the database, may still resemble one or more
of the subjects in the database to a certain extent. It is therefore necessary to implement some sort of
thresholding algorithm to filter out subjects who match with individuals in the database, but are below a
certain percentage.
Humans are particularly good at recognising faces. Scientists are not yet sure whether humans recognise
faces holistically, or whether each facial feature is recognised individually [Zhao et al. 2003]. Study in
this area can help us design algorithms that mimic the way humans recognise faces. Hair, face outline,
mouth and eyes have been shown to be important for humans to recognise and remember faces. Studies
have shown that the nose plays a very minor role in remembering frontal images of faces [Zhao et al.
2003]. For long term recognition systems however, features such as hair are not very useful, as hairstyles
and hair-colour may change over a period of a few weeks. Face outline can be difficult to obtain as it will
likely include background information, and the outline easily changes when there are slight variations in
the orientation of the head.
Zhao et al. [2003] breaks the face recognition problem down into three main subtasks:

   1. Detection and normalisation of the face in the image

   2. Feature extraction, and further normalisation

   3. Identification and verification


2.2.3   Partially Occluded Faces

A further problem that is often encountered in face recognition, is the addition of new objects such as
glasses, hats or facial hair which may partially occlude sections of the face [Rama and Tarres 2005].
Because of this, it makes it difficult to approach face recognition holistically (that is, recognising the
face as a whole), and many approaches try to solve this problem by segmenting the face into separate
parts. This way, if one or more parts are obscured by an object or have poor lighting, the algorithm
will still be able to recognise the person by using the other parts of the face. Lophoscopic PCA which
is described by Rama and Tarres [2005] makes use of this approach by performing a commonly used
recognition algorithm on segments of the face.


2.3     Image Pre-processing

Often images are not in the correct form to be plugged straight into a recognition algorithm. The image
may contain more information than one single face, and the lighting conditions in the test image may
not be the same as in the sample data for training the algorithm. This can greatly affect the effectiveness

                                                      4
or the recognition rate of the algorithm. Therefore, to obtain the best possible results, it is necessary
to pre-process an image, to normalise lighting and remove noise, before inserting it into a recognition
algorithm.


2.3.1   Lighting Normalisation

Gross and Brajovic [2003] say that besides variation in pose, illumination is the next most significant
factor that affect a face’s appearance. The 3D nature of the human face causes certain parts of the face
to be illuminated in different intensities. This can often cause problems for human recognition of faces,
but is a much more significant problem for computer recognition of faces. Zhao et al. [2003] discuss
the importance of lighting in human recognition of faces. Faces that are illuminated from the top are
significantly easier to recognise than faces illuminated from the bottom. One can use various methods to
attempt to normalise the image in such a way as to reduce the differences in lighting in the image.
A simple histogram normalisation step prior to face recognition can make a system more robust to il-
lumination and contrast variance in the input data [King and Xu 1997]. The concept behind histogram
normalisation is to ensure that each value of the dynamic range appears an equal number of times. Fig-
ure 2.1 is an example of a histogram normalisation on the dynamic range of the image. We can see that
the histogram normalisation has improved detail in the areas of low contrast. Normalising the lighting
in an image can be biologically motivated, because the iris of the human eye expands and contracts to
normalise the amount of light entering the eye [King and Xu 1997].




                             Figure 2.1: Histogram Equalisation of an Image

Even state-of-the-art commercial face recognition algorithms struggle with varying lighting conditions
[Gross and Brajovic 2003]. This gives us good reason to minimise the variation in lighting before enter-
ing it into the algorithm. Many existing algorithms deal with this problem by trying to create a 3D model
of the face from a series of pictures, or use a set of training images, each in different lighting conditions.
Gross and Brajovic [2003] have devised an illumination normalisation algorithm based partly on human
vision that significantly improves recognition rates in standard algorithms. The algorithm is called the
reflectance perception model and it deals with local contrast levels rather than the overall brightness of
the image.
The LogAbout method proposed by Liu et al. [2001] is implemented by applying a high pass filter
followed by a logarithmic transform of the image. The filter used in the original paper was as follows:

                                                -1   -1    -1
                                                -1   9     -1
                                                -1   -1    -1




                                                      5
Following that the illumination normalised image g can be found by:

                                                      ln(f (x, y) + 1)
                                      g(x, y) = a +
                                                            b ln c
Where f (x, y) is the original image after the edge detection filter above has been applied, and a, b and c
are constants that affect the shape of the logarithmic transform, and can be adjusted to suit the data it is
being applied to.
An example of what the LogAbout algorithm does can be seen in Figure 2.2. We can see the dark shadow
on the left side of the face is no longer as prominent, and the smaller features of the face with low contrast
have been exaggerated. This process, when applied to an image before PCA recognition, outperforms
the use of a Homomorphic filter and histogram normalisation, and comes close to the performance of
wavelets [Mendona et al. 2007]. The LogAbout method outperforms wavelets in terms of computational
complexity. Table 2.1 gives an overview of the performance increases due to lighting normalisation when
using PCA for recognition on the Yale face database. The tests in Table 2.1 were obtained by Mendona
et al. [2007].

                                      Method              Recognition Rate
                                  No Normalisation             77%
                                 Homomorphic filter             79%
                                     LogAbout                  91%
                                     Wavelets                  93%

                           Table 2.1: Performance increases after normalisation




                      Figure 2.2: LogAbout Illumination Normalisation Algorithm


2.3.2   Image Resolution and Noise

A face in an image will probably need to be normalised to a specific size to allow a recognition algorithm
to act on it. However if the original image (before resizing) is too small, or of poor quality, then the image
being inserted into the recognition algorithm may not contain enough data to be accurately recognised.
One needs to ensure that the resolution of the image is high enough to allow any important features to
be distinguished. However if the resolution is too high, the algorithm may take too long to process the
image, or the algorithm may end up using unimportant details in a face to recognise it.
If the input image contains noise, and especially if the noise is not consistent in all images, then the
noise needs to be removed to prevent it interfering with the recognition of faces. Periodic noise can be
removed by converting images to the Fourier domain, removing selected points from the image, and then
transforming the image back to the x − y domain. Most other noise types can be removed or lessened by
applying various smoothing filters to the image.

                                                      6
2.4     Face Detection and Facial Feature Detection

Even in holistic approaches, such as Turk and Pentland [1991], the system needs to know the exact loca-
tions of key features such as the eyes nose and mouth [Zhao et al. 2003]. Turk and Pentland [1991] admit
that the background in images (behind the face) can cause significant problem when doing recognition
with their method. The scale and size of the face is also an important aspect which can affect a system’s
recognition rate. It is necessary then to detect and normalise a face in an image, and attempt to block
out as much of the background as possible. The two main categories of algorithms used to locate facial
features are grey-level template matching, and geometric relationships between facial features [Shih and
Chuang 2004].


2.4.1   PCA

Turk and Pentland [1991] developed a Principal Component Analysis approach not only to recognise
faces (see Section 2.5.1 on the next page), but also to track and detect faces in an image or sequence of
images. By constructing a face-space of pictures that are known to be faces, sections of a larger image
can be compared to the face-space, and if the distance between the section of the image and the face
space is smaller than some threshold, we can conclude that that area of the larger picture contains a face.
PCA is currently one of the most promising areas in face recognition and detection [Gottumukkal and
Asari 2004]. While it has some drawbacks, it is fast and can produce excellent results assuming the
training data is in the correct form.


2.4.2   Template and Geometric Matching

This approach to face and feature detection involves using sliding windows, which scan through an
image. The window contains a template of what the face should look like, and eigenfaces or neural
networks can be used to determine if a particular part of an image matches closely enough to the template
to be considered a face [Shih and Chuang 2004]. Shih and Chuang [2004] also state however, that this
method is most likely to fail on faces with glasses or facial hair, assuming the training data does not take
this into account.
The geometric structure of the face can be used to find particular features as well [Shih and Chuang 2004].
An algorithm for frontal view face detection could, for example use the fact that the eyes and the mouth
form an isosceles triangle. We can also use the fact that in most faces, the vertical distance between the
eyes and the mouth is proportional to the distance between the centre of the two eyes. These methods
however, assume that the features such as the eyes and mouth are easily distinguished in the image, and
are not occluded by glasses or hair. This method could also give many false positives, especially if the
image contains a lot of information other than a face.


2.4.3   Thresholding

The thresholding method proposed by Shih and Chuang [2004] is divided into 2 parts: High thresholding:
to obtain the head boundary, and low thresholding: to obtain the outline of the facial features. Various
edge detection filters are applied to the image to find the outlines of the head and the features. The strong
edges are assumed to be the head outline, and the weak edges inside the head are assumed to be the facial
feature outlines. The image is then converted to a binary image. Sometimes parts of the face boundary
cannot easily be seen, and a face-boundary repairing technique needs to be used. This technique takes
advantage of the elliptical shape of the head, and can fill in any areas that are missing.




                                                     7
2.5        Recognition Using PCA

2.5.1       Eigenfaces

The first really successful implementation of face recognition was done by Turk and Pentland [1991],
using eigenpictures or eigenfaces [Zhao et al. 2003]. This method uses principal component analysis
(PCA) to create a face-space. Each face is represented as a subset of its eigenvalues. The recognition
works by projecting faces onto a feature space, which is created from sample data [Turk and Pentland
1991]. Each feature distinguishes one face from another.
Eigenfaces are known to work well when the lighting difference in the images is small [Zhang et al.
1997].
By representing a face by its eigenvectors, the image is compressed, and therefore reduces the complexity
of the problem. The eigenface approach is optimal in terms of its complexity [Zhang et al. 1997].
Below is an outline of the PCA approach1 to recognition as described by Turk and Pentland [1991]:
Represent an input image as an n2 × 1 vector rather than an n × n matrix such as in Figure 2.3:




                                                        Figure 2.3:

Let Γ be an n2 × 1 vector representing an n × n face.
Step 1: Obtain Training faces I1 , I2 ...IM
Step 2: Represent each Ii as Γi
Step 3: Determine the average face Ψ where:

                                                                  M
                                                         1
                                                      Ψ=                Γi
                                                         M
                                                                  i=1


Step 4: Subtract the average face from each Γi :

                                                       Φi = Γi − Ψ

Step 5: Determine the matrix A such that each column of A is a face:

                                                   A = [Φ1 , Φ2 ...ΦM ]
   1
       These steps have been adapted from course notes available at: www.cse.unr.edu/˜ bebis/MathMethods/PCA/case study pca1.pdf




                                                              8
Step 6: Find the n2 × n2 covariance matrix C of A and calculate its eigenvectors ui :

                                                  C = AAT

Note however that this matrix is very large and one can reduce the problem’s complexity by using the
matrix AT A (which is M × M ) and computing its eigenvectors vi .
The two matrices have the same eigenvalues and their eigenvectors are related as follows:

                                                  ui = Avi

It should be noted that the M eigenvalues of AT A are the M largest eigenvalues of AAT .
Therefore the M best eigenvalues of AAT can be calculated by ui = Avi with ui = 1.
Keep only the k largest of these M eigenvectors, where k can be calculated theoretically or experimen-
tally.
Step 7: Each Φi can now be represented as a linear combination of the k eigenvectors:

                                             k
                                     Φi =         wj uj , (wj = uT Φi )
                                                                 j
                                            j=1


Recognition is done by projecting the (n2 × 1) input image onto the eigenspace calculated in Step 7
above. i.e. it is represented as a linear combination on the eigenvectors.
We then find the minimum of the distances to all of the faces stored in the database, and the closest
matching one is recognised. It should be noted that if the distance is greater than some threshold t, then
the person is classified as unrecognised. t must be determined experimentally.


2.5.2   Excluding the First Several Principal Components

According to Deng et al. [2005] the recognition rate, when using PCA, can be improved by excluding
the first one to nine principal components. This is because in most cases, the first (and largest) principal
component relates to the changes in lighting in the images. The next several components will be related to
movements inside the image. These characteristics differ depending on the application of the PCA as well
as other factors such as image preprocessing. Removing too many components could result in insufficient
information to do the recognition correctly. It therefore needs to be determined experimentally how many
principal components should be thrown away in order to maximise the recognition rate.


2.5.3   Modular PCA and Eigenfeatures

Zhao et al. [2003] say that combining the use of eigenfaces, eigeneyes, eigenmouths, etc. has improved
results over using just eigenfaces. While currently these improvements are theoretically quite small,
Zhao et al. [2003] believe that combining holistic and local features is very important and deserves more
attention.
Gottumukkal and Asari [2004] present a method of modular PCA whereby the original image is divided
into a number of smaller images. Each sub-image is treated separately in the PCA process. The results
showed that there was a large improvement over conventional PCA when there were large changes in
illumination and facial expression, however it did not make any difference when there were changes in
pose or face orientation. The modular approach allows us to ignore (or partially disregard) parts of the
face which may be abnormal or vastly different from our training data.



                                                      9
A slightly different approach is presented by Rama and Tarres [2005]. This approach eliminates certain
local features (such as the eyes or mouth) in the training set, and uses the rest of the face for recognition.
This method works by duplicating each training image 6 times, and in each training image, blacking out
different parts of the face. This technique is particularly useful when the addition of new objects is seen
on the face (such as glasses), or when the face contains varied facial expressions. This technique has
shown to improve the results of PCA recognition, however its computational complexity is 6 times that
of normal PCA.
Localised PCA gives a smaller error since it pays more attention to local structures in the face instead
of looking at the face as a whole [King and Xu 1997]. This method requires that we already know the
location of the facial features we wish to use in our PCA. We can use the methods for face and feature
detection suggested in Section 2.4 on page 7.


2.6     Recognition Using Neural Networks

A Neural Network is a computational decision-maker that is modelled on how the human brain works.
Feed forward networks consist of an input layer, an output layer, and optionally one or more hidden
layers in between. Inputs in the form of numbers are taken from the input layer, and subjected to various
weights and thresholds at each node and propagated through until the output layer is reached. A network
is trained by giving it an input and an expected output. The training process adjusts the weights and
thresholds to train the network to give the desired output. The most common training process is known
as backpropogation.
Zhao et al. [2003] describe a fully automatic neural network approach to face recognition. The network
consists of three modules: a face detector, an eye localiser and a face recogniser. It is often necessary to
combine two or more networks together, feeding the output of one network into the other networks.
Two network structures are discussed by Zhao et al. [2003]. All-classes-in-one-networks (ACON) are
similar to conventional multi-layer perceptrons (MLP) in that the entire network is used to remember
all subjects. The second network structure is one-class-in-one-network (OCON), where each subject has
their own network dedicated to recognising them. The OCON has advantages over the ACON structure
in that it requires less hidden nodes in each sub-network, and therefore converges faster during training.
Each network is independent of the other networks, therefore making the system more easily extendible.


2.6.1   Self Organising Maps and Convolutional Networks

Lawrence et al. [1996] propose the use of Self Organising Maps (SOM) to reduce the dimension of the
input image, as well as to create invariance to minor changes in the face. SOM are an unsupervised
learning process that is trained on input patterns. Once the input has been reduced with a SOM, it can be
fed into a feed–forward neural network to do classification.
Convolutional networks have successfully been used in character recognition [Lawrence et al. 1996].
The network consists of different layers which contain one or more planes. The input data needs to be
normalised and centred. Each plane receives data from a small section of the plane in the previous layer.
Each plane can be considered a feature map. Lawrence et al. [1996] reduce the data dimension by using
SOM, and then do classification using convolutional neural networks.


2.6.2   Ensemble Networks

An ensemble of neural networks can be used to reduce overall training time. One smaller network is
created for each person inside the database. The network is trained to recognise whether a given face
belongs to its person or not. The outputs of these networks can be fed into another network (known as an


                                                     10
aggregate network) [Bryliuk and Starovoitov 2002]. The network training time is reduced because there
are a smaller number of hidden nodes overall that need to be trained.
An ensemble of voting networks can also be created [Bryliuk and Starovoitov 2002]. A number of
networks are set up to recognise faces, and if a network is sure about a particular person, it votes for
them. The votes can be added up, and if a person has enough votes he is authorised. If a network is
unsure, it can abstain from voting. Bryliuk and Starovoitov [2002] found this method to be more robust
than a single network, however it has a high computational complexity, especially if the database of
known faces is large.


2.6.3   Multi-Layer Perceptron Networks and Auto-Association Networks

In theory, a single neural network trained with back-propagation may be used to directly recognise face
images [Zhang et al. 1997]. The actual recognition could be done on fairly large images, however, the
training process has a very high complexity, and can take an unreasonable amount of time when our
training dataset is large, and when the dimension of the pictures is large. A solution to this is to separate
the problem into two networks. The first network, known as an auto-association network, reduces the
dimension of our input, and the second classification network does the recognition.
The auto-association network works by having n-inputs, n-outputs and a single hidden layer with p
nodes and p      n [Zhang et al. 1997]. The network output is the best possible approximation of the
original input. This way, the outputs of the p nodes in the hidden layer constitute a unique description to
the original image. This compressed vector is then fed into the classification network. However under
the optimal circumstances this feature vector produced from the hidden layer is the same as the basis
produced in the eigenfaces approach [Zhang et al. 1997].
An example of a multi-layer perceptron network is shown in Figure 2.4. The inputs to each node in the
input layer are the pixel values, and the output of the network is related to a specific person.




                 Figure 2.4: An example of a multi-layer NN used for face recognition




                                                     11
2.6.4   Output Classification

In most neural networks used for face recognition, each output of the network would be assigned to a
specific person in the database. When a new face is run on the network, the output with the highest value
is considered to be the identified person. If the highest output is below some threshold t then the person
is considered to be unauthorised.
Most systems are trained only on positive samples, and therefore the systems perform poorly on faces
that have not been seen before. These systems exploit the fact that the unseen people will not be in
any way similar to the ones already present in the database [Bryliuk and Starovoitov 2002]. A method
is posed by Bryliuk and Starovoitov [2002] for classification of faces that improves performance in
rejecting previously unseen people. The algorithm labelled sqr calculates the root mean square deviation
from the real neural network output to the ideal output.

                                           n
                                                         +1, i = max 2
                                  d=            (Oi +                )
                                                         −1, i = max
                                          i=1


If d is less than some threshold t, then the person is unauthorised. The usable range of t is usually [0; 2]
where for t > 2 there can be no false rejections.


2.7     Conclusion

In this chapter I have reviewed some of the more common methods in face recognition, as well as
looked at some adaptations to common algorithms to improve performance. I have also looked at some
implementations of these various methods, and seen the results they have produced. I have given a
good motivation to do research into face recognition, and identified some problem areas I am likely to
encounter when applying face recognition to security systems. The recognition involves more than one
step. Often image pre-processing and face locating are required before one can implement the actual
algorithms. Some possible methods have been discussed, and compared in terms of their results, and
their computational complexity.
In Chapter 3 on the next page I discuss the methodology of my research. This includes identified research
questions, research hypothesis and research method. The method for research is broken down into logical
steps, and each step is motivated. I will also discuss how the testing of the algorithm will be done, and
how this relates to testing the hypothesis.




                                                        12
Chapter 3

Research Methodology

3.1       Introduction

Following on from an explanation of the background and how other researchers have tackled the face
recognition problem, this chapter gives an overview of my method of research. Each step is explained
and a solution to the step is chosen. Motivation is given for why a particular method has been chosen.
The fine details of how each method works are not explicitly laid out. A brief explanation of the method
is given. Some of the finer details are provided in Chapter 2 on page 3. I also talk about testing the
viability of alternate neural network structures, particularly an ensemble of neural networks.
First however, let us discuss the research questions that have been identified, and the research hypothesis
that was formulated from them.


3.2       Research Question

   1. Can we design a face recognition system using Neural Networks and Modular PCA that is robust
      in terms of varying lighting conditions, lateral offset of the face in the picture, and changes such
      as the addition of spectacles and changes in hair style?

   2. Can we analyse the false acceptance rate (FAR) and false rejection rate (FRR) of the system and
      investigate methods of reducing these errors?


3.3       Research Hypothesis

We can design a face recognition system using Modular PCA and a Neural Networks that will give
better results1 than other well-known systems tested on the same dataset.


3.4       Image Pre-Processing

3.4.1      Motivation

Almost every face recognition algorithm can benefit from image pre-processing. Using techniques such
as lighting normalisation and noise reduction help to make the set of all images more equal in terms of
external lighting factors. If the images are normalised in terms of lighting and noise, then the algorithm
  1
      Where “results” are determined in terms of face recognition for security purposes.


                                                               13
can focus on the differences between the actual faces rather than lighting or noise variations. In the
dataset that will be used in this project there is very little noise, and any noise that does exist is evenly
present in all pictures. Therefore we only expect to have to normalise the lighting in the image.


3.4.2   Methodology

It may be beneficial to perform a histogram normalisation on the image. The histogram normalisation
changes the picture such that each element in the available spectrum is used equally. An even better
normalisation technique to use is the LogAbout algorithm proposed by Liu et al. [2001]. The method
involves using a high pass edge detection filter followed by a logarithmic transform. This method has
shown to improve the performance of recognition algorithms [Mendona et al. 2007], and performs better
than the other normalisation methods tested by Liu et al. [2001], except wavelets. The wavelet illumi-
nation is complex to implement, has high complexity and has only a small performance gain over the
LogAbout method.


3.5     Face Detection

3.5.1   Motivation

Methods like PCA work very effectively assuming the algorithm knows the exact location of the face in
the picture, or the face is centred in the picture [Zhao et al. 2003]. Even a neural network based approach
would perform better if it knew where the face was in the picture. Therefore it is necessary to develop
a system of locating the face in the picture, so that either the whole face or elements of the face can be
easily extracted from the image. Even in the data used in this project, where the faces are approximately
centred, there are still slight lateral or sideways movements of the head which can make extraction of
small features, such as the eyes, difficult.


3.5.2   Methodology

Accurate face detection in images is in itself a problem as complex as recognition of faces. An efficient
library called fdlib has been developed by Kienzle et al. [2005] using support vector machines to locate
the position of faces in images. This library is free to use for research purposes, and compared to other
available libraries, performs faster and is more efficient. Once we have the exact location of the face in
the image, we can centre and normalise the size of the image around the face. The library is being used
instead of developing our own algorithm, to ensure that our detection is efficient enough to not interfere
with the recognition part of the algorithm.


3.6     Feature Extraction

3.6.1   Motivation

Papers such as Rama and Tarres [2005] and Gottumukkal and Asari [2004] use systems of recognition
whereby the face is divided into sections before the recognition is done on each section. By dividing the
face into sections, we can account for the possibility that one or two parts of the face are significantly
different to how they should be. This could be because of the addition or subtraction of glasses, facial
expressions, or even directional lighting on the face. Even if some parts of the face are obscured or
vastly different, the algorithm should still be able to match up the other parts of the face that remain
fairly similar to the training data.


                                                     14
3.6.2   Methodology

The face features can either be extracted manually by cutting out sections of the image, or by using PCA
or similar methods to find eigenvector based features. Because the exact locations of the face are being
found by the face detection library, we can safely assume that specific face features are going to appear in
specific parts of the image, and we do not need to use further algorithms to find specific features. We can
consequently extract these features, and do any further lighting or size normalisation on them that may
be required. These operations are elementary to perform in Matlab. Each facial feature, once extracted,
will be treated as a separate image.


3.7     PCA on Each Feature

3.7.1   Motivation

PCA or eigenfaces is known to work effectively on entire faces assuming the faces are normalised in
terms of lighting, and are centred in the image with little or no background behind the face [Zhang et al.
1997]. This same principle can be abstracted to smaller parts of the face. Each part of the face can be
done separately, and on ideal data for a particular person, all the parts of their face will match up to all
the parts that are stored in the database. This process can also be done by other algorithms such as neural
networks, however PCA is known to have a lower complexity, and is theoretically able to give results
that are as good as other algorithms.


3.7.2   Methodology

The method of performing principal component analysis is outlined in Section 2.5.1 on page 8. The idea
behind it is to reduce the dimensionality of the problem. Performing PCA on each individual feature
will happen in the same way, except each person will be represented by their projection onto the vector
spaces of each of the individual facial features. This means we will have to store more data for each
person, however the dimensions of each image is far less than the dimensions of using the entire face at
once.


3.8     Classification of Faces Using Neural Networks

3.8.1   Motivation

The normal approach to classification, after PCA is performed, is to use a minimum distance algorithm
(or a related algorithm) to find the closest matching person. However when dealing with the output from
multiple images from each of the facial features, it is advantageous to use a more advanced classifier
such as a neural network. A neural network, or ensemble of neural networks should be able to find the
subtle differences between different people’s facial features, as well as determine what weighting each
facial features should have in the classification process.


3.8.2   Methodology

A multi-layer perceptron network will be created to read in the projections of the features of the input
image. Because the projections have much lower dimension than the original image, we will have less
inputs into the network, and it should be able to train and run fairly quickly. The output layer of the
network will contain one node for each person stored in the dataset. The network output will be subjected
to the sqr method for classification, which is discussed in Section 2.6.4 on page 12. The threshold value t

                                                    15
used in the sqr method will have to be determined experimentally. Once the algorithm has been created,
further structures of neural networks can be tested, such as ensemble networks. Such changes may make
the system viable for adding subjects to the database.
The implementation of the neural network will be done using the Matlab Neural Network Toolbox. The
toolbox allows complete customisation when creating and training the network, and allows the user to
save and load an already trained network. Many tutorials exist on using this toolbox, and the Matlab help
file contains extensive documentation.


3.9    Implementation and Testing

The algorithm will be tested on the AT&T face database (formerly known as the ORL database). A few
subjects from the database can be seen in Figure 3.1 on the next page. This database contains 40 subjects,
each with 10 different pictures. The pictures contain slight variations in face angle, facial expression and
lighting. The database also has some side movement of the head within the image, i.e. the faces are not
necessarily centred in the image. The database also contains at least one individual that has images with
and without glasses. Various tests will be conducted, where in each test a different set of pictures will be
used for training and for testing. Both learned faces and unseen faces will be used as test images. The
recognition rate, false acceptance rate, will be calculated on each test, and averaged over all the tests.
The successful recognition rate is defined as follows:


                                          N umber of correctly recognised images
                   Recognition rate =
                                              T otal number of images tested
The false rejection rate (FRR) and false acceptance rate (FAR) can similarly be calculated:



                              N umber of authorised images that were rejected
                    F RR =
                                      T otal number of images tested


                            N umber of unauthorised images that were accepted
                  F AR =
                                     T otal number of images tested



I will implement my proposed method, as well as recognition by a single neural network, and recognition
using the eigenfaces method proposed by Turk and Pentland [1991]. This will allow me to compare the
results of each of the methods on the same database, and under the same conditions. Specifically, I will
need to execute all three methods in the same environment in order to compare running times and training
times of the algorithms.
On each run of each method, I will record:

   • The recognition rate of the algorithm without any thresholding.

   • The recognition rate of the algorithm after thresholding.

   • The false rejection rate after thresholding.

   • The false acceptance rate after thresholding.

   • The training time and running time.

                                                     16
               Figure 3.1: An example of some of the images in the AT&T face database


Each system will be run ten times for each number of people in the database, and an average of the values
listed above will be calculated.
In the following sections I will explain the details of my implementation of each of the 3 methods.


3.9.1   Eigenfaces (PCA)

This recognition method will be implemented according to the steps in Section 2.5.1 on page 8. Each
person entered into the database will have eight pictures used for training, and two pictures used for
testing. The system will be tested with various numbers of people in the database, ranging from 1-20. I
will therefore be able to monitor the performance of the system as I increase the number of people in the
database.
The people and pictures used for training and testing will be selected randomly at each run. Ten people,
not used in the training, will be randomly selected and run on the recognition system in order to determine
the false acceptance rate of the system.
The input to the system will be the face which has been extracted from the main image by the fdlib library
explained in Section 2.3.1 on page 5. A histogram equalisation will also be performed to normalise
lighting.
A euclidean minimum-distance algorithm will be used to determine the identity of an input image, and
an optimal threshold value will be determined to maximize the recognition rate, and minimize the false
acceptance rate. If the euclidean distance is below the threshold, then the input image will be counted as
recognised, otherwise it will be rejected.


3.9.2   Single Neural Network

A network similar to the one described in Section 2.6.3 on page 11, will be created in Matlab. Again I
will test the system using 1-20 people in the database. Eight images from each person will be used for
training, and two images will be used for testing. The images will be selected at random on each run,
and the people used in the training and testing of the system will be selected at random at each run.
Fifteen people are selected for negative training material for the system, and the remaining five which
are not used in the training are used as unseen tests to determine the false acceptance rate of the system.
A histogram equalisation is performed on the image before inserting into the network, to normalise
lighting. The input to the neural network is a single 80 × 80 pixel image of the face that has been
extracted by the fdlib library. The network therefore has 6400 inputs, and has a single hidden layer with

                                                    17
 √
2 num inputs = 160 neurons in it. The number of neurons in the output layer is equal to the number
of people being recognised by the system.
The output values of the network are in the range [−1, 1]. A suitable threshold value will be determined,
and if the output of the system is above the threshold, the input will be counted as recognised.
The network is implemented as a pattern recognition network in Matlab, and is trained using the Scaled
Conjugate Gradient method. The network is trained until the mean square error is below 1 × 10−9 .
On certain runs the network is not able to be trained to the required performance function value. After
500 epochs of training the network is checked against all the data it was trained on, and if it has not been
correctly trained, the weights of the network are reset, and the network is retrained.


3.9.3   Modular PCA with Neural Networks for Classification

Again the system will be tested with 1-20 people in the database, and all people and images used for
training and testing are selected at random at the beginning of each run.
An 80 × 80 pixel face will be extracted from the input image using the fdlib library. The image will
then be subjected to the LogAbout algorithm explained in Section 2.3.1 on page 5, in order to normalise
the lighting and prevent lighting changes from interfering with the recognition. Following that, three
sections of the image will be extracted to form three new images. They are, the left eye, right eye and
the mouth. These facial features are extracted in such a way as to minimise any hair or background from
appearing in the picture.
The principal components of each of these images are then found according to Section 2.5.1 on page 8.
The projections of each image onto these principal components, minus the first component, as suggested
by Deng et al. [2005], is then used as training data for the neural network. The size of these vectors, and
thus the number of inputs to the network, is equal to the number of images used in training. However
it may not be necessary to use all of the values, in order to do recognition correctly. We can choose the
largest M of these values, where M is determined experimentally.
                                                           √
Again the network will have a single hidden layer with 2 num inputs neurons. The number of outputs
to the network is equal to the number of people in the database, and the output will be fed through the
sqr function described in Section 2.6.4 on page 12 in an attempt to improve the false acceptance rate.
The network is implemented as a pattern recognition network in Matlab, and is trained using the Scaled
Conjugate Gradient method. The network is trained until the mean square error is below 1 × 10−9 .
On certain runs the network is not able to be trained to the required performance function value. After
training the network is checked against all the data it was trained on, and if it has not been correctly
trained, the weights of the network are reset, and the network is retrained.


3.9.4   Testing the Hypothesis

One can test my research hypothesis by determining if the results obtained by the method of modular
PCA and neural networks performs better than the other methods outlined above. Performance will
be based on the requirements of a security system. i.e. a low false acceptance rate, and a reasonable
recognition rate, as well as relatively short training and running times.


3.10     Neural Network Structures and Future Work

Once the algorithm has been constructed, we can examine different types of neural network structures
in attempt to reduce our false recognition rate and our false acceptance rate. The main testing will be
done on ensemble neural networks discussed in Section 2.6.2 on page 10. The idea of an ensemble of

                                                    18
neural networks is that each network is associated with one person stored in the database. The network
is trained to recognise only one particular person. This structure allows the system to be extended more
easily, and aids in adding and removing people from the system. Once I have the results I can determine
if this method would be statistically viable. By examining the recognition rate and the false acceptance
rate of the systems which have been trained on only one person, we can calculate the false acceptance
rate and recognition rate of a system consisting of an ensemble of these neural networks.


3.11     Conclusion

This chapter defines the research questions and the research hypothesis. It also provides a step–by–step
guide to how the research will be conducted and implemented. Each step is motivated and an explanation
of why it is important is given. The specific method that will be performed in each step is explained. I
also give details of how each method will be implemented, and how I will test the research hypothesis.
I also look at possible future work and how we can determine if an ensemble neural network would be
viable based on the results of other neural network structures.




                                                  19
Chapter 4

Results

4.1    Introduction

In this chapter I discuss the results and findings of my research method that is discussed in the previous
chapter. I display the “pure” recognition rate of the three different systems I am comparing, and the
recognition rate and false acceptance rate after applying thresholding techniques, which are necessary
for security systems. I will show how my proposed method for recognition combines the accuracy of
neural networks with the speed of PCA. I also discuss the false acceptance rates of each method, as well
as the training and running time of each method, and with these results in mind, I discuss the efficiency
of the three methods in terms of security purposes, and outline some of their strengths, and where they
fall short.
I also show theoretical results for the creation of an ensemble of neural networks, and discuss what needs
to be changed on my current networks in order to improve the performance of an ensemble recognition
system, which can be implemented as part of future work. I also discuss why it is necessary to consider
ensemble networks in order to maintain a high recognition rate when many people are added to the
database.
The excel files with each run of the results can be found at
http://www.cs.wits.ac.za/˜youngr/files/results/
and the Matlab files used to run the results can be found at
http://www.cs.wits.ac.za/˜youngr/files/code/.


4.2    Recognition Rate

Table 4.1 compares the “pure” recognition rates of the three methods. We can see that the neural network
method outperforms the other two methods, and also has the smallest standard deviation, meaning it is
more consistent in its results. My proposed method is the second highest with 96.09%. The PCA method
has the lowest of the three, however all three methods perform the task of recognition relatively well.

                    Table 4.1: Comparison Recognition Rate Without Thresholding

      (%)                       PCA     Std dev     NN      Std dev    Mod PCA + NN        Std dev
      Recognition rate          91.63      6.82     97.99      3.04    96.09                  3.75
      False recognition rate    8.37          –     2.01          –    3.91                      –




                                                   20
Figure 4.1 shows how the “pure” recognition rates change as we add more people to the system. All three
methods of recognition have a slight drop in performance as more people are added. The neural network
method has an advantage over the other two methods, as the other two methods lose a lot of information
when the principal components are created, and adding more people means losing more information.
The main downfall of the neural network method is the amount of time it requires for training, but this is
discussed in more detail in Section 4.5 on page 24.
Assuming the recognition rates drop linearly, as they appear to, all three systems would easily be able to
distinguish between 80–100 people before the recognition rate becomes too low to be useful.

                                                                 Recognition Rates
                                                                (without thresholding)
                        120



                        100



                        80
 Recognition Rate (%)




                                                                                                                            PCA
                        60                                                                                                  NN
                                                                                                                            Mod PCA + NN


                        40



                        20



                         0
                              1   2   3   4   5   6   7   8     9   10    11   12   13   14   15   16   17   18   19   20

                                                              Number of People




                                              Figure 4.1: Recognition rates before thresholding


4.3                      Improving the Recognition and False Acceptance Rates

One of the research questions is to investigate methods of reducing the false acceptance rate, and max-
imising the recognition rate. From Chapter 2 on page 3 we can identify two specific methods that have
been used in other research papers to improve the accuracy of the system. These methods are: using im-
age preprocessing, specifically the LogAbout method, and the sqr function for the output of the neural
network.
Table 4.2 on the next page shows the results of the system proposed by this paper (Modular PCA with
neural networks), with and without these added features, in order to determine if they do in fact improve
the accuracy of the system. The recognition rate is determined after the implementation of thresholding.




                                                                         21
          Table 4.2: Methods of Improving the Recognition Rate and False Acceptance Rate

                                                   Recognition rate    False acceptance rate
             Unmodified method                            76                    5.49
             Adding LogAbout preprocessing               79                    3.99
             Adding sqr function                         84                    3.67



One can see that the system does benefit substantially from these added features, and adding other meth-
ods may further improve the results. These results were obtained by running the system with five people
in the database, and the values were taken as an average of 10 runs. The last value in the table correlates
reasonably with the values displayed in Table 4.3.


4.4    False Recognition Rate and False Acceptance Rate

In order for a security face recognition system to function, it needs to reject people who are not autho-
rised by the system. This is done by means of a thresholding mechanism. If the person who is recognised
is below a predetermined threshold, then they are rejected. However this can adversely affect the recog-
nition rate of the system. A person may be recognised correctly, but if their recognition value is below
the threshold, then the system cannot guarantee their authenticity, and they must be rejected.
In Table 4.3 the values of the three systems (mentioned in Section 3.9 on page 16) are compared. The
values are obtained by taking an average of 10 runs for each number of people in the system, where
the number of people ranges from 1 to 20. On each run, the training data, and testing data are selected
randomly.

                     Table 4.3: False Recognition Rate and False Acceptance Rate


 (%)                                   PCA     Std dev    NN       Std dev    Mod PCA + NN        Std dev
 Recognition rate above threshold      79.90     11.31    77.96      11.27    80.69                  9.40
 False recognition rate                20.1          –    22.04          –    19.31                     –
 False acceptance rate                 31.33     12.88    2.34        2.99    3.82                   3.21



The data in Table 4.3 shows that after thresholding, my proposed method for recognition obtains the best
recognition rate. I believe this is due to the segmentation of the face, which excludes volatile features
such as the nose and hair, that change appearance easily depending on the orientation of the head. The
false acceptance rate of the neural network system is slightly lower than my system. The PCA method,
while obtaining a relatively good recognition rate, has a very high false acceptance rate of 31.33%.
This value is far too high to be used in a security system, as it means that roughly one third of people
who are unauthorised will be accepted by the system. While even the 2.34% and 3.82% of the neural
network system and my method respectively, are too large for extremely high security systems where
many unauthorised persons may try there luck, there are many applications where they are low enough
to be used successfully.
My system and the neural network system perform relatively equally. They can both be optimised more
in terms of training and structure of the neural network. However my system is able to train much faster
than the neural network system, due to the smaller dimension of the input data. This is discussed in more
depth in Section 4.5 on page 24.


                                                    22
The graphs in Figure 4.2 shows how the recognition rate after thresholding changes as we add more
people to the database. Again we see a slight decrease in all methods, however the neural network
method seems to decrease the least as we add more people.
Figure 4.3 on the following page shows the change in the false acceptance rate of all three systems as
the number of people being recognised increases. The PCA method is far higher than the others and
has a very high deviation from the mean, making it very unpredictable. Interestingly the neural network
method, and my method seem to remain fairly flat. The false acceptance rate does not increase much
(as one would expect) as we add more people. This means that we can retain a fairly high standard of
security even on large systems with hundreds of people.

                                                               Recognition Rate
                                                            False Acceptance Rate
              50                                                  (After Thresholding)
              100
              45

               90
              40
               80
              35
               70
              30
               60
   Rate (%)




                                                                                                                                 PCA
                                                                                                                                 PCA
  Rate (%)




              25
               50                                                                                                                NN
                                                                                                                                 NN
                                                                                                                                 Mod PCA +NN
                                                                                                                                 Mod PCA +NN
              20
               40

              15
               30

              10
               20

               10
               5


               00
                    11   22   33   44   55   66   77   88    99     10
                                                                   10      11
                                                                          11     12
                                                                                12     13
                                                                                      13     14
                                                                                            14     15
                                                                                                  15    16
                                                                                                        16   17
                                                                                                             17   18
                                                                                                                  18   19
                                                                                                                       19   20
                                                                                                                            20
                                                            Number of People
                                                            Number of People



                              Figure 4.2: Recognition rate after thresholding for the three methods




                                                                         23
                                                     False Acceptance Rate
             50


             45


             40


             35


             30
  Rate (%)




                                                                                                                       PCA
             25
                                                                                                                       NN
                                                                                                                       Mod PCA +NN
             20


             15


             10


              5


              0
                  1   2   3   4     5   6   7    8    9   10    11   12   13   14   15   16   17   18   19    20

                                                     Number of People



                                  Figure 4.3: False acceptance rate for the three methods


4.5           Timing

In this section we look at the time each method takes to run, as well as how long each method takes to
train. The timing results were obtained under a controlled environment. The computer used to obtain the
results was an Intel Core 2 Duo T5270 1.40GHz, with 2GB of memory, running Ubuntu Linux 10.10,
and Matlab 7.11.0 (R2010b).

                                  Table 4.4: Comparison of Training and Running Time

             (seconds)                   PCA    Std dev        NN         Std dev    Mod PCA + NN            Std dev
             Average training time       8.60      0.09        128.71       63.27    50.63                     14.83
             Average running time        0.11     0.002        0.16         0.002    0.20                      0.003



In terms of training time, PCA is by far the quickest method. My method has an average training time
of 50.63 seconds with 1-20 people in the database. The neural network method has the longest average
training time of 128.71 seconds, more than twice as long as my method. By looking at the graph in
Figure 4.4 on the next page we can see that the neural network training has a very high complexity,
and can be very unpredictable in the amount of time it takes to train. My method is not as fast as
PCA, however it has a lower complexity than the neural network method, and a much smaller standard
deviation, making it easier to predict how long it will take to train.




                                                               24
                                                                 Training time
                   250




                   200




                   150
  Time (seconds)




                                                                                                                        PCA
                                                                                                                        NN
                                                                                                                        Mod PCA + NN
                   100




                   50




                    0
                         1   2   3   4   5   6    7   8     9   10    11   12   13   14   15   16   17   18   19   20

                                                          Number of People




                                                 Figure 4.4: Average training times


From Table 4.4 on the preceding page we can see that the average running times, (which were taken as
an average of 10 runs for each person in the database), are all relatively small. With all of them much less
than one second, they are all able to do recognition in real time. Looking at Figure 4.5 on the next page,
we can see that the neural network method’s running time remains constant. The other two methods, that
both make use of PCA, increase linearly. My method has a higher running time due to it running PCA
and a neural network, but it is higher than the PCA method by only a constant amount of time. Even
with the linear increase in running time, my system would be able to hold more than 100 people before
the running time to do recognition increases to more than one second.




                                                                     25
                                                                  Running Time
                    0.3




                   0.25




                    0.2
  Time (seconds)




                                                                                                                              PCA
                   0.15
                                                                                                                              NN
                                                                                                                              Mod PCA + NN


                    0.1




                   0.05




                     0
                          1   2   3    4   5   6    7   8     9   10    11     12   13   14   15   16   17   18    19   20

                                                            Number of People



                                                   Figure 4.5: Average running times


4.6                 Alternate Neural Network Structures and Future Work

We can investigate using other neural network structures, in particular what is known as an ensemble
of neural networks. This system could be implemented and optimised as a part of future work. Neural
network based methods generally have a lower false acceptance rate and a higher recognition rate for
smaller numbers of people stored in the system. The latter is shown to be true in my data in Figure 4.2
on page 23. We can use this to our advantage by using a series of neural networks (one for each person)
as opposed to just one single network that degrades in performance as more and more people are added.
Using the values shown in Table 4.3 on page 22, we can calculate statistically if an ensemble of neural
networks could be viable. It should be noted however that these results were calculated on networks that
were not optimised to recognise only a single person.

                                  Table 4.5: Theoretical results of alternate neural network structures

                                                1 person 5 people                   10 people      15 people      20 people
                      Modular PCA with neural network:
                      Recognition rate (%)         85       85                         85             85             85
                      False acceptance rate (%)    1.5     7.28                      14.03          20.28           26.09
                      Training time (s)          18.43    92.15                      184.3          276.45          368.6
                      Running time (s)           0.167    0.835                       1.67          2.505           3.34
                      Neural network method:
                      Recognition rate (%)         85       85                         85             85             85
                      False acceptance rate (%)   0.96     2.96                       5.84           8.63           11.34
                      Training time (s)          45.23    226.15                     452.3          678.45          904.6
                      Running time (s)           0.156     0.78                       1.56           2.34           3.12



                                                                       26
In Table 4.5 on the preceding page, the recognition rate remains the same for any number of ensemble
networks. This is because only one network can correctly recognise a person. If any other network
recognises a person other than its own, it is counted as a false recognition. The false recognition rate is
calculated as follows: if we assume that a “random” input has has a probability of being falsely accepted
equal to each networks false acceptance rate, then the chance of it being falsely accepted by the system,
where it has to be run through n neural networks, where each neural network has a probability γ of
falsely recognising it, then the false acceptance rate of the entire system containing n networks can be
calculated by:
                                          F ARn = 1 − (1 − γ)n
The running and training times are simply multiplied by the number of networks, as each network needs
to be trained and run in succession. The networks for recognising a single person do not need to be
as complex as the networks used in these methods, and we can expect an implemented and optimised
version to have significantly smaller training and running times. Neural networks can also be trained and
run in parallel. We can therefore make use of multi-core architectures to speed up the process.
A further optimisation that could be looked at is to apply more negative training data. More negative
training data gives the network a more general idea of what not to accept, therefore decreasing the false
acceptance rate.
In an ensemble of networks, each network does not necessarily have to recognise only one person. If
each network was to recognise five people, we would only need four networks to recognise 20 different
people.
What we gain from ensemble networks is that our recognition rate does not decrease as we add more
people to the system. However we suffer on training times, and on false acceptance rates. If we could
optimise our network to have an almost zero false acceptance rate for recognising a single person, then
an ensemble of networks may be more beneficial for recognising large numbers of people.
If we were interested only in the “pure” recognition rate, as displayed in Table 4.1 on page 20, then
we could use ensemble networks, as there is no false acceptance rate, and we would retain our high
recognition rate, even after adding many people to the database.
In conclusion, these two methods, in their current state are not well suited for creating an ensemble net-
work system, however it would not be hard to tweak the networks and optimise them for recognition of a
single person, which would make them suitable for a security based ensemble neural network recognition
system.


4.7    Conclusion

This chapter has displayed the results of my own proposed method of modular PCA with neural networks,
and compared it to the results of both eigenfaces and standard neural networks for recognition. My
proposed method is able to perform as equally well as the neural network system, and outperforms it
in recognition rate and timing. Each method’s strengths and weaknesses are discussed, and possible
improvements, such as increasing the negative training data, have been suggested.
I have shown my research hypothesis (in Section 3.3 on page 13) to be correct, by showing that we
can create a system using modular PCA with neural networks that is able to perform relatively well for
security purposes, when compared to other well known methods.
I also discuss the possibility of implementing an ensemble of neural networks for recognition, in an
attempt to improve the recognition rate when large numbers of people are inserted into the database.
Theoretical results based on my existing results are calculated, however the actual implementation and
optimisation of the system is left as future work.



                                                    27
Chapter 5

Conclusion

In conclusion we can see that while there has been many attempts at improving current face recognition
systems, there is still a long way to go until we obtain a system that is robust and efficient enough to be
used for security purposes. Ongoing research focuses on improving only one aspect of the recognition,
and very little effort has been made to combine methods to produce an all–round efficient system. A
further problem is that most systems never tested on real–world data, and very few systems deal with
false acceptance rates.
A security based face recognition system will encounter its own distinct problems. Firstly, we can expect
a security system to be used for an extended period of time, and over this time, the appearance of a
person’s face is likely to change. These changes include new hairstyles, addition or removal of facial
hair and glasses, or even natural ageing. Apart from the changes in appearance of the face, other factors
such as lighting and angle may affect the system. The ambient lighting on the face cannot be assumed to
always be constant. Natural light will change its intensity and direction depending on the time of day or
year, causing shadows to fall on different parts of the face.
For security purposes, we would want our system to rather be more strict in the selection process of
who is and is not authorised. It is better to occasionally deny access to an authorised person (slightly
inconveniencing them), than to allow access to an unauthorised person, which defeats the purpose of the
entire system. However if the system prevents access to authorised persons too frequently, the system
becomes an inconvenience, and it is more beneficial to use another form of security control. The system
also needs to be able to do recognition in real–time. It is not acceptable for the system to take hours or
even minutes to do recognition on a face, even when the database contains hundreds of individuals. The
system should be easily extendible, allowing for individuals to easily be added and removed.
Security purposes are not limited to determining access to an area or particular system. Face recognition
can be implemented on security cameras to search for wanted individuals, or even to ensure the correct
people are in the correct place at the correct time. Such recognition becomes even more difficult, as it
requires the tracking of the face in the video stream, and the face may not always be at the correct angle.
This particular application is not explored much in this research, however, parts of this research can be
applied to it.
This document gives a good background into the area of face recognition and analyses the faults and
successes of previously done methods. We motivate why it is necessary to conduct research into face
recognition, and describe some common problems encountered in face recognition. We propose and
test a method of face recognition for security purposes. Each subtask in the method is explained and
motivated. A research hypothesis has been formed from a set of research questions, and the method of
research is conducted in such a way as to test the hypothesis.
The results of this document show that my proposed system is capable of a relatively high recognition
rate, and has a low enough false acceptance rate to be used in most security systems, especially systems
that will not encounter a large number of unauthorised persons. The data I used is not as strict as is often

                                                    28
required by security systems, and I believe that eliminating these “problem data sets” such as people with
highly angled heads, would improve my system even further.
There are many possible optimizations to the system that have not been discussed in this paper. These
optimizations deal mostly with improving the time taken, and the structure and training of the neural
network. This paper shows that we can combine existing methods of recognition to create new systems
that have improved results.




                                                   29
References

[Bryliuk and Starovoitov 2002] Dmitry Bryliuk and Valery Starovoitov. Access Control By Face Recog-
      nition Using Neural Networks. In The 2nd International Conference on Artificial Intelligence,
      pages 428–436, 16–20 September 2002.

[Deng et al. 2005] H. B. Deng, L. W. Jin, L. X. Zhen, and J. C. Huang. A New Facial Expression
     Recognition Method Based on Local Gabor Filter Bank and PCA plus LDA. International Journal
     of Information Technology, 11(11):86–96, 2005.

[Gottumukkal and Asari 2004] Rajkiran Gottumukkal and Vijayan K. Asari. An Improved Face Recog-
      nition Technique Based on Modular PCA Approach. Pattern Recognition Letters, 25(4):429 – 436,
      2004.

[Gross and Brajovic 2003] R. Gross and V. Brajovic. An Image Preprocessing Algorithm for Illumi-
      nation Invariant Face Recognition. In 4th International Conference on Audio and Video Based
      Biometric Person Authentication, pages 10–18, June 2003.

[Kienzle et al. 2005] W. Kienzle, G. Bakir, M. Franz, and B. Scholkopf. Face Detection - Efficient and
      Rank Deficient. Advances in Neural Information Processing Systems, 17:673–680, 2005.

[King and Xu 1997] I. King and L. Xu. Localized Principal Component Analysis Learning for Face
      Feature Extraction and Recognition. In Proceedings to the Workshop on 3D Computer Vision,
      pages 124–128, 1997.

[Lawrence et al. 1996] Steve Lawrence, C. Lee Giles, Ah Chung Tsoi, and Andrew D. Back. Face
     Recognition: A Hybrid Neural Network Approach. Technical report, University of Mary-land,
     August 1996.

[Liu et al. 2001] H. Liu, W. Gao, J. Miao, D. Zhao, G. Deng, and J. Li. Illumination Compensation and
      Feedback of Illumination Feature in Face Detection . In Proceedings of the IEEE International
      Conference on Info-tech and Info-net, volume 23, pages 444–449, 2001.

[Mendona et al. 2007] M. Mendona, J. Denipote, R.Fernandes, and M. Paiva. Illumination Normaliza-
     tion Methods for Face Recognition. In Proceedings of 20th Brazilian Symposium on Computer
     Graphics and Image Processing, 2007.

[Rama and Tarres 2005] A. Rama and F. Tarres. Lophoscopic PCA: A Novel Method For Face Recogni-
     tion. Technical report, Departament Teoria del Senyal i Comunicacions de la Universitat Politcnica
     de Catalunya (UPC), 2005.

[Shih and Chuang 2004] Frank Y. Shih and Chao-Fa Chuang. Automatic extraction of head and face
      boundaries and facial features. Information Sciences, 158:117 – 130, 2004.

[Turk and Pentland 1991] M Turk and A Pentland. Eigenfaces for Recognition. Journal of Cognitive
      Neuroscience, 3(1):71–86, 1991.



                                                  30
[Zhang et al. 1997] Jun Zhang, Young Yan, and Martin Lades. Face Recognition: Eigenface, Elastic
     Matching, and Neural Nets. Proceedings of the IEEE, 85(9):1423–1435, September 1997.

[Zhao et al. 2003] W Zhao, R Chellappa, PJ Phillips, and A Rosenfeld. Face Recognition: A Literature
     Survey. ACM Computing Surveys, 35(4):399–458, December 2003.




                                                31

				
DOCUMENT INFO