A Real Time Face Recognition System Using Custom VLSI Hardware-raviteja

Document Sample
A Real Time Face Recognition System Using Custom VLSI Hardware-raviteja Powered By Docstoc
					 A Real-Time Face Recognition System
    Using Custom VLSI Hardware


V.Raviteja                      G.Ravichandra
raviteja470@yahoo.co.in         ravichandra_gurram@yahoo.com

                                                       Humans are able to recognize
     A     real-time   face      recognition
                                               faces effortlessly under all kinds of
system can be implemented on an IBM
                                               adverse conditions, but this simple task
compatible personal computer with a
                                               has been difficult for computer systems
video camera, image digitizer, and
                                               even     under      fairly     constrained
custom VLSI image correlator chip.
                                               conditions. Successful face recognition
With a single frontal facial image
                                               entails the ability to identify the same
under       semi-controlled         lighting
                                               person under different circumstances
conditions, the system performs (i)
                                               while       distinguishing          between
image preprocessing and template
                                               individuals.     Variations    in     scale,
extraction, (ii) template correlation
                                               position, illumination, orientation, and
with a database of 173 images, and (iii)
                                               facial expression make it difficult to
postprocessing of correlation results to
                                               distinguish the intrinsic differences
identify the user. System performance
                                               between two different faces while
issues including image preprocessing,
                                               ignoring differences caused by the
face recognition algorithm, software
                                               environment. Even when acceptable
development,     and       VLSI hardware
                                               recognition has been accomplished
implementation       are     addressed.   In
                                               with    a      computer,      the    actual
particular, the parallel, fully pipelined
                                               implementation has typically required
VLSI image correlator is able to
                                               long run times on high performance
perform 340 Mop/second and achieve
                                               workstations or the use of expensive
a speed up of 20 over optimized
                                               supercomputers. The goal of this work
assembly code on an 80486/66DX2.
                                               is to develop an efficient, real-time
The complete system is able to identify
                                               face recognition system that would be
a user from a database of 173 images
                                               able to recognize a person in a matter
of 34 persons in approximately 2 to 3
                                               of a few seconds.
seconds.     While         the   recognition
                                                       Face recognition has been the
performance of the system is difficult
                                               focus of computer vision researchers
to quantify simply, the system achieves
                                               for many years. There are two basic
a very conservative 88% recognition
                                               approaches to face recognition, (i)
rate using cross-validation on the
                                               parameter-based and (ii) template-
moderately varied database.
                                               based. In parameter-based recognition,
the facial image is analyzed and                 biometrics     identification       systems
reduced    to   a      small   number      of    should be easy to use and less
parameters describing important facial           susceptible to fraud. In particular,
features such as the eye shape, nose             facial features are an obvious and
location, and cheek bone curvature.              effective biometrics of individuals, and
These few extracted facial parameters            the ability to recognize individuals
are subsequently compared to database            from their faces is an integral part of
of   known      faces.     Parameter-based       human society. While any computer
recognition     schemes        attempt      to   (or human) face recognition system has
develop an efficient representation of           obvious limitations such as identical
salient features of an individual.               twins or masks, face recognition could
        While the database search and            be used in combination with other
comparison       for       parameter-based       biometrics or security systems to
recognition          may        not        be    provide a much higher level of security
computationally intensive, the image             surpassing that of any individual
processing required to extract the               system.      However,      the      primary
appropriate     parameters       is      quite   advantages of face recognition is likely
computationally          expensive        and    to be its non-invasive nature and
requires careful selection of facial             socially     acceptable    method        for
parameters which will unambiguously              identifying individuals especially when
describe an individual’s face.                   compared with finger print analysis or
        The applications for a face              retinal scanning.
recognition system range from simple
                                                 FACE RECOGNITION TASK:
security to intelligent user interfaces.
While     physical     keys    and    secret            The face recognition system
passwords are the most common and                was based in large part Figure 1
conventional methods for identification          Overall Processing Data Flow on a
of individuals, they impose an obvious           template-based      face         recognition
burden on users and are susceptible to
fraud. In contrast, biometrics systems
attempt to identify persons by utilizing
inherent physical features of humans
such as fingerprints, retinal patterns,
and vocal characteristics. Effective
algorithm described by Brunelli and                the eye positions and inter-ocular
Poggio [2]. The actual recognition                 distance.
process can be broken down into three              EYE LOCATION:
distinct       phases.        (i)         Image
preprocessing and template extraction
and     normalization,       (ii)       template
correlation with image database and
(iii)   postprocessing       of     correlation
scores to identify user with high
confidence. From a single frontal facial
image under semi-controlled lighting
conditions and limited number of facial
expressions, the system can robustly
identify a user from an image database
of 173 images of 34 persons. While the
                                                      Locating    eyes   in   a   visually
recognition performance of the system
                                                   complex image in real-time is a
is difficult to quantify simply, the
                                                   formidable task. The goal of the real-
system achieves a very conservative
                                                   time face recognition system is to
88% recognition rate using cross-
                                                   operate in such a manner as to
validation on the moderately varied
                                                   minimally constrain the user’s position
                                                   within the image. This requires the
                                                   ability to find the eyes at varying
IMAGE PREPROCESSING:                               scales over a range of locations in the
                                                   image. Since the accuracy of the eye
           Image     preprocessing       entails
                                                   location affects the extraction of the
transforming a 512x480 grey-level
                                                   templates, and thus the correlation and
image into four intensity normalized
                                                   recognition, the location process must
templates corresponding to the eyes,
                                                   be precise. The location process is
nose, mouth, and the entire face
                                                   divided into two parts - rough location
(excluding hair, ears etc.) of the user.
                                                   and refinement.   The rough location
The        regions      of        the     image
                                                   phase quickly scans the image and
corresponding to the templates are
                                                   generates a list of candidate eye
located by finding the user’s eyes and
                                                   locations. The rough eye location
normalizing the image scale based on
                                                   algorithm is based on the observation
that an eye is distinguished by the          is normalized to be horizontal. The
presence of a large dark blob, the iris,     four     regions   of    the   image    are
surrounded by smaller light blobs on         determined by fixed ratios and offsets
each side, the whites. However, under        relative to the eyes. Skewless affine
certain lighting conditions, highlights      transformations are used to scale and
within the eyes need to be removed           rotate four area of the image into the
and can also be used as additional cues      four templates. When multiple image
for eye location. When coupled with          pixels correspond to a single template
sufficient high-level constraints on the     pixel, averaging is employed. The
relative positions of the blobs and an       template sizes are fixed but tailored to
acceptable      measure       of       the   the size of the region from which they
"blobbiness",   this   simple       system   are extracted. The face template is
performs     remarkably     well.     The    68×68, the eye template is 68×34, and
refinement stage then looks more             while the nose and mouth templates
closely at these areas to determine
more exactly the best fit for an eye,
given inter-ocular constraints. The
refinement process not only assigns a
more exact location to each of the
candidate eyes, but also assigns a
radius to the iris (see Figure 3). This
allows more selective pruning by
imposing the restriction that the two
eyes be of similar size. In addition, the
inter-ocular spacing is constrained to a
distance proportional to the eye size.       are each 34×34. The template size

                                             governs the accuracy and speed of the
NORMALIZATION:                               database      search.     Choosing      the
                                             templates to be too small results in a
Once the eyes are located, subsampled
                                             loss of information. Choosing the
templates of the face, eyes, nose, and
                                             templates too large results in extraction
mouth are extracted (see Figure 4). The
                                             and correlation process running slowly.
inter-ocular distance is taken as a
                                             In     addition,   the   registration   and
scaling factor, and the inter-ocular axis
                                              uniform lighting. Global normalization
                                              consists of determining the mean and
                                              standard deviation of the template and
                                              normalizing     the   pixel    values    to
                                              compensate for low variance due to
                                              dim lighting or image saturation.

                                              TEMPLATE     CORRELATION
                                              WITH IMAGE DATABASE:
between the templates alignment errors
become more severe          with     larger          After the facial image of the

template sizes.                               user has been preprocessed to obtain

         Once the templates have been         the    normalized       templates,      the

extracted, they must be normalized for        templates are compared to those in an

variations in lighting to ensure accurate     image database of known persons.

correlation between the templates. . If       Templates are compared to those in the

the image intensity is used directly, a       database   by a       robust   correlation

dark image of one person could match          process to compensate for possible

better with a dark image of a different       registration errors. In particular, the

person than with a light image of the         template is compared to database

same     person.    Since   the    lighting   images over a range of 25 different

conditions prevailing at the time of the      alignments corresponding to spatial

image    database    creation     may be      shifts between +2 and -2 pixels in both

different from those at the time of           the horizontal and vertical directions..

recognition, insensitivity to lighting        While absolute-difference correlation

conditions is crucial. Two types of           is more efficient than multiplication

template intensity normalization are          based correlation, it is still a time

employed, local normalization and             consuming process. Each set of four

global       normalization.          Local    templates consists of roughly 10,000

normalization entails dividing the pixel      pixels. Thus each template comparison

intensity at a given point by the             over the 25 different alignments

average intensity in a surrounding            requires      approximately      250,000

neighborhood.       This    is     roughly    absolute value and sum operations. An

equivalent to high pass filtering of the      Intel 80486/66DX2 running optimized

template data spatially and removes           assembly code can only perform

intensity gradients caused by non-            roughly 5 million integer absolute
value and sum operations per second              as possible. An image is recognized if
including data movement and other                the system correctly identifies it as
overhead. This would seem to limit the           corresponding to someone who is in
database search rate to 20 template sets         the database. An image is missed if the
per second, severely constraining the            user is in the database and the system
size of the database possible for real-          fails to identify him or her. Finally, an
time operation. The results are not              image is mistakenly recognized if the
accurate     enough       to    generate    a    system     claims       that     the        user
definitive answer, but can be used to            corresponds    to   a    person        in    the
narrow the individual’s identity to ten          database, and the user is actually a
candidates in a fraction of the time that        different person in the database or is
a full-resolution search requires. The           not   represented   in     the    database.
top ten candidates are then compared             Postprocessing attempts to maximize
at full resolution to the unknown                the recognition rate while minimizing
individual to yield the final result. In         the mistaken and mis-recognition rate
this way,                                        by interpreting the raw correlation
POSTPROCESSING OF                                scores with an intelligent and robust
                                                 decision making process.
       The        correlation      of      the
normalized extracted templates from
the target image with the database
templates generates a list of the top ten
candidates and their correlation scores.
The task of the postprocessing stage is
to    interpret     the        corresponding
correlation scores and determine if
they indicate a match with someone
previously    stored      in    the     image
database. Typically this is not a clear-
                                                          The 15 correlation scores and
cut decision; therefore decisions have
                                                 pseudo-scores for each of the ten
an associated measure of confidence.
                                                 candidates must then be interpreted to
The goal is to recognize as many
                                                 determine which, if any, of the
images as possible while missing and
                                                 candidates match the input image.
mistakenly recognizing as few images

           The system hardware consists              The image preprocessing and
of    an     IBM    PC   80486/DX2,      a    template extraction are performed by
commercial       frame   grabber,    video    the 80486, the template correlation
camera, and custom VLSI hardware              with the database is accelerated by
(see Figure 6). The goal of             the   using the VLSI image correlator, and
hardware system architecture is to            postprocessing is subsequently
extract the highest performance from
those components.

                                              performed by the 80486. The 80486
                                              provides a flexible platform for general
                                              computation while the VLSI image
           Software implementation of the
                                              correlator is fully optimized for a
face recognition system           described
                                              single operation, template correlation
above on an IBM PC will be limited
                                              with the image database. The database
bya         computational      bottleneck
                                              correlation task is to compute the
associated with the image database
                                              correlation of one template set against
correlation. Benchmarks on an Intel
                                              the   entire    database.    The   user’s
80486/66DX2 system (see Table I)
                                              templates remain constant throughout
reveal that real-time performance in
                                              the entire operation while the database
software alone would not be possible
                                              templates      vary   as    each   known
with a moderately sized database of
                                              individual is considered in succession.
500 images. Thus, in order to achieve
                                              Thus, the user’s templates can be
real-time      performance,   a     special
                                              cached using local SRAM on the
purpose VLSI image correlator was
                                              image coprocessor board to optimize
implemented and integrated into the
                                              the usage of the 8 MByte/sec ISA bus
system as a coprocessor board on the
                                              bandwidth (see Figure 7). Furthermore,
ISA bus.
                                            process through MOSIS (see Figure
                                            10). The MAGIC layout editor was
                                            used to realize the fully custom design
                                            of the 60,000-transistor chip.

since the image template data are only
8 bits wide, two templates can be
transferred in parallel to take full
                                            SYSTEM PERFORMANCE:
advantage of the 16 bit data bus.
        Thus, the VLSI correlator chip             The real-time face recognition
is designed with two independent            system user-interface is menu-driven
image    correlators   such    that   two   and user-friendly. There are many
database entries can be correlated          additional     features      that   were
simultaneously over all 25 possible         incorporated   for   rapid     debugging,
alignments. In this way, the correlation    building of image databases, and
time per 4KByte template is reduced to      development     of      more     advanced
0.9 ms/template, which increases the        recognition techniques. In all, the
possible throughput of the VLSI image       system software represents a large
coprocessor system to about 1000            portion of the research effort and is
templates/sec. Thus,     a moderately       implemented      with     approximately
sized database of 500 persons (a few        40,000 lines of C and 80x 86 assembly
thousand images) can be completely          codes. A typical screen capture of the
correlated in a few seconds.                real-time face recognition system is
                                            shown in Figure 11. The system
        The actual VLSI chip contained      initially locates the eyes of the user as
two     image   correlators    and    was   shown by concentric circles overlaid
fabricated on a 6.8mm × 6.8mm die in        on the original image. Subsequently,
a standard double metal, 2µm CMOS           four small templates are extracted and
compared to the database. The pseudo-      correlation scores. The preprocessing
scores of the top five candidates are      and template extraction phase is
shown at the bottom of the figure. The     performed      using     only    the   frame
highlighted numbers indicate scores        grabber      and       80486/66DX2        in
that exceed the threshold for a positive   approximately 1.8 seconds and is
match. The darkened numbers indicate       independent of the database size. A
scores that exceed the threshold for a     typical      timing      breakdown       for
negative match. All match scores are       preprocessing and template extraction
normalized and offset such that the        is shown in Table II.
rejection threshold was 0 and the
acceptance threshold was 100. Timing
and memory requirements are shown
in the text overlay below the extracted

                                                     The template correlation is
                                           performed      by      the   VLSI      image
                                           correlator and depends on the size of
                                           the   database.        Typical      database
        The speed of the system is         correlation time was approximately 0.3
measured from when the image is            seconds for a database of 173 images.
presented to when the user is notified     Postprocessing is performed by the
of identification. During this time the    80486 but is computationally quite
system must digitize the video image       simple and does not represent a
through the frame grabber, locate the      significant portion of computing time.
eyes,   extract    and   normalize   the
                                                     The recognition performance of
templates, search the database via
                                           the system is highly dependent on the
correlation,      and    interpret   the
                                           database of known persons and the
testing     set.    Cross-validation         is   a   also quite rare. As the recognition and
common           technique       for   measuring      rejection thresholds are adjustable, the
recognition performance. The system                   trade-off      between     missing    and
was       able     to      achieve     an     88%     mistakenly      recognizing     can    be
recognition        rate,     a    93%       correct   controlled     to   suit   a   particular
matching with the top candidate, and a                application.
97% correct matching with the top 3
candidates under cross-validation with
a moderately varied database of 173                          A real-time face recognition
images of 34 persons.                                 system can be developed by making
                                                      effective use of the computing power
                                                      available from an IBM PC 80486 and
                                                      by implementing a special purpose
                                                      VLSI image correlator. The complete
                                                      system requires 2 to 3 seconds to
                                                      analyze and recognize a user after
                                                      being presented with a reasonable
                                                      frontal facial image. This level of
                                                      performance was achieved through
                                                      careful system design of both software
                                                      and hardware. Issues ranging from
                                                      algorithm development to software and
           A typical screen captures his
                                                      hardware implementation, including
head or move slightly so as to be
                                                      custom digital VLSI design, were
recognized more readily on the next
                                                      addressed in the design of this system.
trial a few seconds later. Hence it is
                                                      This approach of extremely focussed
more important that the system does
                                                      system software and hardware co-
not mistakenly recognize a user as
                                                      design can also be effectively applied
someone that they are not, than to miss
                                                      to a wide range of high performance
the person and claim that they are not
                                                      computing applications.
in the database. During actual usage,
the system can sometimes require more
                                                         Robert J. Baron, "Mechanisms
than one trial, but recognition rarely                    of human facial recognition”
takes more than three or four trials.                    Google.com
                                                         Wikipedia.com
Additionally, mistaken recognition is

Shared By: