Docstoc

Facial Tracking Using Radial Basis Function

Document Sample
Facial Tracking Using Radial Basis Function Powered By Docstoc
					                                                 (IJCSIS) International Journal of Computer Science and Information Security,
                                                 Vol. 9, No. 3, March 2011




  FACIAL TRACKING USING RADIAL
         BASIS FUNCTION
      P.Mayilvahanan,                          Dr.S.Purushothaman                               Dr.A.Jothi
      Research scholar,                      Principal, Sun College of                            Dean,
       Dept. of MCA,                        Engineering & Technology,                      School of Computing
      Vel's University,                       Kanyakumari – 629902,                     Sciences, Vel’s University,
 Pallavaram, Chennai, India                  Tamil Nadu, India, Email:                  Pallavaram, Chennai, India
                                           <dr.s.purushothaman@gmail.
                                                        com



ABSTRACT--This paper implements facial tracking                 the template, by projecting it around the detected
using Radial basis function neural network (RBF).               positions of the target and considering its overlap
There is no unique method that claims perfect facial            with the segmented local object. The tracking
tracking in video transfer. The local features of a
                                                                results show good performance, when the camera
frame are segmented. A ratio is found based on a
criteria and output of RBF is used for transferring the
                                                                moves towards the object. Yan Tong et al. [3]
necessary information of the frame from one system to           developed a general framework for region
another system. A decision approach, with a threshold,          tracking which includes models for image
is used to detect if there is any change in the local           changes due to motion, illumination and partial
object of the successive frames. The accuracy of the            occlusion. They used a cascaded parametric
result depends upon the number of centers. The                  motion model and a small set of basis images, to
performance of the algorithm in reconstructing the              account for shading changes, which will be
tracked object is about 96.5% and similar to the                solved in a robust estimation framework, in order
performance of back propagation algorithm (BPA), in
                                                                to handle small partial occlusion. Gleicher [4]
terms of reduced time and quality of reconstruction.
                                                                introduced difference decomposition, to solve
Index Terms- Radial basis function (RBF), Back-                 the registration problem in tracking, where the
propagation algorithm (BPA); Watershed algorithm;               difference would be linear combination of a set
Motion Estimation.                                              of basis vectors. Sclaroff and Isidoro [5] used
                                                                this idea for template registration in region-based
                1.   INTRODUCTION                               non-rigid tracking, where the non-rigid
                                                                deformation was represented, in terms of
          In specific applications like video-                  eigenvectors of a finite element method.
conferencing, news telecast, most of the image                  Photometric variation is considered; and a
area is covered by a human face. Low bit-rate                   modified Delaunay refinement algorithm is used
video transmission is required by using 3D head                 to construct a consistent triangular mesh for the
models. Tracking algorithms are available to                    region of the tracked object.
track head in video sequence. There is no                                  Nguyen and Worring [6] made their
complete automatic system available for                         contribution, by introducing a contour tracking
extracting head model from the video. If three                  method, incorporating static segmentation by the
dimension head models can be extracted from                     watershed algorithm. Their method utilized kinds
the first frame (or a first few frames) in a video              of edge maps from motion (optic flow), intensity
sequence then it will become possible to build                  (watershed) and prediction (contour warping), to
extremely low bit rate video coding systems for                 update the object contour. It was claimed that
communicating head and shoulder scenes. Head                    this method yielded accurate and robust results.
models can be used for synthesizing views and                             The idea of “active blob” by Nguyen
facial expressions, animating virtual characters.               and Worring [6] discusses the non-rigid
          Shi and Tomasi [1] put forward the                    deformation. The Delaunay triangulation of
criterion of “good features” by its texture and                 computer graphics is used to generate some mesh
used it in affine feature tracking. Parry et. al [2]            of the object region [5].
introduced a region-based (formed by
segmentation) tracking method, mainly updating



                                                          134                               http://sites.google.com/site/ijcsis/
                                                                                            ISSN 1947-5500
                                             (IJCSIS) International Journal of Computer Science and Information Security,
                                             Vol. 9, No. 3, March 2011




          In general, the following procedure is                 B.       Methods
adopted for face tracking algorithm [7]
                                                                        The Radial basis function (RBF) is a
a. Wait for a face(s) to appear in the frame                supervised artificial neural network method [8].
b. Enter initialization mode (wait for a face(s) to         The concept of distance measure is used to
appear for a predefined amount of time, to avoid            associate the input and output pattern values, eq
paying attention to people who just happen to               (1). Radial Basis Functions is capable of
pass by)                                                    producing approximations to an unknown
b. Enter tracking mode, choose the closest face.            function ‘f’ from a set of input data abscissa. The
c. Track the face until it leaves the frame. To             approximation is produced by passing an input
avoid losing track of the face due to minor head            point through a set of basis functions, each of
movements, leave tracking mode only when the                which contains one of the RBF centers,
tracked face disappears for a predefined amount             multiplying the result of each function by a
of time.                                                    coefficient and then summing them linearly. For
d. Go to a.                                                 each function ‘t’, the approximation to this
                                                            function is essentially stored in the coefficients
          This paper proposes a region-based                and centers of the RBF. These parameters are in
method for motion estimation undergoing object              no way unique, since for each function ‘t’ is
tracking. Tracking is performed by means of                 approximated, many combinations of parameter
motion segmentation. The proposed method fully              values exist. RBFs have the following
                                                            mathematical representation:
utilizes information of temporal motion and
spatial luminance. Computation of dominant                            N −1
motion of the tracked object is done by a robust             F(x) = ∑ c i Φ(|| x − R i ||)                       (1)
iterative weights least square (IWLS) method.                          i =0
Static segmentation is incorporated to modify
this prediction, where the warping error of each
watershed segment and its rate of overlapping                   where
with warped template are utilized, to help
                                                              R is a vector containing the centers of the
classification of some possible watershed
                                                            RBF, and
segments near the object border.
          The following procedure is used to                    φ is the basis function or activation function
implement RBF for facial tracking.                          of the network. The implementation of RBF for
• Read frame1                                               facial tracking is as follows:
• Take a portion of frame 1(eye/nose/lip etc.)
• Apply watershed segmentation
• Find mean of the segmented image                          Step 1: Apply Radial Basis Function.
• Train /test using RBF                                         No. of Input = width of the facial parameter
• During testing , get the output of RBF                    in number of pixels
• Accordingly display image in system 2                        No. of Patterns = No. of frames under
                                                            implementation
     II MATERIALS AND METHODS                                   No. of centres = No. of patterns
     A. Materials                                                       Calculate RBF as
          The concept of ANN, with supervised
algorithm     for    computing      the     affine                            RBF = exp (-X)
transformation-taking place in the current frame,                      Calculate
with respect to previous frame. This is achieved,
when there is a significant change in the output                          G = RBF
of the neural network, which indicates the                                A = GT * G
change in position of the object in the current
                                                                      Calculate
frame. To detect the change in position of the
object, the network has to be trained in advance                          B = A-1
under supervised mode.
                                                                      Calculate
                                                                          E = B * GT




                                                      135                                http://sites.google.com/site/ijcsis/
                                                                                         ISSN 1947-5500
                                                   (IJCSIS) International Journal of Computer Science and Information Security,
                                                   Vol. 9, No. 3, March 2011




   Step 2: Calculate the Final Weight.                            uses final weights obtained during training for
           F=E*D                                                  updating the frames in the receiving system.

   Step 3: Store the Final Weights in a File.                              Here, fx , fy and ft are the partial
          The final updated weights are saved for                 derivatives of brightness function with respect to
testing the video transfer.                                       x, y and t, the    function is chosen as the Nesi
                                                                  and Magnolfi [8]; and σ is the scale parameter.
  III SCHEMATIC DIAGRAM OF FACIAL                                 To solve the problem, there are two different
             TRACKING                                             ways to find robustly the motion parameters: one
                                                                  is gradient-based, like the SOR method in [3];
                                                                  another is least squares-based, such as (IWLS)
                                                                  method. The algorithm begins, by constructing
                                                                  the Gaussian pyramid (three levels are set up).
                                                                  When the estimated parameters are interpolated
                                                                  into the next level, they are used; to warp
                                                                  (realized by bilinear interpolation) the last frame
                                                                  to the current frame. In the current level, only the
                                                                  changes are estimated in the iterative update
                                                                  scheme
                    (a) Training                                           In static segmentation, the watershed
                                                                  algorithm of mathematical morphology is a
                                                                  powerful method [4]. Early watershed algorithms
                                                                  are developed, to process digital elevation
                                                                  models, and are based on local neighborhood
                                                                  operations on square grids. Some approaches use
                                                                  “immersion simulations“to identify watershed
                                                                  segments, by flooding the image, with water
                                                                  starting at intensity minima. Improved gradient
                                                                  methods are devised, to overcome plateaus and
                                                                  square pixel grids [10]. The former method is
                                                                  used. A severe drawback, to the computation of
                                                                  watershed algorithm, is over-segmentation.
                                                                  Normally watershed merging is performed, along
                                                                  with the watershed generation. Over-
                                                                  segmentation is welcome; so, during tracking,
                                                                  the merging process is omitted, which saves
                                                                  some computational costs. Figure 2 shows
                                                                  procedure for watershed segmentation.

                                                                     IV TEMPLATE WARPING AND REGION
                                                                                ANALYSIS

                                                                           Once the motion parameters have been
                     (b)Testing                                   computed, warp the object template from the last
                                                                  frame to the current frame. Then the warped
         Figure. 1 Procedure for facial tracking                  template is used to determine, which watershed
                                                                  segments enter the template according, to the
          Figure 1a shows the training procedure                  following measure: Given that the number of
for the RBF. Frames are extracted from the                        pixels belonging to the warped template in the
video. Watershed segmentation is applied on the                   number of all pixels in Ri is Ci, a ratio ri is
local and global objects in the face. The mean                    computed, as given in eq(2):
values of the segmented image are used for
training the RBF. The final weights are stored in                             ri =CPi / Ci                             (2)
a file. Figure 1b shows the procedure for
updating the frames in the receiving system. It



                                                            136                               http://sites.google.com/site/ijcsis/
                                                                                              ISSN 1947-5500
                                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                     Vol. 9, No. 3, March 2011




          Based     on     this   measure,    the                            Human face motion is complex with
classification problem of each sub-region is                        rigid and non-rigid movements; hence the idea in
discussed in the following cases:                                   [3] adopted, using a modified affine model, to
1) When ri > r0, then, classify ri as part of the                   describe the local motion of facial features
final object template;                                              (mouth, eyes and eyebrows) and a planar
2) When r0 ≥ ri ≥ r1 (here r1 = 0.4), another                       projective transform to model the head motion.
measure as MAE (Mean Absolute Error) of                             The IWLS method is used to estimate these
difference between the warped frame and the                         motion parameters.
current frame is taken into account, eq (3)

                                                                                                         V. EXPERIMENT RESULTS

                                                                             The project is implemented using
                                                                    Matlab 7. The time taken for processing each
                                                                    frame is at an average of 1.4 seconds. This
                                                                    includes segmentation and processing with
                                                                    neural network. The topology of the RBF is
                                                                    width of the FAP x number of patternsx1. The
                                                                    total number of frames in the video is 91. The
                                                                    peak signal to noise ratio (PSNR) for all the
                                                                    reconstructed face in the received system is
                                                                    shown in Figure 3. For the experiment, only 8
                                                                    frames (Figure 4) are considered, which show
                                                                    significant changes in the lip movements.
                                                                                                  39


                                                                                                  38


                                                                                                  37
                                                                     peak signal to noise ratio




                                                                                                  36


                                                                                                  35
       Figure 2 Flow chart for watershed algorithm
                                                                                                  34




M i = ∑ | f ( x, t + 1) − f w ( x, t ) | / Ci
                                                                                                  33

                                                      (3)                                         32

where fw (x , t ) the warped image of f(x ,t),                                                    31
using the estimated dominant motion parameters;                                                     0   10       20    30       40       50
                                                                                                                       Frame numbers updated
                                                                                                                                               60    70     80


If the warped error Mi of Ri is smaller enough                                                               Figure.3 Peak Signal to noice ratio
(less than a of f (x , t ) using given threshold, for
instance, 10), Ri is still regarded as part of the                                                                VI. CONCLUSION
updated template; otherwise, exclude Ri out of
the object region.                                                            In this paper, an RBF based approach is
3) When ri < r1, ri will not be included in the                     proposed for motion estimation, undergoing
updated template.                                                   facial tracking. The lip movements are mainly
                                                                    focused in this work. The template warping, by
         When people make facial expression                         watershed segmentation and ANN for quick
movements, especially behaving emotionally,                         decision of frame updation, is implemented.
(mainly, six universal facial expressions are to be                 Applications of this method in facial expression
discussed, i.e. disgust, sadness, happiness, fear,                  tracking can be expressed for other parts of face.
anger and surprise), in most of cases head motion
is accompanied. The procedure is divided into                                        REFERENCES
two steps: 1) Head tracking is realized first, then                 [1] Shi J. and Tomasi C, “Good features to
the estimated motion is used to stabilize the face                  track”. In Proc. Computer Vision and Pattern
region; 2) The local motion of each facial feature                  Recognition, 1994.
is estimated relative to the stabilized face.                       [2] Parry et. al, “Region Template Correlation
                                                                    for FLIR Target Tracking”, British Machine
                                                                    Vision Conference’96.



                                                              137                                                              http://sites.google.com/site/ijcsis/
                                                                                                                               ISSN 1947-5500
                                           (IJCSIS) International Journal of Computer Science and Information Security,
                                           Vol. 9, No. 3, March 2011




[3] Yan Tong, YangWang, Zhiwei Zhu, Qiang Ji,
Robust facial feature tracking under varying face
pose and facial expression, Pattern Recognition
40 (2007) 3195 – 3208.
[4] Gleicher M, “Projective registration with
difference decomposition”, IEEE CVPR’97,
pp331-337, 1997.
[5] Sclaroff S. and Isidoro J., “Active blobs”,
ICCV’98.                                                  Frame no:5                        Frame no:16
[6] Nguyen, Worring M., “Multi-feature object
tracking using a model-free approach”, IEEE
CVPR, pp 145 –150, 2000.
[7]    Anton Podolsky and Valery
Frolov,             Face            tracking,
www.cs.bgu.ac.il/~orlovm/
teaching/saya/.../saya-tracking-report.pdf                Frame no:24                       Frame no:37
[8] P. Nesi and R. Magnolfi, Tracking and
Synthesizing Facial Motions with Dynamic
Contours Real-Time Imaging 2, 67–79 (1996)
[9] Vincent L, Soille, “Watersheds in digital
spaces: an efficient algorithm based on
immersion simulations”, IEEE T-PAMI, 13(6):
583-589, 1991.
[10] Gauch J, „Image segmentation and analysis             Frame no:46                      Frame no:57
via multi-scale gradient watershed hierarchies“,
IEEE T-IP, 8(1): 69-79, 1999.




                                                          Frame no:70                       Frame no:91

                                                             Figure 4 Experimental results relating to lip movements




                                                    138                                 http://sites.google.com/site/ijcsis/
                                                                                        ISSN 1947-5500