Docstoc

slides

Document Sample
slides Powered By Docstoc
					Motion Segmentation from
 Clustering of Sparse Point
  Features Using Spatially
Constrained Mixture Models
         Shrinivas Pundlik
           Committee members

       Dr. Stan Birchfield (chair)
           Dr. Adam Hoover
            Dr. Ian Walker
         Dr. Damon Woodard
Motion Segmentation
Gestalt insight: grouping forms the basis of human perception

 Gestalt Laws: Factors that affect the grouping process (cues)




    similarity       proximity     common motion         continuity
                                    (common fate)



 Motion segmentation: segmenting images based on common motion


                 points moving together are grouped together

  Typically, motion segmentation uses common motion + proximity
Applications of Motion Segmentation
   object detection
       pedestrian detection
   tracking
                                   Pedestrian detection
       vehicle tracking            Viola et al., 2003

   robotics
   surveillance
   image and video compression
   scene reconstruction               Vehicle tracking
                                     Kanhere et al., 2005
   video manipulation / editing
       video matting
       video annotation
       motion magnification
                                        Video editing
                                     Criminisi et al., 2006
Previous Work
   Approach                     Algorithm                    Nature of Data

 Motion Layer Estimation      Expectation Maximization          Sparse Features
   Wang and Adelson 1994         Jojic and Frey 2001           Sivic et al. 2004
   Ayer and Sawhney 1995         Smith et al. 2004             Kanhere et al. 2005
   Jojic and Frey 2001           Kokkinos and Maragos 2004     Rothganger et al. 2004
   Willis et al. 2003
   Xiao and Shah 2005
                                     Graph Cuts                   Dense Motion
                                 Willis et al. 2003            Cremers and Soatto 2005
 Multi Body Factorization
                                 Xiao and Shah 2005            Brox et al. 2005
   Costeria and Kanade 1995      Criminisi et al. 2006
   Ke and Kanade 2002
   Vidal and Sastry 2003
   Yan and Pollefeys 2006
                                  Normalized Cuts             Motion + Image Cues
   Gruber and Weiss 2006         Shi and Malik 1998            Xiao and Shah 2005
                                                               Kumar et al. 2005
  Object Level Grouping         Belief Propagation             Criminisi et al. 2006

   Sivic et al. 2004             Kumar et al. 2005
   Kanhere et al. 2005

                                Variational Methods
      Miscellaneous
                                 Cremers and Soatto 2005
   Black and Fleet 1998          Brox et al. 2005
   Birchfield 1999
   Levine and Weiss 2006
Challenges: Short Term
                                                          2. wall

                                                              1. statue
                                                                                   3. trees



                                                                                5. biker

                                                                     4. grass
                                                    6. pedestrian




   computation of motion in the scene           number of objects / regions
      • influence of the neighboring motion       in the scene

                     +            +

                         +        +
                +
                              +


       initialization of motion parameters      description of complex
                                                  motions (articulated human
                                                  motion)
Challenges: Long Term
                       x         fast   medium
                                                               x       fast    medium       slow   crawling


           threshold                              slow




                                                         threshold
                                                    t                         time window          t
                                  time window
                               batch processing                      incremental processing
                              batch processing vs. incremental processing
                                 • updating the reference frame




   maintain existing groups                                        adding new groups (new objects)
     • growing existing regions                                      (deleting invisible groups)
     • splitting
Objectives
                     Feature Tracking                     Motion Segmentation
                                                                   long-term
                            motion           clustering
                                                                  maintenance
                          computation       (two-frame)
                                                                    of groups
• motion segmentation
  using sparse point features
                                            observed data
                                                                     Mixture
• automatically determine                                            Model
  the number of groups               group               parameter Framework
                                   assignment            estimation
• handling dynamic sequences
                                          motion models
• real time performance                 translation    affine

• handling complex motions                               complex models

                                                      Articulated Human
                                                        Motion Models
    Overview of the Topics
   Feature Tracking: Tracking sparse point features
    for computation of image motion and its extension to
    joint feature tracking.
       S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and
        Edges”, CVPR, 2008.

   Motion Segmentation: Clustering point features in
    videos based on their motion and spatial connectivity.
       S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any
        Speed”, BMVC 2006.
       S. J. Pundlik and S. T. Birchfield, “Real Time Motion
        Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans.
        on Systems, Man, and Cybernetics, 2008.


   Articulated Human Motion Models: Learning
    human walking motion from various pose and view
    angles for segmentation and pose estimation (a
    special handling of a complex motion model)

   Iris Segmentation: Texture and intensity based
    segmentation of non-ideal iris images .
       S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris
        Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics,
        2008.
Point Features                               gradients
            input                                                        point features




                                                                        capturing the
                                                                     information content



Popular features:
 Harris corner feature [Harris & Stephens 1987, Schmid et al. 2000]
 Shi-Tomasi feature [Shi & Tomasi 1994]

 Forstner corner feature [Forstner 1994]

 Scale invariant feature transform (SIFT) [Lowe 2000]

 Gradient Location and Orientation Histogram (GLOH) [Mikolajczyk and Schmid 2005]

 Features from accelerated segment test (FAST) [Rosten and Drummond 2005]

 Speeded up robust features (SURF) [Bay et al. 2006]

 DAISY [Tola et al. 2008]
Utility of Point Features
Advantages:
 highly repeatable and extensible (work for a variety of images)

 efficient to compute (real time implementations available)

 local methods for processing (tracking through multiple frames)




     tracking multiple point features = sparse optical flow

       sparse point feature tracks yield the image motion
Tracking Point Features : Lucas-Kanade
 Assume constant brightness:
                image     pixel displacement
                                                                       (optic flow
                                                                   constraint equation)
               image spatial derivatives   image temporal derivative


 Estimate the pixel displacement u = ( u, v )T by minimizing:


                          convolution kernel
 Differentiating with respect to u and v, setting the derivatives to
 zero leads to a linear system:




                                                    Gradient covariance matrix
 Iterate using Newton-Raphson method
 Detection of Point Features
Gradient covariance matrix:                                         Good feature:
                                                                                                         2                    1

                                                                                                                     3
Z=                                                                                   >

                                                    eigenvalues of Z               threshold
image gradients             convolution kernel



        1                                              2                                          3




                                                                                                  intensity
        intensity




                                                       intensity




                     y                      x                         y                    x                     y                    x

                    no feature                                     edge feature                       good feature
                           low                               unidirectional                                   bidirectional
                      intensity                                   intensity                                       intensity
                      variation                                   variation                                       variation
                         emax = 5.15, emin = 3.13                    emax = 1026.9, emin = 29.9               emax = 1672.44, emin = 932.4

                         two small eigenvalues               a small and a large eigenvalue                       two large eigenvalues
Dense Optical Flow: Horn-Schunck
Horn-Schunck: find global displacement functions u(x,y) and v(x,y)
by minimizing:          regularization parameter




                               data term               smoothness term
                        (optical flow constraint)

    Solve using Euler-Lagrange:                 Laplacian




     Approximation                         leads to a sparse system:




                                                    a constant
   average displacement in the neighborhood
Need for a Joint Approach
       Lucas-Kanade (1981)               Horn-Schunck (1981)
 local method (local smoothing)    global method (global smoothing)


   pixel displacement: constant      pixel displacement: a smooth
    within a small neighborhood        function over the image domain

   robust under noise                sensitive to noise

   produces sparse optical flow      produces dense optical flow




      use global smoothing to             use local smoothing to
      improve feature tracking          improve dense optical flow

      Joint Feature Tracking           Combined Local-Global approach
                                       (Bruhn et al., 2004)
Joint Lucas-Kanade (JLK)
   Joint Lucas-Kanade energy functional:
                                             number of feature points


                                                                         expected values



       data term (optical flow constraint)               smoothness term (regularization)


Differentiating EJLK w.r.t. (u,v) gives a 2N x 2N system whose
(2i-1) and (2i)th rows are given by:




           Sparse system is solved using Jacobi iterations
  Results of JLK

repetitive
 texture




   low
 texture
    Overview of the Topics
   Feature Tracking: Tracking sparse point features
    for computation of image motion and its extension to
    joint feature tracking.
       S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and
        Edges”, CVPR, 2008.

   Motion Segmentation: Clustering point features in
    videos based on their motion and spatial connectivity.
       S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any
        Speed”, BMVC 2006.
       S. J. Pundlik and S. T. Birchfield, “Real Time Motion
        Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans.
        on Systems, Man, and Cybernetics, 2008.


   Articulated Human Motion Models: Learning
    human walking motion from various pose and view
    angles for segmentation and pose estimation (a
    special handling of a complex motion model)

   Iris Segmentation: Texture and intensity based
    segmentation of non-ideal iris images .
       S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris
        Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics,
        2008.
  Mixture Models Basics
           sample
                           P(Red|sample)         P(sample|Red) P(Red)

                              Posterior        likelihood of the   prior probability
                            Probability of    sample being Red      of the Red bin
                           drawing a Red       (measurement)
                               sample

                                                 how Red is the      how big is
                                                 drawn sample?      the Red bin?
 3 bins (components)


  probability of drawing a sample from a mixture of three bins:

P(sample) = P(sample|Red)P(Red) + P(sample|Green) P(Green) + P(sample|Blue)P(Blue)




            Mixture Model: likelihoods and priors for all the components
         challenge: only available information is the drawn sample!
Mixture Model Example: GMM
                                                                        θ4= {μ4, σ4}

                                          θ1= {μ1, σ1}

                                               θ2= {μ2, σ2}
                                                         θ3= {μ3, σ3}




                                                  grayscale values



Gaussian density for the jth component:




ith pixel conditioned on parameters
      of the jth Gaussian density


   Parameters of a Gaussian density, θ : mean (μ) and variance (σ2)
 Learning Mixture Models
  Mixture model defined as:
        number of components (known) observed data point (known)

                                                      density parameters (unknown)




                   mixing weights (unknown)   component density


Learning mixture models (parameter estimation): Estimate mixing weights and
                           component density parameters

  Circular nature of the problem:
                            parameter estimation



                      class association (segmentation)
Expectation Maximization
EM: an iterative two step algorithm for parameter estimation


      E Step:                                             M Step:
      Find expectation of the likelihood function         Maximize the likelihood function
      (Segmentation / label assignment)                   (parameter estimation based on segmentation)


                                              convergence:
                             when the likelihood cannot be further maximized
                                (when estimates do not change between
                                          successive iterations )
1.   Initialize:
       a. number of components K
       b. component density parameters θ for all components
       c. mixing weights π
       d. convergence criterion
2.   repeat until convergence
      E STEP
       a. for all N data points
               i. compute likelihood from the component density
              ii. estimate weights, w
       M STEP
       b. estimate mixing weights
       c. estimate component density parameters
 Various Mixture Models
                         data term                                             smoothness term
                  (how closely the data                                       (spatial interaction of
                    follow the models)                                         the data elements)


       one prior for              prior distribution               neighbors mostly
                                                                                                  enforce spatial
     each component            for each data element              have similar labels
                                                                                               connectivity of labels
     (mixing weights)           (label probabilities)              (loose constraint)



      Finite Mixture             Spatially Variant                 Spatially Variant                 Spatially
           Model                  Finite Mixture                    Finite Mixture                  Constrained
           FMM                     Model (ML)                       Model (MAP)                Finite Mixture Model
                                 ML-SVFMM [1]                      MAP-SVFMM [1]                      SCFMM


                                       EM algorithm                                                Greedy EM
                                                                                                    algorithm

1.     S. Sanjay-Gopal and T. Hebert, “Bayesian Pixel Classification Using Spatially Variant Finite Mixtures and Generalized EM
       Algorithm”, IEEE Tran. on Image Processing, 1998.
Greedy-EM (Iterative Region Growing)
                                        consider a 4-connected grid
                                       start location 3       start location 2
Properties of Greedy EM:

 enforces spatial connectivity
  of labels (SCFMM)
 automatically determines
  the number of groups
 local initialization of parameters
 primary user defined parameters:
     • inclusion criterion
     • minimum number of elements
          in a group

                                           start location 1
Grouping Point Features
Between two frames,
 Repeat
                                          centroid
                                        centroid
                                        centroid
                                             centroid
                                            centroid
       Randomly select seed
        feature
       Fit motion model to
        neighbors
       Repeat until group does
        not change:                                 original
                                                    original
                                                     seed
                                                      seed
           Discard all features
            except the one near the
            centroid
           Grow group by recursively
            including neighboring
            features with similar       grouping features from a single seed point
            motion
           Update the motion model
   Until all features have
    been considered
Grouping Consistent Features
   input: point features
                                                            seed 2
    tracked between two           seed 1



    frames
   output: groups of
    point features                              seed 3



   for N seed points
       group point features
       gather sets of features
        always grouped
        together
                                           consistent feature group
Grouping Consistent Features
Consistency check: Features that are always grouped together,
                                 no matter the seed point
         seed point
                                                             seed point
             a                                       a                                    a           b
                             b                                     b



     c                                       c                                    c
                         d                                        d                                 d
         a       b   c   d                       a       b    c   d                   a       b   c d

 a       1       1       1               a       1            1               a       2
 b               1       1          +    b               1        1       =   b               2       2
 c                   1                   c                    1               c                   2
 d                       1               d                        1           d                       2

                                  In practice, we use 7 seed points
Consistent Features: Multiple Groups




        Feature groups obtained for various iterations




                 consistent feature groups
    Maintaining Groups Over Time

    frame k                    frame k + n                      either features
                                                                                                     3
                               lost features
                                                                 are regrouped               7
                                                                                                             5
        2                              2                                                     9
                               1                                                                     8
1                     track
            3                              3            if   Х2 test   fails
                4                  7               4
    7               features
                                       6           5
            6   5
                                   9
                                   6           8
                                               6
                                                       find consistent
                                                                                                     3
                                                           groups                            7
                                                                                                 6
                                newly added                                    or multiple                   5
                                  features                                       groups          9       8
                                                                                are found
Experimental Results




        mobile-calendar            freethrow




   statue                 robots               car-map
Videos




  mobile-calendar sequence   statue sequence
Results Over Time
    Algorithm dynamically determines the number of feature groups




     freethrow              mobile-calendar              statue




       car-map                   robots                  vehicles
Comparison with Other Approaches
             Algorithm               Run Time       Max.
                                    (sec/frame)   number of
                                                   groups
    Xiao and Shah (PAMI, 2005)         520           4
     Kumar et al. (ICCV, 2005)         500           6
      Smith et al. (PAMI, 2004)        180           3
  Rothganger et al. (CVPR, 2004)        30           3
    Jojic and Frey (CVPR, 2001)         1            3
  Cremers and Soatto (IJCV, 2005)       40           4
    Our algorithm (TSMC, 2008)         0.16          8
Effect of Joint Feature Tracking

   input




  standard
Lucas-Kanade




    Joint
Lucas-Kanade
    Overview of the Topics
   Feature Tracking: Tracking sparse point features
    for computation of image motion and its extension to
    joint feature tracking.
       S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and
        Edges”, CVPR, 2008.

   Motion Segmentation: Clustering point features in
    videos based on their motion and spatial connectivity.
       S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any
        Speed”, BMVC 2006.
       S. J. Pundlik and S. T. Birchfield, “Real Time Motion
        Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans.
        on Systems, Man, and Cybernetics, 2008.


   Articulated Human Motion Models: Learning
    human walking motion from various pose and view
    angles for segmentation and pose estimation (a
    special handling of a complex motion model)

   Iris Segmentation: Texture and intensity based
    segmentation of non-ideal iris images .
       S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris
        Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics,
        2008.
Articulated Motion Models
Purpose of human motion analysis:
 pedestrian detection/surveillance
 action recognition
 pose estimation


Traditional approaches use:
 appearance

 frame differencing


 Theme: Sparse Motion alone captures a
 wealth of information

Objectives:
 learn articulated human motion models

 motion only, no appearance
 viewpoint and scale invariant detection

 varying lighting conditions (day and night time sequences)

 detection in presence of camera and background motion

 pose estimation
  Use of Motion Capture Data
                                                 Top-Down Approach




                        train high-level descriptors (appearance or motion based) that describe
                        articulated motion at a global level for detection


                                                  Bottom-Up Approach



                      hand
  motion capture                   center
(mocap) data in 3D   foot 2        foot 1




                                                 displacement of the limbs w.r.t. the body center
                      learn the motion of individual joints from the training data and aggregate
                      the information to detect human motion
Approach Overview
Training




 3D motion capture points               angular viewpoints




                            walking poses
 Motion Descriptor




Gaussian weight maps for the various means and         spatial arrangement of the descriptor bins w.r.t. the
orientations that constitute the motion descriptor                       body center




  views




  poses


  bin values of the motion descriptor describing human     confusion matrix for 64 training descriptors
subjects from various viewpoints and pose configurations
 Segmentation Results



    right profile       left profile       angular               front
View-invariant segmentation of articulated motion using a motion descriptor




Segmentation of articulated motion in a challenging sequence involving
camera and background motion
Pose Estimation Results




          right-profile view    angular view




           front view          nighttime sequence
Videos of Detection and Pose Estimation
    Overview of the Topics
   Feature Tracking: Tracking sparse point features
    for computation of image motion and its extension to
    joint feature tracking.
       S. T. Birchfield and S. J. Pundlik, “Joint Tracking of Features and
        Edges”, CVPR, 2008.

   Motion Segmentation: Clustering point features in
    videos based on their motion and spatial connectivity.
       S. J. Pundlik and S. T. Birchfield, “Motion Segmentation at Any
        Speed”, BMVC 2006.
       S. J. Pundlik and S. T. Birchfield, “Real Time Motion
        Segmentation of Sparse Feature Points at Any Speed”, IEEE Trans.
        on Systems, Man, and Cybernetics, 2008.


   Articulated Human Motion Models: Learning
    human walking motion from various pose and view
    angles for segmentation and pose estimation (a
    special handling of a complex motion model)

   Iris Segmentation: Texture and intensity based
    segmentation of non-ideal iris images .
       S. J. Pundlik, D. L. Woodard and S. T. Birchfield, “Non Ideal Iris
        Segmentation Using Graph Cuts”, CVPR Workshop on Biometrics,
        2008.
 Iris Image Segmentation
    non-ideal iris image segmentation using texture and intensity

                                           higher gradient
                                              magnitude
                                                                        eye
background   pupil   iris
                                          lower gradient     textured          un-textured
                                            magnitude        regions           regions

                                gradient magnitude
                                                             eyelash          non-eyelash

                                         higher density
     input image                        of point features

                                        lower density           iris     pupil     background
                                       of point features

                                  point features                        (Four Regions)
                            Coarse Texture Computation
Ideas:
• local intensity variations (computed from gradient magnitude and point features) can
  be used for texture representation that segments eyelash and non-eyelash regions
• possible segments based on image intensity: iris, pupil and background
Iris Segmentation and Recognition
    Iris segmentation:


                                                      -
     Input Iris Image    Specular Reflections                      Iris Mask




    Preprocessed input    Iris Segmentation     Iris Refinement   Iris Ellipse



Iris recognition:
 unwrap and normalize the iris mask
 generate iris signature from iris mask (using texture in the iris)

 compare iris signature using Hamming distance
Image Segmentation Results




     Input Image      Segmentation          Iris Mask
                   pupil       iris
                   eyelashes   background
Iris Recognition
       Iris recognition using our segmentation algorithm




 West Virginia Non-Ideal Database     West Virginia Off-Axis Database
            1868 images                         584 images
   467 classes, 4 images/class         146 classes, 4 images/class
Conclusions and Future Work
   Motion segmentation based on sparse feature clustering
      spatially constrained mixture model and greedy EM algorithm
      automatically determines number of groups
      real-time performance
      ability to handle long, dynamic sequences and arbitrary number of feature groups

   Joint feature tracking
       incorporation of neighboring feature motion
       improved performance in areas of low-texture or repetitive texture

   Detection of articulated motion
      motion based approach for learning high-level human motion models
      segment and track human motion in varying pose, scale, and lighting conditions
      view invariant pose estimation

   Iris segmentation
        graph cuts based dense segmentation using texture and intensity
        combines appearance and eye geometry
        handles non-ideal iris image with occlusion, illumination changes, and eye rotation

   Future Work
      integration of motion segmentation, joint feature tracking, and articulated motion segmentation
      dense segmentation from the sparse feature groups
      handling non-rigid motions, non-textured regions, and occlusions
      combining sparse feature groups, discontinuities, and image contours for a novel
       representation of video
Questions?

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:9
posted:7/31/2011
language:English
pages:49