Computer Vision – Lecture 18

Document Sample
Computer Vision – Lecture 18 Powered By Docstoc
					                                             Computer Vision – Lecture 18
Perceptual and Sensory Augmented Computing




                                                                Motion and Optical Flow
                                                                                22.01.2009
Computer Vision WS 08/09




                                             Bastian Leibe
                                             RWTH Aachen
                                             http://www.umic.rwth-aachen.de/multimedia

                                             leibe@umic.rwth-aachen.de
                                                Many slides adapted from K. Grauman, S. Seitz, R. Szeliski, M. Pollefeys, S. Lazebnik
                                             Course Outline
                                             •   Image Processing Basics
                                             •   Segmentation & Grouping
Perceptual and Sensory Augmented Computing




                                             •   Object Recognition
                                             •   Local Features & Matching
                                             •   Object Categorization
                                             •   3D Reconstruction
                                             •
Computer Vision WS 08/09




                                                 Motion and Tracking
                                                    Motion and Optical Flow
                                                    Tracking with Linear Dynamic Models
                                                    Articulated Tracking
                                             • Repetition

                                                                                           2
                                              Recap: Structure from Motion
                                                                                     Xj
Perceptual and Sensory Augmented Computing




                                                                         x1j
                                                                                                       x3j
                                                                                    x2j
                                                                       P1
                                                                                                        P3
Computer Vision WS 08/09




                                                                          P2
                                              • Given: m images of n fixed 3D points
                                                                    xij = Pi Xj ,   i = 1, … , m, j = 1, … , n
                                              • Problem: estimate m projection matrices Pi and
                                                   n 3D points Xj from the mn correspondences xij
                                                                                                                 3
                                                                                      B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Recap: Structure from Motion Ambiguity
                                              • If we scale the entire scene by some factor k and, at the
                                                   same time, scale the camera matrices by the factor of
                                                   1/k, the projections of the scene points in the image
Perceptual and Sensory Augmented Computing




                                                   remain exactly the same.
                                              • More generally: if we transform the scene using a
                                                transformation Q and apply the inverse transformation
                                                   to the camera matrices, then the images do not change
Computer Vision WS 08/09




                                                                                  
                                                                     x  PX  PQ QX 
                                                                               
                                                                    x  PX  PQ QX      -1
                                                                                            -1




                                                                                                           4
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Recap: Hierarchy of 3D Transformations
                                                      Projective          A    t              Preserves intersection
                                                      15dof               vT   v
                                                                                                and tangency
                                                                                
Perceptual and Sensory Augmented Computing




                                                      Affine               A t                Preserves parallellism,
                                                                          0T 1                volume ratios
                                                      12dof                   

                                                      Similarity          s R t               Preserves angles, ratios
                                                      7dof                 0T 1               of length
                                                                                
Computer Vision WS 08/09




                                                      Euclidean            R t                Preserves angles,
                                                      6dof                0T 1                lengths
                                                                              
                                              • With no constraints on the camera calibration matrix or on the
                                                scene, we get a projective reconstruction.
                                              • Need additional information to upgrade the reconstruction to
                                                affine, similarity, or Euclidean.                                          5
                                                                                     B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Recap: Affine Structure from Motion
                                              • Let‟s create a 2m × n data (measurement) matrix:
                                                  ˆ    ˆ
                                                 x11 x12                        ˆ
                                                                                x1n   A1 
Perceptual and Sensory Augmented Computing




                                                xˆ 21 x 22
                                                       ˆ                        x 2n   A 2 
                                                                                 ˆ
                                              D                                        X X  X 
                                                                                          1 2        n

                                                                                        Points (3 × n)
                                                  ˆ    ˆ
                                                x m1 x m 2                      ˆ
                                                                                x mn  A m 
Computer Vision WS 08/09




                                                                                               Cameras
                                                                                               (2m × 3)

                                              • The measurement matrix D = MS must have rank 3!

                                                 C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:
                                                 A factorization method. IJCV, 9(2):137-154, November 1992.
                                                                                                                                   6
                                                                                    B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Recap: Affine Factorization
                                              • Obtaining a factorization from SVD:
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                            This decomposition minimizes
                                                                                      |D-MS|2              7
                                             Slide credit: Martial Hebert
                                              Recap: Projective Factorization
                                                  z11x11              z12 x12      z1n x1n   P1 
                                                 z x                 z22 x 22      z2 n x 2 n   P2 
                                             D   21 21                                           X X  X 
Perceptual and Sensory Augmented Computing




                                                                                                  1Points (4 × n) n
                                                                                                              2

                                                                                                 
                                                  zm1x m1            zm 2 x m 2    zmn x mn  Pm 
                                                                                                   Cameras
                                                                                                   (3m × 4)
Computer Vision WS 08/09




                                                                                   D = MS has rank 4
                                              • If we knew the depths z, we could factorize D to
                                                estimate M and S.
                                              • If we knew M and S, we could solve for z.
                                              • Solution: iterative approach (alternate between above
                                                two steps).
                                                                                                                        8
                                                                                        B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Recap: Sequential Projective SfM
                                              • Initialize motion from two images
                                                using fundamental matrix                               Points
                                              • Initialize structure
Perceptual and Sensory Augmented Computing




                                              • For each additional view:




                                                                                             Cameras
                                                       Determine projection matrix
                                                        of new camera using all the
                                                        known 3D points that are
                                                        visible in its image –
Computer Vision WS 08/09




                                                        calibration
                                                       Refine and extend structure:
                                                        compute new 3D points,
                                                        re-optimize existing points
                                                        that are also seen by this camera –
                                                        triangulation
                                              • Refine structure and motion: bundle adjustment
                                                                                                                9
                                                                                  B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Recap: Bundle Adjustment
                                              • Non-linear method for refining structure and motion
                                              • Minimizing mean-square reprojection error
                                                                                        2

                                                                       E (P, X)   Dxij , Pi X j 
Perceptual and Sensory Augmented Computing




                                                                                      m        n


                                                                                      i 1 j 1
                                                                                          Xj
Computer Vision WS 08/09




                                                                         P1Xj
                                                                                x1j                      x3
                                                                                                     P3Xjj
                                                                        P1            P2Xj x2j
                                                                                                              P3
                                                                                               P2                  10
                                                                                          B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                             Topics of This Lecture
                                             • Introduction to Motion
                                                   Applications, uses

                                             • Motion Field
Perceptual and Sensory Augmented Computing




                                                   Derivation

                                             • Optical Flow
                                                   Brightness constancy constraint
                                                   Aperture problem
Computer Vision WS 08/09




                                                   Lucas-Kanade flow
                                                   Iterative refinement
                                                   Global parametric motion
                                                   Coarse-to-fine estimation
                                                   Motion segmentation

                                             • KLT Feature Tracking
                                                                                        11
                                                                             B. Leibe
                                              Video
                                              • A video is a sequence of frames captured over time
                                              • Now our image data is a function of space
Perceptual and Sensory Augmented Computing




                                                   (x, y) and time (t)
Computer Vision WS 08/09




                                                                                                     12
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Applications of Segmentation to Video
                                              • Background subtraction
                                                       A static camera is observing a scene.
                                                        Goal: separate the static background from the moving
Perceptual and Sensory Augmented Computing




                                                    

                                                        foreground.

                                                                                                           How to come up
                                                                                                           with background
                                                                                                           frame estimate
                                                                                                           without access to
Computer Vision WS 08/09




                                                                                                           “empty” scene?




                                                                                                                          13
                                                                                                B. Leibe
                                             Slide credit: Svetlana Lazebnik, Kristen Grauman
                                              Applications of Segmentation to Video
                                              • Background subtraction
                                              • Shot boundary detection
Perceptual and Sensory Augmented Computing




                                                       Commercial video is usually composed of shots or sequences
                                                        showing the same objects or scene.
                                                       Goal: segment video into shots for summarization and browsing
                                                        (each shot can be represented by a single keyframe in a user
                                                        interface).
                                                       Difference from background subtraction: the camera is not
Computer Vision WS 08/09




                                                        necessarily stationary.




                                                                                                                        14
                                                                                B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Applications of Segmentation to Video
                                              • Background subtraction
                                              • Shot boundary detection
Perceptual and Sensory Augmented Computing




                                                       For each frame, compute the distance between the current
                                                        frame and the previous one:
                                                          – Pixel-by-pixel differences
                                                          – Differences of color histograms
                                                          – Block comparison
                                                       If the distance is greater than some threshold, classify the frame
Computer Vision WS 08/09




                                                        as a shot boundary.




                                                                                                                        15
                                                                                     B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Applications of Segmentation to Video
                                              • Background subtraction
                                              • Shot boundary detection
Perceptual and Sensory Augmented Computing




                                              • Motion segmentation
                                                       Segment the video into multiple coherently moving objects
Computer Vision WS 08/09




                                                                                                                    16
                                                                                B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Motion and Perceptual Organization
                                              • Sometimes, motion is the only cue
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                                          17
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Motion and Perceptual Organization
                                              • Sometimes, motion is foremost cue
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                                        18
                                                                             B. Leibe
                                             Slide credit: Kristen Grauman
                                              Motion and Perceptual Organization
                                              • Even “impoverished” motion data can evoke a strong
                                                   percept
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                                                     19
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Motion and Perceptual Organization
                                              • Even “impoverished” motion data can evoke a strong
                                                   percept
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                                                     20
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Uses of Motion
                                              • Estimating 3D structure
                                                       Directly from optic flow
                                                        Indirectly to create correspondences for SfM
Perceptual and Sensory Augmented Computing




                                                    



                                              •   Segmenting objects based on motion cues
                                              •   Learning dynamical models
                                              •   Recognizing events and activities
                                              •   Improving video quality (motion stabilization)
Computer Vision WS 08/09




                                                                                                       21
                                                                                    B. Leibe
                                             Slide adapted from Svetlana Lazebnik
                                              Motion Estimation Techniques
                                              • Direct methods
                                                       Directly recover image motion at each pixel from spatio-
                                                        temporal image brightness variations
Perceptual and Sensory Augmented Computing




                                                       Dense motion fields, but sensitive to appearance variations
                                                       Suitable for video and when image motion is small

                                              • Feature-based methods
                                                       Extract visual features (corners, textured areas) and track them
Computer Vision WS 08/09




                                                        over multiple frames
                                                       Sparse motion fields, but more robust tracking
                                                       Suitable when image motion is large (10s of pixels)




                                                                                                                       22
                                                                                 B. Leibe
                                             Slide credit: Steve Seitz
                                             Topics of This Lecture
                                             • Introduction to Motion
                                                   Applications, uses

                                             • Motion Field
Perceptual and Sensory Augmented Computing




                                                   Derivation

                                             • Optical Flow
                                                   Brightness constancy constraint
                                                   Aperture problem
Computer Vision WS 08/09




                                                   Lucas-Kanade flow
                                                   Iterative refinement
                                                   Global parametric motion
                                                   Coarse-to-fine estimation
                                                   Motion segmentation

                                             • KLT Feature Tracking
                                                                                        23
                                                                             B. Leibe
                                              Motion Field
                                              • The motion field is the projection of the 3D scene
                                                   motion into the image
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                                                     24
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Motion Field and Parallax
                                                                                                            P(t+dt)
                                              • P(t) is a moving 3D point                 P(t)    V
                                              • Velocity of scene point:
Perceptual and Sensory Augmented Computing




                                                V = dP/dt
                                              • p(t) = (x(t),y(t)) is the
                                                   projection of P in the
                                                   image.
                                              • Apparent velocity v in the
Computer Vision WS 08/09




                                                   image: given by                                      v   p(t+dt)
                                                   components vx = dx/dt and                     p(t)
                                                   vy = dy/dt
                                              • These components are
                                                   known as the motion field
                                                   of the image.
                                                                                                                      25
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                                                                          Quotient rule:
                                              Motion Field and Parallax                   D(f/g) = (g f‟ – g‟ f)/g^2

                                                                         P                             P(t+dt)
                                              V  (Vx , V y , VZ ) p  f   P(t)             V
                                                                         Z
Perceptual and Sensory Augmented Computing




                                              To find image velocity v, differentiate
                                              p with respect to t (using quotient rule):
                                                               ZV  Vz P
                                                          v f
                                                                 Z2
Computer Vision WS 08/09




                                                                                                 v      p(t+dt)
                                                   f Vx  Vz x        f V y  Vz y   p(t)
                                              vx               vy 
                                                       Z                   Z
                                              • Image motion is a function of both the 3D motion ( V)
                                                 and the depth of the 3D point (Z).
                                                                                                                   26
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Motion Field and Parallax
                                              • Pure translation: V is constant everywhere
                                                        f Vx  Vz x                 1
                                                   vx                         v  ( v 0  Vz p),
Perceptual and Sensory Augmented Computing




                                                            Z                      Z
                                                   vy 
                                                            f V y  Vz y       v 0   f Vx , f V y 
                                                                  Z
Computer Vision WS 08/09




                                                                                                        27
                                                                                     B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Motion Field and Parallax
                                              • Pure translation: V is constant everywhere
                                                                                    1
                                                                               v  ( v 0  Vz p),
Perceptual and Sensory Augmented Computing




                                                                                   Z
                                                                               v 0   f Vx , f V y 
                                              • Vz is nonzero:
                                                       Every motion vector points toward (or away from) v0,
                                                        the vanishing point of the translation direction.
Computer Vision WS 08/09




                                                                                                               28
                                                                                     B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Motion Field and Parallax
                                              • Pure translation: V is constant everywhere
                                                                                    1
                                                                               v  ( v 0  Vz p),
Perceptual and Sensory Augmented Computing




                                                                                   Z
                                                                               v 0   f Vx , f V y 
                                              • Vz is nonzero:
                                                       Every motion vector points toward (or away from) v0,
                                                        the vanishing point of the translation direction.
Computer Vision WS 08/09




                                              • Vz is zero:
                                                       Motion is parallel to the image plane, all the motion vectors are
                                                        parallel.
                                              • The length of the motion vectors is inversely
                                                proportional to the depth Z.
                                                                                                                        29
                                                                                     B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                             Topics of This Lecture
                                             • Introduction to Motion
                                                   Applications, uses

                                             • Motion Field
Perceptual and Sensory Augmented Computing




                                                   Derivation

                                             • Optical Flow
                                                   Brightness constancy constraint
                                                   Aperture problem
Computer Vision WS 08/09




                                                   Lucas-Kanade flow
                                                   Iterative refinement
                                                   Global parametric motion
                                                   Coarse-to-fine estimation
                                                   Motion segmentation

                                             • KLT Feature Tracking
                                                                                        30
                                                                             B. Leibe
                                              Optical Flow
                                              • Definition: optical flow is the apparent motion of
                                                brightness patterns in the image.
                                              • Ideally, optical flow would be the same as the motion
Perceptual and Sensory Augmented Computing




                                                field.
                                              • Have to be careful: apparent motion can be caused by
                                                lighting changes without any actual motion.
                                                       Think of a uniform rotating sphere under fixed lighting vs. a
Computer Vision WS 08/09




                                                        stationary sphere under moving illumination.




                                                                                                                        31
                                                                                  B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Apparent Motion  Motion Field
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                                                          32
                                                                             B. Leibe   Figure from Horn book
                                             Slide credit: Kristen Grauman
                                              Estimating Optical Flow
Perceptual and Sensory Augmented Computing




                                                                       I(x,y,t–1)              I(x,y,t)
                                              • Given two subsequent frames, estimate the apparent
                                                   motion field u(x,y) and v(x,y) between them.
Computer Vision WS 08/09




                                              • Key assumptions
                                                       Brightness constancy: projection of the same point looks the
                                                        same in every frame.
                                                       Small motion: points do not move very far.
                                                       Spatial coherence: points move like their neighbors.

                                                                                                                       33
                                                                                    B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              The Brightness Constancy Constraint
Perceptual and Sensory Augmented Computing




                                                                          I(x,y,t–1)                  I(x,y,t)

                                              • Brightness Constancy Equation:
                                                       I ( x, y, t  1)  I ( x  u( x, y ), y  v( x, y ), t )
Computer Vision WS 08/09




                                              • Linearizing the right hand side using Taylor expansion:
                                                       I ( x, y, t  1)  I ( x, y, t )  I x  u ( x, y )  I y  v( x, y )

                                              • Hence, I x  u  I y  v  I t  0
                                                                                                                               34
                                                                                       B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              The Brightness Constancy Constraint
                                                                           I x  u  I y  v  It  0
                                              • How many equations and unknowns per pixel?
Perceptual and Sensory Augmented Computing




                                                        One equation, two unknowns

                                              • Intuitively, what does this constraint mean?
                                                                               I  (u, v)  I t  0
                                              • The component of the flow perpendicular to the
Computer Vision WS 08/09




                                                   gradient (i.e., parallel to the edge) is unknown
                                                                                                        gradient
                                                                                                               (u,v)
                                                        If (u,v) satisfies the equation,
                                                        so does (u+u’, v+v’) if I  (u' , v' )  0                (u’,v’)
                                                                                                            (u+u’,v+v’)
                                                                                                              edge           35
                                                                                         B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              The Aperture Problem
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                                          Perceived motion
                                                                                                             36
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              The Aperture Problem
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                                          Actual motion
                                                                                                          37
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              The Barber Pole Illusion
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                             http://en.wikipedia.org/wiki/Barberpole_illusion
                                                                                                                38
                                                                                   B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              The Barber Pole Illusion
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                             http://en.wikipedia.org/wiki/Barberpole_illusion
                                                                                                                39
                                                                                   B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              The Barber Pole Illusion
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                             http://en.wikipedia.org/wiki/Barberpole_illusion
                                                                                                                40
                                                                                   B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Solving the Aperture Problem
                                              • How to get more equations for a pixel?
                                              • Spatial coherence constraint: pretend the pixel‟s
Perceptual and Sensory Augmented Computing




                                                   neighbors have the same (u,v)
                                                       If we use a 5x5 window, that gives us 25 equations per pixel
Computer Vision WS 08/09




                                               B. Lucas and T. Kanade. An iterative image registration technique with an application to
                                               stereo vision. In Proceedings of the International Joint Conference on Artificial
                                               Intelligence, pp. 674–679, 1981.                                                       41
                                                                                      B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Solving the Aperture Problem
                                              • Least squares problem:
Perceptual and Sensory Augmented Computing




                                              • Minimum least squares solution given by solution of
Computer Vision WS 08/09




                                                    (The summations are over all pixels in the K x K window)
                                                                                                               42
                                                                                    B. Leibe
                                             Slide adapted from Svetlana Lazebnik
                                              Conditions for Solvability
                                              • Optimal (u, v) satisfies Lucas-Kanade equation
Perceptual and Sensory Augmented Computing




                                              • When is this solvable?
                                                        ATA should be invertible.
Computer Vision WS 08/09




                                                    

                                                       ATA entries should not be too small (noise).
                                                       ATA should be well-conditioned.




                                                                                                       43
                                                                                  B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Eigenvectors of ATA
Perceptual and Sensory Augmented Computing




                                              • Haven‟t we seen an equation like this before?
                                              • Recall the Harris corner detector: M = ATA is the second
                                                moment matrix.
                                              • The eigenvectors and eigenvalues of M relate to edge
                                                direction and magnitude.
Computer Vision WS 08/09




                                                       The eigenvector associated with the larger eigenvalue points in
                                                        the direction of fastest intensity change.
                                                       The other eigenvector is orthogonal to it.




                                                                                                                          44
                                                                                 B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Interpreting the Eigenvalues
                                              • Classification of image points using eigenvalues of the
                                                   second moment matrix:
Perceptual and Sensory Augmented Computing




                                                                             2   “Edge”
                                                                                  2 >> 1
                                                                                             “Corner”
                                                                                             1 and 2 are large,
                                                                                             1 ~ 2
Computer Vision WS 08/09




                                                      1 and 2 are small         “Flat”                    “Edge”
                                                                                  region                    1 >> 2


                                             Slide credit: Kristen Grauman        B. Leibe                          1   45
                                              Edge
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                               – Gradients very large or very small
                                                                               – Large 1 , small 2
                                                                                                                      46
                                                                                          B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Low-Texture Region
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                               – Gradients have small magnitude
                                                                               – Small 1 , small 2
                                                                                                                  47
                                                                                          B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              High-Texture Region
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                               – Gradients are different, large magnitude
                                                                               – Large 1 , large 2
                                                                                                                            48
                                                                                          B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Per-Pixel Estimation Procedure

                                                                                    I x I t 
                                              • Let M    I  I 
                                                                               T
                                                                         and   b             
                                                                                    I y I t 
Perceptual and Sensory Augmented Computing




                                              • Algorithm: At each pixel compute U by solving MU  b
                                              • M is singular if all gradient vectors point in the same
                                                   direction
                                                        E.g., along an edge
Computer Vision WS 08/09




                                                    

                                                       Trivially singular if the summation is over a single pixel
                                                        or if there is no texture
                                                       I.e., only normal flow is available (aperture problem)

                                              • Corners and textured areas are OK

                                                                                                                     49
                                                                                   B. Leibe
                                             Slide credit: Steve Seitz
                                              Iterative Refinement
                                              • Estimate velocity at each pixel using one iteration of
                                                Lucas and Kanade estimation.
                                              • Warp one image toward the other using the estimated
Perceptual and Sensory Augmented Computing




                                                flow field.
                                                       (Easier said than done)
                                              • Refine estimate by repeating the process.
Computer Vision WS 08/09




                                                                                                         50
                                                                                  B. Leibe
                                             Slide credit: Steve Seitz
                                              Optical Flow: Iterative Refinement
Perceptual and Sensory Augmented Computing




                                                                         estimate              Initial guess:
                                                                          update
                                                                                               Estimate:
Computer Vision WS 08/09




                                                                                    x0                     x

                                                            (using d for displacement here instead of u)


                                                                                                                51
                                                                                    B. Leibe
                                             Slide credit: Steve Seitz
                                              Optical Flow: Iterative Refinement
Perceptual and Sensory Augmented Computing




                                                                         estimate              Initial guess:
                                                                          update
                                                                                               Estimate:
Computer Vision WS 08/09




                                                                                    x0                     x

                                                            (using d for displacement here instead of u)


                                                                                                                52
                                                                                    B. Leibe
                                             Slide credit: Steve Seitz
                                              Optical Flow: Iterative Refinement
Perceptual and Sensory Augmented Computing




                                                                         estimate              Initial guess:
                                                                          update
                                                                                               Estimate:
Computer Vision WS 08/09




                                                                                    x0                     x

                                                            (using d for displacement here instead of u)


                                                                                                                53
                                                                                    B. Leibe
                                             Slide credit: Steve Seitz
                                              Optical Flow: Iterative Refinement
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                            (using d for displacement here instead of u)


                                                                                                           54
                                                                               B. Leibe
                                             Slide credit: Steve Seitz
                                              Optic Flow: Iterative Refinement
                                              • Some Implementation Issues:
                                                       Warping is not easy (ensure that errors in warping are smaller
                                                        than the estimate refinement).
Perceptual and Sensory Augmented Computing




                                                       Warp one image, take derivatives of the other so you don‟t need
                                                        to re-compute the gradient after each iteration.
                                                       Often useful to low-pass filter the images before motion
                                                        estimation (for better derivative estimation, and linear
                                                        approximations to image intensity).
Computer Vision WS 08/09




                                                                                                                     55
                                                                                 B. Leibe
                                             Slide credit: Steve Seitz
                                              Global Parametric Motion Models
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                       Translation         Affine              Perspective   3D rotation




                                                      2 unknowns         6 unknowns            8 unknowns    3 unknowns    56
                                                                                    B. Leibe
                                             Slide credit: Steve Seitz
                                              Affine Motion
                                                  u( x, y)  a1  a2 x  a3 y
                                                  v( x, y)  a4  a5 x  a6 y
Perceptual and Sensory Augmented Computing




                                              • Substituting into the brightness
                                                   constancy equation:
                                                                               I x  u  I y  v  It  0
Computer Vision WS 08/09




                                                                                                            57
                                                                                          B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Affine Motion
                                                  u( x, y)  a1  a2 x  a3 y
                                                  v( x, y)  a4  a5 x  a6 y
Perceptual and Sensory Augmented Computing




                                              • Substituting into the brightness
                                                   constancy equation:
                                                   I x (a1  a 2 x  a3 y )  I y (a 4  a5 x  a 6 y )  I t  0
Computer Vision WS 08/09




                                              • Each pixel provides 1 linear constraint in 6 unknowns.
                                              • Least squares minimization:
                                                   
                                                                       
                                               Err(a )   I x (a1  a2 x  a3 y)  I y (a4  a5 x  a6 y)  I t      2




                                                                                                                       58
                                                                                B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Problem Cases in Lucas-Kanade
                                              • The motion is large (larger than a pixel)
                                                       Iterative refinement, coarse-to-fine estimation
Perceptual and Sensory Augmented Computing




                                              • A point does not move like its neighbors
                                                       Motion segmentation

                                              • Brightness constancy does not hold
                                                       Do exhaustive neighborhood search with normalized correlation.
Computer Vision WS 08/09




                                                                                                                    59
                                                                                 B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Dealing with Large Motions
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                                          60
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Temporal Aliasing
                                              • Temporal aliasing causes ambiguities in optical flow
                                                because images can have many pixels with the same
                                                intensity.
Perceptual and Sensory Augmented Computing




                                              • I.e., how do we know which „correspondence‟ is
                                                correct?
                                                                                               actual shift
Computer Vision WS 08/09




                                                                                                          estimated shift

                                                              Nearest match is                   Nearest match is
                                                            correct (no aliasing)               incorrect (aliasing)

                                              • To overcome aliasing: coarse-to-fine estimation.
                                                                                                                            61
                                                                                    B. Leibe
                                             Slide credit: Steve Seitz
                                              Idea: Reduce the Resolution!
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                                          62
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Coarse-to-fine Optical Flow Estimation
Perceptual and Sensory Augmented Computing




                                                                             u=1.25 pixels


                                                                             u=2.5 pixels
Computer Vision WS 08/09




                                                                             u=5 pixels



                                                             Image 1         u=10 pixels             Image 2


                                               Gaussian pyramid of image 1                  Gaussian pyramid of image 2
                                                                                                                          63
                                                                              B. Leibe
                                             Slide credit: Steve Seitz
                                              Coarse-to-fine Optical Flow Estimation
Perceptual and Sensory Augmented Computing




                                                                         Run iterative L-K

                                                                                     Warp & upsample

                                                                             Run iterative L-K
                                                                                      .
                                                                                      .
                                                                                      .
Computer Vision WS 08/09




                                                             Image 1                                      Image 2


                                               Gaussian pyramid of image 1                       Gaussian pyramid of image 2
                                                                                                                               64
                                                                                   B. Leibe
                                             Slide credit: Steve Seitz
                                              Motion Segmentation
                                              • How do we represent the motion in this scene?
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                   J. Wang and E. Adelson. Layered Representation for Motion Analysis. CVPR 1993.
                                                                                                                                    65
                                                                                     B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Layered Motion
                                              • Break image sequence into “layers” each of which has a
                                                   coherent motion
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                   J. Wang and E. Adelson. Layered Representation for Motion Analysis. CVPR 1993.
                                                                                                                                    66
                                                                                     B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              What Are Layers?
                                              • Each layer is defined by an alpha mask and an affine
                                                   motion model
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                   J. Wang and E. Adelson. Layered Representation for Motion Analysis. CVPR 1993.
                                                                                                                                    67
                                                                                     B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Motion Segmentation with an Affine Model
                                                  u( x, y)  a1  a2 x  a3 y
                                                  v( x, y)  a4  a5 x  a6 y
Perceptual and Sensory Augmented Computing




                                                 Local flow
                                                 estimates
Computer Vision WS 08/09




                                                   J. Wang and E. Adelson. Layered Representation for Motion Analysis. CVPR 1993.
                                                                                                                                    68
                                                                                     B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Motion Segmentation with an Affine Model
                                                  u( x, y)  a1  a2 x  a3 y                        Equation of a plane
                                                                                                (parameters a1, a2, a3 can be
                                                  v( x, y)  a4  a5 x  a6 y                      found by least squares)
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                   J. Wang and E. Adelson. Layered Representation for Motion Analysis. CVPR 1993.
                                                                                                                                    69
                                                                                     B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Motion Segmentation with an Affine Model
                                                  u( x, y)  a1  a2 x  a3 y                                 Equation of a plane
                                                                                                         (parameters a1, a2, a3 can be
                                                  v( x, y)  a4  a5 x  a6 y                               found by least squares)
Perceptual and Sensory Augmented Computing




                                                                                           1D example


                                                          u(x,y)
Computer Vision WS 08/09




                                                                               True flow                Local flow estimate

                                                                                                                              “Foreground”


                                                                                                                              “Background”

                                                                    Segmented estimate                      Line fitting        Occlusion    70
                                                                                             B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              How Do We Estimate the Layers?
                                              • Compute local flow in a coarse-to-fine fashion.
                                              • Obtain a set of initial affine motion hypotheses.
Perceptual and Sensory Augmented Computing




                                                         Divide the image into blocks and estimate affine motion
                                                          parameters in each block by least squares.
                                                          –    Eliminate hypotheses with high residual error
                                                         Perform k-means clustering on affine motion parameters.
                                                          –    Merge clusters that are close and retain the largest clusters to
                                                               obtain a smaller set of hypotheses to describe all the motions in
Computer Vision WS 08/09




                                                               the scene.
                                              • Iterate until convergence:
                                                         Assign each pixel to best hypothesis.
                                                          –    Pixels with high residual error remain unassigned.
                                                         Perform region filtering to enforce spatial constraints.
                                                         Re-estimate affine motions in each region.
                                                   J. Wang and E. Adelson. Layered Representation for Motion Analysis. CVPR 1993. 71
                                                                                       B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Example Result
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                   J. Wang and E. Adelson. Layered Representation for Motion Analysis. CVPR 1993. 72
                                                                                     B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                             Topics of This Lecture
                                             • Introduction to Motion
                                                   Applications, uses

                                             • Motion Field
Perceptual and Sensory Augmented Computing




                                                   Derivation

                                             • Optical Flow
                                                   Brightness constancy constraint
                                                   Aperture problem
Computer Vision WS 08/09




                                                   Lucas-Kanade flow
                                                   Iterative refinement
                                                   Global parametric motion
                                                   Coarse-to-fine estimation
                                                   Motion segmentation

                                             • KLT Feature Tracking
                                                                                        73
                                                                             B. Leibe
                                              Feature Tracking
                                              • So far, we have only considered optical flow estimation
                                                in a pair of images.
                                              • If we have more than two images, we can compute the
Perceptual and Sensory Augmented Computing




                                                optical flow from each frame to the next.
                                              • Given a point in the first image, we can in principle
                                                reconstruct its path by simply “following the arrows”.
Computer Vision WS 08/09




                                                                                                          74
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Tracking Challenges
                                              • Ambiguity of optical flow
                                                       Find good features to track
                                              • Large motions
Perceptual and Sensory Augmented Computing




                                                       Discrete search instead of Lucas-Kanade
                                              • Changes in shape, orientation, color
                                                       Allow some matching flexibility
                                              • Occlusions, disocclusions
Computer Vision WS 08/09




                                                       Need mechanism for deleting, adding new features
                                              • Drift – errors may accumulate over time
                                                       Need to know when to terminate a track




                                                                                                           75
                                                                                 B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Handling Large Displacements
                                              • Define a small area around a pixel as the template.
                                              • Match the template against each pixel within a search
Perceptual and Sensory Augmented Computing




                                                area in next image – just like stereo matching!
                                              • Use a match measure such as SSD or correlation.
                                              • After finding the best discrete location, can use Lucas-
                                                Kanade to get sub-pixel estimate.
Computer Vision WS 08/09




                                                                                                           76
                                                                               B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Tracking Over Many Frames
                                              • Select features in first frame
                                              • For each frame:
Perceptual and Sensory Augmented Computing




                                                       Update positions of tracked features
                                                          – Discrete search or Lucas-Kanade
                                                       Terminate inconsistent tracks
                                                          – Compute similarity with corresponding feature in the previous
                                                            frame or in the first frame where it‟s visible
                                                       Start new tracks if needed
Computer Vision WS 08/09




                                                          – Typically every ~10 frames, new features are added to “refill the
                                                            ranks”.




                                                                                                                                77
                                                                                     B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Shi-Tomasi Feature Tracker
                                              • Find good features using eigenvalues of second-
                                                     moment matrix
                                                          Key idea: “good” features to track are the ones that can be
Perceptual and Sensory Augmented Computing




                                                    

                                                          tracked reliably.
                                              • From frame to frame, track with Lucas-Kanade and a
                                                     pure translation model.
                                                         More robust for small displacements, can be estimated from
                                                          smaller neighborhoods.
Computer Vision WS 08/09




                                              • Check consistency of tracks by affine registration to
                                                     the first observed instance of the feature.
                                                         Affine model is more accurate for larger displacements.
                                                         Comparing to the first frame helps to minimize drift.


                                                                J. Shi and C. Tomasi. Good Features to Track. CVPR 1994.
                                                                                                                           78
                                                                                         B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                              Tracking Example
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                J. Shi and C. Tomasi. Good Features to Track. CVPR 1994.
                                                                                                                           79
                                                                                         B. Leibe
                                             Slide credit: Svetlana Lazebnik
                                             Real-Time GPU Implementations
                                             • This basic feature tracking framework (Lucas-Kanade +
                                               Shi-Tomasi) is commonly referred to as “KLT tracking”.
                                                    Used as preprocessing step for many applications
Perceptual and Sensory Augmented Computing




                                                

                                                    (recall the boujou demo yesterday)
                                                   Lends itself to easy parallelization

                                             • Very fast GPU implementations available
                                                   C. Zach, D. Gallup, J.-M. Frahm,
Computer Vision WS 08/09




                                                    Fast Gain-Adaptive KLT tracking on the GPU.
                                                    In CVGPU‟08 Workshop, Anchorage, USA, 2008
                                                   216 fps with automatic gain adaptation
                                                   260 fps without gain adaptation

                                                         http://www.cs.unc.edu/~ssinha/Research/GPU_KLT/
                                                           http://cs.unc.edu/~cmzach/opensource.html
                                                                                                           80
                                                                             B. Leibe
                                              Example Use of Optical Flow: Motion Paint
                                              • Use optical flow to track brush strokes, in order to
                                                   animate them to follow underlying scene motion.
Perceptual and Sensory Augmented Computing
Computer Vision WS 08/09




                                                                              What Dreams May Come


                                                                        http://www.fxguide.com/article333.html
                                                                                                                 81
                                                                                       B. Leibe
                                             Slide credit: Kristen Grauman
                                              Motion vs. Stereo: Similarities
                                              • Both involve solving
                                                       Correspondence: disparities, motion vectors
                                                        Reconstruction
Perceptual and Sensory Augmented Computing




                                                    
Computer Vision WS 08/09




                                                                                                      82
                                                                                 B. Leibe
                                             Slide credit: Kristen Grauman
                                              Motion vs. Stereo: Differences
                                              • Motion:
                                                       Uses velocity: consecutive frames must be close to get good
                                                        approximate time derivative.
Perceptual and Sensory Augmented Computing




                                                       3D movement between camera and scene not necessarily single
                                                        3D rigid transformation.

                                              • Whereas with stereo:
                                                       Could have any disparity value.
Computer Vision WS 08/09




                                                       View pair separated by a single 3d transformation.




                                                                                                                  83
                                                                                 B. Leibe
                                             Slide credit: Kristen Grauman
                                              Summary
                                              • Motion field: 3D motions projected to 2D images;
                                                dependency on depth.
                                              • Solving for motion with
Perceptual and Sensory Augmented Computing




                                                       Sparse feature matches
                                                       Dense optical flow
                                              • Optical flow
Computer Vision WS 08/09




                                                       Brightness constancy assumption
                                                       Aperture problem
                                                       Solution with spatial coherence assumption
                                                       Extensions to segementation into motion layers



                                                                                                         84
                                                                                 B. Leibe
                                             Slide credit: Kristen Grauman
                                             References and Further Reading
                                             • Here is the original paper by Lucas & Kanade
                                                   B. Lucas and T. Kanade. An iterative image registration
                                                    technique with an application to stereo vision. In Proc. IJCAI,
Perceptual and Sensory Augmented Computing




                                                    pp. 674–679, 1981.


                                             • And the original paper by Shi & Tomasi
                                                   J. Shi and C. Tomasi. Good Features to Track. CVPR 1994.
Computer Vision WS 08/09




                                             • Read the story how optical flow was used for special
                                               effects in a number of recent movies
                                                   http://www.fxguide.com/article333.html




                                                                                                                      85
                                                                              B. Leibe