Docstoc

Object Recognition Using Scale-Invariant Features

Document Sample
Object Recognition Using Scale-Invariant Features Powered By Docstoc
					                      SIFT
• Guest Lecture by Jiwon Kim
• http://www.cs.washington.edu/homes/jwkim/
SIFT Features and
 Its Applications
Autostitch Demo
                Autostitch
• Fully automatic panorama generation
  – Input: set of images
  – Output: panorama(s)
• Uses SIFT (Scale-Invariant Feature
  Transform) to find/align images
1. Solve for homography
1. Solve for homography
1. Solve for homography
2. Find connected sets of images
2. Find connected sets of images
2. Find connected sets of images
  3. Solve for camera parameters

• New images initialised with rotation, focal
  length of best matching image
  3. Solve for camera parameters

• New images initialised with rotation, focal
  length of best matching image
     4. Blending the panorama

• Burt & Adelson 1983
  – Blend frequency bands over range  l
 2-band Blending




Low frequency (l > 2 pixels)




High frequency (l < 2 pixels)
Linear Blending
2-band Blending
                 So, what is SIFT?
•   Scale-Invariant Feature Transform
•   David Lowe at UBC
•   Scale/rotation invariant
•   Currently best known feature descriptor
•   Many real-world applications
    –   Object recognition
    –   Panorama stitching
    –   Robot localization
    –   Video indexing
    –   …
Example: object recognition
              SIFT properties
• Locality: features are local, so robust to
  occlusion and clutter
• Distinctiveness: individual features can be
  matched to a large database of objects
• Quantity: many features can be generated for
  even small objects
• Efficiency: close to real-time performance
       SIFT algorithm overview
1. Feature detection
  – Detect points that can be repeatably
    selected under location/scale change
2. Feature description
  – Assign orientation to detected feature points
  – Construct a descriptor for image patch
    around each feature point
3. Feature matching
             1. Feature detection
•   Detect points stable under location/scale
    change
    –   Build continuous space (x, y, scale)
    –   Approximated by multi-scale Difference-of-
        Gaussian pyramid
    –   Select maxima/minima in (x, y, scale)
1. Feature detection
           1. Feature detection
•   Localize extrema by fitting a quadratic

     1) Sub-pixel/sub-scale interpolation using Taylor
        expansion



     2) Take derivative and set to zero
           1. Feature detection
•   Discard low-contrast/edge points
     1) Low contrast: discard keypoints with D(x) <
                                                 ˆ
        threshold
     2) Edge points: high contrast in one direction, low
        in the other  compute principal curvatures from
        eigenvalues of 2x2 Hessian matrix, and limit ratio
            1. Feature detection
• Example
                               (a) 233x189 image
                               (b) 832 DOG extrema
                               (c) 729 left after peak
                                   value threshold
                               (d) 536 left after testing
                                   ratio of principle
                                   curvatures
           2. Feature description
•   Assign orientation to keypoints
    –   Create histogram of local
        gradient directions
        computed at selected
        scale
    –   Assign canonical
        orientation at peak of
        smoothed histogram




                                    0   2
            2. Feature description
•   Construct SIFT descriptor
    –   Create array of orientation histograms
    –   8 orientations x 4x4 histogram array = 128
        dimensions
            2. Feature description
•   Advantage over simple correlation
    –   Gradients less sensitive to illumination change
    –   Gradients may shift: robust to deformation,
        viewpoint change
    Performance: stability to noise
• Match features after random change in image scale &
  orientation, with differing levels of image noise
• Find nearest neighbor in database of 30,000 features
                Performance:
         stability to affine change
• Match features after random change in image scale &
  orientation, with 2% image noise, and affine distortion
• Find nearest neighbor in database of 30,000 features
      Performance: distinctiveness
• Vary size of database of features, with 30 degree affine
  change, 2% image noise
• Measure % correct for single nearest neighbor match
           3. Feature matching
• For each feature in A, find nearest neighbor in B

             A                       B
              3. Feature matching
•   Nearest neighbor search too slow for large
    database of 128-dimenional data
•   Approximate nearest neighbor search:
    –   Best-bin-first [Beis et al. 97]: modification to k-d tree
        algorithm
    –   Use heap data structure to identify bins in order by
        their distance from query point
•   Result: Can give speedup by factor of 1000
    while finding nearest neighbor (of interest)
    95% of the time
                3. Feature matching
•       Reject false matches
    –     Compare distance of nearest neighbor to second nearest
          neighbor
    –     Common features aren’t distinctive, therefore bad
    –     Threshold of 0.8 provides excellent separation
         3. Feature matching
• Now, given feature matches…
  – Find an object in the scene
  – Solve for homography (panorama)
  –…
          3. Feature matching
• Example: 3D object recognition
               3. Feature matching
•   3D object recognition
    –   Assume affine transform: clusters of size >=3
    –   Looking for 3 matches out of 3000 that agree on
        same object and pose: too many outliers for
        RANSAC or LMS
    –   Use Hough Transform
        •   Each match votes for a hypothesis for object ID/pose
        •   Voting for multiple bins & large bin size allow for error due
            to similarity approximation
              3. Feature matching
•   3D object recognition: solve for pose
    –   Affine transform of [x,y] to [u,v]:




    –   Rewrite to solve for transform parameters:
              3. Feature matching
•    3D object recognition: verify model
    1) Discard outliers for pose solution in prev step
    2) Perform top-down check for additional features
    3) Evaluate probability that match is correct
       a) Use Bayesian model, with probability that features would
          arise by chance if object was not present
       b) Takes account of object size in image, textured regions,
          model feature count in database, accuracy of fit [Lowe 01]
            Planar recognition
• Training images
             Planar recognition
• Reliably recognized
  at a rotation of 60°
  away from the
  camera
• Affine fit
  approximates
  perspective
  projection
• Only 3 points are
  needed for
  recognition
         3D object recognition
• Training images
3D object recognition

           • Only 3 keys are
             needed for
             recognition, so extra
             keys provide
             robustness
           • Affine model is no
             longer as accurate
Recognition under occlusion
Illumination invariance
             Applications of SIFT
•   Object recognition
•   Panoramic image stitching
•   Robot localization
•   Video indexing
•   …

• The Office of the Past
    – Document tracking and recognition
Location recognition
Robot Localization
Map continuously built over time
Locations of map features in 3D
Sony Aibo

SIFT usage:

 Recognize
 charging
 station

 Communicate
 with visual
 cards

 Teach object
 recognition
       The Office of the Past
• Paper everywhere
     Unify physical and
     electronic desktops
 Video camera

                • Recognize video of
                  paper on physical
                  desktop
                  – Tracking
                  – Recognition
Desktop
                  – Linking
     Unify physical and
     electronic desktops
 Video camera

                • Applications
                  – Find lost documents
                  – Browse remote desktop
                  – Find electronic version
                  – History-based queries
Desktop
Example input video
Demo – Remote desktop
System overview
        Video camera


        Computer
                   User
 Desk
                System overview




Video of desk
                System overview




Video of desk      Images from PDF
                  System overview




Video of desk               Images from PDF


                 Track &
                recognize
                  System overview




Video of desk               Images from PDF   Internal representation



                 Track &
                recognize


                                                Desk         Desk

                                                 T           T+1
                  System overview




Video of desk               Images from PDF       Internal representation



                 Track &
                recognize


                                                    Desk         Desk

                                    Scene Graph      T           T+1
                  System overview



                                              Where is
                                              my W-2?



Video of desk               Images from PDF    Internal representation



                 Track &
                recognize


                                                 Desk         Desk

                                                  T           T+1
                  System overview



                                              Where is
                                              my W-2?
                                                                     Answer


Video of desk               Images from PDF    Internal representation



                 Track &
                recognize


                                                 Desk         Desk

                                                  T           T+1
               Assumptions
• Document
  – Corresponding electronic copy exists
  – No duplicates of same document
               Assumptions
• Document
  – Corresponding electronic copy exists
  – No duplicates of same document
• Motion
  – 3 event types: move/entry/exit
  – One document at a time
  – Only topmost document can move
             Non-assumptions
• Desk need not be initially empty
             Non-assumptions
• Desk need not be initially empty
• Stacks may overlap
         Algorithm overview
 Input                   …
          …
Frames
            Algorithm overview
 Input                            …
             …
Frames


 Event
Detection
                 before   after
             Algorithm overview
   Input                              …
                 …
  Frames


  Event
 Detection
                     before   after

    Event                              “A document moved
Interpretation                        from (x1,y1) to (x2,y2)”
             Algorithm overview
   Input                                   …
                 …
  Frames


  Event
 Detection
                     before   after

    Event                                   “A document moved
Interpretation                             from (x1,y1) to (x2,y2)”


                               File1.pdf
Document                       File2.pdf
Recognition
                               File3.pdf
             Algorithm overview
   Input                                   …
                 …
  Frames


  Event
 Detection
                     before   after

    Event                                   “A document moved
Interpretation                             from (x1,y1) to (x2,y2)”


                               File1.pdf
Document                       File2.pdf
Recognition
                               File3.pdf


Scene Graph
  Update

                     Desk      Desk
                    Algorithm overview
          Input                                   …
                        …
         Frames


         Event
        Detection
                            before   after

           Event                                   “A document moved
       Interpretation                             from (x1,y1) to (x2,y2)”
SIFT
                                      File1.pdf
       Document                       File2.pdf
       Recognition
                                      File3.pdf


       Scene Graph
         Update

                            Desk      Desk
Document tracking example




 before             after
Document tracking example




 before             after
Document tracking example




 before             after
Document tracking example




 before             after
Document tracking example




 before             after
Document tracking example




 before             after
Document tracking example




 before             after
Document tracking example




 before             after
Document tracking example




 before             after
Document tracking example

          Motion: (x,y,θ)




 before                     after
       Document Recognition
• Match against PDF image database



               …                                                                 …


                   File1.pdf File2.pdf File3.pdf File4.pdf File5.pdf File6.pdf
       Document Recognition
• Performance analysis
  – Tested 20 pages against database of 162 pages
          Document Recognition
• Performance analysis
  – Tested 20 pages against database of 162 pages
  – ~200x300 pixels per document for reliable match



   Recognition
      Rate




                     Document Resolution
          Document Recognition
• Performance analysis
  – Tested 20 pages against database of 162 pages
  – ~200x300 pixels per document for reliable match


                 0.9
   Recognition
      Rate




                               300


                       Document Resolution
                   Results
• Input video
  – ~40 minutes
  – 1024x768 @ 15 fps
  – 22 documents, 49 events
• Running time
  – Video processed offline
  – No optimization
  – A few hours for entire video
Demo – Paper tracking
Photo sorting example
Photo sorting example
Demo – Photo sorting
                Future work
• Enhance realism
  – Handle more realistic desktops
  – Real-time performance
• More applications
  – Support other document tasks
     • E.g., attach reminder, cluster documents
  – Beyond documents
     • Other 3D desktop objects, books/CD’s
                 Summary
• SIFT is:
  – Scale/rotation invariant local feature
  – Highly distinctive
  – Robust to occlusion, illumination change, 3D
    viewpoint change
  – Efficient (real-time performance)
  – Suitable for many useful applications
                      References
• Distinctive image features from scale-invariant keypoints
   – David G. Lowe, International Journal of Computer Vision, 60, 2
     (2004), pp. 91-110
• Recognising panoramas
   – Matthew Brown and David G. Lowe, International Conference on
     Computer Vision (ICCV 2003), Nice, France (October 2003), pp.
     1218-25.
• Video-Based Document Tracking: Unifying Your Physical
  and Electronic Desktops
   – Jiwon Kim, Steven M. Seitz and Maneesh Agrawala, ACM
      Symposium on User Interface Software and Technology (UIST
      2004), pp. 99-107.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:3/3/2012
language:English
pages:96