Docstoc

A Sparse Texture Representation Using Affine-Invariant Regions

Document Sample
A Sparse Texture Representation Using Affine-Invariant Regions Powered By Docstoc
					Learning Local Affine Representations
 for Texture and Object Recognition

                   Svetlana Lazebnik
Beckman Institute, University of Illinois at Urbana-Champaign
      (joint work with Cordelia Schmid, Jean Ponce)
                       Overview
• Goal:
  – Recognition of 3D textured surfaces, object classes
• Our contribution:
  – Texture and object representations based on
    local affine regions
• Advantages of proposed approach:
  – Distinctive, repeatable primitives
  – Robustness to clutter and occlusion
  – Ability to approximate 3D geometric transformations
                        The Scope
1. Recognition of single-texture images (CVPR 2003)




2. Recognition of individual texture regions in multi-texture
   images (ICCV 2003)




3. Recognition of object classes (BMVC 2004, work in progress)
1. Recognition of Single-Texture Images
        Affine Region Detectors
                   Harris detector (H)   Laplacian detector (L)




Mikolajczyk & Schmid (2002), Gårding & Lindeberg (1996)
          Affine Rectification Process
Patch 1                                           Patch 2




            Rectified patches (rotational ambiguity)
     Rotation-Invariant Descriptors 1:
               Spin Images
• Based on range spin images (Johnson & Hebert 1998)
• Two-dimensional histogram:
     distance from center × intensity value
 Rotation-Invariant Descriptors 2: RIFT
• Based on SIFT (Lowe 1999)
• Two-dimensional histogram:
      distance from center × gradient orientation
• Gradient orientation is measured w.r.t. to the direction
  pointing from the center of the patch
             Signatures and EMD
• Signatures
     S = {(m , w ), … , (m , w )}
             1   1         k   k

         mi — cluster center
         wi — relative weight

• Earth Mover’s Distance (Rubner et al. 1998)
  – Computed from ground distances d(mi, m'j)
  – Can compare signatures of different sizes
  – Insensitive to the number of clusters
         Database: Textured Surfaces




25 textures, 40 sample images each (640x480)
                   Evaluation
• Channels: HS, HR, LS, LR
  – Combined through addition of EMD matrices
• Classification results
  – 10 training images per class, rates averaged over
    200 random training subsets
                Comparative Evaluation
                      Our method                Varma & Zisserman
                                                (2003)

Spatial selection     Harris and Laplacian      None (every pixel
                      detectors                 location is used)

Neighborhood shape    Affine adaptation         None (support of
selection                                       descriptors is fixed)

Descriptors           Spin images, RIFT         Raw pixel values


Textons               Separate set of textons   Universal texton
                      for each image            dictionary

Representing/comparing Signatures/EMD           Histograms/
texton distributions                            chi-squared distance
              Results of Evaluation:
Classification rate vs. number of training samples

    (H+L)(S+R)              VZ-Joint              VZ-MRF




• Conclusion: an intrinsically invariant representation is
  necessary to deal with intra-class variations when they are
  not adequately represented in the training set
                    Summary
• A sparse texture representation based on local
  affine regions
• Two novel descriptors (spin images, RIFT)
• Successful recognition in the presence of viewpoint
  changes, non-rigidity, non-homogeneity
• A flexible approach to invariance
 2. Recognition of Individual Regions in
         Multi-Texture Images
• A two-layer architecture:
  – Local appearance + neighborhood relations
• Learning:
  – Represent the local appearance of each texture class
    using a mixture-of-Gaussians model
  – Compute co-occurrence statistics of sub-class labels over
    affinely adapted neighborhoods
• Recognition:
  – Obtain initial class membership probabilities from the
    generative model
  – Use relaxation to refine these probabilities
          Two Learning Scenarios
• Fully supervised: every region in the training image
  is labeled with its texture class




                           brick


• Weakly supervised: each training image is labeled
  with the classes occurring in it




                    brick, marble, carpet
   Neighborhood Statistics


                          Estimate:
                          • probability p(c,c')
                          • correlation r(c,c')




Neighborhood definition
    Relaxation (Rosenfeld et al. 1976)
• Iterative process:
  – Initialized with posterior probabilities p(c|xi) obtained from
    the generative model
  – For each region i and each sub-class label c, update the
    probability pi(c) based on neighbor probabilities pj(c') and
    correlations r(c,c')


• Shortcomings:
  – No formal guarantee of convergence
  – After the initialization, the updates to the probability values
    do not depend on the image data
       Experiment 1: 3D Textured Surfaces
                                       Single-texture images
   T1 (brick)      T2 (carpet)       T3 (chair)     T4 (floor 1)     T5 (floor 2)    T6 (marble)      T7 (wood)




                                       Multi-texture images



10 single-texture training images per class, 13 two-texture training images, 45 multi-texture test images
Effect of Relaxation on Labeling
                    Original image




     Top: before relaxation, bottom: after relaxation
                                Retrieval
                      (single-texture training images)


T1 (brick)                  T2 (carpet)                 T3 (chair)               T4 (floor 1)




             T5 (floor 2)                 T6 (marble)                T7 (wood)
Successful Segmentation Examples
Unsuccessful Segmentation Examples
               Experiment 2: Animals




   cheetah, background   zebra, background   giraffe, background


• No manual segmentation
• Training data: 10 sample images per class
• Test data: 20 samples per class + 20 negative
  images
Cheetah Results
Zebra Results
Giraffe Results
                   Summary
• A two-level representation (local appearance +
  neighborhood relations)
• Weakly supervised learning of texture models

                 Future Work
• Design an improved representation using a random
  field framework, e.g., conditional random fields
  (Lafferty 2001, Kumar & Hebert 2003)
• Develop a procedure for weakly supervised
  learning of random field parameters
• Apply method to recognition of natural
  texture categories
    3. Recognition of Object Classes




The approach:
• Represent objects using multiple composite
  semi-local affine parts
  – More expressive than individual regions
  – Not globally rigid
• Correspondence search is key to learning and
  detection
              Correspondence Search
• Basic operation: a two-image matching procedure for finding
  collections of affine regions that can be mapped onto each
  other using a single affine transformation



                                  A



• Implementation: greedy search based on geometric and
  photometric consistency constraints
   – Returns multiple correspondence hypotheses
   – Automatically determines number of regions in correspondence
   – Works on unsegmented, cluttered images (weakly supervised learning)
Matching: 3D Objects
Matching: 3D Objects




     closeup           closeup
Matching: Faces




           spurious match ???
Finding Symmetries
Finding Repeated Patterns and
         Symmetries
Learning Object Models for Recognition
• Match multiple pairs of training images to produce a
  set of candidate parts
• Use additional validation images to evaluate
  repeatability of parts and individual regions
• Retain a fixed number of parts having the best
  repeatability score
        Recognition Experiment: Butterflies
    Admiral   Swallowtail   Machaon   Monarch 1   Monarch 2   Peacock   Zebra




•     16 training images (8 pairs) per class
•     10 validation images per class
•     437 test images
•     619 images total
Butterfly Parts
                            Recognition
• Top 10 parts per class used for recognition
                                total number of regions detected
• Relative repeatability score:          total part size
• Classification results:




               Total part size (smallest/largest)
Classification Rate vs.
   Number of Parts
         Detection Results (ROC Curves)




Circles: reference relative repeatability rates. Red square: ROC equal error rate (in parentheses)
Successful Detection Examples
             Training images




   Test images (blue: occluded regions)




    All ellipses found in the test images
Unsuccessful Detection Examples
               Training images




    Test images (blue: occluded regions)




      All ellipses found in the test image
                       Summary
• Semi-local affine parts for describing structure
  of 3D objects
• Finding a part vocabulary:
   – Correspondence search between pairs of images
   – Validation
• Additional application:
   – Finding symmetry and repetition

                     Future Work
• Find a better affine region detector
• Represent, learn inter-part relations
• Evaluation: CalTech database, harder classes, etc.
Egret                Birds                      Puffin
        Snowy Owl   Mandarin Duck   Wood Duck
Birds: Candidate Parts
      Mandarin Duck




         Puffin
Objects without Characteristic Texture




                   (LeCun’04)
                 Summary of Talk
1. Recognition of single-texture images
  •   Distribution of local appearance descriptors

2. Recognition of individual regions in
   multi-texture images
  •   Local appearance + loose statistical neighborhood
      relations

3. Recognition of object categories
  •   Local appearance + strong geometric relations


For more information:
   http://www-cvr.ai.uiuc.edu/ponce_grp
                 Issues, Extensions
• Weakly supervised learning
    – Evaluation methods?
    – Learning from contaminated data?
•   Probabilistic vs. geometric approaches to invariance
•   EM vs. direct correspondence search
•   Training set size
•   Background modeling
•   Strengthening the representation
    – Heterogeneous local features
    – Automatic feature selection
    – Inter-part relations

				
DOCUMENT INFO
Categories:
Tags:
Stats:
views:9
posted:9/13/2011
language:English
pages:49