Pascal VOC Classification challenges - PowerPoint - PowerPoint

Document Sample
Pascal VOC Classification challenges - PowerPoint - PowerPoint Powered By Docstoc
					       Pascal VOC
Classification challenges


               Lear Group
             INRIA - France




01/02/2010    Pascal Workshop Challenge   1
Kernel-Based Classification of Visual
    Objects using a Sparse Image
          Representation



                      Jianguo Zhang, Cordelia Schmid

                INRIA Rhone-Alpes, 665, avenue de l'Europe
                          38330 Montbonnot, FRANCE
              Email: {jianguo.zhang, cordelia.schmid}@inrialpes.fr




 01/02/2010                  Pascal Workshop Challenge               2
Sparse Representation

   Scale invariant regions: robustness against geometric
    transformations
      sparse spatial selection
    A Without representation: saliency, compactness selection
                                          With spatial




     01/02/2010         Pascal Workshop Challenge        3
Outline




  01/02/2010   Pascal Workshop Challenge   4
      Pascal VOC: image signatures #1




       (…)
                       }      Codebook
                                                     (…)
                                                                    Histograms




Keypoint description                         Keypoint description
                                                                     2distance
                          Standard ‘Bag of words’ representation
             01/02/2010             Pascal Workshop Challenge        5
      Pascal VOC: image signatures #2



                       }   Codebook #1


       (…)                    (…)                     (…)



                       }   Codebook #n


Keypoint description                          Keypoint description   Center coordinates +
                                                                              histograms


                                     EMD distance

             01/02/2010              Pascal Workshop Challenge             6
Technical details

   Extraction of a sparse set of descriptors, scale-invariant interest
    regions: Harris-Laplace and Laplacian.
   SIFT is used as region descriptor, resulting in 128 dimensional
    description vectors. Note that the version of SIFT used here is not
    rotation invariant.
   Vocabulary construction: k-means We cluster the descriptors of
    each class separately and then concatenate them.
   Here we extract 250 clusters per class with the k-means algorithm.
    The concatenation results in 1000 clusters




     01/02/2010            Pascal Workshop Challenge            7
    Distance measure
   For each image we compute a frequency histogram for our set of
    visual words.
   Compare these histograms with the distance:



h1 , h2 are the vocabulary histograms of two different images
                            (h1 (i )  h2 (i )) 2
                        
                        2

                              h1 (i )  h2 (i )




       01/02/2010             Pascal Workshop Challenge         8
Using EMD kernel

   Clustering the descriptors of each images into 40 cluster centers and
    form signatures for each image.

   Compute earth mover’s distance on the signatures between images.

   Kernelization and classification is as the similary way as      2
    distance.




     01/02/2010             Pascal Workshop Challenge                    9
Classification / histograms

   We use Support Vector Machines (SVM) for classification.
    Our kernel is a Gaussian kernel based on the  distance
                                                   2




                   K ( I1 , I 2 )  exp( 1 / A   2 (h1 , h2 ))

   The parameter A is obtained by 5-fold cross validation on the training
    images. The distance between images is computed separately for each
    detector/descriptor pair. Results are combined by adding the distances
    and estimating A for the combination. Here we combine Harris-Laplace /
    SIFT and Laplacian / SIFT. For each of the 4 classes
   we train a binary classifier which separates a class from the others. The
    output of the SVM is normalized to [0, 1] and used as a confidence
    measure



     01/02/2010                Pascal Workshop Challenge               10
Results

   For the training images and test set 1 (1373 images in total), the
    average number of points detected per image is 796 for Harris-
    Laplace and 2465 for the Laplacian.
   The minimum number of points detected for an image is 15
    (Harris-Laplace) and 71.




     01/02/2010            Pascal Workshop Challenge             11
ROC curve




        Cars                                   Bikes


ROC curve of x2 kernel and EMD kernel on test set 1
  01/02/2010       Pascal Workshop Challenge           12
ROC Curve: test1




       Motobikes                                People




 ROC curve of x2 kernel and EMD kernel on test set 1


  01/02/2010        Pascal Workshop Challenge        13
ROC Curve: test2




           Bikes                                Cars
 ROC curve of x2 kernel and EMD kernel on test set 2


  01/02/2010        Pascal Workshop Challenge          14
ROC Curve: test2




        Motobikes                               People

 ROC curve of x2 kernel and EMD kernel on test set 1


  01/02/2010        Pascal Workshop Challenge            15
Difficult images in test set 1




               Difficult bikes




               Difficult cars
  01/02/2010      Pascal Workshop Challenge   16
Difficult images in test set 1




               Difficult motorbikes




                 Difficult people
  01/02/2010       Pascal Workshop Challenge   17
Difficult images in test set 2




               Difficult bikes




               Difficult cars
  01/02/2010      Pascal Workshop Challenge   18
Difficult images in test set 2




  01/02/2010   Pascal Workshop Challenge   19
Other results

   We tried several combination of detector/descriptors.
   We denote
     •   the Harris detector with different levels of invariance as HS, HSR and HA
     •   Laplacian detector as LS, LSR and LA. Note that HA and LA are
     •   Both are by construction rotation invariant.
   The combination of detectors and descriptors is denoted by
    (detector+detector)(descriptor+descriptor), e.g.,
    (HS+LS)(SIFT+SPIN) means the combination of HS and LS
    detectors described each with SIFT and SPIN descriptors.




     01/02/2010                    Pascal Workshop Challenge                         20
Other results




  01/02/2010    Pascal Workshop Challenge   21
Conclusions

   The framework using a sparse image representation with kernel
    works well for image categorization.
    Under current experimental settings,  kernel works slightly
                                                2

    better than EMD kernel.
   Parameter k is not critical
   Laplacian detector works better.
   Comparable recognition performance can also be achieved
    without building the ‘global’ vocabulary with EMD. Could be useful
    in case of large data sets.
   Positive / Negative examples in test2: only background should be
    taken as negative; several objects can appear in the same test
    image – it’s not the case for the training set.


     01/02/2010            Pascal Workshop Challenge           22