ICCV & CVPR paper reading by 86X4kk39

VIEWS: 14 PAGES: 39

									Supervised Translation-Invariant
           Sparse Coding
                [Jianchao Yang, Kai Yu, Thomas Huang]




                     讲解人: 崔 振
                       2010.9.17
提纲

•作者信息
•文章信息
•拟解决的问题
•本文的方法
•实验
•结论
提纲

•作者信息
•文章信息
•拟解决的问题
•本文的方法
•实验
•结论
Jianchao Yang
   Image Formation & Processsing Group (IFP), University
    of Illinois at Urbana-Champaign (UIUC)

   Ph.D. Candidate (06-Present, ECE, UIUC) ; Ph.D.
    Adviser: Prof. Thomas S. Huang
   B.Eng (02-06, EEIS, USTC)

   Publication(第一作者)
                                                            jyang29 @ifp.uiuc.edu
     CVPR:4篇,2篇oral

     TIP:2篇

     ECCV10,1篇

     ICIP,1篇



   Homepage: http://www.ifp.illinois.edu/~jyang29/
Kai Yu
   Machine Learning researcher and the Head of Media
    Analytics Department at NEC Laboratories America. Inc..

   Ph.D. Computer Science, University of Munich,Germany,
    January 2001 – July 2004.
   B.Sc and M.Sc, Nanjing University.

   Research Interests
     Areas: machine learning, data mining, information
       retrieval, computer vision
   CVPR(4),ECCV(4+),ICML(8+),NIPS(10+),…

   http://www.dbs.informatik.uni-muenchen.de/~yu_k/
Thomas Huang

   Beckman Institute Image Formation and Processing
    and Artificial Intelligence groups.
   William L. Everitt Distinguished Professor in the U of I
    Department of Electrical and Computer Engineering
    and the Coordinated Science Lab (CSL);

   Sc.D. from MIT in 1963

   computer vision, image compression and
    enhancement, pattern recognition, and multimodal
    signal processing.

   http://www.beckman.illinois.edu/directory/t-huang1
提纲

•作者信息
•文章信息
•拟解决的问题
•本文的方法
•实验
•结论
文章信息

   文章出处
     CVPR10(oral)

   相关文章
     Yang et al. Linear spatial pyramid matching using
      sparse coding for image classification. CVPR’09.
Abstract
   In this paper, we propose a novel supervised hierarchical
    sparse coding model based on local image descriptors for
    classification tasks. The supervised dictionary training is
    performed via back-projection, by minimizing the training error
    of classifying the image level features, which are extracted by
    max pooling over the sparse codes within a spatial pyramid.
    Such a max pooling procedure across multiple spatial scales
    offer the model translation invariant properties, similar to the
    Convolutional Neural Network (CNN). Experiments show that
    our supervised dictionary improves the performance of the
    proposed model significantly over the unsupervised dictionary,
    leading to state-of-the-art performance on diverse image
    databases. Further more, our supervised model targets
    learning linear features, implying its great potential in handling
    large scale datasets in real applications.
摘要

   针对分类任务,提出了一种新颖的基于局部图像描述子的
    监督分级稀疏编码模型。
   通过back-projection方法,以最小化在图像层级特征
    (image level features)的分类误差训练监督词典。其中图
    像层级特征是以空间金字塔为结构max pooling稀疏编码。
    在多种空间尺度下max pooling方法具有平移不变的特性,
    如同CNN(Convolutional Neural Network)一样。
   实验证明,与无监督词典相比,监督词典明显地改善了模
    型的性能,并且在多个图像数据库拥有最好的表现。
   另外,监督模型目标是学习线性特征,它蕴含了一个巨大
    潜能-实时地处理大规模数据库。
提纲

•作者信息
•文章信息
•拟解决的问题
•本文的方法
•实验
•结论
拟解决的问题

   Image classification
     To find a generic feature representation

     Interested in linear prediction model
Sparse Coding for Image Classification
Sparse Coding                      Unsupervised            Supervised

Sparse coding on holistic image    D. Bradley et al. ‘08   D. Bradley et al. ‘08
-Linear model assumption           J. Wright et al. ’09    J. Marialet al. ’08
-Sensitive to image misalignment   A. Wagner et al.’09     Q. Zhang. CVPR10
-Limited applications              etc                     etc


Sparse coding on local             R. Rainaet al. ’07
descriptors
-Break linear model assumption
                                   J. Yang et al. ’09
                                   J. Yang et al. ’10
                                                           ?
for the image space                etc
-Robust to image misalignment
-Applicableto generic image
classification
提纲

•作者信息
•文章信息
•拟解决的问题
•本文的方法
•实验
•结论
本文的方法
   框架
   相关知识
   本文模型
   求解方法
 框架



        Descriptor extraction
                       Bag of coordinated
                        Local descriptors
           nonlinear coding
                                                                 Yang. CVPR09
                              High-dimensional
                                sparse codes
           feature pooling
                                    Image
                                representation
            classification
                              It must be a cool
                                     Cat!

J. Yang et al. Linear spatial pyramid matching using sparse coding for image classification. CVPR’09.
已有方法

   Histogram-based SPM feature
     Step 1: local descriptor extraction

     Step 2: vector quantization (e.g.k-means)

     Step 3: hierarchical average pooling

     Step 4: nonlinear SVM

   The framework of ScSPM (CVPR09)
     Step 1: local descriptor extraction

     Step 2: sparse coding (无监督词典)

     Step 3: hierarchical max pooling

     Step 4: linear SVM
相关知识(1)
   Sparse coding

                    Xnxm=(X1,X2,…,Xm)
                    Bnxk:词典

   Max pooling     Zkxm:稀疏系数
相关知识(2)




          S: 尺度(层次)
          U: 串接



   分级融合
Model(1)

             Xk:表示第k个图像




                                  目标函数
                          + SVM




    多层max pooling
Model(2)-目标函数




监督




     Optimization over B: back propagation!
求解方法(1)




         No analytical
    Squared hinge link
         Linear prediction
           Only cares about
    loss model
         function
           the pooled
           maximum values
求解方法(2)

   Solution: use implicit differentiation




                                                                  Setting the gradients at
                                                                zero coefficients to be zero,
                                                                 a lot of computations can
                                                                           be saved!

D. M. Bradley et al. Differentiable sparse coding. NIPS 2008.
Training convergence

   Initialization is important: B is trained in unsupervised
    manner.
   Convergence
Example dictionary

   Example dictionary: CMU PIE




           Unsupervised           Supervised
提纲

•作者信息
•文章信息
•拟解决的问题
•本文的方法
•实验
•结论
Experiment

   Classification tasks
     Face recognition: CMU PIE, and CMU Multi-PIE

     Handwritten digit recognition: MNIST

     Gender Recognition: FRGC 2.0

   Image local descriptors: raw image patches
   Prediction model: one-vs-all linear SVM with squared hinge loss
    function.
   Stochastic optimization: typically converges in 10 iterations,
    gradient descent.
Experiment

   Parameter settings




        学习率:
Experiment –Face Recognition (1)

   CMU PIE:
     41368 images of 68 people, each under 13 poses, 43
      different illumination conditions with 4 different
      expressions.
     A subset of five near frontal views are used including
      all expressions and illuminations.
Experiment –Face Recognition (1)

   USC: unsupervised sparse coding model.
   SSC: supervised sparse coding model.
   Improvements: shows the improvements of SSC over
    USC.
             Classification error(%) on CMU PIE
Experiment –Face Recognition (2)

   CMU Multi-PIE:
     contains 337 subjects across simultaneous variations
      in pose, expression and illumination.
     A subset containing near frontal view face images are
      used as training and testing.
Experiment –Face Recognition (2)

                 Face recognition error(%) on Multi-PIE




   [SR] A. Wagner et al. Towards a practical face recognition system:
    robust registration and illumination by sparse representation. CVPR’09.
Experiment – Handwritten Digit Recognition

   MNIST: consists of 70,000 handwritten digits, aligned to
    the center. 60,000 of them are modeled as training, and
    the rest 10,000 as testing.
Experiment – Gender Recognition

   FRGC 2.0
     contains 568 individuals, totally 14714 face images
      under various lighting conditions and backgrounds.
     11700 face images of 451 individuals are used as
      training, and the remaining 3014 images of 114
      persons are used as testing.
Experiment – Gender Recognition
提纲

•作者信息
•文章信息
•拟解决的问题
•本文的方法
•实验
•结论
Conclusion
   A supervised translation-invariant sparse coding model for image
    classification
     A generic image representation.
     The max pooling feature is translation-invariant.
     Sparse coding on local descriptors is promising compared to
       sparse coding on holistic image.
     Supervised sparse coding improves the performance significantly.
     Next steps:
     Connections with hierarchical models in deep belief networks
       should be investigated.
     More theoretical analysis for pooling functions are needed.
     Deep hierarchical models based on sparse coding should be
       studied.
参考文献

   Jianchao Yang, Kai Yu, Thomas Huang,Supervised Translation-
    Invariant Sparse Coding. CVPR10.
   J. Yang et al. Translation-Invariant Sparse Coding. CVPR10(talk).
   J. Yang et al. Linear spatial pyramid matching using sparse coding
    for image classification. CVPR’09.

								
To top