Template for Small Animal Faces

Document Sample
Template for Small Animal Faces Powered By Docstoc
					                                                                                                     Learning mixed image templates for object recognition
                                                                                                                                                                         Project page with code & data:
                                                                                                           Zhangzhang Si1, Haifeng Gong1,2, Ying Nian Wu1, Song-Chun Zhu1,2                                                                        1. Statistics Department, UCLA; 2. Lotus Hill Institute for Computer Vision

                                                                                                 Motivation                                                                                                                                                                                                                          Implementation
                                       One goal                                                                                                                                                                                                                Feature design                                                               Statistical modeling                                                                            Work flow
 Learning generative image templates for a wide range of                                                                                                                                                                                                                                                      The marginal statistical model for sketch variable is a simple log linear one:                                  Example images or      100 * 100
                                                                                                                                                                                                                                           Both sketch and texture features are defined on a                                                                                                                                  image pyramids
 image categories (or manifolds) composed of geometric                                                                                                                                                                                     common Gabor dictionary.
                                                                                                                                                                                                                                                                                                                                                                                                                                                       150 * 150
 structures and local textures.                                                                                                                                                                                                           Sketch variable: let Bx,y,o,s be the Gabor element located at       Information gain of sketch primitive B located at region Λj : (favor large mean)
                                                                                                                                                                                                                                                                                                                                                                                                                                      Gabor              200 * 200
                                                                                                                                                                                                                                          (x,y) orientation o and scale s. The sketch variable (feature                                                                                                                               response             local average and normalization over
                                 Two challenges:                                                                                                                                                                                          response) is specified as a local maximum:                                                                                                                          local maximization      maps                 orientations
- The residual terms (ϵ) are not comparable for h and B;                                                                                                                                                                                                                                                      The marginal statistical model for texture variable is Gaussian-like:                                     Sketch                        Orientation
- Calculating the normalizing constant.                                                                                                                                                                                                                                                                                                                                                                                 response                      histogram
                                                                                                                                                                                                                                           where s() is a sigmoid-like transformation.                                                                                                                                  maps                          maps
 Explanation (right figure) : Quantization in the image space and histogram feature                                                                                                                                                       Texture variable: from local region          we compute an
 space provides a primitive dictionary {B} and a texture dictionary {h}                                                                                                                                                                                                                                       Information gain of histogram prototype h located at region Λj : (favor small variance)                              average over examples
                                                                                                                                                                                                                                          orientation histogram                              from Gabor
 respectively, which compete to explain observed image patches. A mixed template                                                                                                                                                          responses, where                                                                                                                                                                Mean                              Variance
 of hedgehog T = {B1, h2, B3, h4, …} is composed of sketches and histogram                                                                                                                                                                                                                                                                                                                                                response                          map of
 prototypes explaining local image patches at different locations.                                                                                                                                                                                                                       and                                                 System parameters                                                            map                               histograms

 A local image patch of hedgehog can be explained by a geometric primitive, i.e. I                                                                                                                                                         is the local average. The texture variable (response) is:           Gabor filters are all 17 * 17 pixels. Number of orientations is 16. For Gabor filters we
                                                                                                                                                                                                                                                                                                               use the same parameters as in (Wu et. al. 07) [1]. Multi-scale Gabor is implemented by            select sketch variables with                select texture variables with
 = cB+ε, where B is a geometric primitive and ε is the residual image; or be                                                                                                                                                                                                                                                                                                                                                                                 small variance
 explained by a texture prototype, i.e. H(I) = h + ε, where H is some histogram                                                                   “Center” (template) of the hedgehog                                                                                                                          image pyramid. The radius for local maximization is 6 pixels. The radius for local                large mean                     Mixed
                                                                                                                                                                                                                                          The prototype histogram h is to be estimated from examples.          average (orientation histogram) can be 11, 21 and 41.                                                                            template
 statistics and ε is the histogram residual.                                                                                                      manifold (scale/center normalized).

                                                                                                 Formulation                                                                                                                                                                                                                 Result and Evaluation
                                        Matching/Inference                                                                                                              Learning                                                             Mixed templates                                   hedgehog              bear head                cat head                 clock                     pizza               lion head                     pig head                       tiger head
                                                                                Left figure: Each marginal                                                                        Left figure: the best template is the one that         Mixed templates illustrated along with
                                                                                matching score rj indicates the                                                                                                                          three example images per category.
                                                                                                                                                                                  scores highest on i.i.d. example images                Black stroke: sketch.
                                                                                similarity between the image patch
                                                                                                                                                                                  {I1…In}. With the score being log-likelihood           Red blob: orientation histogram.
                                                                                under inspection and sub-template
                                                                                tj , being either an image primitive                                                              (ratio), it is equivalent to maximum likelihood.
                                                                                or orientation histogram and                                                                                            notation                             AUC vs. method                                                       head/shoulder                                              hedgehog                                 cat head                                        pig head
                                                                                subject to local deformation. The
                                                                                total matching score is a linear
                                                                                                                                                                                   {I1,..., In}: positive examples
                                                                                                                                                                                   p(I): distribution learned from positive examples
                                                                                                                                                                                                                                             and training size
                                                                                combination of the marginal ones.                                                                                                                         Improvement on binary classification
                                                                                                                                                                                   q(I): distribution of negative examples (random
                                                                                                                                                                                                                                          (vs. 600 random negative images) due
                                                                                                                                                                                   natural images)
                              r (sk), if tj  B
                                                                                                                                                                                                                                         to the combination of sketch and
    rj  match(I  j ,t j )   (tex)                         Bottom figure: matching a hedgehog mixed                                                                                                                                    texture features. In each plot, the area
                              r , if tj  h
                                                             template onto examples.                                                                                                                                                     under ROC curve (AUC) is averaged
              (See “Implementation”)                                                                                   We learn a statistical model p(I) by a series of model updating: q(I) = p0(I)  p1(I)  ...  pk(I).               over cross-validation runs and plotted
black stroke: sketch                                                                                                   Under (generalized) max-entropy principle, pk(I) that matches the empirical mean of {r1...rk} has form:            against the No. of positive training
red blob: texture                                                                                                                                                                                                                         examples. The dotted lines indicate
                                                                                                                                                                                                                                          95% student-t confidence bounds.
                                                                                                                       with {r1,...,rk} de-correlated :
                                                                                                                                                                                                                                                                                                       data set                          overview of average precision                           image categories across a wide spectrum of visual complexity
   template                                                                                                                                                                         and                                                     AP (average precision)                                                                                                                                  Low complexity                                               High complexity
            Feature selection: sketch vs. texture competition                                                          MLE (variable selection and parameter estimation) for pk(I) is then simplified to MLE on each marginal               over 100+ object /
                                                                                                                       distribution {p(rj)}. It takes a simple line search to find the best λj and zj. And variable selection is based
                                                                                                                       on ranking the feature responses (variables) by information gain:
                                                                                                                                                                                                                                            texture categories
                                                                                                                                                                       Information gain                                                  We evaluate the learned templates in
                                                                                                                                                                                                                                         one-vs-all classifications. For each
                                                                                                                            rj being either the sketch response or texture response, its gain is evaluated by number of bits:
                                                                                                                                                                                                                                         category we randomly select 15
                                                                                                                                                                            1 n    p(rj (Ii ))                                           examples as training positives, and the
                                                                                                                                            gain j  KL( p(rj ) || q(rj ))   log                                 (1)                   rest (at most 50) are used for testing
                                                                                                                                                                            n i 1 q(rj (Ii ))
                                                                                                                                                                                                                                         (~4200 images used for testing). Images
                                                                                                                       This information-theoretic criterion enables comparison of apples to oranges.                                     are transformed to grayscale, and are
 Each figure plots the information gain of top 40 features ranked in descending order. Black/white                                                                                                                                       resized to have a specified image area
 bars: information gains of selected sketch features; Red bars: gains of texture features. For low                                                  Adaptive textural background for sketches                                            while preserving the original aspect ratio.                                                 Box plot of average precisions (the area under
 complexity image categories such as head/shoulder, sketch features dominate the information gain.                     To better decouple sketch and texture features, instead of the                                                                                                    60 object categories and 41 texture         precision-recall curve). Each box shows
                                                                                                                       convenient                      , we use                                                                                                                          categories from Caltech101,CUReT and        max/min, 25% / 75% percentiles and the               Top: 10 object/texture categories ranked by perceived complexity, namely: human
 As there are more clutters inside objects, texture features begin to contribute more: see the feature                                                                                                                                   We use a universal threshold 0.1 on
                                                                                                                                                                                                                                                                                         LHI datasets. It is made moderately         median of average precisions on 100+ object          head/shoulder, pistol, laptop, dog head, mouse head, hedgehog, pizza and three texture
 competition for hedgehog, pizza and the water patches cropped from a pond image.                                                pj-1(rj) = qlocal texture(rj) = q (rj) exp( - λ rj ) / z(λ)                                             information gain as the stopping criterion
                                                                                                                                                                                                                                                                                         difficult by object categories easy to      and texture categories. The mixed template           categories. Bottom: average precisions (AP) of object categories (ordered as the plot
                                          Selected References                                                                                                                                                                            of feature selection. On average about                                                                                                           on the top), for sketch-only, texture-only and combined templates. Combination of
                                                                                                                       for some λ (to be estimated an example image).                                                                                                                    confuse, e.g., 18 kinds of animal faces     performs observably better than the individual
   [1] Y. N. Wu, Z. Si, C. Fleming and S.C. Zhu, Deformable templates as active basis, ICCV’07                                                                                                                                           200 features are selected per category                                                                                                           sketch and texture features benefits the most for the “mid-complexity” categories.
                                                                                                                       qlocal texture(rj) is adaptive per example and can be called adaptive q.                                          (i.e. per template).                            and some similar texture categories.        sketch or texture templates.

Description: Template for Small Animal Faces document sample