# Template for Small Animal Faces

Document Sample

```					                                                                                                     Learning mixed image templates for object recognition
Project page with code & data: http://www.stat.ucla.edu/~zzsi/mixed_template.html
Zhangzhang Si1, Haifeng Gong1,2, Ying Nian Wu1, Song-Chun Zhu1,2                                                                        1. Statistics Department, UCLA; 2. Lotus Hill Institute for Computer Vision

Motivation                                                                                                                                                                                                                          Implementation
One goal                                                                                                                                                                                                                Feature design                                                               Statistical modeling                                                                            Work flow
Learning generative image templates for a wide range of                                                                                                                                                                                                                                                      The marginal statistical model for sketch variable is a simple log linear one:                                  Example images or      100 * 100
Both sketch and texture features are defined on a                                                                                                                                  image pyramids
image categories (or manifolds) composed of geometric                                                                                                                                                                                     common Gabor dictionary.
150 * 150
structures and local textures.                                                                                                                                                                                                           Sketch variable: let Bx,y,o,s be the Gabor element located at       Information gain of sketch primitive B located at region Λj : (favor large mean)
Gabor              200 * 200
(x,y) orientation o and scale s. The sketch variable (feature                                                                                                                               response             local average and normalization over
Two challenges:                                                                                                                                                                                          response) is specified as a local maximum:                                                                                                                          local maximization      maps                 orientations
- The residual terms (ϵ) are not comparable for h and B;                                                                                                                                                                                                                                                      The marginal statistical model for texture variable is Gaussian-like:                                     Sketch                        Orientation
- Calculating the normalizing constant.                                                                                                                                                                                                                                                                                                                                                                                 response                      histogram
where s() is a sigmoid-like transformation.                                                                                                                                  maps                          maps
Explanation (right figure) : Quantization in the image space and histogram feature                                                                                                                                                       Texture variable: from local region          we compute an
space provides a primitive dictionary {B} and a texture dictionary {h}                                                                                                                                                                                                                                       Information gain of histogram prototype h located at region Λj : (favor small variance)                              average over examples
orientation histogram                              from Gabor
respectively, which compete to explain observed image patches. A mixed template                                                                                                                                                          responses, where                                                                                                                                                                Mean                              Variance
of hedgehog T = {B1, h2, B3, h4, …} is composed of sketches and histogram                                                                                                                                                                                                                                                                                                                                                response                          map of
prototypes explaining local image patches at different locations.                                                                                                                                                                                                                       and                                                 System parameters                                                            map                               histograms

A local image patch of hedgehog can be explained by a geometric primitive, i.e. I                                                                                                                                                         is the local average. The texture variable (response) is:           Gabor filters are all 17 * 17 pixels. Number of orientations is 16. For Gabor filters we
use the same parameters as in (Wu et. al. 07) [1]. Multi-scale Gabor is implemented by            select sketch variables with                select texture variables with
= cB+ε, where B is a geometric primitive and ε is the residual image; or be                                                                                                                                                                                                                                                                                                                                                                                 small variance
explained by a texture prototype, i.e. H(I) = h + ε, where H is some histogram                                                                   “Center” (template) of the hedgehog                                                                                                                          image pyramid. The radius for local maximization is 6 pixels. The radius for local                large mean                     Mixed
The prototype histogram h is to be estimated from examples.          average (orientation histogram) can be 11, 21 and 41.                                                                            template
statistics and ε is the histogram residual.                                                                                                      manifold (scale/center normalized).

Formulation                                                                                                                                                                                                                 Result and Evaluation
Left figure: Each marginal                                                                        Left figure: the best template is the one that         Mixed templates illustrated along with
matching score rj indicates the                                                                                                                          three example images per category.
scores highest on i.i.d. example images                Black stroke: sketch.
similarity between the image patch
{I1…In}. With the score being log-likelihood           Red blob: orientation histogram.
under inspection and sub-template
tj , being either an image primitive                                                              (ratio), it is equivalent to maximum likelihood.
subject to local deformation. The
total matching score is a linear
{I1,..., In}: positive examples
p(I): distribution learned from positive examples
and training size
combination of the marginal ones.                                                                                                                         Improvement on binary classification
q(I): distribution of negative examples (random
(vs. 600 random negative images) due
natural images)
r (sk), if tj  B
                                                                                                                                                                                                           to the combination of sketch and
rj  match(I  j ,t j )   (tex)                         Bottom figure: matching a hedgehog mixed                                                                                                                                    texture features. In each plot, the area
r , if tj  h
                               template onto examples.                                                                                                                                                     under ROC curve (AUC) is averaged
(See “Implementation”)                                                                                   We learn a statistical model p(I) by a series of model updating: q(I) = p0(I)  p1(I)  ...  pk(I).               over cross-validation runs and plotted
black stroke: sketch                                                                                                   Under (generalized) max-entropy principle, pk(I) that matches the empirical mean of {r1...rk} has form:            against the No. of positive training
red blob: texture                                                                                                                                                                                                                         examples. The dotted lines indicate
95% student-t confidence bounds.
with {r1,...,rk} de-correlated :
data set                          overview of average precision                           image categories across a wide spectrum of visual complexity
template                                                                                                                                                                         and                                                     AP (average precision)                                                                                                                                  Low complexity                                               High complexity
Feature selection: sketch vs. texture competition                                                          MLE (variable selection and parameter estimation) for pk(I) is then simplified to MLE on each marginal               over 100+ object /
distribution {p(rj)}. It takes a simple line search to find the best λj and zj. And variable selection is based
on ranking the feature responses (variables) by information gain:
texture categories
Information gain                                                  We evaluate the learned templates in
one-vs-all classifications. For each
rj being either the sketch response or texture response, its gain is evaluated by number of bits:
category we randomly select 15
1 n    p(rj (Ii ))                                           examples as training positives, and the
gain j  KL( p(rj ) || q(rj ))   log                                 (1)                   rest (at most 50) are used for testing
n i 1 q(rj (Ii ))
(~4200 images used for testing). Images
This information-theoretic criterion enables comparison of apples to oranges.                                     are transformed to grayscale, and are
Each figure plots the information gain of top 40 features ranked in descending order. Black/white                                                                                                                                       resized to have a specified image area
bars: information gains of selected sketch features; Red bars: gains of texture features. For low                                                  Adaptive textural background for sketches                                            while preserving the original aspect ratio.                                                 Box plot of average precisions (the area under
complexity image categories such as head/shoulder, sketch features dominate the information gain.                     To better decouple sketch and texture features, instead of the                                                                                                    60 object categories and 41 texture         precision-recall curve). Each box shows
convenient                      , we use                                                                                                                          categories from Caltech101,CUReT and        max/min, 25% / 75% percentiles and the               Top: 10 object/texture categories ranked by perceived complexity, namely: human
As there are more clutters inside objects, texture features begin to contribute more: see the feature                                                                                                                                   We use a universal threshold 0.1 on
LHI datasets. It is made moderately         median of average precisions on 100+ object          head/shoulder, pistol, laptop, dog head, mouse head, hedgehog, pizza and three texture
competition for hedgehog, pizza and the water patches cropped from a pond image.                                                pj-1(rj) = qlocal texture(rj) = q (rj) exp( - λ rj ) / z(λ)                                             information gain as the stopping criterion
difficult by object categories easy to      and texture categories. The mixed template           categories. Bottom: average precisions (AP) of object categories (ordered as the plot
Selected References                                                                                                                                                                            of feature selection. On average about                                                                                                           on the top), for sketch-only, texture-only and combined templates. Combination of
for some λ (to be estimated an example image).                                                                                                                    confuse, e.g., 18 kinds of animal faces     performs observably better than the individual
[1] Y. N. Wu, Z. Si, C. Fleming and S.C. Zhu, Deformable templates as active basis, ICCV’07                                                                                                                                           200 features are selected per category                                                                                                           sketch and texture features benefits the most for the “mid-complexity” categories.
qlocal texture(rj) is adaptive per example and can be called adaptive q.                                          (i.e. per template).                            and some similar texture categories.        sketch or texture templates.

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 60 posted: 11/14/2010 language: English pages: 1
Description: Template for Small Animal Faces document sample