Document Sample
scalespace Powered By Docstoc
					      Perceptual Scale Space
        and its Applications
Yizhou Wang     Siavosh Bahrami        Song-Chun Zhu
  Department of Computer Science and Statistics, UCLA

                     Presented by:
                   Shane Brennan

                    3 / 15 / 2006
                          The Goal
●   Wish to find the changes in edge-maps at different scales

●   Use this knowledge to generate edge-maps that have no
    “flicker” between frames

●   This is useful in object tracking over multiple scales where edge
    features will change at different resolutions. A stable edge-map
    would allow much easier tracking across scales as the structure
    of objects would remain more constant

●   By intuition, a solution that meets these goals will be based on
    a pyramid structure, finding the differences in images between
    scales. Therefore the pyramid that is built will be similar to a
    laplacian pyramid in some ways
              The Basic Idea of the
           Primitive Sketch Approach
●   Split an image into a structural part, and a textural part, represent the two
    parts with appropriate models

●   From an image's Gaussian pyramid build a corresponding “sketch pyramid”
    of the structural components

●   The sketch pyramid has a corresponding grammar, or “rulebook” which
    defines how image primitive warp to form the next level of the pyramid

●   This is a generative pyramid. Given one level of the sketch pyramid and the
    rulebook, one can create the next level of the pyramid

●   Take the non-sketchable parts (texture), and represent them with a MRF, ie
    texture synthesis

●   This is another generative model, an image in the gaussian pyramid can be
    created by combining the corresponding sketch image with a “dictionary” of
    image patches, as well as the non-sketchable image
                   Image Primitives
●   Image primitives are composed of a center node, with two or
    more “anchor points” connected to the central node
●   Multiple primitives can be connected into a graph by aligning
    center points and anchor points
●   Some examples of image primitives:

●   The use of these image primitives and a sketch pyramid to
    represent an image is referred to as the “Perceptual Scale
              Three Issues for the
             Perceptual Scale Space
●   Inferring the sketch pyramid so graphs over scales are optimally
    matched and have consistent correspondance. Authors adopt a
    bayesian framework and use MCMC reversible jumps to
    compute the optimal representation upwards-downwards the
    pyramid to ensure consistency

●   Studying the criterion and mechanisms for the transitions in the
    context of model selection with maximum posterior probability

●   And...
                         The Third Issue
●   Studying three categories
    of perceptual transitions in
    the sketch graph
     –   Graph grammars for the
         graph topological changes

     –   Sharpening of image
         primitives without structural

     –   Catastrophic changes from
         texture to structures with
         explosive births of image
A Reference Legend
Some Properties of Primal Sketches
•   Images are broken into two parts, sketchable and non-sketchable

•   The structural part assumes an occlusion model where

•   The non-sketchable (texture) area is clustered into about 1 – 5 homogenous stochastic texture

•   And finally, as mentioned, Ik is a generative model where

•   So Given Ik, Sk is inferred by maximizing
    a posterior probability
                                 About Sk
●   The key component in Sk is the sketch graph Gk = Vk, Ek. Where Vk is the
    selected image primitives and Ek is the connections between adjacent
    primitives whose anchor points are aligned

●   The graph follows an imhogenous Gibbs model enforcing a few properties
    such as smoothness, continuity, and canonical junctions

●   The authors claim that this sketch representation holds two advantages over
    pyramid representations with linear additive models (such as wavelets)
    because of the following reasons:
     –   The number of sketches used to reconstruct an image is much fewer due to
         hyper-sparsity of the dictionary learned from images
     –   The sketch graph topology captures properties of human perception in contrast
         to a wavelet representation (this will be seen later in the presentation)

●   Consequently, it is more meaningful to use the sketch graph to study the
    perceptual transitions due to scale change
                       Graph Grammars
• Due to intrinsic uncertainty in the posterior probability the sketch pyramid S will be
  inconsistent if each level is computed independently using the posterior probability

   Because of this, the graphs at each level may not have good correspondence and this
   may cause a “flickering” effect when viewing the sketches from coarse to fine
                       Fixing the Flicker
• The flicker effect is remedied in the sketch representation by enforcing steady and
  monotonic graph transitions over the sketch pyramid. This is realized through the use
  of “graph grammars”

• The use of graph grammars turns the sketch graph into a generative model, where a
  sequence of m(k) production rules, which form the rulebook R k is used to generate the
  next sketch graph in the image pyramid. Each rule in the rulebook is denoted by the
  symbol so the rulebook is defined as:

   and the next level in the sketch pyramid can be generated as such:

• Note that the order of the rules does matter as they form a path in the space of sketch
  graphs from Sk to Sk+1
                           About Grammar Rules
•   Each rule is applied to a subraph of Gk, the subgraph is denoted as gk,i which has a
    neighborhood         and is replaced by a new subgraph
•   Some examples of grammar rules:
       - Null operation (no change)

       - birth of a node

       - death of a node

       - birth of a junction

       - death of a junction

       - extend a node

       - shrink a node

       - split a ridge terminator into a pair of step-edges with a set of corners

       - combine a pair of step-edges and a set of corners into a ridge terminator

       - split a ridge into a pair of step edges

       - merge a pair of step edges into a ridge

       - split a cross into several L-junctions

       - catastrophic birth of a large number of nodes

       - catastrophic death of a large number of nodes
                  More About Grammar
•   Each rule is associated with a probability depending on its attributes

• Therefore, we have a probability for the transition from S k to Sk+1

• The probabilities used by the authors for      were obtained by maximum likelihood
  estimate. Graph transitions were hand labeled in 50 images from the Corel database
            Types of Graph Transitions
●   Sharpening of image primitives without structural changes. Only replace image primitives from a
    blurred dictionary Δk to a dictionary Δk+1. This could be used for image enhancement and

●   Graph grammars for mild changes in graph topology where each expansion in the pyramid
    reveals more details. Crucial for formulating a robust super-resolution framework that moves
    beyond image sharpening and on to hallucinating generic topological structures
        Graph Transitions, continued
●   Catastrophic changes from textures with explosive births of image primitives
             Edge Changes Over Scales

Scanning a row
  of an image

                  The edge differences over scales   The sketch graph, notice the graph does not
                                                     change much once the ridges are expanded
                                                                to pairs of step-edges
                    Grammar Summary
• Goal is to infer the sketch pyramid together with the optimal path of transitions by
  maximizing a Bayesian posterior probability
                   Sketch Transition as
                   Model Comparison
• Wish to understand which structure should appear at which scale, so should study
  the criterion and mechanisms for transitions

• Suppose Sk is the optimal sketch from Ik, computed from levels 0 to k. At level k+1,
  Ik+1 has increased resolution due to the addition of the Laplacian band image . Let
  be the new structures introduced. Therefore we compare the ratio of the posterior
  probabilities over Sk and (Sk, )

• The first term should be positive for a good choice of     because an augmented
  generative model will fit the image better. The prior term          should be
  negative to penalize complex models. Therefore, is accepted if                  .
  Thus a new feature (image primitive) is introduced at level k+1 if and only if:
              Sketch Pursuit Algorithm
• A greedy method of finding the image primitives which best describe an image
  region, in other words, how to construct a sketch graph
    – Given the current B, α, and F, β
        • Compute the log-likelihood increase for a particular primitive b *

        • Compute the log-likelihood increase for a particular filter F *

        • If ΔF > ΔB and ΔF > ε then add F* to F and update β

        • If ΔB > ΔF and ΔB > ε then add b * to B and update α

    – Stop if ΔB < ε and ΔF < ε,                       B is the set of image primitives
      otherwise iterate again                          α is the set of coefficients for the image primitives
                                                       F is a filter used to represent the texture in an image region
                                                       β is the potential of the filter response
                                                       b* is a proposed new image primitive
                                                       F* is a proposed new filter to better represent the texture
                                                       ε is an ending condition threshold
      Upwards-Downwards Inference
• Sketch graphs created with the original sketch-pursuit algorithm are not consistent
  across scales, so there is flickering. We wish to remove this flickering by forcing
  consistency over scales using MCMC reversible jumps to track and edit the sketch
  graphs upwards and downwards iteratively across scales

• Each pair of reversible jumps (birth/death of node, birth/death of junction, etc) is
  selected probabilistically. These steps simulate a Markov chain with invariant
                         A Comparison
●   (a) is the original image, (b) is the initial sketches computed independantly at
    each level, (c) is the improved sketches across scales using reversible
    jumps, and (d) is the reconstructed image from the sketch graph and the
    non-sketchable textures
             Applications – Multi-scale
                  Object Tracking
●   Most tracking algorithms assume certain object structures exist in an object,
    but these structures may only exist in a narrow range of scales. When the
    object motion occurs in a wide range of scales significan structural changes
    occur in the graph representation

●   With the sketch space representation, the structural changes between scales
    can be accounted for and corrected:

This tracking result obtained by manually labeling the car sketches in the first frame, then
tracking is performed by estimating the scale change of the foreground. Tracking assumes
background is still, ie camera is at a fixed position
              Applications – Adaptive
                  Image Display
●   Have a small screen (128 x 128 pixels) but wish to show a high resolution
    image (2048 x 2048 pixels)

●   Normal interfaces show a low resolution version, letting the user zoom in on
    various regions to see more detail. This is tedious and inconvenient for very
    large images

●   Instead, present the user with a “tour” of the image that summarizes its
    informational content in as few frames as possible, where each frame is at a
    different location and resolution

●   Accomplish this by associating each subregion of the image with a scale
    such that any further zooming would not expand the sketch graph, in other
    words no perceptual gain could be had by further zooming

●   Adopt a quad-tree representation with the root node being the top-level of
    the sketch pyramid. Quad-tree node is split when perceptual information can
    be gained at a finer scale
                        Information Gain
• A key to these quad-tree decompositions is the “information gain” obtained when
  splitting a node. In the perceptual pyramid, a node v at level k corresponds to a
  subgraph Sk(v) of the sketch, and its children at the next level correspond to S k+1(v+)

• The information gain for this split can be measured by:

• We can then expand a node in a sequential order until either an information gain
  threshold or a maximum “depth” is reached
Adaptive Image Display, continued
              Comparison to
           Laplacian Quad-Trees

                         Scale Space Quad Trees

Laplacian Quad Trees – Note that the trees aren’t always focused on regions
  of interest to the human eye, as opposed to the scale space quad trees
●   P.J. Green, “Reversible Jump Markov Chain Monte Carlo Computation and
    bayesian Model Determination”, Biometrika, vol.82, 711-732, 1995.

●   C.E. Guo, S.C. Zhu, and Y.N. Wu “A Mathematical Theory of Primal Sketch
    and Sketchability,” ICCV, 2003.

              Thank You For Listening

Shared By: