Your Federal Quarterly Tax Payments are due April 15th

# scalespace by mudoc123

VIEWS: 5 PAGES: 27

• pg 1
```									      Perceptual Scale Space
and its Applications
Yizhou Wang     Siavosh Bahrami        Song-Chun Zhu
Department of Computer Science and Statistics, UCLA

Presented by:
Shane Brennan

3 / 15 / 2006
The Goal
●   Wish to find the changes in edge-maps at different scales

●   Use this knowledge to generate edge-maps that have no
“flicker” between frames

●   This is useful in object tracking over multiple scales where edge
features will change at different resolutions. A stable edge-map
would allow much easier tracking across scales as the structure
of objects would remain more constant

●   By intuition, a solution that meets these goals will be based on
a pyramid structure, finding the differences in images between
scales. Therefore the pyramid that is built will be similar to a
laplacian pyramid in some ways
The Basic Idea of the
Primitive Sketch Approach
●   Split an image into a structural part, and a textural part, represent the two
parts with appropriate models

●   From an image's Gaussian pyramid build a corresponding “sketch pyramid”
of the structural components

●   The sketch pyramid has a corresponding grammar, or “rulebook” which
defines how image primitive warp to form the next level of the pyramid

●   This is a generative pyramid. Given one level of the sketch pyramid and the
rulebook, one can create the next level of the pyramid

●   Take the non-sketchable parts (texture), and represent them with a MRF, ie
texture synthesis

●   This is another generative model, an image in the gaussian pyramid can be
created by combining the corresponding sketch image with a “dictionary” of
image patches, as well as the non-sketchable image
Image Primitives
●   Image primitives are composed of a center node, with two or
more “anchor points” connected to the central node
●   Multiple primitives can be connected into a graph by aligning
center points and anchor points
●   Some examples of image primitives:

●   The use of these image primitives and a sketch pyramid to
represent an image is referred to as the “Perceptual Scale
Space”
Three Issues for the
Perceptual Scale Space
●   Inferring the sketch pyramid so graphs over scales are optimally
matched and have consistent correspondance. Authors adopt a
bayesian framework and use MCMC reversible jumps to
compute the optimal representation upwards-downwards the
pyramid to ensure consistency

●   Studying the criterion and mechanisms for the transitions in the
context of model selection with maximum posterior probability

●   And...
The Third Issue
●   Studying three categories
of perceptual transitions in
the sketch graph
rulebooks
–   Graph grammars for the
graph topological changes

–   Sharpening of image
primitives without structural
changes

–   Catastrophic changes from
texture to structures with
explosive births of image
primitives
A Reference Legend
Some Properties of Primal Sketches
•   Images are broken into two parts, sketchable and non-sketchable

•   The structural part assumes an occlusion model where

•   The non-sketchable (texture) area is clustered into about 1 – 5 homogenous stochastic texture
areas:

•   And finally, as mentioned, Ik is a generative model where

•   So Given Ik, Sk is inferred by maximizing
a posterior probability
●   The key component in Sk is the sketch graph Gk = Vk, Ek. Where Vk is the
selected image primitives and Ek is the connections between adjacent
primitives whose anchor points are aligned

●   The graph follows an imhogenous Gibbs model enforcing a few properties
such as smoothness, continuity, and canonical junctions

●   The authors claim that this sketch representation holds two advantages over
pyramid representations with linear additive models (such as wavelets)
because of the following reasons:
–   The number of sketches used to reconstruct an image is much fewer due to
hyper-sparsity of the dictionary learned from images
–   The sketch graph topology captures properties of human perception in contrast
to a wavelet representation (this will be seen later in the presentation)

●   Consequently, it is more meaningful to use the sketch graph to study the
perceptual transitions due to scale change
Graph Grammars
• Due to intrinsic uncertainty in the posterior probability the sketch pyramid S will be
inconsistent if each level is computed independently using the posterior probability
equation:

Because of this, the graphs at each level may not have good correspondence and this
may cause a “flickering” effect when viewing the sketches from coarse to fine
Fixing the Flicker
• The flicker effect is remedied in the sketch representation by enforcing steady and
monotonic graph transitions over the sketch pyramid. This is realized through the use
of “graph grammars”

• The use of graph grammars turns the sketch graph into a generative model, where a
sequence of m(k) production rules, which form the rulebook R k is used to generate the
next sketch graph in the image pyramid. Each rule in the rulebook is denoted by the
symbol so the rulebook is defined as:

and the next level in the sketch pyramid can be generated as such:

• Note that the order of the rules does matter as they form a path in the space of sketch
graphs from Sk to Sk+1
•   Each rule is applied to a subraph of Gk, the subgraph is denoted as gk,i which has a
neighborhood         and is replaced by a new subgraph
•   Some examples of grammar rules:
- Null operation (no change)

- birth of a node

- death of a node

- birth of a junction

- death of a junction

- extend a node

- shrink a node

- split a ridge terminator into a pair of step-edges with a set of corners

- combine a pair of step-edges and a set of corners into a ridge terminator

- split a ridge into a pair of step edges

- merge a pair of step edges into a ridge

- split a cross into several L-junctions

- catastrophic birth of a large number of nodes

- catastrophic death of a large number of nodes
•   Each rule is associated with a probability depending on its attributes

• Therefore, we have a probability for the transition from S k to Sk+1

• The probabilities used by the authors for      were obtained by maximum likelihood
estimate. Graph transitions were hand labeled in 50 images from the Corel database
Types of Graph Transitions
●   Sharpening of image primitives without structural changes. Only replace image primitives from a
blurred dictionary Δk to a dictionary Δk+1. This could be used for image enhancement and
super-resolution

●   Graph grammars for mild changes in graph topology where each expansion in the pyramid
reveals more details. Crucial for formulating a robust super-resolution framework that moves
beyond image sharpening and on to hallucinating generic topological structures
Graph Transitions, continued
●   Catastrophic changes from textures with explosive births of image primitives
Edge Changes Over Scales

Scanning a row
of an image

The edge differences over scales   The sketch graph, notice the graph does not
change much once the ridges are expanded
to pairs of step-edges
Grammar Summary
• Goal is to infer the sketch pyramid together with the optimal path of transitions by
maximizing a Bayesian posterior probability
Sketch Transition as
Model Comparison
• Wish to understand which structure should appear at which scale, so should study
the criterion and mechanisms for transitions

• Suppose Sk is the optimal sketch from Ik, computed from levels 0 to k. At level k+1,
Ik+1 has increased resolution due to the addition of the Laplacian band image . Let
be the new structures introduced. Therefore we compare the ratio of the posterior
probabilities over Sk and (Sk, )

• The first term should be positive for a good choice of     because an augmented
generative model will fit the image better. The prior term          should be
negative to penalize complex models. Therefore, is accepted if                  .
Thus a new feature (image primitive) is introduced at level k+1 if and only if:
Sketch Pursuit Algorithm
• A greedy method of finding the image primitives which best describe an image
region, in other words, how to construct a sketch graph
– Given the current B, α, and F, β
• Compute the log-likelihood increase for a particular primitive b *

• Compute the log-likelihood increase for a particular filter F *

• If ΔF > ΔB and ΔF > ε then add F* to F and update β

• If ΔB > ΔF and ΔB > ε then add b * to B and update α

– Stop if ΔB < ε and ΔF < ε,                       B is the set of image primitives
otherwise iterate again                          α is the set of coefficients for the image primitives
F is a filter used to represent the texture in an image region
β is the potential of the filter response
b* is a proposed new image primitive
F* is a proposed new filter to better represent the texture
ε is an ending condition threshold
Upwards-Downwards Inference
• Sketch graphs created with the original sketch-pursuit algorithm are not consistent
across scales, so there is flickering. We wish to remove this flickering by forcing
consistency over scales using MCMC reversible jumps to track and edit the sketch
graphs upwards and downwards iteratively across scales

• Each pair of reversible jumps (birth/death of node, birth/death of junction, etc) is
selected probabilistically. These steps simulate a Markov chain with invariant
property:
A Comparison
●   (a) is the original image, (b) is the initial sketches computed independantly at
each level, (c) is the improved sketches across scales using reversible
jumps, and (d) is the reconstructed image from the sketch graph and the
non-sketchable textures
Applications – Multi-scale
Object Tracking
●   Most tracking algorithms assume certain object structures exist in an object,
but these structures may only exist in a narrow range of scales. When the
object motion occurs in a wide range of scales significan structural changes
occur in the graph representation

●   With the sketch space representation, the structural changes between scales
can be accounted for and corrected:

This tracking result obtained by manually labeling the car sketches in the first frame, then
tracking is performed by estimating the scale change of the foreground. Tracking assumes
background is still, ie camera is at a fixed position
Image Display
●   Have a small screen (128 x 128 pixels) but wish to show a high resolution
image (2048 x 2048 pixels)

●   Normal interfaces show a low resolution version, letting the user zoom in on
various regions to see more detail. This is tedious and inconvenient for very
large images

●   Instead, present the user with a “tour” of the image that summarizes its
informational content in as few frames as possible, where each frame is at a
different location and resolution

●   Accomplish this by associating each subregion of the image with a scale
such that any further zooming would not expand the sketch graph, in other
words no perceptual gain could be had by further zooming

●   Adopt a quad-tree representation with the root node being the top-level of
the sketch pyramid. Quad-tree node is split when perceptual information can
be gained at a finer scale
Information Gain
• A key to these quad-tree decompositions is the “information gain” obtained when
splitting a node. In the perceptual pyramid, a node v at level k corresponds to a
subgraph Sk(v) of the sketch, and its children at the next level correspond to S k+1(v+)

• The information gain for this split can be measured by:

• We can then expand a node in a sequential order until either an information gain
threshold or a maximum “depth” is reached
Comparison to

Laplacian Quad Trees – Note that the trees aren’t always focused on regions
of interest to the human eye, as opposed to the scale space quad trees
References
●   P.J. Green, “Reversible Jump Markov Chain Monte Carlo Computation and
bayesian Model Determination”, Biometrika, vol.82, 711-732, 1995.

●   C.E. Guo, S.C. Zhu, and Y.N. Wu “A Mathematical Theory of Primal Sketch
and Sketchability,” ICCV, 2003.

Thank You For Listening

```
To top