VIEWS: 5 PAGES: 27 POSTED ON: 6/26/2011
Perceptual Scale Space and its Applications Yizhou Wang Siavosh Bahrami Song-Chun Zhu Department of Computer Science and Statistics, UCLA Presented by: Shane Brennan 3 / 15 / 2006 The Goal ● Wish to find the changes in edge-maps at different scales ● Use this knowledge to generate edge-maps that have no “flicker” between frames ● This is useful in object tracking over multiple scales where edge features will change at different resolutions. A stable edge-map would allow much easier tracking across scales as the structure of objects would remain more constant ● By intuition, a solution that meets these goals will be based on a pyramid structure, finding the differences in images between scales. Therefore the pyramid that is built will be similar to a laplacian pyramid in some ways The Basic Idea of the Primitive Sketch Approach ● Split an image into a structural part, and a textural part, represent the two parts with appropriate models ● From an image's Gaussian pyramid build a corresponding “sketch pyramid” of the structural components ● The sketch pyramid has a corresponding grammar, or “rulebook” which defines how image primitive warp to form the next level of the pyramid ● This is a generative pyramid. Given one level of the sketch pyramid and the rulebook, one can create the next level of the pyramid ● Take the non-sketchable parts (texture), and represent them with a MRF, ie texture synthesis ● This is another generative model, an image in the gaussian pyramid can be created by combining the corresponding sketch image with a “dictionary” of image patches, as well as the non-sketchable image Image Primitives ● Image primitives are composed of a center node, with two or more “anchor points” connected to the central node ● Multiple primitives can be connected into a graph by aligning center points and anchor points ● Some examples of image primitives: ● The use of these image primitives and a sketch pyramid to represent an image is referred to as the “Perceptual Scale Space” Three Issues for the Perceptual Scale Space ● Inferring the sketch pyramid so graphs over scales are optimally matched and have consistent correspondance. Authors adopt a bayesian framework and use MCMC reversible jumps to compute the optimal representation upwards-downwards the pyramid to ensure consistency ● Studying the criterion and mechanisms for the transitions in the context of model selection with maximum posterior probability ● And... The Third Issue ● Studying three categories of perceptual transitions in the sketch graph rulebooks – Graph grammars for the graph topological changes – Sharpening of image primitives without structural changes – Catastrophic changes from texture to structures with explosive births of image primitives A Reference Legend Some Properties of Primal Sketches • Images are broken into two parts, sketchable and non-sketchable • The structural part assumes an occlusion model where • The non-sketchable (texture) area is clustered into about 1 – 5 homogenous stochastic texture areas: • And finally, as mentioned, Ik is a generative model where • So Given Ik, Sk is inferred by maximizing a posterior probability About Sk ● The key component in Sk is the sketch graph Gk = Vk, Ek. Where Vk is the selected image primitives and Ek is the connections between adjacent primitives whose anchor points are aligned ● The graph follows an imhogenous Gibbs model enforcing a few properties such as smoothness, continuity, and canonical junctions ● The authors claim that this sketch representation holds two advantages over pyramid representations with linear additive models (such as wavelets) because of the following reasons: – The number of sketches used to reconstruct an image is much fewer due to hyper-sparsity of the dictionary learned from images – The sketch graph topology captures properties of human perception in contrast to a wavelet representation (this will be seen later in the presentation) ● Consequently, it is more meaningful to use the sketch graph to study the perceptual transitions due to scale change Graph Grammars • Due to intrinsic uncertainty in the posterior probability the sketch pyramid S will be inconsistent if each level is computed independently using the posterior probability equation: Because of this, the graphs at each level may not have good correspondence and this may cause a “flickering” effect when viewing the sketches from coarse to fine Fixing the Flicker • The flicker effect is remedied in the sketch representation by enforcing steady and monotonic graph transitions over the sketch pyramid. This is realized through the use of “graph grammars” • The use of graph grammars turns the sketch graph into a generative model, where a sequence of m(k) production rules, which form the rulebook R k is used to generate the next sketch graph in the image pyramid. Each rule in the rulebook is denoted by the symbol so the rulebook is defined as: and the next level in the sketch pyramid can be generated as such: • Note that the order of the rules does matter as they form a path in the space of sketch graphs from Sk to Sk+1 About Grammar Rules • Each rule is applied to a subraph of Gk, the subgraph is denoted as gk,i which has a neighborhood and is replaced by a new subgraph • Some examples of grammar rules: - Null operation (no change) - birth of a node - death of a node - birth of a junction - death of a junction - extend a node - shrink a node - split a ridge terminator into a pair of step-edges with a set of corners - combine a pair of step-edges and a set of corners into a ridge terminator - split a ridge into a pair of step edges - merge a pair of step edges into a ridge - split a cross into several L-junctions - catastrophic birth of a large number of nodes - catastrophic death of a large number of nodes More About Grammar • Each rule is associated with a probability depending on its attributes • Therefore, we have a probability for the transition from S k to Sk+1 • The probabilities used by the authors for were obtained by maximum likelihood estimate. Graph transitions were hand labeled in 50 images from the Corel database Types of Graph Transitions ● Sharpening of image primitives without structural changes. Only replace image primitives from a blurred dictionary Δk to a dictionary Δk+1. This could be used for image enhancement and super-resolution ● Graph grammars for mild changes in graph topology where each expansion in the pyramid reveals more details. Crucial for formulating a robust super-resolution framework that moves beyond image sharpening and on to hallucinating generic topological structures Graph Transitions, continued ● Catastrophic changes from textures with explosive births of image primitives Edge Changes Over Scales Scanning a row of an image The edge differences over scales The sketch graph, notice the graph does not change much once the ridges are expanded to pairs of step-edges Grammar Summary • Goal is to infer the sketch pyramid together with the optimal path of transitions by maximizing a Bayesian posterior probability Sketch Transition as Model Comparison • Wish to understand which structure should appear at which scale, so should study the criterion and mechanisms for transitions • Suppose Sk is the optimal sketch from Ik, computed from levels 0 to k. At level k+1, Ik+1 has increased resolution due to the addition of the Laplacian band image . Let be the new structures introduced. Therefore we compare the ratio of the posterior probabilities over Sk and (Sk, ) • The first term should be positive for a good choice of because an augmented generative model will fit the image better. The prior term should be negative to penalize complex models. Therefore, is accepted if . Thus a new feature (image primitive) is introduced at level k+1 if and only if: Sketch Pursuit Algorithm • A greedy method of finding the image primitives which best describe an image region, in other words, how to construct a sketch graph – Given the current B, α, and F, β • Compute the log-likelihood increase for a particular primitive b * • Compute the log-likelihood increase for a particular filter F * • If ΔF > ΔB and ΔF > ε then add F* to F and update β • If ΔB > ΔF and ΔB > ε then add b * to B and update α – Stop if ΔB < ε and ΔF < ε, B is the set of image primitives otherwise iterate again α is the set of coefficients for the image primitives F is a filter used to represent the texture in an image region β is the potential of the filter response b* is a proposed new image primitive F* is a proposed new filter to better represent the texture ε is an ending condition threshold Upwards-Downwards Inference • Sketch graphs created with the original sketch-pursuit algorithm are not consistent across scales, so there is flickering. We wish to remove this flickering by forcing consistency over scales using MCMC reversible jumps to track and edit the sketch graphs upwards and downwards iteratively across scales • Each pair of reversible jumps (birth/death of node, birth/death of junction, etc) is selected probabilistically. These steps simulate a Markov chain with invariant property: A Comparison ● (a) is the original image, (b) is the initial sketches computed independantly at each level, (c) is the improved sketches across scales using reversible jumps, and (d) is the reconstructed image from the sketch graph and the non-sketchable textures Applications – Multi-scale Object Tracking ● Most tracking algorithms assume certain object structures exist in an object, but these structures may only exist in a narrow range of scales. When the object motion occurs in a wide range of scales significan structural changes occur in the graph representation ● With the sketch space representation, the structural changes between scales can be accounted for and corrected: This tracking result obtained by manually labeling the car sketches in the first frame, then tracking is performed by estimating the scale change of the foreground. Tracking assumes background is still, ie camera is at a fixed position Applications – Adaptive Image Display ● Have a small screen (128 x 128 pixels) but wish to show a high resolution image (2048 x 2048 pixels) ● Normal interfaces show a low resolution version, letting the user zoom in on various regions to see more detail. This is tedious and inconvenient for very large images ● Instead, present the user with a “tour” of the image that summarizes its informational content in as few frames as possible, where each frame is at a different location and resolution ● Accomplish this by associating each subregion of the image with a scale such that any further zooming would not expand the sketch graph, in other words no perceptual gain could be had by further zooming ● Adopt a quad-tree representation with the root node being the top-level of the sketch pyramid. Quad-tree node is split when perceptual information can be gained at a finer scale Information Gain • A key to these quad-tree decompositions is the “information gain” obtained when splitting a node. In the perceptual pyramid, a node v at level k corresponds to a subgraph Sk(v) of the sketch, and its children at the next level correspond to S k+1(v+) • The information gain for this split can be measured by: • We can then expand a node in a sequential order until either an information gain threshold or a maximum “depth” is reached Adaptive Image Display, continued Comparison to Laplacian Quad-Trees Scale Space Quad Trees Laplacian Quad Trees – Note that the trees aren’t always focused on regions of interest to the human eye, as opposed to the scale space quad trees References ● P.J. Green, “Reversible Jump Markov Chain Monte Carlo Computation and bayesian Model Determination”, Biometrika, vol.82, 711-732, 1995. ● C.E. Guo, S.C. Zhu, and Y.N. Wu “A Mathematical Theory of Primal Sketch and Sketchability,” ICCV, 2003. Thank You For Listening