Flexible Templates
- considerations of a hypothetical critic Joachim M. Buhmann Institute for Computational Science
IMA Visual Learning & Recognition Workshop 21-26 May 2006
Goal of Vision / Scene Understanding
gravel / grass road bushes
cheetah
sand
form equivalence classes, e.g. cheetah vs. road patches learn discriminative information in images conditioned on task, segmentation/categorization
23 May 2006
Joachim M. Buhmann / Institute for Computational Science
2
How Complicated is Object Recognition?
23 May 2006
Joachim M. Buhmann / Institute for Computational Science
3
Recognition by Key Features and Spatial Constellation Models Reasoning
1973
1988
2003
23 May 2006
Joachim M. Buhmann / Institute for Computational Science
4
What are good flexible/adaptive templates?
Object variations or deformations can be captured, e.g., facial expression, object invariant articulation, perspective distortions … Better performance than rigid template matching!
dense or sparse ?
23 May 2006
Joachim M. Buhmann / Institute for Computational Science
5
Disadvantages of flexible templates?
Deformations / local metrics are not conditioned on a task!
Warping differs from categorization or superresolution. Tradeoff between distinctiveness and redundancy
We need a new notion of context / task sensitive information. Learn what helps you to solve your problem! Shannon’s concept of information is devised for compression and communication. What if we face another task like scene interpretation?
23 May 2006
Joachim M. Buhmann / Institute for Computational Science
6
Learning problems with flexible templates
Adaptive templates can be interpreted as nonparametric models of deformation!
How should we regularize adaptive templates with their large number of degrees of freedom? How should we address the model selection problem?
http://research.microsoft.com/~hoppe/bspline.pdf
23 May 2006
Joachim M. Buhmann / Institute for Computational Science
7
Space-Time Tradeoff in Representations
Representing entities by a set of features amounts to a “space”-like representation (invest neurons) Flexible templates represent entities by relations between features (graphs); matching has to be calculated by a dynamics (“time”-like data format). Selection of the “best” representation should be controlled by robustness arguments.
23 May 2006
Joachim M. Buhmann / Institute for Computational Science
8
Objectives of Composition Systems (S. Geman)
Small vocabulary of generic parts (feature sharing) Category specific compositions of parts based on relations Learning using only images with a category label Coupling of compositions in a global shape model localization
parts composition
23 May 2006
Joachim M. Buhmann / Institute for Computational Science
9
Compositionality
Simple, widely reusable parts & relations between them Compositions
23 May 2006
Joachim M. Buhmann / Institute for Computational Science
10
Object Recognition with graphical Models (Björn Ommer, JB ECCV’06)
salient image regions localized histograms code book vectors
object position
image category
compositions
u1 u2
23 May 2006
Joachim M. Buhmann / Institute for Computational Science
11
Information Flow for Image Interpretation
features compositions form model
exhaust
wheel
image category
relations
categories of compositions
bottom-up: data driven
23 May 2006
top-down: model driven
Joachim M. Buhmann / Institute for Computational Science 12
Strategies of Learning
Parametric approach (statistical learning)
few data -> make your model class simple (PAC learnable, low VC-dim) return a single learned solution (ERM)
Model averaging (Bayesian learning, MaxEnt)
few data -> keep complex hypothesis class and identify a set of solutions which are compatible with data return a “fingerprint” of the solution set, e.g. average solution (delayed decision making) empirical risk approximation
23 May 2006
Joachim M. Buhmann / Institute for Computational Science
13
Aggregated (Averaged) Segmentations
algorithm test persons
23 May 2006
Joachim M. Buhmann / Institute for Computational Science
14
Flexible Templates: A Summary
Adaptivity of flexible templates is required to yield good generalization in vision tasks! Careful tuning of FTs is needed to avoid overfitting. Task dependent theory of information is required.
Discriminative signals for one task (expression analysis) are distracter signals for another task (identification).
Ceterum censeo: We have to investigate the relation of statistical and computational complexity?
23 May 2006
Joachim M. Buhmann / Institute for Computational Science
20