Introduction to Post-Graduate Studies in Computer Science T-0.7050 Theoretical research approach Jorma Laaksonen
February 25, 2008
1.
Overview
• What is theoretical research method? • Knowledge advances in theoretical research • Where do theories come from, where are they needed? • Objects of theoretical research • Properties of good theories • Building blocks of theories • Validation and evaluation • Scope of a theory • Fields of theoretical research in computer science • Case study • Discussion 2
1
2.
What is theoretical research method?
• Something opposed to practical? – theories generally have practical applications – in engineering, theory’s value is in its applications • Something opposed to empirical? – theories should be verifiable or falsifiable empirically
3
2.1
Possible research models (Adrion, 1993)
– the scientific method – the engineering method – the empirical method – the analytical method
• Four different research models can be identified:
• The analytical method is the most “purely theoretical” • Theoretical research has an important role in the scientific method too
4
2
2.2
Possible research models – 2
– observe the world – propose a model or theory of behavior – measure and analyze – validate hypotheses of the model or theory – if possible, repeat
• The scientific method
• The engineering method – observe existing solutions – propose better solutions – build or develop, measure and analyze – repeat until no further improvements are possible
5
2.3
Possible research models – 3
– propose a model – develop statistical or other methods – apply to case studies – measure and analyze – validate the model – repeat
• The empirical method
• The analytical method – propose a formal theory or set of axioms – develop a theory – derive results, and if possible, compare with empirical observations 6
3
2.4
Possible research models (Totland, 1997)
• Another set of four different research models: Logical, theoretical Assumptions (theory and logic) ↓ New theory (logically sound conclusions) Quantitative, experimental Hypothesis ↓ Experiments ↓ Theory (falsified or verified) Qualitative, observational Empirical study ↓ Description or theory Participatory action Empirical study ↓ Theory ↓ Empirical study
7
3.
Knowledge advances in theoretical research
• Feasibility: how a previously unsolved problem can be solved • Novelty: how a previously solved problem can be solved with a new and promising technique • Improvement: how a previously solved problem can be solved in a better way than before
8
4
4.
Where theories come from
• Theories are usually first intuitive • Sir Isaac Newton and his famous apple • Discovery of patterns gives rise to theories – causal patterns – co-occurrence patterns – if A then (very probably) B – patterns can be empirical or logical
9
5.
What theories are needed for
• A theory formalizes all prior observations in a compact form • Theories aim at predicting future behavior • Theories tell what happens to the output when the input is changed • Every theory has its own scope or limits of applicability
10
5
6.
Objects of theoretical research
• Theories and hypotheses or theorems – theory of non-linear separability of pattern classes • Models – feed-forward perceptron neural network with output nonlinearity • Computational methods and algorithms – error back-propagation algorithm • Proofs – convergence limits of regularized error back-propagation learning
11
7.
Properties of good theories (Schick&Vaughn, 2002)
• Testability or falsifiability • Simplicity, “Occam’s Razor” • Wide scope of applicability • Fruitfulness in predictions • Conservatism, “fit” with existing knowledge
12
6
8.
Building blocks of theories
• Axioms and assumptions • Hypotheses • Derivation or proof • Verification, validation, evaluation
axioms hypothesis natural data simplifying assumptions validation, evaluation
logics
mathematics
verification
13
8.1
Theory
• Theory is a mathematical or logical explanation or a testable model • One theory can bound together multiple hypotheses • Systematic and formalized expression of all previous observations • Scientific theories have to be predictive, logical and testable
14
7
8.2
Axioms
• Axiom or postulate is a proposition that is not proved or demonstrated • Considered to be self-evident and serve as a starting point • Mathematical axioms • Logical axioms
15
8.3
Assumptions
• Assumptions normally simplify the setting with natural data • One doesn’t try to solve a whole problem, but a substantial part of it • Assumptions make the problem mathematically approachable • If one is lucky, the world is close enough to the assumption • Assumptions should be made explicitly • Still, many assumptions are implicit or silent
16
8
8.4
Typical mathematical simplifications
• Euclidean distance spaces • Gaussian distributions • linear input–output dependencies • neglecting higher-order terms in Taylor series expansion • continuous functions and their derivatives • memoryless symbol sources • independence of observations • independent and identically-distributed (i.i.d.) random variables • additive noise • equal or ad hoc prior probabilities
17
8.5
Hypotheses
• Suggested explanation of the studied phenomenon • Suggested explanation for correlation between multiple phenomena • Based on previous observations or extensions of scientific theories • Almost always originally intuitive, modified by rigorous justification • Produce predictions that can be verified • Failed predictions indicate incorrect hypotheses • Any hypothesis must be falsifiable!
18
9
8.6
Derivation or proof
• Building a hypothesis and verifying it is an iterative process • If the wanted outcome cannot be derived, one has to: – think harder – search for additional existing scientific theories – make more simplifying assumptions • Correctness of the hypothesis doesn’t ensure the value of the theory
19
9.
Validation
• Theory can be correct, but false assumptions lead to failure • Validation tests correctness of the simplifying assumptions • Controlled experiments • Publicly available data, benchmarks • Try to separate processing stages: – validate each separately – some stages can be taken from literature, no need for validation • Try to analyze sources of suboptimality: – too unrealistic simplifying assumptions – weaknesses of the proposed computational method – problems of the data acquisition – difficult nature of the problem 20
10
9.1
Practical validation hints
• Separate development/training and testing data sets • Avoid over-fitting of model parameters to training data – regularization – early stopping – averaging • Cross-validation: splitting of training data, iterating over splits • Try to avoid proof-of-concept with toy data • Try to avoid “simulated data” that fits in the simplified problem
21
9.2
Evaluation
• Evaluation takes one step further from validation • Evaluation should take a stand in the feasibility, novelty and improvement viewpoints of knowledge advance • Performance as an accuracy–speed trade-off
22
11
10.
Scope of a theory
• Every theory is valid only as long as the underlying assumptions hold • Wider scope produces wider applicability and larger fame • Aim at the widest possible scope! • Try to separate the theory and its practical applications • Try to think where else the same theory could be applied • Tell that the current application is just one among many • Try to formulate as wide theories as possible • Remember: Newton didn’t write a theorem on apples
23
11.
Fields of theoretical research in computer science
– computational problems: decision problems, function problems – computer languages, compilers – distributed, parallel computation – verification – cryptography
• Theoretical computer science
• Computer graphics – 3D rendering and registration problems – shading problems
24
12
• Machine learning – pattern recognition – clustering – computer vision – data analysis, data mining – natural language processing – neural networks • Signal and image processing • Computational biology, neuroscience, physics, . . . • etc.
25
12.
Case study
• Subspace classifiers in recognition of handwritten digits, 1997 • Aimed at developing new statistical pattern classification theory • Based on existing theories on: – – – – – linear projection subspace classifiers (Watanabe, 1967) Learning Subspace Classifier (Kohonen, 1978) Averaged Learning Subspace Classifier (Oja, 1983) k-Nearest Neighbor Classifiers (Fix&Hodges, 1951) Learning Vector Quantization (Kohonen, 1988)
Registration Preprocessing
b)
Data Collection
Segmentation
Normalization
c)
Feature Extraction
a)
Classification
Postprocessing
26
13
12.1
Case study – 2
– linear projections – Euclidean distance spaces – Gaussian distributions
• Simplifying assumptions were made:
6 6 6 66 666 0 6 6 66 6 6 6 66 0 66 0 6 6 06 6 6 0 0 0 6 0 0 0 00 00 0 00 0 0 0 0 0 0 0 0 1
1 0 00
11 1 11 1 111 1 1 11 111 11 1 11 7 77 7 7 7 77 7 7 77 7 7 77 7 77 7 7 7 7
6 6 6 66 666 0 6 6 66 6 6 6 66 0 66 0 6 6 06 6 6 0 0 0 6 0 0 0 0 1
1
00 0 00 0 0 0 0 0 0
11 1 11 1 111 1 1 11 111 11 1 11 7 77 7 7 7 77 7 7 77 7 7 77 7 77 7 7 7 7
27
12.2
Case study – 3
– More Gaussian distributions – independence of random variables (not so true)
• More simplifying assumptions were made:
28
14
12.3
Case study – 4
– Gamma distribution of projection residuals
• Still more simplifying assumptions were made:
29
12.4
Case study – 5
– least squares linear projection as feature extraction technique – local linear interpolation of shapes
• Final simplifying assumptions:
30
15