# Intro

Document Sample

```					              Classiﬁcation
Model Training
Classiﬁer Structure
Course

Pattern Classiﬁcation 1: Introduction
www.ee.kth.se/sip/courses/F2E5414/

Arne Leijon

KTH Sound and Image Processing

April 2, 2007

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Overview
Classiﬁer Structure   Easy
Course

A Pattern-Classiﬁcation System

Transducer
Feature
Classifier
Extractor
Output
Input
decision
signal

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Overview
Classiﬁer Structure   Easy
Course

Human Pattern Classiﬁcation – the Demon Model

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training         Overview
Classiﬁer Structure       Easy
Course

Source Categories

Signal Source
Noise

Sub-source #1

Sub-source #2

Noise                                      Noise                            Noise

Transducer

Sub-source # N s            S
Random State
Noise                         Switch

Arne Leijon        www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Overview
Classiﬁer Structure   Easy
Course

Optimal Classiﬁcation Mechanism

Classifier

Discriminant        g1(x )
Feature Extraction                  Function

#1
Select
x                                          index
of
largest

#K
Discriminant        gN d (x )
Function

Arne Leijon   www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Overview
Classiﬁer Structure   Easy
Course

Source Categories: C1 , . . . , CNs . Known P(Ci ), all i
Observation Space: x. Known density p(x|Ci ), any x, any i
Objective: Given an observed x, guess Source Category with
minimum error probability.

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training         Overview
Classiﬁer Structure       Easy
Course

Observation Space: Distributions and Decision Regions
3 Source Categories; 2-dim Feature Vector; 3 Possible Decisions

5

4   d= 2                                                       d= 1

3

2

1
x2

0

-1

-2

-3

-4   d= 3

-5
-4    -2            0             2               4             6
x1
Arne Leijon        www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Overview
Classiﬁer Structure   Easy
Course

Conditional Probability and Bayes Rule

Joint Probability (“Product Rule”):

P(A ∩ B) = P(B|A)P(A) = P(A|B)P(B)

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Overview
Classiﬁer Structure   Easy
Course

Conditional Probability and Bayes Rule

Joint Probability (“Product Rule”):

P(A ∩ B) = P(B|A)P(A) = P(A|B)P(B)

Conditional probability:

P(B|A)P(A)
P(A|B) =
P(B)

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Overview
Classiﬁer Structure   Easy
Course

Conditional Probability and Bayes Rule

Joint Probability (“Product Rule”):

P(A ∩ B) = P(B|A)P(A) = P(A|B)P(B)

Conditional probability:

P(B|A)P(A)
P(A|B) =
P(B)

“Marginal” probability (“Sum Rule”):

P(B) =           P(A ∩ B) =                P(B|A)P(A)
A                          A

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Overview
Classiﬁer Structure   Easy
Course

First Classiﬁer Design: Solution

Known: One observed point x; p(x|Ci ), P(Ci ), all i

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Overview
Classiﬁer Structure   Easy
Course

First Classiﬁer Design: Solution

Known: One observed point x; p(x|Ci ), P(Ci ), all i
Calculate a Posteriori Source Probabilities, all i
p(x|Ci )P(Ci )              p(x|Ci )P(Ci )
P(Ci |x) =                  =
p(x)                    j p(x|Cj )P(Cj )

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Overview
Classiﬁer Structure   Easy
Course

First Classiﬁer Design: Solution

Known: One observed point x; p(x|Ci ), P(Ci ), all i
Calculate a Posteriori Source Probabilities, all i
p(x|Ci )P(Ci )              p(x|Ci )P(Ci )
P(Ci |x) =                  =
p(x)                    j p(x|Cj )P(Cj )

Optimal Classiﬁer: Choose Category with greatest probability.

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Overview
Classiﬁer Structure   Easy
Course

First Classiﬁer Design: Solution

Known: One observed point x; p(x|Ci ), P(Ci ), all i
Calculate a Posteriori Source Probabilities, all i
p(x|Ci )P(Ci )              p(x|Ci )P(Ci )
P(Ci |x) =                  =
p(x)                    j p(x|Cj )P(Cj )

Optimal Classiﬁer: Choose Category with greatest probability.
Equivalent Decision Rules, for minimum error probability:

d(x) = argmax p(x|Ci )P(Ci ) =
i
= argmax (log p (x|Ci ) + log P (Ci ))
i

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training         Overview
Classiﬁer Structure       Easy
Course

Second Classiﬁer Design – General Objective
Minimum Expected Loss

Source Categories: C1 , . . . , CNs . Known P(Ck ), all k
Possible Decisions: d(x) ∈ (1, . . . , Nd ); Nd = Ns
Loss: Lkj = cost, if Source=k and Decision=j
Optimal Classiﬁer: Choose d for Minimum Expected Loss:

d(x) = argmin                   Lkj P(Ck |x)
j     k

Arne Leijon        www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Max. Likelihood
Classiﬁer Structure   Bayesian
Course

Real Classiﬁer Design – Supervised Training

We do NOT know p(x|Ck ), maybe not even P(Ck )
We have observed Training Data Set:

x =(x1 , . . . , xN ); observations
t =(t1 , . . . , tN ); “target” categories

How to use training data....?
Parametric Methods – create model from data, then classiﬁer
Non-parametric Methods – use data “directly” in classiﬁer

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Max. Likelihood
Classiﬁer Structure   Bayesian
Course

Parametric Model Training

Assume a Class of density functions, p(x|Ck , θ). Within Class,
particular density speciﬁed by parameter vector θ.
Example, 1-dim. Gaussian observations:
(x−µk )2
1           −
2σ 2
p(x|Ck , µk , σk ) =          √       e           k
σk 2π

Use all training data x = (x1 , . . . , xN ) to estimate θ
Still two ways to do it.....

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training         Max. Likelihood
Classiﬁer Structure       Bayesian
Course

Parametric Model Training – Max. Likelihood (ML)

Assume θ has some ﬁxed, but unknown, value
Calculate probability of all training data x = (x1 , . . . , xN )
N
p(x|θ) =                 p(xn |θ), any θ
n=1

Find best-ﬁtting single parameter estimate
ˆ
θ ML = argmax p(x|θ)
θ

ˆ
Design Classiﬁer using p(x|Ck , θ ML )

Arne Leijon        www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Max. Likelihood
Classiﬁer Structure   Bayesian
Course

Parametric Model Training – Bayesian Learning

Assume θ is a random vector ...
... with Prior density p(θ|α0 ), with hyperparameters α0

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Max. Likelihood
Classiﬁer Structure   Bayesian
Course

Parametric Model Training – Bayesian Learning

Assume θ is a random vector ...
... with Prior density p(θ|α0 ), with hyperparameters α0
Posterior density function, given training data:

p(θ|x) ∝ p(x|θ)p(θ|α0 )

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Max. Likelihood
Classiﬁer Structure   Bayesian
Course

Parametric Model Training – Bayesian Learning

Assume θ is a random vector ...
... with Prior density p(θ|α0 ), with hyperparameters α0
Posterior density function, given training data:

p(θ|x) ∝ p(x|θ)p(θ|α0 )

Density of future observation x, and any θ, given training:

p(x, θ|Ck , x) = p(x|Ck , θ)p(θ|x)

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Max. Likelihood
Classiﬁer Structure   Bayesian
Course

Parametric Model Training – Bayesian Learning

Assume θ is a random vector ...
... with Prior density p(θ|α0 ), with hyperparameters α0
Posterior density function, given training data:

p(θ|x) ∝ p(x|θ)p(θ|α0 )

Density of future observation x, and any θ, given training:

p(x, θ|Ck , x) = p(x|Ck , θ)p(θ|x)

Design Classiﬁer using marginal density across all θ

p(x|Ck , x) =           p(x, θ|Ck , x)dθ

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Max. Likelihood
Classiﬁer Structure   Bayesian
Course

Bayesian Learning – Example (problem 2.38)
Assume any single observation x is Gaussian, with known σ
(x−µ)2
p(x|µ) ∝ e −        2σ 2

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Max. Likelihood
Classiﬁer Structure   Bayesian
Course

Bayesian Learning – Example (problem 2.38)
Assume any single observation x is Gaussian, with known σ
(x−µ)2
p(x|µ) ∝ e −        2σ 2

Assume Prior density for µ is Gaussian, hyperparam. µ0 , σ0
(µ−µ0 )2
−
2σ 2
p(µ|µ0 , σ0 ) ∝ e              0

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training       Max. Likelihood
Classiﬁer Structure     Bayesian
Course

Bayesian Learning – Example (problem 2.38)
Assume any single observation x is Gaussian, with known σ
(x−µ)2
p(x|µ) ∝ e −            2σ 2

Assume Prior density for µ is Gaussian, hyperparam. µ0 , σ0
(µ−µ0 )2
−
2σ 2
p(µ|µ0 , σ0 ) ∝ e                   0

Posterior density for µ, with training data x = (x1 , . . . , xN ):
(µ−µ0 )2
(x1 −µ)2              (xN −µ)2       −
−                     −                        2σ 2
p(µ|x) ∝ e           2σ 2    ···e          2σ 2    e           0

(µ−µN )2
−
2σ 2
∝e          N        , still a Gaussian density

Arne Leijon      www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Max. Likelihood
Classiﬁer Structure   Bayesian
Course

Bayesian Learning – Example (problem L2.1)

Binary single observation x = 0 or 1, with prob. mass function

p(x|µ) = µx (1 − µ)1−x

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Max. Likelihood
Classiﬁer Structure   Bayesian
Course

Bayesian Learning – Example (problem L2.1)

Binary single observation x = 0 or 1, with prob. mass function

p(x|µ) = µx (1 − µ)1−x

Assume Prior density for µ is Beta, with hyperparam. a0 , b0

p(µ|a0 , b0 ) ∝ µa0 −1 (1 − µ)b0 −1

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Max. Likelihood
Classiﬁer Structure   Bayesian
Course

Bayesian Learning – Example (problem L2.1)

Binary single observation x = 0 or 1, with prob. mass function

p(x|µ) = µx (1 − µ)1−x

Assume Prior density for µ is Beta, with hyperparam. a0 , b0

p(µ|a0 , b0 ) ∝ µa0 −1 (1 − µ)b0 −1

Posterior density for µ, with training data x = (x1 , . . . , xN ):

p(µ|x) ∝ µx1 (1 − µ)1−x1 · · · µxN (1 − µ)1−xN µa0 −1 (1 − µ)b0 −1
∝ µaN −1 (1 − µ)bN −1 , still a Beta density

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Max. Likelihood
Classiﬁer Structure   Bayesian
Course

ˆ
Using single ML estimate θ ML is sub-optimal
Avoids over-ﬁtting
Trained model “knows” how well it was trained
Compare/validate model structures, and model sizes

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training         Generative
Classiﬁer Structure       Discriminative
Course

Classiﬁer Structure – Generative or Discriminative?
5

4   d= 2                                                     d= 1

3

2

1
x2

0

-1

-2

-3

-4   d= 3

-5
-4   -2           0             2                4            6
x1

Arne Leijon        www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Generative
Classiﬁer Structure   Discriminative
Course

Classiﬁer Structure – Generative

Assume Class of generative density functions p(x|Ck , θ),
deﬁned by parameters θ.
Train generative model instance for each source category.
Design discriminant functions gj (x) from trained models.
Use decision function d(x) = argmaxj gj
Example: GMM or HMM for speaker recognition or speech
recognition.

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training     Generative
Classiﬁer Structure   Discriminative
Course

Classiﬁer Structure – Discriminative

Assume Class of discriminant functions gj (x|θ), deﬁned by
parameters θ.
Train discriminant functions, so that
gj ≈ 1, for j =desired decision, otherwise gj ≈ 0.
Use decision function d(x) = argmaxj gj
Example: Neural network, Support Vector Machine, Relevance
Vector Machine.

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/
Classiﬁcation
Model Training
Classiﬁer Structure
Course

Course Format – All info on web page

Part 1: Theory, April-June, 8 ECTS credits
Solve 80% of problems, incl. mandatory
No exam
Part 2: Project, May-Sept(?), +4 ECTS credits
Propose individual classiﬁcation project,
... or accept given proposal.
Implement and present solution.
MatLab Tools (Gauss, GMM, HMM, ...) on Project page.
Example 1: implement discriminative method (ANN, SVM,...)
to compare with given generative GMM approach.
Example 2: Implement Bayesian versions of MatLab tools.

Arne Leijon    www.ee.kth.se/sip/courses/F2E5414/

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 13 posted: 12/20/2011 language: pages: 33
How are you planning on using Docstoc?