# Bayesian Decision Theory (Classification)

Document Sample

```					Bayesian Decision Theory
(Classification)

主講人：虞台文
Contents
   Introduction
   Generalize Bayesian Decision Rule
   Discriminant Functions
   The Normal Distribution
   Discriminant Functions for the Normal
Populations.
   Minimax Criterion
   Neyman-Pearson Criterion
Bayesian Decision Theory
(Classification)

Introduction
What is Bayesian Decision Theory?

   Mathematical foundation for decision making.

   Using probabilistic approach to help making
decision (e.g., classification) so as to
minimize the risk (cost).
Preliminaries and Notations

i  {1 , 2 ,, c } : a state of nature
P (i ) : prior probability
x : feature vector
class-conditional
p ( x | i ) :
density

P(i | x) : posterior probability
Bayesian Rule
p(x | i ) P(i )
P(i | x) 
p ( x)
c
p ( x)   p (x | i ) P (i )
j 1
Decision
p(x | i ) P(i )
P(i | x) 
p ( x)
unimportant in
making decision

D (x)  arg max P(i | x)
i
p(x | i ) P(i )
P(i | x) 
p ( x)
Decision                  D (x)  arg max P(i | x)
i

Decide i if P(i|x) > P(j|x)        ji

Decide i if p(x|i)P(i) > p(x|j)P(j)  j  i

Special cases:
1. P(1)=P(2)=   =P(c)
2. p(x|1)=p(x|2) =   = p(x|c)
Decide i if P(i|x) > P(j|x)     ji
Decide i if p(x|i)P(i) > p(x|j)P(j)  j  i

Two Categories
Decide 1 if P(1|x) > P(2|x); otherwise decide 2

Decide 1 if p(x|1)P(1) > p(x|2)P(2); otherwise decide 2

Special cases:
1. P(1)=P(2)
Decide 1 if p(x|1) > p(x|2); otherwise decide 1
2. p(x|1)=p(x|2)
Decide 1 if P(1) > P(2); otherwise decide 2
Special cases:
1. P(1)=P(2)
Decide 1 if p(x|> p(x|2); otherwise decide 1

Example    2. p(x|1)=p(x|2)
Decide 1 if P(1) > P(2); otherwise decide 2

R2   R1
P(1)=P(2)
Example
P(1)=2/3
P(2)=1/3

R2

R2
R1           R1

Decide 1 if p(x|1)P(1) > p(x|2)P(2); otherwise decide 2
Classification Error
P(error )   p(error , x)dx
  P(error | x) p(x)dx
Consider two categories:
Decide 1 if P(1|x) > P(2|x); otherwise decide 2

P(2 | x) if we decide 1
P(error | x)                              min[ P(1 | x), P(2 | x)]
 P(1 | x) if we decide 2
Classification Error
P(error )   p(error , x)dx
  P(error | x) p(x)dx
Consider two categories:
Decide 1 if P(1|x) > P(2|x); otherwise decide 2

P(2 | x) if we decide 1
P(error | x)                              min[ P(1 | x), P(2 | x)]
 P(1 | x) if we decide 2
Bayesian Decision Theory
(Classification)

Generalized
Bayesian Decision
Rule
The Generation
  {1 , 2 ,  , c } : a set of c states of nature
  {1 ,  2 ,  ,  a } : a set of a possible actions

The loss incurred for
ij   ( i |  j ) :   taking action i when the
can be zero.             true state of nature is j.

We want to minimize the expected loss in making decision.
Risk
Conditional Risk
c                            c
R ( i | x)    ( i |  j ) P ( j | x)   ij P ( j | x)
j 1                        j 1

x
Given , the expected loss (risk)
associated with taking action

i .
0/1 Loss Function
c                              c
R ( i | x)    ( i |  j ) P ( j | x)   ij P ( j | x)
j 1                          j 1

0  i is a correctdecision assiciated with  j
 ( i |  j )  
1 otherwise

R( i | x)  P(error | x)
Decision
c                              c
R ( i | x)    ( i |  j ) P ( j | x)   ij P ( j | x)
j 1                          j 1

Bayesian Decision Rule:

 (x)  arg min R( i | x)
i
Overall Risk
R   R( (x) | x) p(x)dx
Decision function

Bayesian decision rule:  (x)  arg min R( i | x)
i

the optimal one to minimize the overall risk
Its resulting overall risk is called the Bayesian risk
Two-Category Classification
State of Nature

  {1 , 2 }
Loss Function      1 2
1   11 12
  {1 ,  2 }
Action
2   21 22

R(1 | x)  11P(1 | x)  12 P(2 | x)
R( 2 | x)  21P(1 | x)  22 P(2 | x)
Two-Category Classification

Perform 1 if R(2|x) > R(1|x); otherwise perform 2

21P(1 | x)  22 P(2 | x)  11P(1 | x)  12 P(2 | x)

(21  11 ) P(1 | x)  (12  22 ) P(2 | x)

R(1 | x)  11P(1 | x)  12 P(2 | x)
R( 2 | x)  21P(1 | x)  22 P(2 | x)
Two-Category Classification

Perform 1 if R(2|x) > R(1|x); otherwise perform 2

21P(1 | x)  22 P(2 | x)  11P(1 | x)  12 P(2 | x)

(21  11 ) P(1 | x)  (12  22 ) P(2 | x)
positive                 positive

Posterior probabilities are scaled before comparison.
p(x | i ) P(i )
P(i | x) 
p ( x)          irrelevan
t
Two-Category Classification

Perform 1 if R(2|x) > R(1|x); otherwise perform 2

21P(1 | x)  22 P(2 | x)  11P(1 | x)  12 P(2 | x)

(21  11 ) P(1 | x)  (12  22 ) P(2 | x)

(21  11 ) p(x | 1 ) P(1 )  (12  22 ) p(x | 2 ) P(2 )
p(x | 1 ) (12  22 ) P(2 )

p(x | 2 ) (21  11 ) P(1 )
This slide will be recalled later.

Two-Category Classification

Likelihood
Threshold
Ratio

p(x | 1 ) (12  22 ) P(2 )
Perform 1 if              
p(x | 2 ) (21  11 ) P(1 )
Bayesian Decision Theory
(Classification)

Discriminant Functions
How to define discriminant functions?

The Multicategory Classification

gi(x)’s are called the
g1(x)           discriminant functions.

x
g2(x)                Action
(e.g., classification)
(x)

gc(x)        Assign x to i if
gi(x) > gj(x) for all j  i.
If f(．) is a monotonically increasing function,
than f(gi(．) )’s are also be discriminant functions.

Simple Discriminant Functions

Minimum Risk case:
g i (x)   R( i | x)
Minimum Error-Rate case:
g i (x)  P(i | x)
g i (x)  p(x | i ) P(i )
g i (x)  ln p(x | i )  ln P(i )
Decision Regions
R i  {x | g i (x)  g j (x) j  i}

Two-category example

Decision regions are
separated by decision
boundaries.
Bayesian Decision Theory
(Classification)

The Normal
Distribution
Basics of Probability
Discrete random variable (X) － Assume integer
Probability mass function (pmf):      p ( x )  P( X  x )
x
Cumulative distribution function (cdf):    F ( x)  P( X  x)     p(t )
t  

Continuous random variable (X)
Probability density function (pdf):   p( x) or f ( x) not a probability
x
Cumulative distribution function (cdf):    F ( x)  P( X  x)   p(t )dt

Expectations
Let g be a function of random variable X.
 
  g ( x) p( x)    X is discrete
E[ g ( X )]   x  

 g ( x) p ( x)dx

X is continuous

The kth moment      E[ X k ]
The 1st moment  X  E[X ]

The kth central moments           E[( X   X ) k ]
Fact: Var[ X ]  E[ X 2 ]  ( E[ X ]) 2
Important Expectations

Mean
 
  xp( x)          X is discrete
 X  E[ X ]   x  

 xp ( x )dx

X is continuous

Variance
 
      ( x   X ) 2 p( x)   X is discrete
 X  Var[ X ]  E[( X   X ) ]   x  
2                           2

 ( x   X ) 2 p ( x)dx

X is continuous
Entropy
The entropy measures the fundamental uncertainty
in the value of points selected randomly from a
distribution.

 
  p ( x) ln p ( x)     X is discrete
H [ X ]   x  


   p( x) ln p( x)dx   X is continuous
Properties:
1. Maximize the entropy
2. Central limit theorem
Univariate Gaussian Distribution

X~N(μ,σ2)                            p(x)

( x )2
1       
p( x)       e       2 2
μ
2                               σ   σ
x
2σ         2σ
E[X] =μ                             3σ            3σ

Var[X] =σ2
X  ( X 1 , X 2 ,, X d )       T

Random Vectors

X:  R
A d-dimensional                         d
random vector

Vector Mean:   μ  E[ X]  ( 1 ,  2 , ,  d )   T

Covariance Matrix:               12  12      1d 
                     
 21  2 2
  2d 
Σ  E[( X  μ)( X  μ) ]  T

              
                   2 
 d 1  d 2
               d  
( x )2
1       
p( x)       e       2 2
2 

Multivariate Gaussian Distribution

X~N(μ,Σ)                     A d-dimensional random vector

1                   1         1      
p ( x)                            exp  (x  μ) Σ (x  μ)
T

(2 ) d / 2 | Σ |1/ 2        2                 

E[X] =μ
E[(X-μ) (X-μ)T] =Σ
Properties of N(μ,Σ)
X~N(μ,Σ)      A d-dimensional random vector

Let Y=ATX, where A is a d × k matrix.

Y~N(ATμ, ATΣA)
Properties of N(μ,Σ)
X~N(μ,Σ)      A d-dimensional random vector

Let Y=ATX, where A is a d × k matrix.

Y~N(ATμ, ATΣA)
On Parameters of N(μ,Σ)

X~N(μ,Σ)                      X  ( X 1 , X 2 ,, X d )             T

μ  E[ X]  ( 1 ,  2 , ,  d )                T

i  E[ X i ]
Σ  E[(X  μ)(X  μ) ]  [ ij ]d d   T

 ij  E[( X i  i )( X j   j )]  Cov( X i , X j )   X i  X j   ij  0

 ii   i2  E[( X i  i ) 2 ]  Var ( X i )
Σ  (ΦΛ             1/ 2
)( ΦΛ   1/ 2 T
)

More On Covariance Matrix
 is symmetric and positive semidefinite.
Σ  ΦΛΦ  ΦΛ Λ Φ  T             1/ 2   1/ 2      T

: orthonormal matrix, whose columns are eigenvectors of .
: diagonal matrix (eigenvalues).

Σ  E[(X  μ)(X  μ) ]  [ ij ]d d   T

 ij  E[( X i  i )( X j   j )]  Cov( X i , X j )        X i  X j   ij  0

 ii   i2  E[( X i  i ) 2 ]  Var ( X i )
Σ  (ΦΛ          1/ 2
)( ΦΛ    1/ 2 T
)
Whitening Transform
X~N(μ,Σ)
Y=ATX                        Y~N(ATμ, ATΣA)
1 / 2
Let Aw  ΦΛ
A w X ~ N ( A μ, A ΣA w ) T
w
T
w
A T ΣA w  (ΦΛ 1/ 2 )T (ΦΛ 1/ 2 )( ΦΛ 1/ 2 )T (ΦΛ 1/ 2 )  I
w

A w X ~ N ( A μ, I )       T
w
Σ  (ΦΛ             1/ 2
)( ΦΛ   1/ 2 T
)
Whitening Transform
Whitening
X~N(μ,Σ)                                      Linear

Y~N(A μ, ATΣA)
Transform
Y=ATX                                           T

1 / 2
Let Aw  ΦΛ                                                    Projection

A w X ~ N ( A μ, A ΣA w ) T
w
T
w
A T ΣA w  (ΦΛ 1/ 2 )T (ΦΛ 1/ 2 )( ΦΛ 1/ 2 )T (ΦΛ 1/ 2 )  I
w

A w X ~ N ( A μ, I )       T
w
r 2  (x  μ)T Σ 1 (x  μ)

Mahalanobis Distance
X~N(μ,Σ)
1                   1         1      
p ( x)                            exp  (x  μ) Σ (x  μ)
T

(2 ) d / 2 | Σ |1/ 2        2                 

depends on     constant                                r2
the value of r2
r 2  (x  μ)T Σ 1 (x  μ)

Mahalanobis Distance
X~N(μ,Σ)
1                   1         1      
p ( x)                            exp  (x  μ) Σ (x  μ)
T

(2 ) d / 2 | Σ |1/ 2        2                 

depends on     constant                                r2
the value of r2
Bayesian Decision Theory
(Classification)

Discriminant
Functions for the
Normal Populations
g i (x)  P(i | x)
g i (x)  p(x | i ) P(i )
Minimum-Error-Rate Classification

g i (x)  ln p(x | i )  ln P(i )
Xi~N(μi,Σi)
1                  1                              
p ( x | i )                          exp  ( x  μ i )T Σ i1 ( x  μ i )
(2 ) d / 2 | Σ i |1/ 2      2                              

1                          d       1
gi (x)   (x  μ i ) Σi (x  μ i )  ln 2  ln | Σi |  ln P(i )
T 1

2                          2       2
Minimum-Error-Rate Classification

Three Cases:
Case 1: Σ i   2 I
Classes are centered at different mean, and their feature
components are pairwisely independent have the same variance.
Case 2: Σ i  Σ
Classes are centered at different mean, but have the same variation.

Case 3: Σ i  Σ j
Arbitrary.
1                          d       1
gi (x)   (x  μ i ) Σi (x  μ i )  ln 2  ln | Σi |  ln P(i )
T 1

2                          2       2
1    1
Σ            I
Case 1. i =                                              
i         2
2I

1
g i ( x)             || x  μ i ||2  ln P(i )
2 2
1
  2 (xT x  2μT x  μT μ i )  ln P(i )
2
i     i

irrelevant
1        1 T                       
g i ( x)  2 μ x   
T
μ i μ i  ln P(i )
            2 2
i
                           
irrelevant

1                          d       1
gi (x)   (x  μ i ) Σi (x  μ i )  ln 2  ln | Σi |  ln P(i )
T 1

2                          2       2
wi  12 μi

wi 0   2 2 μT μi  ln P(i )
1

Case 1. i =                  
i
2I

g i (x)  w T x  wi 0
i

1         1 T                       
g i ( x)  2 μ x   
T
μ i μ i  ln P(i )
            2 2
i
                           
wi  12 μi

wi 0   2 2 μT μi  ln P(i )
1

Case 1. i =                                 
i
2I

i                   j
g i (x)  w x  wi 0
T
i

wT x  wi 0  wT x  wj 0
i             j
g i ( x)  g j ( x)
(w  w )x  w j 0  wi 0
T
i
T
j
Boundary btw.
P (i )                         i and j
(μ  μ )x  (μ μ i  μ μ j )   ln
T       T       1       T       T     2

P ( j )
i       j       2       i       j

(μT  μTj )(μ i  μ j )      P(i )
(μ  μ )x  (μ  μ )(μ i  μ j )  
T           T       1       T       T             2   i
ln
|| μ i  μ j ||2         P( j )
i           j       2       i       j
The decision boundary will be a hyperplane
perpendicular to the line btw. the means at somewhere.

Case 1. i =                           2I

i             x j
w (x  x 0 )  0
T
xx0
w
w  μi  μ j                                                                    x0
2            P(i )                      g i ( x)  g j ( x)
x0  1 (μi  μ j )                    ln         (μ i  μ j )
2
|| μi  μ j ||2
P( j )                     Boundary btw.
midpoint                                                      i and j
0 if P(i)=P(j)
wT
(μT  μTj )(μ i  μ j )        P(i )
(μ  μ )x  (μ  μ )(μ i  μ j )  
T        T     1     T      T                    2   i
ln
|| μ i  μ j ||2          P( j )
i        j     2     i      j
2           P(1 )
x0  (μ1  μ 2 ) 
1
ln        (μ1  μ 2 )
2
|| μ1  μ 2 ||2
P(2 )

Case 1. i =             2I

P(1 )  P(2 )

Minimum distance classifier (template matching)
2           P(1 )
x0  (μ1  μ 2 ) 
1
ln        (μ1  μ 2 )
2
|| μ1  μ 2 ||2
P(2 )

Case 1. i =          2I

P(1 )  P(2 )
2           P(1 )
x0  (μ1  μ 2 ) 
1
ln        (μ1  μ 2 )
2
|| μ1  μ 2 ||2
P(2 )

Case 1. i =          2I

P(1 )  P(2 )
2           P(1 )
x0  (μ1  μ 2 ) 
1
ln        (μ1  μ 2 )
2
|| μ1  μ 2 ||2
P(2 )

Case 1. i =          2I

P(1 )  P(2 )                                   Demo
Case 2. i = 
1
gi (x)   (x  μ i )T Σ 1 (x  μ i )  ln P(i )
2
Mahalanobis Irrelevant if
Distance P(i)= P(j) i, j

irrelevant

1                          d       1
gi (x)   (x  μ i ) Σi (x  μ i )  ln 2  ln | Σi |  ln P(i )
T 1

2                          2       2
w i  Σ 1μ i

wi 0   1 μT Σ 1μ i  ln P (i )
Case 2. i = 
2 i

1
gi (x)   (x  μ i )T Σ 1 (x  μ i )  ln P(i )
2
1 T 1
  (x Σ x  2μT Σ 1x  μT Σ 1μ i )  ln P(i )
i            i
2
Irrelevant

g i (x)  w T x  wi 0
i
w i  Σ 1μ i

wi 0   1 μT Σ 1μ i  ln P (i )
Case 2. i = 
2 i

g i (x)  w T x  wi 0
i
i             x    j
w

x0

w T (x  x 0 )  0
1
g i ( x)  g j ( x)
w  Σ (μi  μ j )
ln[ P(i ) / P( j )]
x 0  (μ i  μ j ) 
1
1
(μ i  μ j )
2
(μ i  μ j ) Σ (μ i  μ j )
T
Case 2. i = 
Demo

Case 2. i = 
1 1 w i  Σ i1μ i wi 0   1 μ T Σ i1μ i  1 ln | Σ i1 |  ln P(i )
Wi   Σi                          2 i              2
2

Case 3. i   j
1                          1
gi (x)   (x  μ i ) Σi (x  μ i )  ln | Σi |  ln P(i )
T 1

2                          2
g i ( x)  x Wi x  w x  wi 0
T          T
i           • hyperplanes
• hyperspheres
Without this term             • hyperellipsoids
In Case 1 and 2               • hyperhyperboloids

irrelevant

1                          d       1
gi (x)   (x  μ i ) Σi (x  μ i )  ln 2  ln | Σi |  ln P(i )
T 1

2                          2       2
Case 3. i   j
Non-simply connected decision
regions can arise in one dimensions
for Gaussians having unequal
variance.
Case 3. i   j
Case 3. i   j
Case 3. i   j

Demo
Multi-Category Classification
Bayesian Decision Theory
(Classification)

Minimax
Criterion
Bayesian Decision Rule:
Two-Category Classification

Likelihood
Threshold
Ratio

p(x | 1 ) (12  22 ) P(2 )
Decide 1 if              
p(x | 2 ) (21  11 ) P(1 )

Minimax criterion deals with the case that
the prior probabilities are unknown.
Basic Concept on Minimax

To choose the worst-case prior probabilities
(the maximum loss) and, then, pick the
decision rule that will minimize the overall risk.

Minimize the maximum possible overall risk.
R(1 | x)  11P(1 | x)  12 P(2 | x)
R( 2 | x)  21P(1 | x)  22 P(2 | x)

Overall Risk
R   R( (x) | x) p(x)dx

  R(1 | x) p(x)dx   R( 2 | x) p(x)dx
R1                        R2

R   [11P(1 | x)  12 P(2 | x)] p(x)dx 
R1


R2
[21P(1 | x)  22 P(2 | x)] p(x)dx
p(x | i ) P(i )
P(i | x) 
p ( x)

Overall Risk
R   [11P(1 ) p(x | 1 )  12 P(2 ) p(x | 2 )]dx 
R1

R2
[21P(1 ) p(x | 1 )  22 P(2 ) p(x | 2 )]dx

R   [11P(1 | x)  12 P(2 | x)] p(x)dx 
R1

R2
[21P(1 | x)  22 P(2 | x)] p(x)dx
P(2 )  1  P(1 )
Overall Risk
R   [11P(1 ) p(x | 1 )  12 P(2 ) p(x | 2 )]dx 
R1

 R2
[21P(1 ) p(x | 1 )  22 P(2 ) p(x | 2 )]dx
R   {11P(1 ) p(x | 1 )  12 [1  P(1 )] p(x | 2 )}dx 
R1

R2
{21P(1 ) p(x | 1 )  22 [1  P(1 )] p(x | 2 )}dx

R  12  p(x | 2 )dx  22  p(x | 2 ) dx
R1                   R2

 11 P(1 )  p(x | 1 )dx  12 P(1 )  p(x | 2 )dx
R1                         R1

 21 P(1 )  p (x | 1 )dx  22 P(1 )  p (x | 2 )dx
R2                         R2

R1
p(x | i )dx   p(x | i )dx  1
R2

Overall Risk

R[ P(1 )]  22  (12  22 )  p(x | 2 )dx
R1

 P(1 ) (11  22 )  (21  11 )  p(x | 1 )dx  (12  22 )  p(x | 2 )dx

                             R2                            R1           


R  12  p(x | 2 )dx  22  p(x | 2 ) dx
R1                      R2

 11 P(1 )  p(x | 1 )dx  12 P(1 )  p(x | 2 )dx
R1                             R1

 21 P(1 )  p (x | 1 )dx  22 P(1 )  p (x | 2 )dx
R2                             R2
R(x) = ax + b
Overall Risk
The value depends on
the setting of decision boundary
R[ P(1 )]  22  (12  22 )  p(x | 2 )dx
R1

 P(1 ) (11  22 )  (21  11 )  p(x | 1 )dx  (12  22 )  p(x | 2 )dx

                             R2                            R1           


The value depends on
the setting of decision boundary

The overall risk for a particular P(1).
R(x) = ax + b
Overall Risk
= Rmm, minimax risk

R[ P(1 )]  22  (12  22 )  p(x | 2 )dx
R1

 P(1 ) (11  22 )  (21  11 )  p(x | 1 )dx  (12  22 )  p(x | 2 )dx

                             R2                            R1           


= 0 for minimax solution

Independent on the value of P(i).
Minimax Risk

R[ P(1 )]  22  (12  22 )  p(x | 2 )dx
R1

 P(1 ) (11  22 )  (21  11 )  p(x | 1 )dx  (12  22 )  p(x | 2 )dx

                             R2                            R1           


Rmm  22  (12  22 )  p(x | 2 )dx
R1

 11  (21  11 )  p(x | 1 )dx
R2
Use 0/1 loss function

Error Probability

R[ P(1 )]  22  (12  22 )  p(x | 2 )dx
R1

 P(1 ) (11  22 )  (21  11 )  p(x | 1 )dx  (12  22 )  p(x | 2 )dx

                             R2                            R1           


Perror [ P(1 )]   p (x | 2 )dx
R1

 P(1 )   p (x | 1 )dx   p (x | 2 )dx
 R2
                    R1            

Use 0/1 loss function

Minimax Error-Probability

Pmm (error )   p(x | 2 )dx   p(x | 1 )dx
R1                        R2

P(1|2)                  P(2|1)

Perror [ P(1 )]   p (x | 2 )dx
R1

 P(1 )   p (x | 1 )dx   p (x | 2 )dx
 R2
                    R1            

Perror [ P(1 )]   p (x | 2 )dx
R1

 P(1 )   p (x | 1 )dx   p (x | 2 )dx
 R2
                    R1            


Minimax Error-Probability

Pmm (error )   p(x | 2 )dx   p(x | 1 )dx
R1                                    R2

P(1|2)                              P(2|1)

1                                      2

R1                                    R2
Perror [ P(1 )]   p (x | 2 )dx
R1

 P(1 )   p (x | 1 )dx   p (x | 2 )dx
 R2
                    R1            


Minimax Error-Probability
Bayesian Decision Theory
(Classification)

Neyman-Pearson
Criterion
Bayesian Decision Rule:
Two-Category Classification

Likelihood
Threshold
Ratio

p(x | 1 ) (12  22 ) P(2 )
Decide 1 if              
p(x | 2 ) (21  11 ) P(1 )

Neyman-Pearson Criterion deals with the
case that both loss functions and the prior
probabilities are unknown.
Signal Detection Theory
   The theory of signal detection theory evolved
from the development of communications and
radar equipment the first half of the last century.

   It migrated to psychology, initially as part of
sensation and perception, in the 50's and 60's as
an attempt to understand some of the features of
human behavior when detecting very faint stimuli
that were not being explained by traditional
theories of thresholds.
The situation of interest
   A person is faced with a stimulus (signal) that is
very faint or confusing.

   The person must make a decision, is the signal
there or not.

   What makes this situation confusing and difficult
is the presences of other mess that is similar to the
signal. Let us call this mess noise.
Example
Noise is present both in the
environment and in the sensory
system of the observer.

The observer reacts to the
momentary total activation of
the sensory system, which
fluctuates from moment to
moment, as well as responding
to environmental stimuli, which
may include a signal.
Example
   A radiologist is examining a CT scan, looking for
evidence of a tumor.
   A hard job, because there is always some uncertainty.

   There are four possible outcomes:
–   hit (tumor present and doctor says "yes'')
–   miss (tumor present and doctor says "no'')         Two types
–   false alarm (tumor absent and doctor says "yes") of Error
–   correct rejection (tumor absent and doctor says "no").
Signal detection theory was developed to help us understand how a
continuous and ambiguous signal can lead to a binary yes/no decision.

The Four Cases
Signal (tumor)
Absent (1)        Present (2)
P(1|1)            P(1|2)
No (1)    Correct Rejection         Miss
Decision
P(2|1)            P(2|2)
Yes (2)       False Alarms             Hit
|  2  1 |
Discriminability    d'

Decision Making
Criterion    Based on expectancy
(decision bias)
d’
Noise                          Noise + Signal
Hit       P(2|2)
False
P(2|1)
Alarm
1           2
No (1)         Yes (2)
ROC Curve

Hit
PH=P(2|2)

False
Alarm
PFA=P(2|1)
Neyman-Pearson Criterion

Hit
PH=P(2|2)
NP:
max. PH
subject to PFA ≦ a

False
Alarm
PFA=P(2|1)
Likelihood Ratio Test
p ( x|1 )
0
       p ( x| 2 )   T   R 1  {x | p(x | 1 )  Tp (x | 2 )}
 ( x)         p ( x|1 )
1
       p ( x| 2 )   T   R 2  {x | p(x | 1 )  Tp (x | 2 )}

where T is a threshold that meets the PFA constraint (≦ a).

How to determine T?
PFA  E[ ( X) | 1 ]
PH  E[ ( X) | 2 ]
Likelihood Ratio Test
p ( x|1 )
0
       p ( x| 2 )   T   R 1  {x | p(x | 1 )  Tp (x | 2 )}
 ( x)         p ( x|1 )
1
       p ( x| 2 )   T   R 2  {x | p(x | 1 )  Tp (x | 2 )}

PFA   p(x | 1 )dx
R2

   (x) p(x | 1 )dx
PH
PH   p(x | 2 )dx                                                   PFA
R2

   (x) p(x | 2 )dx                      R1         R2
0

p ( x|1 )
T                       PFA  E[ ( X) | 1 ]
p ( x| 2 )
 ( x)      p ( x|1 )
1                 T
    p ( x| 2 )
PH  E[ ( X) | 2 ]
Neyman-Pearson Lemma
Consider the aforementioned rule with T chosen to give PFA() =a. There is
no decision rule ’ such that PFA(’)  a and PH(’) > PH() .
Pf) Let ’ be a decision rule with PFA ( ' )  E[ ' (X) | 1 ]  a.

 [ (x)  ' (x)][Tp(x |  )  p(x |  )]dx  0
2             1

=1
0            >0
1

p ( x|1 )
p ( x| 2 )   T                       PFA  E[ ( X) | 1 ]
 ( x)      p ( x|1 )
0
    p ( x| 2 )   T                       PH  E[ ( X) | 2 ]
Neyman-Pearson Lemma
Consider the aforementioned rule with T chosen to give PFA() =a. There is
no decision rule ’ such that PFA(’) ≦a and PH(’) > PH() .
Pf) Let ’ be a decision rule with PFA ( ' )  E[ ' (X) | 1 ]  a.

 [ (x)  ' (x)][Tp(x |  )  p(x |  )]dx  0 
2             1

=0
0               0
1

p ( x|1 )
p ( x| 2 )   T                                    PFA  E[ ( X) | 1 ]
 ( x)       p ( x|1 )
0
     p ( x| 2 )   T                                    PH  E[ ( X) | 2 ]
Neyman-Pearson Lemma
Consider the aforementioned rule with T chosen to give PFA() =a. There is
no decision rule ’ such that PFA(’) ≦a and PH(’) > PH() .         OK
Pf) Let ’ be a decision rule with PFA ( ' )  E[ ' (X) | 1 ]  a.

 [ (x)  ' (x)][Tp(x |  )  p(x |  )]dx  0
2                 1

                                            
 T   (x) p(x | 2 )dx    ' (x) p(x | 2 )dx    (x) p(x | 1 )dx    ' (x) p(x | 1 )dx   
 T [ PH ( )  PH ( ' )]  [ PFA ( )  PFA ( ' )]  0
0
PH ( )  PH ( ' )

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 4 posted: 6/18/2012 language: pages: 95