Tutorial 6

Document Sample
Tutorial 6
Shared by: techmaster
Stats
views:
499
posted:
10/29/2008
language:
English
pages:
21
Tutorial 6



• Bias and variance of estimators

• The score and Fisher information

• Cramer-Rao inequality









236607 Visual Recognition Tutorial 1

Estimators and their Properties

• Let { p( x |  )},   be a parametric set of distributions.

Given a sample D  x ( n ) x1 , , xn drawn i.i.d from one of

the distributions in the set we would like to estimate its

parameter (thus identifying the distribution).

• An estimator for  w.r.t. D is any function T ( D)  

notice that an estimator is a random variable.

• How do we measure the quality of an estimator?

• Consistency: An estimator T for  is consistent if

T ( x ( n ) )   , as n  

p



this is a (desirable) asymptotic property that motivates us

to acquire large samples. But we should emphasize that we

are also interested in measures for finite (and small!)

sample sizes.



236607 Visual Recognition Tutorial 2

Estimators and their Properties

• Bias: Define the bias of an estimator   to be b(ˆ)   E [ˆ]   2



Here, the expectation is w.r.t. to the distribution p( x |  ).

The estimator is unbiased if its bias is zero b(ˆ)  0



• Example: the estimators x and x  1 n i1 xi , for the mean

n





of a normal distribution, are both unbiased.

The estimator 1 in1 ( xi x12 n for its variance is biased

n

)

whereas the estimator n1 i1 ( xi x )  is unbiased.

2









• Variance: another important property of an estimator is its

ˆ.

variance varp ( x| ) ( ) We would like to find estimators with

minimum bias and variance.

• Which is more important, bias or variance?

236607 Visual Recognition Tutorial 3

Risky Estimators

• Employ our decision-theoretic framework to measure the

quality of estimators.

• Abbreviate ˆ  T ( x ( n ) ) and consider the square error loss

function  (ˆ, )  (ˆ   ) 2 



• The conditional risk associated with  when  is the true

parameter R(ˆ |  )  E (ˆ  )2   (ˆ  )2 p( x( n) |  )dx( n)



• Claim: R(ˆ |  )  var(ˆ)  b(ˆ)  variance+bias

• Proof: E (ˆ   )2  E (ˆ  Eˆ  Eˆ   )2 

      

2 2

ˆ ˆ ˆ ˆ ˆ

 E   E  2 E   E E    Eˆ  



 E   E    E     variance+bias

2 2

ˆ ˆ ˆ



236607 Visual Recognition Tutorial 4

Bias vs. Variance

• So, for a given level of conditional risk, there is a tradeoff

between bias and variance.

• This tradeoff is among the most important facts in pattern

recognition and machine learning.



• Classical approach: Consider only unbiased estimators and

try to find those with minimum possible variance.

• This approach is not always fruitful:

– The unbiasedness only means that the average of the

estimator (w.r.t. to p( x |  )is  . It doesn’t mean it will

be near  for a particular sample (if variance is large).

– In general, an unbiased estimate is not guaranteed to

exist.



236607 Visual Recognition Tutorial 5

The Score

• The score v of the family p( x |  ) is the random variable



p( x |  )



v ln p( x |  )   

 p( x |  )

measures the “sensitivity” of p( x |  )as a function of the

parameter  .

• Claim: E[v]  0

• Proof: 

p( x |  )

 

E[v]   p ( x |  )dx   p( x |  )dx

p( x |  ) 

 



  p( x |  )dx   1  0

• Corollary: var[v]  E (v  E[v])2   E[v 2 ]

 

236607 Visual Recognition Tutorial 6

The Score - Example

• Consider the normal distribution N (  ,1)

1  1 

p( x |  )  exp   ( x   ) 2 

2  2 

1 1

ln p( x |  )   ln(2 )  ( x   ) 2

2 2



v ln p( x |  )  x  



• clearly, E[v]  E[ x  ]  E[ x]    0

• and var(v )  E[v 2 ]  E[( x   ) 2 ]   2  1







236607 Visual Recognition Tutorial 7

The Score - Vector Form

• In case where   (1 , , k ) is a vector, the score v is

the vector whose i th component is



vi  ln p ( x |  )

 i

1  1 

• Example: p( x |  ,  )  exp   2 ( x   ) 2 

2   2 

1 1

ln p( x |  ,  )   ln(2 )  ln   2 ( x   ) 2

2 2

 x

ln p( x |  ,  ) 

 2

 1 ( x   )2

ln p ( x |  ,  )   

  3

 x   1 ( x   )2 

v   2 ,  

   3 

236607 Visual Recognition Tutorial 8

Fisher Information

• Fisher information: Designed to provide a measure of how

much information the parametric probability law p( x |  )

carries about the parameter  .

• An adequate definition of such information should possess

the following properties:

– The larger the sensitivity of p( x |  ) to changes in  , the

larger should be the information

– The information should be additive: The information

carried by the combined law p( x1 , x2 | ) should be the

sum of those carried by p( x1| ) and p( x2 | )

– The information should be insensitive to the sign of the

change in  and preferably positive

– The information should be a deterministic quantity;

should not depend on the specific random observation

236607 Visual Recognition Tutorial 9

Fisher Information

• Definition (scalar form): Fisher information (about  ), is

the variance of the score

 

2



J ( )  E  ln p( x |  ) 

  

• Example: consider a random variable ~ N ( ,  2 )

1 1

ln p( x |  ,  )   ln(2 )  ln   2 ( x   ) 2

2 2

 x 

v ln p( x |  ,  )  2

 

 x    2  1 2

J ( )  E v   E  2    4 E ( x   )   4  1/  2

 

2



2

 

    

 



236607 Visual Recognition Tutorial 10

Fisher Information - Cntd.

• Whenever   (1 , , k ) is a vector, Fisher information

is the matrix J ( )   J i , j ( )  where

   

J i , j ( )  cov  log p( x |  ), log p( x |  ) 

   j 

 i 

• Remainder:

cov  X , Y   E  X  E[ X ]Y  E[Y ]

• Remark: the Fisher information is only defined whenever

the distributions p( x |  ) satisfy some regularity conditions.

(For example, they should be differentiable w.r.t.  i and

all the distributions in the parametric family must have

same support set).



236607 Visual Recognition Tutorial 11

Fisher Information - Cntd.

• Claim: Let x ( n )  x1 , , xn be i.i.d. random variables ~ p( x |  ).

The score of p( x( n ) |  ) is the sum of the individual scores.

 

• Proof: v( x ) 

(n)

ln p ( x |  ) 

(n)

ln  p( xi |  )

  i





 ln p ( xi |  )

i 



  v( xi )

i





• Example: If x ( n )  x1 , , xn are i.i.d. ~ N ( ,  2 ) the score is

,

 x 

n ln p( x |  ,  )  n 2

 



236607 Visual Recognition Tutorial 12

Fisher Information - Cntd.

• Based on n i.i.d. samples, the Fisher information about 

 

2

is 

J n ( )  E  ln p( x ( n ) |  ) 

  

2

 n



v 2 ( x ( n ) )   E   v( xi ) 

 E 

 i 1 

n

  E v 2 ( xi )   nJ ( )

 

i 1





• Thus, the Fisher information is additive w.r.t. i.i.d. random

variables.

• Example: Suppose x  x1 , , xn are i.i.d. ~ N ( ,  2 ) . From

(n)





previous example we know that the Fisher information 2

about the parameter  based on one sample is J ( )  1/ 

Therefore, based on the entire sample, J n ( )  n / 

2





236607 Visual Recognition Tutorial 13

The Cramer-Rao Inequality

• Theorem: Let  be an unbiased estimator for  . Then





var(ˆ)  1

J ( )



• Proof: Using Ev  0 we have:





  

E  v  Ev  ˆ  Eˆ   E v ˆ  Eˆ 

    

 E vˆ   EˆEv

 

 E[vˆ]





236607 Visual Recognition Tutorial 14

The Cramer-Rao Inequality - Cntd.

• Now 

p( x |  )

E vˆ    

  ˆ p( x |  )dx

p( x |  )



 p( x |  )ˆdx





p( x |  )ˆdx

 





 ˆ      1

 E  

 



236607 Visual Recognition Tutorial 15

The Cramer-Rao Inequality - Cntd.

• So, E  v  Ev  ˆ  Eˆ    E[vˆ]  1

 

• By the Cauchy-Schwarz inequality



   

   E 2 



2

1  E  v  Ev    E 

ˆ ˆ  E  v  Ev   E ˆ ˆ

2



     



ˆ

 E v 2  var( )

 

ˆ

 J ( ) var( )

• Therefore,

1

var(ˆ) 

J ( )

1  

2

 ˆ

( E   )



• For a biased estimator we have: ˆ

var( ) 

J ( )





236607 Visual Recognition Tutorial 16

The Cramer-Rao General Case

• The Cramer-Rao inequality also true in general

form: The error covariance matrix for θ is ˆ

bounded as follows:

ˆ ˆ

C  E[(θ - θ)(θ - θ)t ]  J 1 ( )









236607 Visual Recognition Tutorial 17

The Cramer-Rao Inequality - Cntd.

• Example: Let x ( n )  x1 , , xn be i.i.d. ~ N ( ,  2 ) . From

previous example n J n ( )  n /  2

• Now let ˆ( x )  n  xi be an (unbiased) estimator for  .

(n) 1



i 1





   

2 2

ˆ ˆ

var( )  E   Eˆ ˆ

 E   ˆ

 Eˆ 2  2 E   2  Eˆ 2   2

2

1  

E   xi   2  n 2 2  n 2 

n

1

Eˆ 2 

n 2  i 1  n

 2  2 / n



• So var(ˆ)   / n matches the Cramer-Rao lower

2



bound.

• Def: An unbiased estimator whose covariance meets the

Cramer-Rao lower bound is called efficient.

236607 Visual Recognition Tutorial 18

Efficiency

• Theorem (Efficiency): The unbiased estimator ˆ

θ is

efficient, that is,

ˆ

Eθ  θ

ˆ ˆ

C  E[(θ - θ)(θ - θ)t ]  J 1 (θ)

iff

ˆ

J (θ)(θ - θ)  ν



ˆ

• Proof (If): If J (θ)(θ - θ)  ν then

ˆ ˆ

E[J (θ)(θ - θ)(θ - θ)t J t (θ)]  J (θ)CJ t (θ)  E[ νν t ]  J (θ)



meaning C  J 1 (θ)



236607 Visual Recognition Tutorial 19

Efficiency

• Only if: Recall the cross covariance between ˆ

νand(θ  θ) :

 

2

ˆ

E[ ν (θ - θ)t ] I

The Cauchy-Schwarz inequality for random variables says

 

2

ˆ ˆ ˆ

I  E[ ν (θ - θ)t ]  E[ νν t ]E[(θ - θ)(θ - θ)t ]  JC  1

ˆ

(θ - θ)   ν;C   2 J;  J 1 ;



thus

ˆ

J (θ)(θ - θ)  ν









236607 Visual Recognition Tutorial 20

Cramer-Rao Inequality and ML - Cntd.



• Theorem: Suppose there exists an efficient estimator 



for all  . Then the ML estimator  ML is  .





 1

• Proof: By assumption var( ) 

J ( )



By previous claim     v or

J ( )

 log p( x| ) 

 J ( )(   ) for all 





This holds at    ML and since this is a maximum point

 

the left side is zero so   

ML



236607 Visual Recognition Tutorial 21


Share This Document


Related docs
Other docs by techmaster
KANLAON VOLCANO QUICK REFERENCE NOTES
Views: 38  |  Downloads: 1
Tutorial for creating a web database
Views: 22  |  Downloads: 3
Scitation � A User Guide
Views: 29  |  Downloads: 0
Tutorial 1
Views: 24  |  Downloads: 1
Health Professional Quick Reference
Views: 5  |  Downloads: 0
by registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!