Tutorial 6

Reviews
Shared by: techmaster
Categories
Tags
Stats
views:
226
rating:
not rated
reviews:
0
posted:
10/29/2008
language:
UNKNOWN
pages:
0
Tutorial 6 • Bias and variance of estimators • The score and Fisher information • Cramer-Rao inequality 236607 Visual Recognition Tutorial 1 Estimators and their Properties • Let { p( x |  )},   be a parametric set of distributions. Given a sample D  x ( n ) x1 , , xn drawn i.i.d from one of the distributions in the set we would like to estimate its parameter (thus identifying the distribution). • An estimator for  w.r.t. D is any function T ( D)   notice that an estimator is a random variable. • How do we measure the quality of an estimator? • Consistency: An estimator T for  is consistent if p T ( x ( n ) )   , as n    this is a (desirable) asymptotic property that motivates us to acquire large samples. But we should emphasize that we are also interested in measures for finite (and small!) sample sizes. 236607 Visual Recognition Tutorial 2 Estimators and their Properties  to be b(ˆ)   E [ˆ]   2 • Bias: Define the bias of an estimator   Here, the expectation is w.r.t. to the distribution p( x |  ). The estimator is unbiased if its bias is zero b(ˆ)  0 • Example: the estimators x and x  1 n i1 xi , for the mean of a normal distribution, are both unbiased. ) The estimator 1 in1 ( xi x12 n for its variance is biased n 2 whereas the estimator n1 i1 ( xi x )  is unbiased. n • Variance: another important property of an estimator is its ˆ. variance varp ( x| ) ( ) We would like to find estimators with minimum bias and variance. • Which is more important, bias or variance? 236607 Visual Recognition Tutorial 3 Risky Estimators • Employ our decision-theoretic framework to measure the quality of estimators. • Abbreviate ˆ  T ( x ( n ) ) and consider the square error loss function  (ˆ, )  (ˆ   ) 2   • The conditional risk associated with  when  is the true parameter R(ˆ |  )  E (ˆ  )2   (ˆ  )2 p( x( n) |  )dx( n) • Claim: R(ˆ |  )  var(ˆ)  b(ˆ)  variance+bias • Proof: E (ˆ   )2  E (ˆ  Eˆ  Eˆ   )2        ˆ ˆ ˆ  E   E    E     variance+bias 2 2 2 ˆ ˆ ˆ ˆ ˆ  E   E  2 E   E E    Eˆ    2 236607 Visual Recognition Tutorial 4 Bias vs. Variance • So, for a given level of conditional risk, there is a tradeoff between bias and variance. • This tradeoff is among the most important facts in pattern recognition and machine learning. • Classical approach: Consider only unbiased estimators and try to find those with minimum possible variance. • This approach is not always fruitful: – The unbiasedness only means that the average of the estimator (w.r.t. to p( x |  )is  . It doesn’t mean it will be near  for a particular sample (if variance is large). – In general, an unbiased estimate is not guaranteed to exist. 236607 Visual Recognition Tutorial 5 The Score • The score v of the family p( x |  ) is the random variable  p( x |  )  v ln p( x |  )     p( x |  ) measures the “sensitivity” of p( x |  )as a function of the parameter  . • Claim: E[v]  0  • Proof: p( x |  )   E[v]   p ( x |  )dx   p( x |  )dx p( x |  )      p( x |  )dx   1  0  var[v]  E (v  E[v])2   E[v 2 ] • Corollary:   236607 Visual Recognition Tutorial 6 The Score - Example • Consider the normal distribution N (  ,1) 1  1  p( x |  )  exp   ( x   ) 2  2  2  1 1 ln p( x |  )   ln(2 )  ( x   ) 2 2 2  v ln p( x |  )  x    • clearly, • and E[v]  E[ x  ]  E[ x]    0 var(v )  E[v 2 ]  E[( x   ) 2 ]   2  1 236607 Visual Recognition Tutorial 7 The Score - Vector Form • In case where   (1 , , k ) is a vector, the score the vector whose i th component is vi  v is • Example:  ln p ( x |  )  i 1  1  p( x |  ,  )  exp   2 ( x   ) 2  2   2  1 1 ln p( x |  ,  )   ln(2 )  ln   2 ( x   ) 2 2 2  x ln p( x |  ,  )   2  1 ( x   )2 ln p ( x |  ,  )      3  x   1 ( x   )2  v   2 ,     3   236607 Visual Recognition Tutorial 8 Fisher Information • Fisher information: Designed to provide a measure of how much information the parametric probability law p( x |  ) carries about the parameter  . • An adequate definition of such information should possess the following properties: – The larger the sensitivity of p( x |  ) to changes in  , the larger should be the information – The information should be additive: The information carried by the combined law p( x1 , x2 | ) should be the sum of those carried by p( x1| ) and p( x2 | ) – The information should be insensitive to the sign of the change in  and preferably positive – The information should be a deterministic quantity; should not depend on the specific random observation 236607 Visual Recognition Tutorial 9 Fisher Information • Definition (scalar form): Fisher information (about  ), is the variance of the score    J ( )  E  ln p( x |  )     2 • Example: consider a random variable ~ N ( ,  2 ) 1 1 ln p( x |  ,  )   ln(2 )  ln   2 ( x   ) 2 2 2  x  v ln p( x |  ,  )  2    x    2  1 2 2 2 J ( )  E v   E  2    4 E ( x   )   4  1/  2             236607 Visual Recognition Tutorial 10 Fisher Information - Cntd. • Whenever   (1 , , k ) is a vector, Fisher information is the matrix J ( )   J i , j ( )  where     J i , j ( )  cov  log p( x |  ), log p( x |  )      j i   • Remainder: cov  X , Y   E  X  E[ X ]Y  E[Y ] • Remark: the Fisher information is only defined whenever the distributions p( x |  ) satisfy some regularity conditions. (For example, they should be differentiable w.r.t.  i and all the distributions in the parametric family must have same support set). 236607 Visual Recognition Tutorial 11 Fisher Information - Cntd. • Claim: Let x ( n )  x1 , , xn be i.i.d. random variables ~ p( x |  ). The score of p( x( n ) |  ) is the sum of the individual scores.   (n) (n) • Proof: v( x )  ln p ( x |  )  ln  p( xi |  )   i   ln p ( xi |  ) i    v( xi ) i • Example: If x ( n )  x1 , , xn are i.i.d. ~ N ( ,  2 ) the score is ,  x  n ln p( x |  ,  )  n 2   236607 Visual Recognition Tutorial 12 Fisher Information - Cntd. • Based on n i.i.d. samples, the Fisher information about  2 is    J n ( )  E  ln p( x ( n ) |  )       v 2 ( x ( n ) )   E   v( xi )   E   i 1  n 2   E v 2 ( xi )   nJ ( )   i 1 n • Thus, the Fisher information is additive w.r.t. i.i.d. random variables. (n) • Example: Suppose x  x1 , , xn are i.i.d. ~ N ( ,  2 ) . From previous example we know that the Fisher information 2 about the parameter  based on one sample is J ( )  1/  2 Therefore, based on the entire sample, J n ( )  n /  236607 Visual Recognition Tutorial 13 The Cramer-Rao Inequality  • Theorem: Let  be an unbiased estimator for  . Then ˆ)  1 var( J ( ) • Proof: Using Ev  0 we have: E  v  Ev  ˆ  Eˆ   E v ˆ  Eˆ       E vˆ   EˆEv    E[vˆ]     236607 Visual Recognition Tutorial 14 The Cramer-Rao Inequality - Cntd. • Now  p( x |  ) E vˆ     ˆ p( x |  )dx   p( x |  )   p( x |  )ˆdx    p( x |  )ˆdx    ˆ      1  E     236607 Visual Recognition Tutorial 15 The Cramer-Rao Inequality - Cntd. • So, E  v  Ev  ˆ  Eˆ    E[vˆ]  1   • By the Cauchy-Schwarz inequality ˆ ˆ 1  E  v  Ev    E         E 2  ˆ  E  v  Ev   E ˆ       ˆ  E v 2  var( )   ˆ  J ( ) var( ) 2 2    • Therefore, var(ˆ)  1 J ( ) • For a biased estimator we have: 1  ˆ var( )    ˆ ( E   ) J ( )  2 236607 Visual Recognition Tutorial 16 The Cramer-Rao General Case • The Cramer-Rao inequality also true in general ˆ form: The error covariance matrix for θ is bounded as follows: ˆ ˆ C  E[(θ - θ)(θ - θ)t ]  J 1 ( ) 236607 Visual Recognition Tutorial 17 The Cramer-Rao Inequality - Cntd. • Example: Let x ( n )  x1 , , xn be i.i.d. ~ N ( ,  2 ) . From previous example n J n ( )  n /  2 1 (n) • Now let ˆ( x )  n  xi be an (unbiased) estimator for  . ˆ ˆ ˆ var( )  E   E i 1   2 ˆ  E   n 2   2 ˆ  Eˆ 2  2 E   2  Eˆ 2   2 Eˆ 2  1  1  E   xi   2  n 2 2  n 2  n 2  i 1  n  2  2 / n 2 • So var(ˆ)   / n matches the Cramer-Rao lower bound. • Def: An unbiased estimator whose covariance meets the Cramer-Rao lower bound is called efficient. 236607 Visual Recognition Tutorial 18 Efficiency • Theorem (Efficiency): The unbiased estimator efficient, that is, ˆ Eθ  θ ˆ θ is ˆ ˆ C  E[(θ - θ)(θ - θ)t ]  J 1 (θ) iff ˆ J (θ)(θ - θ)  ν then ˆ • Proof (If): If J (θ)(θ - θ)  ν ˆ ˆ E[J (θ)(θ - θ)(θ - θ)t J t (θ)]  J (θ)CJ t (θ)  E[ νν t ]  J (θ) meaning C  J 1 (θ) 236607 Visual Recognition Tutorial 19 Efficiency • Only if: Recall the cross covariance between  ˆ E[ ν (θ - θ)t ]  ˆ νand(θ  θ) : 2 I The Cauchy-Schwarz inequality for random variables says ˆ ˆ ˆ I  E[ ν (θ - θ)t ]  E[ νν t ]E[(θ - θ)(θ - θ)t ]  JC  1 ˆ (θ - θ)   ν;C   2 J;  J 1 ; thus   2 ˆ J (θ)(θ - θ)  ν 236607 Visual Recognition Tutorial 20 Cramer-Rao Inequality and ML - Cntd.  • Theorem: Suppose there exists an efficient estimator    for all  . Then the ML estimator  ML is  .  • Proof: By assumption var( )   By previous claim     1 J ( ) v or J ( )  log p( x| )  for all   J ( )(   )   This holds at    ML and since this is a maximum point   the left side is zero so    ML 236607 Visual Recognition Tutorial 21

Related docs
Tutorial 6
Views: 19  |  Downloads: 0
TUTORIAL 6
Views: 27  |  Downloads: 2
Tutorial 6
Views: 15  |  Downloads: 0
Tutorial Tutorial
Views: 303  |  Downloads: 23
TUTORIAL 6
Views: 37  |  Downloads: 6
TUTORIAL TUTORIAL
Views: 561  |  Downloads: 12
TUTORIAL
Views: 28  |  Downloads: 1
Tutorial
Views: 33  |  Downloads: 0
Tutorial
Views: 251  |  Downloads: 21
TUTORIAL
Views: 128  |  Downloads: 9
Tutorial
Views: 123  |  Downloads: 20
Tutorial
Views: 129  |  Downloads: 11
Tutorial
Views: 267  |  Downloads: 12
Tutorial
Views: 15  |  Downloads: 1
premium docs
Other docs by techmaster