DATA ANALYSIS by suchenfz

VIEWS: 5 PAGES: 24

									DATA ANALYSIS

Module Code: CA660

        Lecture Block 5
 ESTIMATION – Rationale , Summary &
          Other Features
• Estimator validity – how good?
• Basis statistical properties (variance, bias, distributional
  etc.)
• Bias E (ˆ)   where ˆ is the point estimate,  the true
  parameter. Bias can be positive, negative or zero.
  Permits calculation of other properties, e.g. MSE  E (ˆ   )
                                                                  2


  where this quantity and variance of estimator are the
  same if estimator is unbiased.
  Obtained by both analytical and “bootstrap methods”
                                        Bias    j
                                                      ˆ j f ( x)  

  Similarly, for continuous variables
                                                      
                                                       b
                                                  1
  or for b bootstrap replications,      BiasB               ˆi  
                                                  b
                                                      i 1              2
Estimation Rationale etc. - contd.
• For any, estimator ˆ , even unbiased, there is a
  difference between estimator and true parameter =
  sampling error
   Hence the need for probability statements around ˆ
                                     P{T 1  ˆ  T 2}  
   with C.L. for estimator = (T1 , T2), similarly to before and 
  the confidence coefficient. If the estimator is unbiased,
  in other words,  is the probability that the true parameter
  falls into the interval.

• In general, confidence intervals can be determined using
  parametric and non-parametric approaches, where
  parametric construction needs a pivotal quantity =
  variable which is a function of parameter and data, but
  whose distribution does not depend on the parameter.
                                                                    3
   HYPOTHESIS TESTING - Rationale
• Starting Point of scientific research
   e.g. No Genetic Linkage between the genetic markers
  and genes when we design a linkage mapping experiment
  H0 :  = 0.5 (No Linkage) (2-locus linkage experiment)
  H1 :   0.5 (two loci linked with specified R.F. = 0.2, say)

• Critical Region
  Given a cumulative probability distribution fn. of a test
  statistic, F(x) say, the critical region for the hypothesis
  test is the region of rejection in the distribution, i.e. the
  area under the probability curve where the observed test
  statistic value is unlikely to be observed if H0 true.  =
  significance level
                          [1  F ( x)]  
                                                                  4
HT: Critical Regions and Symmetry

• For a symmetric 2-tailed hypothesis test:
         [1  F ( x)]   2     or
              1   2  F ( x)
   distinction = uni- or bi-directional alternative hypotheses

• Non-Symmetric, 2-tailed
            [1  F ( x)]  
   1  b  F ( x)      0  a  , 0  b  ,      a b 

• For a=0 or b=0, reduces to 1-tailed case
                                                                 5
HT-Critical Values and Significance
• Cut-off values for Rejection and Acceptance regions =
  Critical Values, so hypothesis test can be interpreted as
  comparison between critical values and observed
  hypothesis test statistic, i.e.
                     x  x one  tailed
                    x  xU
                           two  tailed,U upper, L lower
                    xL  x

• Significance Level : p-value is the probability of
  observing a sample outcome if H0 true
                                       p  value  1  F ( x)
                                                           ˆ
      ˆ
   F (x) is cum. prob. that expected value less than observed
  test statistic for data under H0. For p-value less than or
  equal to , H0 rejected at significance level  and below.
                                                                6
Related issues in Hypothesis Testing -
         POWER of the TEST
• Probability of False Positive and False Negative errors
  e.g. false positive if linkage between two genes
  declared, when really independent
                            Hypothesis Test Result
        Fact              Accept H0        Reject H0

        H0 True             1-             False positive
                                           = Type I error =

       H0 False          False negative     Power of the Test
                        =Type II error=     = 1- 

• Power of the Test or Statistical Power = probability of
  rejecting H0 when correct to do so. (Related strictly to
  alternative hypothesis and )                                 7
Example on Type II Error and Power

• Suppose have a variable, with known population S.D. =
  3.6. From the population, a r.s. size n=100, used to test at
  =0.05, that
                    H 0 :   17 .5 H1 :   17 .5
• critical values of x for a 2-sided test are:
                       xi   0  U         0  1.96 
                                       n                  n
  for =0.05 where for x i , i = upper or lower and 0 
  under H0
• So substituting our values gives:
  xU  17 .50  1.96 (0.36 )  18 .21;  xL  17 .50  1.96 (0.36 )  16 .79

• But, if H0 false,  is not 17.5, but some other value e.g.
  16.5 say ??
                                                                              8
Example contd.
• Want new distribution with mean  = 16.5, i.e. new
  distribution is shifted w.r.t. the old.
• Thus the probability of the Type II error - failing to reject
  false H0 is the area under the curve in the new distribution
  which overlaps the non-rejection region specified under H0
• So, this is
    16 .79  16 .5      18 .21  16 .5      0.29      1.71 
   P               U                   P      U       
         0.36                0.36           0.36      0.36 
                                           P{0.81  U  4.75}
                                     1  0.7910  0.209
• Thus, probability of taking the appropriate action
  (rejecting H0 when this is false) is 0.791 = Power
                                                                  9
      Shifting the distributions

          Non-Rejection
          region                             f ( x0 )    Rejection region

Rejection region        /2                                 /2
                              16.79   17.5              18.21

                   f ( x1 )


                         16.5          




                                                                            10
Example contd. - Power under alternative
                             for given 
Possible values of                  1-
under H1 for H0 false
          16.0           0.0143    0.9857
          16.5           0.2090    0.7910
          17.0           0.7190    0.2810
          18.0           0.7190    0.2810
          18.5           0.2090    0.7910
          19.0           0.0143    0.9857


• Balancing  and  :  tends to be large c.f.  unless
  original hypothesis way off. So decision based on a
  rejected H0 more conclusive than one based on H0 not
  rejected, as probability wrong is larger in the latter case.
                                                                 11
     SAMPLE SIZE DETERMINATION
• Example: Suppose wanted to design a genetic mapping
  experiment, usually mating design (or population mapping
  type): so conventional experimental design - ANOVA),
  genetic marker type and sample size considered.

   Questions might include:
   What is the statistical power to detect linkage for certain
  progeny size?
   What is the precision of estimated R.F. when sample size
  is N?

• Sample size needed for specific Statistical Power
• Sample size needed for specific Confidence Interval
                                                                 12
Sample size - calculation based on C.I.
For some parameter for normal approximation approach
  valid, C.I. is
                  ˆ
                  U (1 ) / 2ˆ

U =standardized normal deviate (S.N.D.) and range from
  lower to upper limits
              d LU  3.92 ˆ  2 1.96 ˆ

is just a precision measurement for the estimator
Given a true parameter ,
                               2                              1
                         ˆ     or in Informatio terms 
                                                  n
                                 n                           nI ( )
So manipulation gives:       3.92 
                                       2
                                        (2U )  ˆ
                                              2   2
                                                   (2U ) 2
                             d   ˆ  d 2  d 2 I ( )
                          n
                                      2
                                   
                             LU           LU     LU
                                                                13
   Sample size - calculation based on Power
             (firstly, what affects power)?
 • Suppose  = 0.05, =3.5, n=100, testing H0: 0=25 when true
   =24; assume H1:1 <25
    - One-tailed test (U = 1.645):sample mean under H0 = 24.45:
                               x   24 .45  24
                          U                      1.21
                                        0.35
                                   n
                                              N .B. xL  24 .43 under H 0
 Under H1  Power = 0.50+0.39 = 0.89
 Note: Two-sided test at  = 0.05 gives critical values with
xL  24 .31 , xU  25 .69 under H0: equivalently  UL=+0.89,Uu = 4.82 for H1
  (i.e.substitute for x limits in equn. & then recalculate for new  = 1 = 24)
 So, P{do not reject H0: =25 when true mean =24} = 0.1867 =  (Type II)
      Thus, Power = 1 - 0.1867 = 0.8133                                    14
Sample Size and Power contd.
• Suppose, n=25, other values same. 1-tailed now
          xL  23 .85,U L  0.22           Power = 0.4129
• Suppose  = 0.01, critical values 2-tailed xL  24 .10 , xu  25 .90
   with, equivalently, UL = +0.29, UU = +5.43
So, P{do not reject H0: =25 when true mean =24} = 0.1141
                    Power = 0.8859

FACTORS : , n and type of test (1- or 2-sided), true
  parameter value
                           2 (U  U  ) 2
                       n
                            (  0  1 ) 2

where subscripts 0 and 1 refer to null and alternative, and 
  value may refer to 1-sided value or 2-sided value                 15
        ‘Other’ ways to estimate/test
      NON-PARAMETRICS/DISTN FREE
• Standard Pdfs -do not apply to data, sampling distributions or
  test statistics – uncertain due to small or unreliable data sets,
  non-independence etc. Parameter estimation - not key issue.

• Example / Empirical-basis. Weaker assumptions. Less
  „information‟ e.g. median used. Simple hypothesis testing as
  opposed to estimation. Power and efficiency are issues.
  Counts - nominal, ordinal (natural non-parametric data type).
• Nonparametric Hypothesis Tests - (parallels to the
  parametric case).
    e.g. H.T. of locus orders requires complex test statistic
  distributions, so need to construct empirical pdf. Usually,
  assume the null hypothesis using re-sampling techniques, e.g.
  permutation tests, bootstrap, jacknife.
                                                                16
          LIKELIHOOD - DEFINITIONS

• Suppose X can take a set of values x1,x2,…with
                                    L( )  P{X  x  }
  where  is a vector of parameters affecting observed x’s
• e.g.   (  ,  2 ) . So can say something about P{X} if we
  know, say, X ~ N (  ,  2 )
• But not usually case, i.e. observe x‟s, knowing nothing of 
• Assuming x‟s a random sample size n from a known
  distribution, then

                                                                
                                                                   n
  likelihood for                   L( )  L( x1, x 2,.... xn)    L( xi  )
                                                                  i 1


• Finding most likely  for given data is equivalent to
  Maximising the Likelihood function, (where M.L.E. is ˆ                  )
                                                                               17
LIKELIHOOD –SCORE and INFO. CONTENT
• The Log-likelihood is a support function [S()] evaluated at
  point,  ´ say
• Support function for any other point, say  ´´ can also be obtained –
   basis for computational iterations for MLE e.g. Newton-Raphson
• SCORE = first derivative of support function w.r.t. the parameter
           d [ S ( )] or, numerically/discretely, S (  )  S ( )
             
                  d                                                           
• INFORMATION CONTENT evaluated at (i) arbitrary point =
  Observed Info. (ii)support function maximum = Expected Info.

                                                    2               
           I ( )  E
                                             2
                          log L ( / x )    
                           
                                                      E
                                                        2 log L ( / x ) 
                                                                    


                                                                                   18
Example - Binomial variable
(e.g. use of Score, Expected Info. Content to determine type of
mapping population and sample size for genomics experiments)

Likelihood function
                                                         n x
                  L( )  L(n, p )  P{ X  x / n, p}    (1   ) n  x
                                                         x
Log-likelihood                                           
                               n
            Log {L( )}  Log    xLog  (n  x) Log (1   )
                               x
                               
Assume n constant, so first term invariant w.r.t. p = S( ) at point p
            Log[ L( p)]  xLogp  (n  x) Log(1  p)
Maximising w.r.t. p i.e. set the derivative of S w.r.t.  = 0 so
         x   nx                             ˆ ˆ     x
SCORE           0          so M.L.E.       p
        ˆ   1ˆ                                    n
How does it work, why bother?
                                                                        19
Numerical Examples:

See some data sets and test examples:

Simple:
http://warnercnr.colostate.edu/class_info/fw663/BinomialLikeliho
   od.PDF

Context:
http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_1.html All sections
   useful, but especially examples, sections 1-3 and 6

Also, e.g. for R
http://www.montana.edu/rotella/502/binom_like.pdf


                                                                   20
      Bayesian Estimation- in context

• Parametric Estimation - in “classical approach” f(x,) for a
  r.v. X of density f(x) , with  the unknown parameter 
  dependency of distribution on parameter to be estimated.
• Bayesian Estimation-  is a random variable, so can
  consider the density as conditional and write f(x|  )
  Given a random sample X1, X2,… Xn the sample random
  variables are jointly distributed with parameter r.v. . So, joint
  pdf
                       f X 1 , X 2 ,... Xn , ( x1, x 2,... xn, )
• Objective - to form an estimator that gives value of  ,
  dependent on observations of the sample random variables.
  Thus conditional density of  given X1, X2,… Xn also plays a
  role. This is the posterior density
                                                                     21
Bayes - contd.
• Posterior Density         f ( x1, x2,....,xn)

• Relationship - prior and posterior:                    n
                                                ( )    f ( xk  )
                f ( x1, x 2,.... xn )    
                                              
                                                        k 1
                                                         n            
                                              ( )     f ( xk  )  d
                                            
                                                     k 1            
                                                                      

where () prior density of 

• Value: Close to MLE for large n, or for small n if sample
  values compatible with the prior distribution. Also, has
  strong sample basis, simpler to calculate than M.L.E.

                                                                             22
  Estimator Comparison in brief.
• Classical: uses objective probabilities, intuitive estimators,
  additional assumptions for sampling distributions: good
  properties for some estimators.
• Moment:less calculation, less efficient. Not common in genomic
  analysis despite analytical solutions & low bias, because poorer
  asymptotic properties; even simple solutions may not be unique.
• Bayesian - subjective prior knowledge, sample info. close to
  MLE under certain conditions - see earlier.
• LSE - if assumptions met,  ‟s unbiased + variances obtained,
  {(XTX)-1} . Few assumptions for response variable distributions,
  just expectations, variance-covariance structure. (Unlike MLE
  where need to specify joint prob. distribution of variables).
  Requires additional assumptions for sampling distns. Close if
  assumptions met. Computation easier.
                         ˆ                       1
                        β ~ N (β, I(β) 1 ), I  2 XT X
                                                                23
     Addendum: MGF – estimation use
• Mostly for functions of Random variables.
• Normal : reproductive property ( and Poisson and 2      )
                              t   2  t 2 
  actually has M X (t )  exp
                                             
                                              
                                     2       
• Importantly, for X1, X2 ,…Xn independent r.v.‟s with normal
  distributions with means 1 ,  2 .....  n and variances
   12 ,  2 ,.....  n
           2          2



Then r.v. Y  a1 X 1  a2 X 2  ...... n X n
                                     a             is ~ N with
          Y  a11  a2 2  ...... n n
                                   a
           Y  a1  12  a2  2  ...... an  n
            2    2         2   2           2   2


                                                                 24

								
To top