Docstoc

The Mann-Whitney Test

Document Sample
The Mann-Whitney Test Powered By Docstoc
					     Ch11: Comparing 2 Samples
                     11.1: INTRO:
    This chapter deals with analyzing continuous
                    measurements.
    Later, some experimental design ideas will be
                      introduced.
Chapter #13 will be devoted to qualitative data analysis.
               11.2: Comparing
           Two Independent Samples
In medical study, one sample X of subjects may be
  assigned to a control (placebo) treatment and another
  sample Y to a particular (group) treatment.
This section deals with independent samples and later
  sections with dependent & paired samples.
X 1 ,...,X n observations from control group   F
                                                 iid


Y1 ,...,Ym observations fromtreatment group   G
                                                iid


X and Y are Two Independent random samples
                    INFERENCE about :
the difference of means or other location parameters of F and G
                   11.2.1: Methods based on
                      Normal Distributions
                                   Assumptions:
                   
X 1 ,...,X n ~ N  X ,  2     
               
Y1 ,...,Ym ~ N  Y ,  2   
X and Y are Two Independent random samples
The difference of means    X   Y has natural  estimate X  Y
                           1 1 
X  Y ~ N   X  Y ,  2    
                                 
                           n m 

 2 known  X  Y   z / 2  
                              1 1
                                is a 100(1   )%CI for    X   Y
                              n m
                                                           (n  1) S X  (m  1) S Y
                                                                     2             2
 unknown  estimated by the pooled sample var iance s p 
 2                                                     2

                                                                  mn2
                       11.2.1 : (cont’d)
Theorem A :
                                                      
Let X 1 ,...,X n ~ N  X ,  2 and Y1 ,...,Ym ~ N  Y ,  2      
be Two Independent random samples

 The statistic t 
                     X  Y         X    Y 
                                                     follows a t df
                                  1 1
                             sp     
                                  n m
t df is a t  distr ibution with df  m  n  2 deg rees of freedom
Corollary A : under the assumption s of Theorem A,

a 100(1   )% CI for  X   Y is X  Y   t m  n  2  / 2   s p
                                                                           1 1
                                                                            
                                                                           n m
                  11.2.1 : (cont’d)
      Test Procedures for Normal Populations:
Null Hypothesis:
                     H 0 :  X   Y or  X   Y  0   
Test Statistic:
                     t
                        X  Y   0
                           1 1
                       sp    
                           n m
There 3 common alternative hypotheses. 2 of which
  are one-sided ( H A :  X  Y or H A :  X  Y )
  and one is two-sided ( H A :  X  Y ).

Revisit my handouts about CI and HT for references
           11.2.2 : Power calculation
The power of the 2-sample t-test depends on:

1.        X  Y (real difference)
     –   The larger  , the greater the power
2.   (level of significance)
    – The larger  , the more powerful the test
3.  (population standard deviation)
    – The smaller  , the larger the power
4. n and m (sample sizes)
     – The larger n and m, the greater the power
      11.2.2 : Power calculation (cont’d)
Assume that n=m (same sample size) are large enough
  to test at level  , H :    vs H :    with
                        0    X     Y     A     X    Y

                                   X Y
 test statistic based on     Z                      ,
                                        2
where  ,  ,  are given.         
                                        n
The rejection region (RR) of such a test is:
     Z  z( / 2)  X  Y  z( / 2)   2 / n

The power of a test is the probability of rejecting the null
  hypothesis when it is false. That is,
              Power against '   X  Y
                                                  
 (' )  PRR | '  P X  Y  z ( / 2)   2 / n 
   
 P X  Y   z ( / 2)   2 / n 
 PX  Y   z ( / 2)     2/ n
    X  Y   '  z ( / 2)   2 / n  ' 
 P                                         
     2/n                   2/n             
    X  Y   '  z ( / 2)   2 / n  ' 
 P                                         
     2/n                   2/n             
             ' n           ' n 
  1    z            z      
                        
           2   2          2  2
    Application: what n is needed?
As the difference  ' moves away from zero, one of the
  terms     ' n                ' n 
            z         or   z          
              2   2           2  2
 will be negligible with respect to the other.
Problem: want to be able to detect a difference of ' 1
  with probability 0.9 and   5?
Solution:             ' n                       1 n
         1   1.96         0  0.9   1.96       0.1
                       2                        5 2
                  1 n
          1.96        1 (0.1)  1.28
                  5 2
          n  525
     11.2.3: The Mann-Whitney Test
        (a nonparametric method)
Known as the Wilcoxon RST (Rank Sum Test).
Assume that m + n experimental units are to be
 assigned (at random) to a treatment group and
 a control group. In this specific context, n
 (remaining m) units are randomly chosen and
 assigned to the ctrl (to the trt).
We are interested in testing the null hypothesis
 that the treatment has NO EFFECT.
Then, if the null is true, then any difference in the
 outcomes under the 2 conditions is due to the
 randomization (i.e. solely by chance).
    The Mann-Whitney Test: (cont’d)
    The MW-test statistic is calculated as follows:
1. Group all m + n observations together and Rank
   them in order of increasing size (no ties)
2. Calculate the sum of the ranks of those
   observations that came from the ctrl group.
3. Reject null if the sum is too small or too large
Example: ranks are bold and shown in parentheses
                 Treatment       Control
                    1 (1)         6 (4)
                    3 (2)         4 (3)
       R = 3 + 4 = 7 (ctrl) and R = 1 + 2 = 3 (trt)
     The Mann-Whitney Test: (cont’d)
Question: Does this discrepancy provide convincing
  evidence of a systematic difference between trt & ctrl,
  or could it be just due by chance?
Answer: null hypothesis  trt had no effect
Under the null, every assignment (total: 4!=24) of ranks
  to observations happens equally likely.
                             4      4!
In particular, each of the   
                             2  2!(4  2)!  6 assignments
                             
of ranks to the ctrl group (shown below) is equally likely:

          Rank {1,2} {1,3} {1,4} {2,3} {2,4} {3,4}
            R      3     4     5     5      6     7
     The Mann-Whitney Test: (cont’d)
The null distribution of R is the discrete r.v. R:

          r        3    4    5    6     7 Sum
        P(R=r)     1    1   2     1     1     1
                   6    6   6     6     6
                             1
From this table, P( R  7)     ; that is to say that this
  discrepancy would occur6    one time out of 6 by chance.
Similar computations can be carried out for any sample
  sizes m and n and can be even extended to testing:
H 0 : F  G , where CTRL X 1 ,...,X n ~ F and TRT Y1 ,...,Yn ~ G

                 Read page 404 (textbook).
           The Mann-Whitney Test:
              Another approach
Suppose that the X’s are sampled from F and the Y’s
  are sampled from G. The Mann-Whitney test can be
  derived from a different point of view than what was
  seen earlier.
We would like to estimate the probability   P( X  Y )that
  an observation from F is smaller than an independent
  observation from G which is as a measure of the
  treatment, where X and Y are independently
  distributed with distribution functions F and G.
An estimate of  can be obtained by comparing all n
  values of X to all m values of Y and by calculating the
  proportion  of the comparisons for which X is less
              ˆ
  than Y.
      The Mann-Whitney Test:
     Another approach (Cont’d)
              1
That is :  
           ˆ     UY ,
              mn
               n   m
where U Y    Z
              i 1 j 1
                          ij


           1 , if X i  Y j
and Z ij  
           0 , otherwise
Theorem A : Under the null H 0 : F  G ,
                              mn( m  n  1)
E U Y      and VarU Y  
           mn
           2                       12
       11.3: Comparing Paired Samples
                Paired Design vs Unpaired design:
CASE1 : Paired Design
( X i , Yi ) are pairs with i  1,...,n
The X ' s have mean  X and var iance X
                                       2


TheY ' s have mean  Y and var iance Y
                                      2


Assume different pairs are Independent ly distributed
Assume cov(X i , Yi )   XY   X  Y ,   pair members correlation
The differences Di  X i  Yi are Independent and D  X  Y
E ( Di )   X   Y and Var( Di )   X   Y  2  X  Y
                                       2     2


                                    1 2
                                    n
                                          
 E ( D )   X   Y and Var( D )   X   Y  2  X  Y
                                             2
                                                              
                      11.3: (cont’d)
                        Unpaired Design:

CASE 2 :Unpaired Design
If the 2 samples X ' s and Y ' s are independent , then   0
Then  X   Y will be estimated by X  Y

                                        n
                                             
E X  Y    X   Y and VarX  Y    X   Y
                                        1 2      2
                                                       
        
      1 2
      n
                 2
                               1 2
                                                 
Thus,  X   Y  2  X  Y   X   Y for   0
                                n
                                         2



In this circumst ance, PAIRING is the more effective DESIGN.
                          11.3: (cont’d)
                      What if  X    Y      ?

What if  X   Y   ?
                            2 2                        2 2 (1   )
Then Varunpaired X  Y        and Varpaired D  
                             n                                n
                                        Varpaired D 
Thus, the relativeefficiencyis :                                1 
                                                        2
                                 Varunpaired X  Y  
                                                           2


                                                         n
              1
That is ,    Paired Design with n pairs will be as precise as
              2
an Unpaired Design with 2n subjects per treatment.
               Pros & Cons
     Paired vs Independent Samples:
Here are 2 competiting sampling schemes:
Paired Samples: n pairs (2n measurements)
Independent Samples: 2n observations (m=n)
            They both give the common form:
             ˆE  t    , where D  X  Y
       D  S ˆ df  
                    2
   But, the SE estimates and the df for t are different:
            Independent Samples Paired Samples
                      1 1                  sD
     ˆˆ
     SE          sp    
                      n n                    n
     t df      2n—2 = 2(n—1)              n—1
            Pros & Cons
  Paired vs Independent Samples:
For a same SE estimate, a loss of DF (degrees of
  freedom) gives a larger value for the t-test.
(example: n  10  t (9)  1.833  t (18 )  1.734 )
                     0.05             0.05
         A loss of DF for the t-test produces:
• C.I. Larger Confidence Intervals
• H.T. Loss of Power to detect real differences in
  the population means.
Such loss of DF for Paired Samples is compensated
  by a smaller variance Var(X—Y) of Paired Samples
  with respect to Independent Samples.
  11.3.1: Parametric Methods on the
  Normal Distribution for Paired Data
Assume that D  X  Y ~ N  ,  
                i     i    i         D
                                         2
                                         D

where  D   X   Y  E ( Di ) and  D  var(Di )
                                       2


              Inferences will be based on :
     D  D
t
       sD
              because    2
                           D   is unknownin general 

t followsa t  dist ' n with n  1deg rees of freedom.
                                                 
a 100(1   )% CI for  D is : D  s D  t n 1  
                                                2
Testing H 0 :  D  0 (no treatment effect) vs H A :  D  0
                                                                   
at level has the rejection region D  s D  t n 1    t  t n 1  
                                                    2              2
   11.3.2: Nonparametric Method for
   Paired Data: Sign Rank Test (SRT)
The Wilcoxon SRT is computed as follows:
1. Rank the absolute values of the differences (no ties)
    with Ri  rank of Di fori  1,...,n
2. To get the signed ranks, just restore the signs of the
 Di to the ranks.
3. Calculate W , the sum of those ranks that have
    positive (+) signs.
            Example: Let Di be -2, 4, 3, 2, -1, 5
 -1(r1), -2(r2) ,+2(r3) ,+3(r4) ,+4(r5) ,+5(r6)  4 + obs.
             23               
   W  2.5  tie   between  2   4  5  6  17 .5
                 2             
             Wilcoxon SRT (cont’d):
Theorem A: Under the null hypothesis that the Di are
  independent and symmetrically distributed about zero,
                   n(n  1)                n(n  1)(2n  1)
         E W            and VarW  
                      4                           24
Proof:                   1,
     n
                                if the k th l arg est | Di | has Di  0
W   kIk , where I k  
     k 1                0,    otherwise
                             1
under H 0 , I k ~ Bernoull i   independently
                             2
             1                 1
 E ( I k )  and Var( I k ) 
             2                 4
The result follows.
      11.4: Experimental Design

Some basic principles of DOE (Design of
 Experiment) are introduced here.

Experimental Design can be viewed as a
 sequence of linked studies under some
 conditions.

     Read case studies 11.4.1 thru 11.4.8

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:2/2/2013
language:Unknown
pages:24