Inference for Proportions by cJ74v9

VIEWS: 10 PAGES: 23

									 Inference for Proportions

Inference for a Population Proportion
    Comparing Two Proportions
                              Inference
•       Remember, in most situations much of the ‘population’
        is not known – i.e., we sometimes don’t know:
    –     the population standard deviation (  ) or
    –     the population parameter ( p )
•       But, we can continue to work with samples and use the
        values observed there to make estimates (inferences)
        about the overall population
    –     Sample population parameter (p )  ̂
    –     If the population standard deviation is unknown, you can
                                  ̂
          calculate it using the p = p (1  p )
                                        n
          as long as the population is 10x the sample size
          and the normality conditions are met np  10, n(1  p)  10
                              Inference
•   If we standardize the values (around z), and rearrange
                                
    the equation we get: z  p  p
                                        p(1  p)
                                           n
                                    
•   In very large samples, p approximately is p…
    so, the Standard Error becomes:
                               
                         p (1  p )
                SE 
                             n
    for a confidence interval for p of:
                estimate  z * SEestimate
                               
                         p(1  p)
                p  z*
                             n
Sampling Distribution
    Conditions for Inference about a Proportion
•       Again, for Inferences about a Proportion, the following
        conditions must be met (assumed):
    –     The data are a simple random sample (SRS)
    –     The population is at least 10x as large as the sample
    –     For a test of Ho: p  po , the sample size n is large enough
          that np  10, n(1  p)  10
                       Inference Examples
•       In our activity yesterday, we conducted 4 separate but
        similar simulations…
    –     2 sets of random head/tail coin simulations
                  http : / / www.random.org / coins
                  http : / / shazam.econ.ubc.ca / flip /

    –     Red/black card simulation
                  http : / / www.random.org / playing  cards
    –     Penny-fall simulation
•       What were your sample proportion values?
        compared to the expected values?
•       What did you notice about the Standard Errors?
        and Inference confidence intervals?
             Inference – Calculating Sample Size
 •       Like before, you can also calculate the (needed)
         sample size from proportional information…
         Rearranging from before, the margin of error can be
         rewritten as: m  z* p(1  p)
                                            n

 •       For example…
     –       Using 1 of our head/tail simulations:
             
             p = 0.48, z* for 95% CI (1.960), and a desired margin of
             error (m) of 0.05, the needed sample size (n) is:
              
         p(1  p)                 0.48*0.52             0.2496
m  z*            ...0.05  1.960           ...0.0255         ...Re arranging , n  383.85
            n                         n                    n

             A sample size of 384 is needed for a margin of error of 0.05
 Inference for Proportions

Inference for a Population Proportion
    Comparing Two Proportions
               Comparing Two Proportions
•       Remembering our example of 2 means…
•       When you have a two-sample problem, you need to
        take into account the two sets of sample proportions
        and sample sizes, to give you a resulting sampling
        distribution…
•       But first, we must assume that the two samples are:
    –     Both randomly selected
    –     Both occur from normally distributed populations
    –     Independent from each other        (of course, right?!) 
           Comparing Two Proportions
•   For a 2-sample problem, we have:
       Population   Population              Sample             Sample
                    Proportion               Size             Proportion
           1             p1                   n1                  
                                                                  p 1

           2             p2                   n2                  
                                                                  p2


•   Recalling how to work with standard deviations and
    variances with 2 variables…  X Y   X  Y
                                                   X Y   X   Y
                                                    2        2     2



•                                              
    When we compare the sampling distribution, p1 –  2
                                                    p
    the mean is  P  P   P   P  p1  p2
                     1   2           1        2


                                                            p1 (1  p1 ) p2 (1  p2 )
    and the variance is          2
                                  P  P2     
                                             2
                                             P
                                                     2
                                                     P2                
                                   1          1
                                                                 n1           n2
    Confidence Intervals for Two Proportions
•    The standard deviation of p1 – p2 is the square root of
     the variance… p1 (1  p1 ) p2 (1  p2 )
                                 
                          n1              n2
     (note: you may add the variances, NOT the standard
     deviations!)
•    To obtain a confidence interval, the sample proportions
     can replace the population proportions, resulting in a
     Standard Error of:                      
                              p (1  p ) p (1  p )
                          SE    1         1
                                                  2          2

                                     n1                n2

     the confidence interval has the form:                  estimate  z * SEestimate
Sampling Distribution
                       12.1 Quiz on Friday…
•       Possible Topics:
    –        Conditions needed for an inference problem
    –        Inference for a population proportion
         •      Confidence intervals (z-scores)
         •      Standard errors
    –        Calculating sample sizes
    –        Inference for a two-sample proportion
                      Significance Tests
•       An observed difference between 2 sample proportions
        can reflect a difference in the populations OR
        it may just be due to chance variation in random
        sampling
•       Significance tests can help determine if the sample
        represents the population
    –     Significance tests for p1 – p2
    –     Ho: p1 = p2…
          Null hypothesis says there is no difference between the 2
          populations
    Inference – Remembering our Formulas
•   If we standardize the values (around z), and rearrange
                                
    the equation we get: z  p  p
                                        p(1  p)
                                           n
                                    
•   In very large samples, p approximately is p…
    so, the Standard Error becomes:
                               
                         p (1  p )
                SE 
                             n
    for a confidence interval for p of:
                estimate  z * SEestimate
                               
                         p(1  p)
                p  z*
                             n
Significance Test example (from book, p.707)
•       High levels of cholesterol in the blood are associated
        with higher risk of heart attacks.
    –        Will using a drug to lower blood cholesterol reduce heart
             attacks?
    –        Middle-aged men were assigned at random to 1 of 2
             treatments
         •      2051 took the drug Gemfibrozil
         •      A control group of 2030 took a placebo
         •      During the next 5 years, 56 men in the Gemfibrozil group and 84 in
                the placebo group had heart attacks.
             Significance Test example (cont’d)
•       Significance test for cholesterol vs. heart attacks…
         •      56 of 2051 who took the drug Gemfibrozil had heart attacks
         •      84 of a control group (placebo) of 2030 had heart attacks.
    –        Define variables:
                                                               56
         •      p1 – proportion for Gemfibrozil who       p1        0.0273
                suffer heart attacks                           2051
                                                               84
         •      p2 – proportion for placebo group who     p2        0.0414
                suffer heart attacks                           2030
         •      Ho: p1 = p2 Ha: p1 < p2
    –        Pool the sample proportions             56  84    140
                                                 p                  0.0343
                                                    2051  2030 4081
                                          
    –        Check the conditions… n1p = 2051*(0.0343) = 70.3
                
             n(1-p) = 2030*(0.9657) = 1960.4
             both are > 5, therefore we can use the 2-sample z-procedure
       Significance Test example (cont’d)
•   To test the hypothesis…        Ho: p1 = p2,
                                              
    the z-statistic formula is:             p1  p2
                                  z
                                              1   1
                                       p(1  p)(  )
                                                 n1 n2

•   In terms of a variable Z having the
    standard normal distribution,
    the p-value for a test of Ho against…
    Ha: p1 > p2 is P(Z>z)
    Ha: p1 < p2 is P(Z<z)
        Ha : p1  p2...
        2 P( Z  z )
       Significance Test example (cont’d)
•   Continuing… for 2-sample distribution, z-formula is:
                            
                          p1  p2
                z
                            1   1
                     p(1  p)(  )
                               n1 n2
                          0.0273  0.0414
                
                                      1    1
                    0.0343*0.9657(           )
                                    2051 2030
                   0.0141
                            2.47
                  0.005695
•   The p-value for a test of
    Ho against…
             Ha : p1  p2
      Significance Test example (cont’d)
•   From the z-table, -2.47
    corresponds to 0.0068




•   Interpreting… With a very low p-value, and p < 
    (i.e., far below an  of 0.05 or 0.01), we have enough
    information to reject the Ho (that p1 = p2)
•   In other words, we have evidence to believe the drug
    Gemfibrozil reduced the rate of heart attacks.
               Chapter Review problems…
•       As part of the chapter 12 review,
        the homework is from p. 719:
    –     Problems 12.35 through 12.42
    –     Be prepared to discuss these Tuesday…

•       Reminder: Chapter 11 and 12 test on Wednesday!
        Inference Procedures (Chapters 10-12)
•       At this point, we have worked with a number of
        distributions and types of problems…
        how do we sift through it all ?!
•       There are a few ways to look at these…
    –     Mean ( ) vs. proportions ( p )
    –     Known vs. unknown standard deviations
    –     1-sample, 2-samples, matched pairs
    –     z-distributions, t-distributions
•       See page 719 in your text for a good summary chart…
Inference Procedures (the ‘big picture’)

								
To top