Stat 2013

Document Sample
Stat 2013 Powered By Docstoc
					   Stat 2013


             Chapter 18
Inference about a Population Proportion
 The two types of data — reminder
 Quantitative
     Something that can be counted or measured and then
      added, subtracted, averaged, etc., across individuals in
      the population.
     Example: How tall you are, your age, your blood
      cholesterol level

 Categorical
     Something that falls into one of several categories. What
      can be counted is the proportion of individuals in each
      category.
     Example: Your blood type (A, B, AB, O), your hair color,
      your family health history for genetic diseases, whether
      you will develop lung cancer
                      ˆ
The sample proportion p
We now study categorical data and draw inference on
the proportion, or percentage, of the population with a
specific characteristic.

If we call a given categorical characteristic in the
population “success,” then the sample proportion of
              ˆ
successes, p hat) is:
                (p
      count of successesin the sample
  p
  ˆ
     count of observations in the sample
                             
    Sampling distribution of p
The sampling distribution of p is never exactly
                             ˆ

normal. But as the sample size increases, the
sampling distribution of becomes approximately
normal.
Example
Suppose that 35% of all nurses in America are willing to switch
hospitals if offered a higher salary. If a recruiter randomly contacts
100 nurses, what is the probability that over 40% of the sample would
be willing to switch hospitals if offered a higher salary?

Mean of the sampling distribution = 0.35
Standard Deviation of the sampling distribution = (.35)(.65)/100

We want P(p-hat > 0.40)
Z-score: 0.40  0.35    0.05       Looking up 1.05 in our z-table
                            1.05 gives .1469 (or 14.7%) of the
           (.35)(.65) 0.048
                                   samples of nurses would
              100                  have over 40% willing to
                                   switch hospitals.
Conditions for Inference
In order to use the sampling distribution for
proportions, a few conditions must be met…
 The data must be from an SRS of the
population of interest
 The population is at least 10 times as large as the
sample.
 The sample size n is large enough to ensure that
the distribution of z is close to the Normal model.
One-proportion z-interval
 When the conditions are met, we can find the
 confidence interval for the population proportion p.

                     F pq I
                       
              p  z *G J
              
                     Hn K
 Use this method when the number of successes
 and failures in the sample are both at least 15.
Example
Vitamin D, whether ingested as a dietary supplement or produced naturally
when sunlight falls upon the skin, is essential for strong, healthy bones.
The bone disease rickets was largely eliminated in England during the
1950s, but now there is concern that a generation of children more likely to
watch TV or play computer games than spend time outdoors is at
increased risk. A recent study of 2700 children randomly selected from all
parts of England found 20% of them deficient in vitamin D. Find a 98%
confidence interval for Vitamin D deficiency.

                           (.20)(.80) 
             0.20  2.326 
                                      
                              2700   
             0.20  0.0179          We are 98% confident that
                                            the true proportion of
             (0.1821, 0.2179)               children in England who are
                                            Vitamin D deficient is
                                            between 18.2% and 21.8%.
Because we have to use an estimate of p to compute the margin of
error, confidence intervals for a population proportion are not very
accurate.
                                                             ˆ    ˆ
                                                             p(1 p)
                                                  m  z*
                                                                n



                                       
                                               Specifically, we tend to be
                                               incorrect more often than
                                               the confidence level would
                                               indicate. But there is no
                                               systematic amount
                                               (because it depends on p).


           Use with caution!
“Plus four” confidence interval for p
 A simple adjustment produces more accurate confidence
 intervals. We act as if we had four additional
 observations, two being successes and two being
 failures. Thus, the new sample size is n + 4 and the
 count of successes is X + 2.
                                             counts of successes 2
 The “plus four” estimate of p is: ~ 
                                   p
                                           count of all observations  4

 And an approximate level C confidence interval
 is:           CI : ~  m , with
                       p

                   m  z * SE  z * ~(1  ~) (n  4)
                                    p     p
 Use this method when C is at least 90% and sample size is at least 10.
Example
Wildlife biologists inspect 153 deer taken by hunters and find 32 of
them carrying ticks that test positive for Lyme disease. Use the
“plus 4” method to create a 90% confidence interval for the
percentage of deer that may carry such ticks.
First, find the value of the “plus four” p-hat:
              32  2   34
          p              0.2166
             153  4 157
Now, calculate the “plus four” interval:
                       (.2166)(.7834) With 90% confidence,
       0.2166  1.645                 between 16.25% and
                            157       27.1% of deer may
       0.2166  0.0541                carry Lyme positive
                                                  ticks.
       (.1625,.2707)
Margin of Error and Sample Size
In planning a study, we may want to choose a sample
size that will allow us to estimate the parameter within
a given margin of error (MoE).

The MoE in the large-sample confidence interval is:
                            p (1  p )
                            ˆ      ˆ
              MoE  z   *

                                n
We will need to guess what p-hat is…. You can either
use a guess based on a pilot study (or past
information), or use p-hat=0.50, which is the most
conservative estimate.
Example
It’s believed that as many as
25% of adults over 50 never                   (0.5)(0.5)
graduated from high school.      0.06  1.645
We wish to see if this                            n
percentage is the same among
                                            0.25
the 25 to 30 age group.
                                 0.0365 
How many of this younger age                  n
group must we survey in order
to estimate the proportion of               0.25
non-grads to within 6% with      0.00133 
90% confidence?                              n
                                 n  187.92
MoE=6%
Since we don’t have data about   We should sample at least 188
this age group, use 0.50         people in the 25 to 30 age group.
The One-Sample Proportions test
Just like before, the steps to the hypothesis test
1) State the hypotheses
2) Calculate the test statistic (z-score!)
3) Find the p-value (From the calculator) OR
   the critical value (From the table)
4) State the conclusion (reject or retain) in the
   context of your situation
Example
 Some doctors suspect that young mothers have fewer multiple births. In
 2001, a national vital statistics report indicated that about 3% of all births
 produced twins. Data from a large city hospital found only 7 sets of twins
 were born to 469 teenage girls. Does that suggest that mothers under age
 20 may be less likely to have twins? Test an appropriate hypothesis and
 state your conclusion. Use alpha=0.05
 Step 1: State the hypotheses
      Ho: p=0.03 (No different than the national report)
      Ha: p<0.03 (We suspect young mothers fewer than national)

 Step 2: Calculate the test statistic

          0.015  0.03    0.015
       z                        1.90
           (0.03)(0.97)   0.008
               469
Example
 Some doctors suspect that young mothers have fewer multiple births. In
 2001, a national vital statistics report indicated that about 3% of all births
 produced twins. Data from a large city hospital found only 7 sets of twins
 were born to 469 teenage girls. Does that suggest that mothers under age
 20 may be less likely to have twins? Test an appropriate hypothesis and
 state your conclusion. Use alpha=0.05
 Step 3: Find the p-value (calculator) OR the critical value (table).




 Note… the x indicates the number of successes and must be a whole
 number!!!!
 Example
     Some doctors suspect that young mothers have fewer multiple births.
     In 2001, a national vital statistics report indicated that about 3% of all
     births produced twins. Data from a large city hospital found only 7 sets
     of twins were born to 469 teenage girls. Does that suggest that
     mothers under age 20 may be less likely to have twins? Test an
     appropriate hypothesis and state your conclusion. Use alpha=0.05
     Step 3: Find the p-value (calculator) OR the critical value (table).
                                   Upper tail probability P
      0.25     0.2    0.15     0.1   0.05 0.025     0.02    0.01   0.005 0.003 0.001 0.0005
z*   0.674   0.841   1.036   1.282 1.645 1.960 2.054 2.326         2.576 2.807 3.091 3.291
      50%     60%     70%     80%    90%     95%   96%      98%     99% 99.5% 99.8% 99.9%
                                   Confidence level C

     Since this was a 1-tailed test, we use the 0.05 column. The critical
     value is -1.645 (was a lower tailed test), so anything beyond this
     value is in the rejection region.
Example
 Some doctors suspect that young mothers have fewer multiple births. In
 2001, a national vital statistics report indicated that about 3% of all births
 produced twins. Data from a large city hospital found only 7 sets of twins
 were born to 469 teenage girls. Does that suggest that mothers under age
 20 may be less likely to have twins? Test an appropriate hypothesis and
 state your conclusion. Use alpha=0.05
 Step 4: State the conclusion (reject or retain) in the context of your
 situation

 P-value method: With a p-value of 0.028, this is below our alpha of 0.05,
 so we reject the null hypothesis and conclude that there is evidence that
 mothers under age 20 may be less likely to have twins.

 Critical value method: Our test statistic was -1.90, which is beyond the
 critical value of -1.645, so our test statistic is in the rejection region,
 therefore we conclude that there is evidence that mothers under age 20
 may be less likely to have twins.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:68
posted:8/8/2011
language:English
pages:19