# Stat 2013

Document Sample

```					   Stat 2013

Chapter 18
The two types of data — reminder
 Quantitative
   Something that can be counted or measured and then
added, subtracted, averaged, etc., across individuals in
the population.
cholesterol level

 Categorical
   Something that falls into one of several categories. What
can be counted is the proportion of individuals in each
category.
   Example: Your blood type (A, B, AB, O), your hair color,
your family health history for genetic diseases, whether
you will develop lung cancer
ˆ
The sample proportion p
We now study categorical data and draw inference on
the proportion, or percentage, of the population with a
specific characteristic.

If we call a given categorical characteristic in the
population “success,” then the sample proportion of
ˆ
successes, p hat) is:
(p
count of successesin the sample
p
ˆ
count of observations in the sample

Sampling distribution of p
The sampling distribution of p is never exactly
ˆ

normal. But as the sample size increases, the
sampling distribution of becomes approximately
normal.
Example
Suppose that 35% of all nurses in America are willing to switch
hospitals if offered a higher salary. If a recruiter randomly contacts
100 nurses, what is the probability that over 40% of the sample would
be willing to switch hospitals if offered a higher salary?

Mean of the sampling distribution = 0.35
Standard Deviation of the sampling distribution = (.35)(.65)/100

We want P(p-hat > 0.40)
Z-score: 0.40  0.35    0.05       Looking up 1.05 in our z-table
       1.05 gives .1469 (or 14.7%) of the
(.35)(.65) 0.048
samples of nurses would
100                  have over 40% willing to
switch hospitals.
Conditions for Inference
In order to use the sampling distribution for
proportions, a few conditions must be met…
 The data must be from an SRS of the
population of interest
 The population is at least 10 times as large as the
sample.
 The sample size n is large enough to ensure that
the distribution of z is close to the Normal model.
One-proportion z-interval
When the conditions are met, we can find the
confidence interval for the population proportion p.

F pq I

p  z *G J

Hn K
Use this method when the number of successes
and failures in the sample are both at least 15.
Example
Vitamin D, whether ingested as a dietary supplement or produced naturally
when sunlight falls upon the skin, is essential for strong, healthy bones.
The bone disease rickets was largely eliminated in England during the
1950s, but now there is concern that a generation of children more likely to
watch TV or play computer games than spend time outdoors is at
increased risk. A recent study of 2700 children randomly selected from all
parts of England found 20% of them deficient in vitamin D. Find a 98%
confidence interval for Vitamin D deficiency.

 (.20)(.80) 
0.20  2.326 
            
    2700   
0.20  0.0179          We are 98% confident that
the true proportion of
(0.1821, 0.2179)               children in England who are
Vitamin D deficient is
between 18.2% and 21.8%.
Because we have to use an estimate of p to compute the margin of
error, confidence intervals for a population proportion are not very
accurate.
ˆ    ˆ
p(1 p)
m  z*
n


Specifically, we tend to be
incorrect more often than
the confidence level would
indicate. But there is no
systematic amount
(because it depends on p).

Use with caution!
“Plus four” confidence interval for p
A simple adjustment produces more accurate confidence
observations, two being successes and two being
failures. Thus, the new sample size is n + 4 and the
count of successes is X + 2.
counts of successes 2
The “plus four” estimate of p is: ~ 
p
count of all observations  4

And an approximate level C confidence interval
is:           CI : ~  m , with
p

m  z * SE  z * ~(1  ~) (n  4)
p     p
Use this method when C is at least 90% and sample size is at least 10.
Example
Wildlife biologists inspect 153 deer taken by hunters and find 32 of
them carrying ticks that test positive for Lyme disease. Use the
“plus 4” method to create a 90% confidence interval for the
percentage of deer that may carry such ticks.
First, find the value of the “plus four” p-hat:
32  2   34
p              0.2166
153  4 157
Now, calculate the “plus four” interval:
(.2166)(.7834) With 90% confidence,
0.2166  1.645                 between 16.25% and
157       27.1% of deer may
0.2166  0.0541                carry Lyme positive
ticks.
(.1625,.2707)
Margin of Error and Sample Size
In planning a study, we may want to choose a sample
size that will allow us to estimate the parameter within
a given margin of error (MoE).

The MoE in the large-sample confidence interval is:
p (1  p )
ˆ      ˆ
MoE  z   *

n
We will need to guess what p-hat is…. You can either
use a guess based on a pilot study (or past
information), or use p-hat=0.50, which is the most
conservative estimate.
Example
It’s believed that as many as
25% of adults over 50 never                   (0.5)(0.5)
graduated from high school.      0.06  1.645
We wish to see if this                            n
percentage is the same among
0.25
the 25 to 30 age group.
0.0365 
How many of this younger age                  n
group must we survey in order
to estimate the proportion of               0.25
non-grads to within 6% with      0.00133 
90% confidence?                              n
n  187.92
MoE=6%
Since we don’t have data about   We should sample at least 188
this age group, use 0.50         people in the 25 to 30 age group.
The One-Sample Proportions test
Just like before, the steps to the hypothesis test
1) State the hypotheses
2) Calculate the test statistic (z-score!)
3) Find the p-value (From the calculator) OR
the critical value (From the table)
4) State the conclusion (reject or retain) in the
Example
Some doctors suspect that young mothers have fewer multiple births. In
2001, a national vital statistics report indicated that about 3% of all births
produced twins. Data from a large city hospital found only 7 sets of twins
were born to 469 teenage girls. Does that suggest that mothers under age
20 may be less likely to have twins? Test an appropriate hypothesis and
Step 1: State the hypotheses
Ho: p=0.03 (No different than the national report)
Ha: p<0.03 (We suspect young mothers fewer than national)

Step 2: Calculate the test statistic

0.015  0.03    0.015
z                        1.90
(0.03)(0.97)   0.008
469
Example
Some doctors suspect that young mothers have fewer multiple births. In
2001, a national vital statistics report indicated that about 3% of all births
produced twins. Data from a large city hospital found only 7 sets of twins
were born to 469 teenage girls. Does that suggest that mothers under age
20 may be less likely to have twins? Test an appropriate hypothesis and
Step 3: Find the p-value (calculator) OR the critical value (table).

Note… the x indicates the number of successes and must be a whole
number!!!!
Example
Some doctors suspect that young mothers have fewer multiple births.
In 2001, a national vital statistics report indicated that about 3% of all
births produced twins. Data from a large city hospital found only 7 sets
of twins were born to 469 teenage girls. Does that suggest that
mothers under age 20 may be less likely to have twins? Test an
appropriate hypothesis and state your conclusion. Use alpha=0.05
Step 3: Find the p-value (calculator) OR the critical value (table).
Upper tail probability P
0.25     0.2    0.15     0.1   0.05 0.025     0.02    0.01   0.005 0.003 0.001 0.0005
z*   0.674   0.841   1.036   1.282 1.645 1.960 2.054 2.326         2.576 2.807 3.091 3.291
50%     60%     70%     80%    90%     95%   96%      98%     99% 99.5% 99.8% 99.9%
Confidence level C

Since this was a 1-tailed test, we use the 0.05 column. The critical
value is -1.645 (was a lower tailed test), so anything beyond this
value is in the rejection region.
Example
Some doctors suspect that young mothers have fewer multiple births. In
2001, a national vital statistics report indicated that about 3% of all births
produced twins. Data from a large city hospital found only 7 sets of twins
were born to 469 teenage girls. Does that suggest that mothers under age
20 may be less likely to have twins? Test an appropriate hypothesis and
Step 4: State the conclusion (reject or retain) in the context of your
situation

P-value method: With a p-value of 0.028, this is below our alpha of 0.05,
so we reject the null hypothesis and conclude that there is evidence that
mothers under age 20 may be less likely to have twins.

Critical value method: Our test statistic was -1.90, which is beyond the
critical value of -1.645, so our test statistic is in the rejection region,
therefore we conclude that there is evidence that mothers under age 20
may be less likely to have twins.

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 68 posted: 8/8/2011 language: English pages: 19