web.as.uky.edustatisticsusersvielesta570s08re by dffhrtcv3

VIEWS: 5 PAGES: 14

									 Everything about
Single Proportions


        STA 570 001-002
        Spring 2008
Inference for Proportions

   All inference for proportions of based on
    drawing a random sample of size n from a
    large population (remember n>30 and we
    assume we are sampling less than 5% of the
    population).
   We estimate the population proportion p with
    the sample proportion phat. While phat is
    usually not exactly equal to p, it varies in a
    close range around p defined by the
    sampling distribution
    phat ~ N(p, sqrt(p(1-p)/n))
Things we’ve learned about
proportions

   We now know how to
   1) Conduct a hypothesis test
   2) Compute power for a hypothesis test
   3) Compute the p-value for a hypothesis test
   4) Determine the sample size for a
    hypothesis test
   5) Construct a confidence interval
   6) Determine a sample size in advance for a
    confidence interval
Conducting a hypothesis test

   We have a null hypothesis that specifies a single
    value for p, H0 : p=p0.
   We are testing against an alternative H1 : p<p0,
    H1 : p>p0, or H1 : p≠p0.
   The null distribution is N(p0, sqrt(p0(1-p0)/n))
   The cutoff depends on the alternative. For “<“,
    reject for phat below the α percentile of the null
    distribution. For “>”, reject for phat above the 1-α
    percentile of the null distribution. For “≠”, reject
    for phat outside the α/2 and 1-(α/2) percentiles
    of the null distribution.
Computing power for a hypothesis
test

   Power is the probability of rejecting the null
    hypothesis when H1 is true (the right decision).
   For composite alternatives, we talk about computing
    power at a point in the alternative (points near the
    null have power near α, point far from the null have
    power near 1).
   So you are given a point p1 in the alternative.
   Compute the alternative distribution N(p1, sqrt(p1(1-
    p1)/n))
   Find the area where you reject H0 (using the null
    distribution as before)
   Compute the probability the alternative distribution
    places in the rejection region. That is the power.
P-values

   The p-value is defined as the transition point
    between α values where you would reject H0
    and α values where you would not reject H0.
   If you have a p-value, remember you reject
    H0 whenever the p-value is smaller than α.
    You do not reject when the p-value is bigger
    than α.
   Computing the p-value depends on the
    alternative AND requires the data.
Computing p-values

   For a “<“ alternative, the p-value is the
    probability the null distribution places below
    phat.
   For a “>” alternative, the p-value is probability
    the null distribution places above phat.
   For a “≠” alternative, if depends on the value
    of phat. For phat below p0, the p-value is
    twice the probability the null distribution
    places below phat. For phat above p0, the p-
    value is twice the probability the null
    distribution places above phat.
Determining sample sizes for
hypothesis tests

   A sample size calculation is premised on
    finding a minimum sample size for achieving
    a desired α while simultaneously achieving a
    desired power (POW) at a particular point p1.
   Remember power changes as p varies in the
    alternative hypothesis, you have to pick
    somewhere you consider meaningful for p1.
   How to compute this sample size depends
    on the alternative.
Determining sample sizes for
hypothesis tests

   For a “<“ alternative, you need to equate the α
    percentile of the null distribution with the POW
    percentile of the alternative distribution.
   For a “>” alternative, you need to equate the 1-α
    percentile of the null distribution with the 1-POW
    percentile of the alternative distribution.
   For a “≠” alternative with p1<p0, you need to equate
    the α/2 percentile of the null distribution to the POW
    percentile of the alternative distribution.
   For a “≠” alternative with p1>p0, you need to equate
    the 1-(α/2) percentile of the null distribution to the 1-
    POW percentile of the alternative distribution.
Determining sample sizes for
hypothesis tests

   Let s0=sqrt(p0(1-p0)) and s1=sqrt(p1(1-p1))
   Let z0 and z1 be the Z values corresponding
    to the appropriate percentiles from the null
    and alternative distribution.
   The minimum required sample size, for all
    cases, is
                                   2
                 z1s1  z0 s0 
             n               
                 p0  p1 
Confidence Intervals

   A confidence interval takes the observed value
    of phat and computes a range of possible values
    for the population proportion p.
   The “confidence level” of this interval refers to
    the probability the procedure will produce an
    interval containing the population proportion.
    Here α is 1 minus the confidence level (e.g. 99%
    confidence results in α=0.01)
   Let z* be the Z-score corresponding to the
    1-(α/2) percentile. Note the values
    corresponding to Z=±z* contain probability
    equaling the confidence level.
Confidence Intervals

   A confidence interval for p is
    phat ± z* sqrt(phat(1-phat)/n)
   Note we have estimated the standard
    deviation using phat. This does not cause
    problems for the sample sizes we are
    considering (n>30).
   Note the length of this confidence interval is
    2 z* sqrt(phat(1-phat)/n))
Computing sample sizes for
confidence intervals

   Recall the length of the confidence interval is
     2 z* sqrt(phat(1-phat)/n))
   To make sure this is less than a prespecified
    length L, you need n to be at least

               n
                                
                  4 p (1  p ) z
                    ˆ       ˆ      * 2

                           2
                          L
   If you have a guess of phat in advance, use
    it. Otherwise, guess phat=0.5 to protect
    yourself against all phats simultaneously.
Common threads

   We make repeated use of the sampling
    distribution (through the null distribution, the
    alternative distribution, and the form of the
    confidence interval).
   Different situations have different sampling
    distributions, but the way we use them we
    remain the same. We will still want particular
    percentiles of the null and alternative
    distributions, for example.

								
To top