web.as.uky.edustatisticsuserskvielesta570s10p by dffhrtcv3

VIEWS: 4 PAGES: 17

									 Everything about
Single Proportions


        STA 570 401-402
        Spring 2010
Things we’ve learned about
hypothesis testing

   1) Planning a hypothesis test
    –   For fixed α and power at a fixed alternative value, we
        can determine the required sample size n.
   2) Planning or Analysis phase of a hypothesis
    test.
    –   For fixed n and α, we can compute the cutoff for the
        hypothesis test (where to reject).
    –   For fixed n and α, we can compute the power for a
        fixed alternative value.
   3) Analysis phase of a hypothesis test (e.g. we
    have data).
    –   Determine whether or not to accept or reject H0.
    –   Compute the p-value of the test.
Things we’ve learned about
confidence intervals

   1) Prior to collecting data
    –   We can determine the required sample size to
        achieve a particular confidence interval length.
   2) After collecting data
    –   We can construct a confidence interval from the
        data.
Information common to all
hypothesis tests we have studied
or will study

   The null hypothesis states a parameter is equal
    to a specific value, while the alternative can
    state that the parameter is any of “<“, “>”, or “≠”
    that specific value.
   The cutoff of the hypothesis test is based on the
    null distribution, which is the sampling
    distribution when H0 is true. The cutoff is in the
    direction/s of the alternative hypothesis.
   Cutoff/s
    –   α percentile for a “<“ alternative (reject below)
    –   1-α percentile for a “>” alternative (reject above)
    –   both the α/2 and 1-(α/2) percentiles for a “≠”
        alternative (reject to the extremes)
More common information

   Finding the power of the test requires the
    alternative distribution, the sampling
    distribution at a fixed point in the alternative
    hypothesis.
   The power of the test is the probability of
    rejecting the null hypothesis when the
    alternative hypothesis is true (the right
    answer, thus we want a high power).
More common information

   To compute a sample size in advance of the
    experiment, we want to find a sample size
    which simultaneously achieves a given α and
    power.
   We need both the null and alternative
    distributions in terms of an unknown n.
   We then solve an equation equating
    percentiles of the null and alternative
    distributions, based on the direction of the
    alternative.
Which percentiles to equate?

   To find the required sample size in
    advance...
    –   “<“ alternative – equate α percentile of null to
        POW percentile of alternative.
    –   “>” alternative – equate 1-α percentile of null to 1-
        POW percentile of alternative.
    –   “≠” alternative with alternative value less than null
        value – equate the α/2 percentile of null to POW
        percentile of alternative.
    –   “≠” alternative with alternative value greater than
        null value – equate the 1-(α/2) percentile of null to
        POW percentile of alternative.
Common information about p-
values

   p-values are computed from the data.
   For all α > p-value, reject the null hypothesis
    H0. For all α < p-value, do not reject H0.
   Way to remember – small p-values typically
    result in rejecting H0.
   To compute a p-value, you need to compute
    a probability involving the null distribution, in
    the direction of the alternative hypothesis.
Computing p-values

   To compute a p-value, compute the following
    probabilities under the null distribution.
    –   “<“ alternative – compute the probability below the
        data.
    –   “>” alternative – compute the probability above
        the data.
    –   “≠” alternative, data below the null value –
        compute the probability below the data AND
        double the result.
    –   “≠” alternative, data above the null value –
        compute the probability above the data AND
        double the result.
Common information about
confidence intervals

   For STA570 (there are more general situations
    we do not consider) a confidence interval is
    centered at the best point estimate available.
   The width of the confidence interval is
    determined by computing the width needed to
    contain the middle 1-α of the sampling
    distribution.
   If that width itself depends on the parameter
    (e.g. p(1-p) involves the parameter p), then
    estimate the parameter in determining the width.
   To determine a sample size in advance, set up
    an equation relating the length of the interval to
    n, then solve for n.
Inference for Proportions

   All inference for proportions of based on
    drawing a random sample of size n from a
    large population (remember n>30 and we
    assume we are sampling less than 5% of the
    population).
   We estimate the population proportion p with
    the sample proportion phat. While phat is
    usually not exactly equal to p, it varies in a
    close range around p defined by the
    sampling distribution
    phat ~ N(p, sqrt(p(1-p)/n))
Conducting a hypothesis test

   We have a null hypothesis that specifies a
    single value for p, H0 : p=p0.
   We are testing against an alternative H1 :
    p<p0, H1 : p>p0, or H1 : p≠p0.
   The null distribution is N(p0, sqrt(p0(1-p0)/n))
   Where useful, the alternative distribution is
    N(p1, sqrt(p1(1-p1)/n)) for a fixed p1.
Determining sample sizes for
hypothesis tests

   Let s0=sqrt(p0(1-p0)) and s1=sqrt(p1(1-p1))
   Let z0 and z1 be the Z values corresponding
    to the appropriate percentiles from the null
    and alternative distribution.
   The minimum required sample size, for all
    cases, is
                                   2
                 z1s1  z0 s0 
             n               
                 p0  p1 
Confidence Intervals

   A confidence interval takes the observed value
    of phat and computes a range of possible values
    for the population proportion p.
   The “confidence level” of this interval refers to
    the probability the procedure will produce an
    interval containing the population proportion.
    Here α is 1 minus the confidence level (e.g. 99%
    confidence results in α=0.01)
   Let z* be the Z-score corresponding to the
    1-(α/2) percentile. Note the values
    corresponding to Z=±z* contain probability
    equaling the confidence level.
Confidence Intervals

   A confidence interval for p is
    phat ± z* sqrt(phat(1-phat)/n)
   Note we have estimated the standard
    deviation using phat. This does not cause
    problems for the sample sizes we are
    considering (n>30).
   Note the length of this confidence interval is
    2 z* sqrt(phat(1-phat)/n))
Computing sample sizes for
confidence intervals

   Recall the length of the confidence interval is
     2 z* sqrt(phat(1-phat)/n))
   To make sure this is less than a prespecified
    length L, you need n to be at least

               n
                                
                  4 p (1  p ) z
                    ˆ       ˆ      * 2

                           2
                          L
   If you have a guess of phat in advance, use
    it. Otherwise, guess phat=0.5 to protect
    yourself against all phats simultaneously.
Common threads

   We make repeated use of the sampling
    distribution (through the null distribution, the
    alternative distribution, and the form of the
    confidence interval).
   Different situations have different sampling
    distributions, but the way we use them we
    remain the same. We will still want particular
    percentiles of the null and alternative
    distributions, for example.

								
To top