9 Master by cJaxKih

VIEWS: 9 PAGES: 145

									9 Master

Presentation 8.1
                  Example
• A survey of 436 workers showed that 192 of
  them said that it was seriously unethical to
  monitor employee e-mail. When 121 senior-
  level bosses were surveyed, 40 said that it was
  seriously unethical to monitor employee e-mail.
• Let     and     be the population proportion of
  workers and bosses that feel it‟s unethical to
       pW      pB
  monitor e-mail.
We might want to obtain a CI for        pW.  pB
We would first need an estimate of this
difference. It should seem reasonable that an
estimate be
              pW  pB
              ˆ    ˆ
               192 / 436  40 /121
               0.1097
The standard error of             ˆ  pBˆ
                                  pW is estimated by

                                pW (1  pW ) pB (1  pB )
                                ˆ       ˆ     ˆ      ˆ
            s.e.( pW  pB ) 
                  ˆ    ˆ                    
                                    nW           nB

and just thinking intuitively, this means a CI for
         
     pW is pB

                 pW  pB  z s.e.( pW  pB )
                 ˆ    ˆ            ˆ
                                   *
                                        ˆ
• To compute a CI for         B
                          pWwepneed andˆ B p
       ˆ
       pW
      which are 192/436= 0.4403 and
  40/121=0.3305 respectively. This gives a
  standard error of

       0.4403 (1  0.4403 ) 0.3305 (1  0.3305 )
                                                 0.0489
               436                  121
• Now, if we want to obtain an 80% CI for
  pW  pB have
  ˆ    ˆ we

              0.1097  1.282 (0.0489 )
               (0.047 ,0.172 )
• Suppose we want to test the claim that the a
  larger percentage of workers feel that it‟s
  unethical to monitor email. That is


               H1 : pW  pB
            H1 : pW  pB  0
Again, it should seem intuitive that the
test statistic will be of the form
                       pW  pB
                       ˆ     ˆ
               pW (1  pW ) pB (1  pB )
                           
                   nW           nB
but under H0, pW and pB are equal. So, in the
denominator, we can simply replace this with p.
                       pW  pB
                       ˆ     ˆ
                 p (1  p ) p (1  p)
                           
                    nW         nB

An estimate for p is (192+40)/(436+40)
=0.4165. This gives the test statistic as
2.1656.
Similar to the one sample tests, we can make a
decision by
• comparing the test statistic to the critical value.
  If α = 0.05, then the critical value is 1.645. Since
  TS > CV, reject H0.
• or we can compare the p-value to α. The p-
  value is found as P(Z > 2.1656) =0.015. Since
  this value is less than α, we reject H0.
                  Another example
A major court case on the health effects of drinking contaminated
water took place in the town of Woburn, Massachusetts. A town well
was contaminated with industrial chemicals. During the period when
the well was open, 16 birth defects out of 414 births. When this
particular well was shut off from and water was supplied from other
wells, 3 out of 228 birth defects were reported. The plaintiffs suing the
firm responsible for contaminating the well claim that the rate of birth
defects is higher when the contaminated well was in use. Denote the
contaminated well as „C‟ and the other uncontaminated wells as „U‟ and
p be the proportion of birth defects. What exactly are the plaintiffs
wanting to test?
• Obtain a 98% confidence interval for the
  difference in the rate of birth defects for when
  the well was on compared to when it was shut
  off.
• What is the test statistic?
• What‟s the critical value if we use α=0.01?
• What‟s the conclusion? Should the plaintiffs be
  favored here?
Confidence Interval for p

Reasonable Range of Values for
True Population Proportion p
     Confidence Interval for p
• The goal is to take a sample and be able
  to make intelligent guesses about the true
  value of the proportion p in the population.
• A valuable tool is the confidence interval:
  the range of values for p in the population
  that could reasonably have produced the
  sample p-hat we observed.
                 CI Formula
• A confidence interval for the population p is given
  by:

                           p (1  p )
                           ˆ      ˆ
          pZ
          ˆ         *

                               n
                CI Formula
• A 95 percent confidence interval for the
  population p is given by:



                            p (1  p )
                            ˆ      ˆ
      p  1.96
      ˆ
                                n
                  Example
• Suppose we cure p-hat = .9 of n=1000
  heartworm infected dogs. What is the
  reasonable range for the cure rate p of our new
                    . CI for
  treatment? Do 95%9(.1) p.
        .9  1.96
                   1000
        .9  1.96(.009487)
        .9  .0185
        (.8815,.9185)
                Example
• Reasonable range for p (.88, .92) is same
  range argued in previous section on
  sampling distributions for p-hat.
• The only reasonable values for p are those
  that could produce p-hats only a couple of
  standard deviations removed from the
  truth.
     Reeses Pieces Example
• What is the proportion of orange candies,
  p?
• To study this unknown, but very important
  value p, we will construct confidence
  intervals for p from samples of candies.
• Each bag represents a random sample of
  size n from the population of these
  candies.
• From each bag your group should: find n,
      Reeses Pieces Example
• On whiteboard place your information in tabular
  form:

          Grou N P-hat CI
          p
          1
          2
          3
          4
          5
          6
     Reeses Pieces Example
• A histogram of p-hat values should result
  in a representation of the sampling
  distribution of p-hat.
• The center of this histogram should be p.
  What do you think p is?
     Reeses Pieces Example
• From the CI‟s, what do you think the true p
  is?
• Is an evenly distributed color distribution
  p=1/3, a reasonable hypothesis based on
  our data? Why or why not?
• Pay attention to the written conclusion I
  provide on the board !
  Vietnam Veterans Divorce Rate
• N=2101 veterans interviewed found p-hat=777/2101
  = .3698 had been divorced at least once.
• What is reasonable range of values for true divorce
  proportion p?

         .3698  1.96(.01053)
         .3698  .02064
         (.349,.390)
       Vietnam Vets Divorces
• Do you think true divorce proportion is
  greater than .5?
• Ans: No. The reasonable range of values
  for the true p is (.349, .390). This range is
  entirely below p=.5, so we have strong
  evidence that the true divorce proportion is
  BELOW .5 not above it.
      Vietnam Vets Divorces
• Do you think the true divorce proportion
  could be .37?
• Ans: Yes, a proportion like .37 is a
  reasonable value for the true p according
  to our range of reasonable values, so the
  truth could reasonably be .37.
         Domestic Violence
• For those women who had experienced
  some abuse before age 18, the sample
  proportion that had experienced some
  abuse in the past 12 months was p-hat =
  236/569 = .4147
• CI for p: (.374, .455).
• Suppose the true proportion currently
  abused for those not abuse before age 18
  was .11.
• Is there evidence the true population
  proportion in our study is greater than .11?
Ask Marilyn – Let‟s Make a Deal
• In 1991 a reader wrote to Marilyn Vos
  Savant (highest documented IQ) and
  asked whether a player should switch
  doors when playing Let‟s Make a Deal.
• There are 3 doors, two with goats and one
  with a car. You pick a door. The host,
  Monty Hall shows you a door you have not
  picked and there is a goat behind it. You
  are then asked if you wish to switch doors.
  Should you switch?
            Let‟s Make a Deal
• Marilyn said yes, you should switch doors.
• There was a storm of angry letters from bad
  colleges with bad statistics professors.
• “you are the goat”, “take my intro class”, “it is
  clearly 50-50 with no advantage to switching”.
• The next week stats professors from elite
  universities like Harvard, Stanford, UMM
  wrote in and said that Marilyn was correct,
  but her reasoning was wrong.
          Let‟s Make a Deal
• Let‟s play the game on the computer
  simulation, be sure to play the strategy of
  switching doors after a goat is shown to
  you. Keep track of how many times you
  win divided by the number of plays.
  Compute p-hat.
• Who is right? Marilyn or the bad
  professors?
• Do a 95% CI for p, the proportion of
        Level of Confidence
• A CI for p includes a statement of a
  confidence level, usually 95%.
• You should know how to compute
  confidence intervals for any level of
  confidence, but particularly for 80%, 90%,
  95%, 98%, 99%.
• The formula is the same for each, but the
  Z multiplier changes.
               Z Multiplier
• For any confidence level, the Z multiplier is
  obtained by drawing a standard normal
  curve and then placing symmetric
  boundaries around the mean zero.
• For a 95% interval these boundaries
  should contain 95% of the observations
  within these bounds. That means there is
  2.5% of the observations outside these
  bounds in each tail to add to the remaining
  5%.
Finding Z*
              Z-Multiplier
• This means that the upper boundary is at
  the 97.5 percentile, and the lower
  boundary is at the 2.5 percentile.
• Use your normal table and look up in the
  middle for .975 (97.5%), go to the edges to
  observe that the z-value corresponding to
  this point is 1.96. That is why we have
  used 1.96 for the 95% CI multiplier.
         Other Z-Multipliers
• You should be able to verify that the
  correct multipliers for other confidence
  levels are: 1.28, 1.64, 2.33, 2.57.
• Do you know how these were obtained?
What Does 95% Confidence Mean
           Anyway?
• A 95% CI means that the method used to
  construct the interval will produce intervals
  containing the true p in about 95% of the
  intervals constructed.
• This means that if the 95% CI method was
  used in 100 samples, we should expect
  that about 95 of the intervals will contain
  the true p, and about 5 intervals should
  miss the true p.
      Diagram of Confidence
95% of intervals
Contain true p, but
Some do not. About
5% miss truth.




                      p
              CI Meaning
• We never know if our CI has contained the
  true p or not, but we know the method we
  used has the property that it catches the
  truth 90% of the time (for a 90% CI), so it
  probably has done well in our study, or at
  least is not far from the truth.
              Butterfly Net
• A confidence interval is like a butterfly net
  for catching the true p within its
  boundaries.
• Take a swing at the butterfly (p) with your
  net (CI), you have a known reliability of
  catching the butterfly (p), say 90%, but
  you will never know if your net caught the
  butterfly or not, just that it is typically a
  good method for catching butterflies, and
  so it was probably good for you too!
        Percent Confidence
• The percent confidence refers to the
  reliability of the CI method to produce
  intervals that contain the true p.
• Why not do a 100% confidence interval?
  Then we would be completely sure that
  the interval has contained the true p.
             100 % CI for p
• A 100% CI for p is (0, 1), this interval is
  sure to contain the true p.
• However this is not very useful. This
  illustrates the trade-off between
  %confidence and the usefulness of the
  interval to simplify the world.
• We usually choose 90, 95, or 99 percent
  confidence levels.
                 CI Cautions !
• Don‟t suggest that the parameter varies: There is
  a 95% chance the true proportion is between .37
  and .42. YUCK!! It sounds like the true proportion
  is wandering around like an intoxicated (blank) fan.
  (Fill in your most hated sports team in the blank).
  The true p is fixed, not random.

• Don‟t claim that other samples will agree with
  yours: 95% of samples will have proportions
  supporting proposal X between .37 and .42.
  NOPE!! This range is not about sample proportions
  as this statement implies.
      CI Cautions ! (Continued)
• Don‟t be certain about the parameter: The cure
  rate is between 37 and 42 percent. UGG !! This
  makes it seem like the true p could never be
  outside this range. We are not sure of this, just
  sorta-kinda-sure.
• Don‟t forget: It‟s the parameter (not the statistic):
  Never, ever say that we are 95% sure the
  sample proportion is between .37 and .42. DUH
  ! There is NO uncertainty in this, it HAS to be
  true.
• Don‟t claim to know too much.
• Do take responsibility (for the uncertainty).
     CI Cautions ! (Continued)
• Don‟t claim to know too much: “I‟m 95%
  confident that between 37 and 42 percent of
  people in the universe are lunkheads.” Well
  your population really wasn‟t the whole universe,
  just Podunk State U.
• Do take responsibility (for the uncertainty): You
  are the one who is uncertain, not the parameter
  p. You must accept that only 95% of CI‟s will
  contain the true value of p.
          Usefulness of CI‟s
• There is a trade-off between reliability
  (confidence) and the width of the interval.
• Increasing confidence means the interval
  width becomes greater (wider). By
  increasing the sample size, n, the interval
  becomes narrower.
• How big should the sample size be to get
  useful, precise information about the
  population p?
CI Behavior
            Margin of Error
• The margin of error (m) of a confidence
  interval is the plus and minus part of the
  confidence interval, m=Z se(p-hat)
• P-hat +/- Z se(p-hat)
• P-hat +/- m
• A confidence interval that has a margin of
  error of plus or minus 3 percentage points
  means that the margin of error m=.03.
            Margin of Error
• From the formula m=Z se (p-hat), you can
  see that the margin of error depends on
  the confidence level (Z multiplier) and
  through the sample size n inside the
  expression for se(p-hat).
• A common problem in statistics is to figure
  out what sample size will be needed to
  obtain the desired accuracy (margin of
  error m).
         Sample Size Formula
• The sample size n needed to get desired margin of
  error m is given by,



                        2
            Z * *
         n     p (1  p )
                          *

            m 
              Sample Size
• The margin of error desired m, is usually
  provided in the problem. The value Z* is
  determined by the level of confidence that
  is desired. If no level is given, just assume
  95% confidence.
• The p* value is a bit of a chicken and egg
  problem. P* is your best guess about the
  value of the true p.
             Sample Size
• Mmmm, let‟s see, we are trying to do a
  study to estimate p, but we need to know p
  (p*) to compute the needed sample size.
  This seems impossible!
• Quit whining and do the best you can.
  Give the best or most current state of
  knowledge about p as p*. Usually there is
  some information about what p might be.
  If you know absolutely nothing, then use
  p*=.5.
                  Why use p*=.5?
• Here is a graph of p*(1-p*) for values of p*:
p*(1-p*)

    .25




           p*=0       .5                 p*
                            1
            Why use p*=.5
• The graph shows that p*(1-p*) will be
  largest when p*=.5. This means the
  sample size will be largest when p*=.5.
  This means that the sample size will be at
  least as big as actually needed.
• This is called being conservative because
  you are using more data than would
  actually be needed to achieve the margin
  of error desired.
      Sample Size Example
• NBA Games: I had a basketball viewing
  orgy at my house. I watched n=30 NBA
  games from my big blue chair, drank
  beverages of God, ate lots of popcorn. I
  found that X=18 games were won by the
  home team. This means p-hat = 18/30 =
  .6.
• What is a 95% CI for true home court win
  proportion p?
NBA Games Example
           .6(.4)
.6  1.96
             30
.6  .1753
(.4246,.7753)
       NBA Games Example
• Plausible range of values for true home
  court winning proportion was (.42, .78).
  This is not very helpful, I knew this even
  before the first popcorn kernel popped.
• Why was the procedure not more helpful?
• Problem was the margin of error. It was
  huge ! It was about m=.17, .18. The
  sample size was too small to make our
  inference more precise. We need a bigger
  sample size. How big?
          NBA Sample Size
• Suppose we wish to obtain a margin of
  error of m=.02 in a 95% CI for p. What
  sample size is needed?
• n=(1.96/.02)^2 .6(1-.6) = 2304.96
• Round up to n=2305 games. Oh Joy!
  What a fiesta !
• Note that our best knowledge was the
  small study done at my house, there p-hat
  =.6 so it is our best knowledge of the true
  p, so p*=.6.
      Vietnam Vets Example
• If you go back a few slides you will find
  that in the Vietnam Vets divorce rate
  example, the margin of error was about
  .02. Notice this is a small value for m, and
  it was obtained because the sample size
  was huge for that problem. Sample size
  was over 2000 subjects!
Relationship between m and n
m




                        n
         Graph Computation
•   When p*=.5, m=.05, n=385
•   When m=.03, n=1068
•   When m=.02, n=2401
•   etc
 Relationship between m and n
• Notice that as the sample size increases
  initially, there is a big drop in the margin of
  error. It drops substantially early on.
• However, for larger sample sizes there is
  almost no additional reduction in margin of
  error for increasing the sample size.
• Most big surveys are below 2000 – 3000
  subjects. Do you see why?
Poor, Ignorant Phil !
       Right Eye Dominance
• Hold a piece of paper with small hole in
  middle out in front of you with both hands.
  Focus on an object across the room to be
  visible in the hole with both eyes open.
• Now shut one eye, if the object is still
  visible, the open eye is the dominant eye.
• Do a 95% CI for the proportion of the
  population that is right eye dominant, p.
A Recent Poll (Gallup)
                 Poll Details
• Certainly, one of the challenges for the winner of
  this year's election will be to bring a divided
  nation together again.
Survey Methods
• These results are based on telephone interviews
  with a randomly selected national sample of
  1,013 adults, aged 18 and older, conducted Oct.
  14-16. For results based on this sample, one can
  say with 95% confidence that the maximum error
  attributable to sampling and other random effects
  is ±3 percentage points. In addition to sampling
  error, question wording and practical difficulties
  in conducting surveys can introduce error or bias
  into the findings of public opinion polls.
Hypothesis Tests for p

Decisions About the True
Population Proportion p
       Hypothesis Test for p
• You have seen previously the method for
  producing a confidence interval or
  reasonable range for parameter p.
• Hypothesis tests can also be performed
  with one sample proportion to learn about
  the population proportion of interest.
    Hypothesis Test Formula
H 0 : p  p0
H a : p  p0 ,  p0 ,  p0
         p  p0
         ˆ
Z
        p0 (1  p0 )
             n
P  Value  P( Z  Zobs), P( Z  Zobs),
 2 * P( Z | Zobs |)
      Ask Marilyn Example

H 0 : p  .5
H a : p  .5
         p  p0
         ˆ             .689  .5
Z                                 2.54
        p0 (1  p0 )    .5(1  .5)
             n             45
P  Value  P ( Z  2.54)  .0055
         Ask Marilyn Example
• Data is unlikely under Ho, data is inconsistent
  with Ho.
• We have evidence to doubt Ho.
• We have evidence to support Ha.
• We have evidence the proportion of wins by
  switching doors, p, is greater than .5. We have
  evidence that Marilyn is right, we should switch
  doors.
     Reeses Pieces Example
• What is the proportion of orange candies,
  p?
• I believe our data were something like p-
  hat=.52 for n=60 candies. Do appropriate
  hypothesis test.
       Right Eye Dominance
• Hold a piece of paper with small hole in
  middle out in front of you with both hands.
  Focus on an object across the room to be
  visible in the hole with both eyes open.
• Now shut one eye, if the object is still
  visible, the open eye is the dominant eye.
• Do a hypothesis test that the proportion of
  the population that is right eye dominant, p
  is not equal to .5.
          Spinning Pennies
• We wish to test the hypothesis that the
  proportion of spins that will turn heads is
  different than .5.
• Some students perform an experiment and
  find that 17 heads were obtained from 40
  spins. This means p-hat=17/40 = .425.
               Spinning Pennies
H 0 : p  .5
H a : p  .5
         p  p0
         ˆ             .425  .5
Z                                 .95
        p0 (1  p0 )    .5(1  .5)
             n             40
P  Value  2 * P( Z  .95)  2 * P( Z  .95) 
P  Value  2 * (.171)  .342
    Spinning Pennies Conclusion
•   The data is consistent with Ho.
•   There is no evidence to doubt the Ho.
•   There is no evidence to support the Ha.
•   There is no evidence to suggest the
    proportion of spins that are heads is
    anything other than .5.
          Spinning Pennies
• Let‟s do the experiment ourselves.
Inference for Two Population
Proportions
Tests and CI‟s about two
population proportions.
            Data Situation
• We now have two populations, and we
  wish to compare the proportions of these
  populations.
• Population 1 Data: n_1 and p-hat_1.
• Population 2 Data: n_2 and p-hat_2.
     Data Situation
Data :
                       X1
Sample _ 1 : n1 , p1 
                  ˆ
                       n1
                       X2
Sample _ 2 : n2 , p2 
                  ˆ
                       n2
     Hypothesis Test Formula
H 0 : p1  p2  0
H a : p1  p2  0,  0,  0
           p1  p2  0
           ˆ ˆ
Z                          , where
                1 1
        p(1  p)  
        ˆ     ˆ 
                 n1 n2  
   X1  X 2
p
ˆ
   n1  n2
P  Value  P( Z  Zobs), P( Z  Zobs),
 2 * P( Z | Zobs |)
     Hypothesis Test Formula
• Notice the p-hat with no subscript in the
  denominator of the Z statistic. This is
  called the pooled proportion.
• Under the Ho we hypothesize that both
  populations have the same proportion, so
  the natural thing to do is use all the data to
  estimate the common proportion. Simply
  add all events and divide by the total
  sample size.
       Red Dye #2 Example
• 2 samples conducted on lab animals. One
  group was given a typical animal diet with
  44 animals. Four developed tumors.
  Thus, p-hat=.091
• In a group given red dye # 2, there were
  14 animals developing tumors out of 44.
  Thus p-hat=.318.
         Red Dye Hypothesis Test
H 0 : pR  pC  0
H a : pR  pC  0
           p1  p2  0
           ˆ ˆ                     .318  .091
Z                         
                 1 1                      1  1 
       p (1  p )  
       ˆ      ˆ             .205(1  .205)  
                  n1 n2 
                                           44 44 
Z  2.64
    X1  X 2    4  14
p
ˆ                      .205
     n1  n2   44  44
P  Value  P( Z  2.64)  .0041
      Red Dye #2 Conclusion
• The data are unusual if the Ho is true.
  The data are inconsistent with the Ho.
• There is evidence to doubt the Ho.
• There is evidence to support the Ha.
• There is evidence that p_r > p_c, and this
  means there is evidence the red dye #2
  group has a higher proportion of animals
  with cancerous tumors than the control
  diet. This is evidence that RD#2 is a
  carcinogen.
   Red Dye #2 Historical Note
• All red color food disappeared for a while.
  No Jello, no red M&M‟s, no Hawaiian
  Punch, etc, poor young Jon .
• Eventually another red dye was approved
  for sale. Jon‟s favorite mass-produced
  junk items returned .
       Saracco Study (Italy)
• Study of heterosexual couples where one
  member of the couple was HIV infected.
• First group used condoms regularly, 171
  couples. Of these 3 subsequently became
  infected. P-hat = 3/171=.0175
• Second group did not use condoms
  regularly. There were 55 such couples,
  and 8 subsequently became infected, p-
  hat = 8/55 = .14545.
           Saracco Hypothesis Test
H 0 : pR  p N  0
H a : pR  p N  0
          pR  p N  0
          ˆ     ˆ                 .0175  .14545
Z                        
                1 1                          1   1
       p(1  p)  
       ˆ      ˆ            .04867(1  .04867)     
                 n1 n2 
                                              171 55 
Z  3.84
    X1  X 2   38
p
ˆ                     .04867
     n1  n2 171  55
P  Value  P( Z  3.84)  .0002
        Saracco Conclusion
• The data are unusual under Ho, so data
  are inconsistent with Ho.
• There is evidence to doubt the Ho.
• There is evidence to support the Ha.
• There is evidence that p_r<p_n, this
  means evidence that HIV infection
  proportion is less in group that used
  condoms regularly.
      Saracco Historical Note
• This was the study that prompted world
  health officials to proclaim that regular
  condom use was “effective” in preventing
  HIV infection.
• This does not mean that using condoms is
  risk-free, all it means is that the infection
  proportion was statistically less than not
  using them.
    Confidence Interval Formula

                   p1 (1  p1 ) p2 (1  p2 ) 
                    ˆ       ˆ     ˆ      ˆ
p1  p2  Z
ˆ ˆ           *
                  
                                            
                                              
                        n1          n2       
        Confidence Intervals
• The crucial value used to evaluate these
  intervals is zero. If all values are above
  zero, it implies that proportion p_1 is
  greater than p_2.
• If the interval is all negative, there is
  evidence p_1<p_2.
• If the interval contains zero, it means no
  difference is a plausible/reasonable
  statement, and thus no evidence to say
  that the proportions differ.
           Woburn Mass CI
• In Woburn Massachusetts there were
  public wells that provided the city‟s water
  supply.
• When the questionable water was being
  consumed there were 16 adverse birth
  outcomes out of 414 births. P-hat =
  16/414=.039.
• When the water was not being consumed,
  there were 3 adverse birth outcomes out
       Woburn Confidence Interval
                    p y (1  p y ) pn (1  pn ) 
                     ˆ        ˆ      ˆ      ˆ
p y  pn  Z
ˆ     ˆ        *                               
                         ny            nn       
                                                
                    .039(1  .039) .013(1  .013) 
.039  .013  1.96                               
                         414            228       
.026  1.96(.012)
.026  .024
(.002,.05)
    Woburn Water Conclusion
• The plausible range of value for p_y-p_n is
  (.002, .05).
• The entire plausible range is positive.
• This means there is evidence the p_y >
  p_n, and that the proportion of adverse
  birth events with the water on is greater
  than when the water was not used. There
  is evidence the water is responsible for an
  increase in adverse birth events in
  Woburn.
        Woburn Water Note
• Entertainment note: Hollywood film, A
  Civil Action, starring John Travolta and
  Robert DuVall is based on this problem
  situation.
• I believe the parents shown in the video
  clip are part of the plot of the movie.
         Propranolol Study
• Potential usefulness of propranolol for
  recent heart attack victims. Population
  proportion p_c=proportion death within 2
  years. Population proportion p_p=
  proportion death within two years.
Propranolol Confidence Interval
               pc (1  pc ) p p (1  p p ) 
                 ˆ      ˆ     ˆ       ˆ
 pc  p p  Z 
 ˆ    ˆ      *
                                           
                    nc           np        
                                           
.0954  .0704  1.645 *
   .0954(1  .0954) .0704(1  .0704) 
                                    
        1919             1918        
.025  1.645(.00889)
.025  .0147
(.0103,.0397)
      Propranolol Conclusion
• Note was 90% CI. The plausible range of
  values for p_c – p_p is (.01, .04).
• This range includes only positive values.
• This implies p_c > p_p, and that there is a
  higher death proportion under usual care,
  and that two year death rates are reduced
  when using propranolol.
• Is this a big deal?
Personal Ads Data
   Personal Ads Comparisons
• Compute four confidence intervals – one
  for each of the four attributes.
• Compute 95% CI‟s for p_male – p_female.
• Write complete conclusion for each
  interval.
Large-sample Confidence
 Interval for a Population
        Proportion
•A confidence interval for a
population characteristic is an
interval of plausible values for the
characteristic. It is constructed so
that, with a chosen degree of
confidence, the value of the
characteristic will be captured
inside the interval.
     Confidence Level
•The confidence level associated
with a confidence interval estimate
is the success rate of the method
used to construct the interval.
                    Recall
     For the sampling distribution of p,
                  p(1  p)
     mp = p, p           and for large* n
                     n
     The sampling distribution of p is
     approximately normal.
Specifically when n is large*, the statistic
p has a sampling distribution that is
approximately normal with mean p and
standard deviation p(1  p) .
                          n
* np  10 and np(1-p)  10
         Some considerations




Approximately 95% of all large samples will
result in a value of p that is within
                p(1  p) of the true population
1.96p  1.96
                   n
proportion p.
          Some considerations
      Equivalently, this means that for 95% of
      all possible samples, p will be in the
      interval
                     p(1  p)              p(1  p)
            p  1.96          to p  1.96
                         n                    n

Since p is unknown and n is large, we estimate
          p(1  p)      p(1  p)
                   with
             n             n

This interval can be used as long as
                      np  10 and np(1-p)  10
The 95% Confidence Interval
    When n is large, a 95% confidence
    interval for p is
                   p(1  p)           p(1  p) 
         p  1.96          , p  1.96          
                      n                  n 


  The endpoints of the interval are often
  abbreviated by            p(1  p)
                  p  1.96
                               n
  where - gives the lower endpoint and + the
  upper endpoint.
                 Example
•For a project, a student randomly sampled
182 other students at a large university to
determine if the majority of students were in
favor of a proposal to build a field house.
He found that 75 were in favor of the
proposal.

•Let p = the true proportion of students that
favor the proposal.
         Example - continued
                 75
             p      0.4121
                182
   So np = 182(0.4121) = 75 >10 and
   n(1-p)=182(0.5879) = 107 >10 we can use
   the formulas given on the previous slide to
   find a 95% confidence interval for p.

         p(1  p)                 0.4121(0.5879)
p  1.96           0.4121  1.96
            n                          182
                   0.4121  0.07151

     The 95% confidence interval for p is
              (0.341, 0.484).
 The General Confidence
        Interval
The general formula for a confidence
interval for a population proportion p
when
  1. p is the sample proportion from a
     random sample , and
  2. The sample size n is large
     (np  10 and np(1-p)  10)
is given by
                                p(1  p)
     p   z critical value 
                                   n
     Finding a z Critical Value
     •Finding a z critical value for a 98%
     confidence interval.




                                    2.33
Looking up the cumulative area or 0.9900 in the
body of the table we find z = 2.33
Some Common Critical
      Values
    Confidence z critical
      level     value
       80%        1.28
       90%        1.645
       95%        1.96
       98%        2.33
       99%        2.58
       99.8%      3.09
       99.9%      3.29
                Terminology

    The standard error of a statistic is the
    estimated standard deviation of the statistic.


For sample proportions, the standard deviation is
                 p(1  p)
                    n

This means that the standard error of the sample
proportion is
                            p(1  p)
                               n
                    Terminology


        The bound on error of estimation, B,
        associated with a 95% confidence interval is
        (1.96)·(standard error of the statistic).



The bound on error of estimation, B, associated
with a confidence interval is
   (z critical value)·(standard error of the statistic).
                   Sample Size
       The sample size required to estimate a
       population proportion p to within an amount
       B with 95% confidence is

                               2
                    1.96 
   n  p(1  p)            
                    B 
 The value of p may be estimated by prior
 information. If no prior information is available,
 use p = 0.5 in the formula to obtain a
 conservatively large value for n.

Generally one rounds the result up to the nearest integer.
 Sample Size Calculation Example
•If a TV executive would like to find a 95%
confidence interval estimate within 0.03 for
the proportion of all households that watch
NYPD Blue regularly. How large a sample
is needed if a prior estimate for p was 0.15.
We have B = 0.03 and the prior estimate of p = 0.15
                      2                       2
                 1.96                  1.96 
   n  p(1  p)         (0.15)(0.85)         544.2
                 B                     0.03 
   A sample of 545 or more would be needed.
           Sample Size Calculation
              Example revisited
     •Suppose a TV executive would like
     to find a 95% confidence interval
     estimate within 0.03 for the
     proportion of all households that
     watch NYPD Blue regularly. How
     large = 0.03 and should use p = 0.5 in
We have B a sample is needed if we have
     no reasonable prior estimate for p.
the formula.
                   2                    2
              1.96                1.96 
n  p(1  p)         (0.5)(0.5)         1067.1
              B                   0.03 
The required sample size is now 1068.
Notice, a reasonable ball park estimate for p
can lower the needed sample size.
                   Another Example
           •A college professor wants to
           estimate the proportion of students
           at a large university who favor
           building a field house with a 99%
           confidence interval accurate to 0.02.
           If one of his students performed a
            B = 0.02, a prior estimate estimated p to
   We havepreliminary study and p = 0.412 and we
           be 0.412, how large a sample
   should use the z critical value 2.58 (for a 99%
   confidence interval) take.
           should he
                   2                         2
              2.58                    2.58 
n  p(1  p)         (0.412)(0.588)         4031.4
              B                       0.02 
      The required sample size is 4032.
         Large Sample Hypothesis
                 Test for a Single
                      Proportion
         To test the hypothesis
         H0: p = hypothesized proportion,
         compute the z statistic
                p  hypothesized value
   z
        hypothesized value(1-hypothesized value)
                           n
In terms of a standard normal random variable z, the
approximate P-value for this test depends on the
alternate hypothesis and is given for each of the
possible alternate hypotheses on the next 3 slides.
                        Hypothesis Test
               Large Sample Test of Population
                        Proportion




                                                            
                           p  hypothesized value           
P-value  P  z                                             
                   hypothesized value(1-hypothesized value) 
                                                            
                                      n                     
                        Hypothesis Test
                       Large Sample Test of
                       Population Proportion




                                                            
                           p  hypothesized value           
P-value  P  z                                             
                   hypothesized value(1-hypothesized value) 
                                                            
                                      n                     
                        Hypothesis Test
              Large Sample Test of Population
                       Proportion




                                                             
                            p  hypothesized value           
P-value  2P  z                                             
                    hypothesized value(1-hypothesized value) 
                                                             
                                       n                     
Hypothesis Test Example
     Large-Sample Test for a
      Population Proportion
•An insurance company states that
the proportion of its claims that are
settled within 30 days is 0.9. A
consumer group thinks that the
company drags its feet and takes
longer to settle claims. To check
these hypotheses, a simple
random sample of 200 of the
company‟s claims was obtained
and it was found that 160 of the
               Example 2
         Single Proportion
p = proportion of the company’s claims that are
                continued
     settled within 30 days
H0: p = 0.9
HA: p  0.9
                              160
 The sample proportion is p       0.8
                              200
        0.8  0.9      0.8  0.9
z                               4.71
      (0.9)(1  0.9)    0.9(0.1)
           200            200

     P-value  P(z  4.71)  0
                      2
      Single Proportion
             continued
The probability of getting a result as strongly or
more strongly in favor of the consumer group's
claim (the alternate hypothesis Ha) if the
company’s claim (H0) was true is essentially 0.
Clearly, this gives strong evidence in support of
the alternate hypothesis (against the null
hypothesis).
            Example 2
      Single Proportion
We would say continued support for
             that we have strong
the claim that the proportion of the insurance
company’s claims that are settled within 30 days
is less than 0.9.
Some people would state that we have shown
that the true proportion of the insurance
company’s claims that are settled within 30 days
is statistically significantly less than 0.9.
                Hypothesis Test
                Example Single
                  Proportion
•A county judge has agreed that he will give
up his county judgeship and run for a state
judgeship unless there is evidence at the
0.10 level that more then 25% of his party is
in opposition. A SRS of 800 party members
included 217 who opposed him. Please
advise this judge.
Hypothesis Test Example
   Single Proportion
       continued
p = proportion of his party that is in opposition
H0: p = 0.25
HA: p > 0.25
 = 0.10
Note: hypothesized value = 0.25

               217
n  800, p         0.27125
               800
   0.27125  0.25
z                 1.39
      0.25(0.75)
         800
 Hypothesis Test Example
Single Proportion continued
    P-value=P(z  1.39)  1  0.9177  0.0823


    •At a level of significance of 0.10,
    there is sufficient evidence to
    support the claim that the true
    percentage of the party members
    that oppose him is more than 25%.

    •Under these circumstances, I
       Large-Sample Inferences
     Difference of Two Population (Treatment)
                    Proportions
     Some notation:


                        Population     Sample
                Sample Proportion of Proportion of
                 Size   Successes Successes
Population or
 treatment 1      n1         p1           p1
Population or
 treatment 2      n2         p2           p2
Properties: Sampling Distribution
             of p1- p2
If two random samples are selected
independently of one another, the following
properties hold:
1. m p p  p1  p2
       1     2

                          p1 (1  p1 ) p2 (1  p2 )
2.                              
       2             2     2
       p1  p 2      p1    p2                       and
                               n1          n2
                    p1 (1  p1 ) p2 (1  p2 )
   p p                       
        1    2
                         n1          n2
3. If both n1 and n2 are large [n1 p1  10,
   n1(1- p1)  10, n2p2  10, n2(1- p2)  10],
   then p1 and p2 each have a sampling
   distribution that is approximately normal
  Large-Sample z Tests for p1
          – p2 = 0
The combined estimate of the common
population proportion is


     n1p1  n 2 p 2
pc 
       n1  n 2
      total number of successes in two samples
  
                  total sample size
          Large-Sample z Tests for p1
                  – p2 = 0
         Null hypothesis: H0: p1 – p2 = 0

         Test statistic:
                              p1  p 2
            z
                     p c (1  p c )   p c (1  p c )
                                    
                           n1              n2
Assumptions:
 1. The samples are independently chosen random
    samples OR treatments are assigned at random to
    individuals or objects (or vice versa).
 2. Both sample sizes are large:
     n1 p1  10, n1(1- p1)  10, n2p2  10, n2(1- p2)  10
Large-Sample z Tests for p1
        – p2 = 0
Alternate hypothesis and finding the P-value:
1. Ha: p1 - p2 > 0
       P-value = Area under the z curve to the
           right of the calculated z
2. Ha: p1 - p2 < 0
       P-value = Area under the z curve to the
           left of the calculated z
3. Ha: p1 - p2  0
       i. 2•(area to the right of z) if z is positive
       ii. 2•(area to the left of z) if z is negative
            Example - Student Retention
           A group of college students were asked what they
           thought the “issue of the day”. Without a pause the
           class almost to a person said “student retention”. The
           class then went out and obtained a random sample
           (questionable) and asked the question, “Do you plan
           on returning next year?”
           The responses along with the gender of the person
           responding are summarized in the following table.
                                       Response
                                   Yes   No Maybe
                        Male       211    45    19
               Gender
                        Female     141    32    9
Test to see if the proportion of students planning on returning is
the same for both genders at the 0.05 level of significance?
Example - Student Retention
p1 = true proportion of males who plan on returning
p2 = true proportion of females who plan on returning
n1 = number of males surveyed
n2 = number of females surveyed
p1 = x1/n1 = sample proportion of males who plan on
                             returning
p2 = x2/n2 = sample proportion of females who plan on
                             returning

Null hypothesis: H0: p1 – p2 = 0
Alternate hypothesis: Ha: p1 – p2  0
      Example - Student Retention
      Significance level:  = 0.05

      Test statistic:
                          p1  p 2
       z
                 p c (1  p c )   p c (1  p c )
                                
                       n1              n2

Assumptions: The two samples are independently
chosen random samples. Furthermore, the sample sizes
are large enough since
        n1 p1 = 211  10, n1(1- p1) = 64  10
        n2p2 = 141  10, n2(1- p2) = 41  10
Example - Student Retention
Calculations:
     n1p1  n 2 p 2 211  141 352
pc                               0.7702
       n1  n 2      275  182 457
                p1  p 2
z
      p c (1  p c )       p c (1  p c )
                       
          275                  182
                       0.76727  0.77473
  
      0.77024(1  0.77024)                      0.77024(1  0.77024)
                                            
                   275                                  182
  -0.0074525
             -0.19
   0.040198
       Example - Student Retention
        P-value:
        The P-value for this test is 2 times the area
        under the z curve to the left of the computed
        z = -0.19.
        P-value = 2(0.4247) = 0.8494

Conclusion:
Since P-value = 0.849 > 0.05 = , the hypothesis H0 is
not rejected at significance level 0.05.
There is no evidence that the return rate is different for
males and females..
                Example
A consumer agency spokesman stated that he
thought that the proportion of households having
a washing machine was higher for suburban
households then for urban households. To test to
see if that statement was correct at the 0.05 level
of significance, a reporter randomly selected a
number of households in both suburban and
urban environments and obtained the following
data.
                          Number     Proportion
                           having      having
               Number     washing     washing
               surveyed   machines   machines
   Suburban      300        243       0.810
   Urban         250        181       0.724
                  Example
p1 = proportion of suburban households having
      washing machines
p2 = proportion of urban households having
      washing machines
p1 - p2 is the difference between the proportions
       of suburban households and urban
       households that have washing machines.
H0: p1 - p2 = 0
Ha: p1 - p2 > 0
                         Example
       Significance level:  = 0.05
       Test statistic:
                              p1  p 2
            z
                     p c (1  p c )   p c (1  p c )
                                    
                           n1              n2

Assumptions: The two samples are independently
chosen random samples. Furthermore, the sample sizes
are large enough since
        n1 p1 = 243  10, n1(1- p1) = 57  10
        n2p2 = 181  10, n2(1- p2) = 69  10
                   Example
Calculations:

     n1p1  n 2 p 2 243  181 424
pc                               0.7709
       n1  n 2      300  250 550

                p1  p2
     z
        pc (1  pc ) pc (1  pc )
                    
            n1           n2
                  0.810  0.742
       
                               1    1 
           0.7709(1  0.7709)         
                               300 250 
        2.390
                        Example
       P-value:
       The P-value for this test is the area under the z
       curve to the right of the computed z = 2.39.
       The P-value = 1 - 0.9916 = 0.0084
Conclusion:
Since P-value = 0.0084 < 0.05 = , the hypothesis H0 is
rejected at significance level 0.05. There is sufficient
evidence at the 0.05 level of significance that the
proportion of suburban households that have washers is
more that the proportion of urban households that have
washers.
 Large-Sample Confidence
     Interval for p1 – p2
When
1. The samples are independently selected random
   samples OR treatments that were assigned at
   random to individuals or objects (or vice versa), and
2. Both sample sizes are large:
   n1 p1  10, n1(1- p1)  10, n2p2  10, n2(1- p2)  10
A large-sample confidence interval for p1 – p2 is


                                     p1 (1  p1 ) p 2 (1  p 2 )
 (p1  p2 )   z critical value                
                                          n1           n2
                         Example
      A student assignment called for the students to survey
      both male and female students (independently and
      randomly chosen) to see if the proportions that
      approve of the College’s new drug and alcohol policy.
      A student went and randomly selected 200 male
      students and 100 female students and obtained the
      data summarized below.
               Number Number that Proportion
               surveyed approve   that approve
        Female   100       43         0.430
        Male     200       61         0.305
Use this data to obtain a 90% confidence interval estimate
for the difference of the proportions of female and male
students that approve of the new policy.
                            Example
          For a 90% confidence interval the z value to use is
          1.645. This value is obtained from the bottom row of
          the table of t critical values (Table III).
          We use p1 to be the female’s sample approval
          proportion and p2 as the male’s sample approval
          proportion.
                        0.430(1  0.430) 0.305(1  0.305)
(0.430  0.305)  1.645                 
                              100              200
             (0.125)  0.097      or    (0.028,0.222)
   Based on the observed sample, we believe that the
   proportion of females that approve of the policy exceeds the
   proportion of males that approve of the policy by
   somewhere between 0.028 and 0.222.

								
To top