; Confidence Intervals and Power
Learning Center
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Confidence Intervals and Power


  • pg 1
									Inference, Confidence Intervals,
    Effect sizes, and Power
              The Vote

    Evidence of hacking into email
    accounts to fix the results

√   Only six people voted.
    In some countries you are only allowed
    to vote after a handful of rich people
    have paid lobbyists and created adverts
    that insult rodentia intellect.

Maya      Amy
                        Today’s Aims

• What is inference?
• What are confidence intervals?
   – How to make and report confidence intervals.
   – A glance at bootstrapping (more in a couple of weeks)
• Touch upon hypothesis testing (more next week)
• Effect sizes (and this will continue)
• What is power?
   – How to calculate and report power.
Interval versus Point Reporting
                 Inference: Point Estimates

• When we calculate the mean of a sample, we use that as an
  estimate of the population mean μ.
• The Plug-in principle.
   – Requires we believe the sample is representative of the
   – Requires that the sample statistic is an unbiased (or at least
      good) estimate of the population parameter.
• Some estimates are biased. The sample range underestimates the
  population range.
               100,000 more Iraqi dead post invasion:
                       Roberts et al. (2004)

     – Timing??? (Thursday before US election)
     – Recommendations
        Geneva convention says occupying force has responsibilities. US
        General says “we don’t do body counts”. Authors argue it can be
        done (they did it in 4 weeks with 7 people) and is necessary for
        Geneva convention.
                            Cluster sample

• Travelling important to minimize! (GPS)
• 33 clusters of 30 houses. Choose nearest 30 houses in each cluster, which
  is probably not good.
• Their power analysis seems to assume non-clustered sample.

                          Area                         Household
Sorry the labels are small, the point is
there is more red and dark blue after the invasion
                          Violent Deaths up

• But should be viewed in light of many methodological limitations.
   – Authors discuss these
• 100,000 estimate is 98,000 with a 95% CI from 8,000 to 194,000 (without
  (with Falluja the lower bound of the confidence interval is lower)
• This band includes most other estimates.

• Ethical problems?
                    What does 8,000 to 194,000 mean?

 It does NOT mean that
      – There is a 95% probability the number of deaths is between those
 It means that
      – If you repeated the survey a billion times, and making lots of
         assumptions, 95% of the time the true value will be in that range.
 The philosophers say we can be 95% “confident”
      – Whatever that means in this context???

US and most English speaking places (UK changed in 1974) -> 1,000,000,000
Others (most of Europe, South America, Cuba, Mexico, etc.) -> 1,000,000,000,000
               Constructing Confidence Intervals

• Population μ (pronounced mu)
• Estimate with sample mean ( ), the plug-in principle
• But with sampling error. Estimating the region which will usually include
  the population mean.
   CI 95%  x  t 0.05                               Lots of assumptions
• Need to know df. df = n - 1 for this test. Here 94 - 1 = 93.
• t0.05 is usually about 2, but you need to look it up the t table.
             Example: Newton's (1998) Hostility Data

• The mean on arrival for the 94 prisoners was 28.3 with a standard
  deviation of 8.0.
• df = n-1 so 93, or about 90 for the t table

             CI 95%  x  t0.05       x  t0.05 se

          CI 95%  28.3  1.99       28.3  1.99(.825)
           28.3  1.6 or 26.7 to 29.9
CI 95%  28 .3  1.99          28 .3  1.6
t distribution with df = 93 and 2.5% in each tail
   What does having 95% CI of 28.3 ± 1.6 mean?

We expect that about 95% of the time when a confidence interval is
made that the population mean (μ) will be within the interval created.
This allows us to be fairly "confident" that the confidence interval we
calculate contains the population mean.

It is not that there is a 95% probability that μ is within the interval.
This is a tricky concept.

Confidence intervals are a fundamental tool for the frequentist
statistician. In the long run, you should be right (i.e., μ within the
interval) about 95% of the time.

(This is a tricky concept and will be revisited)
Plotting the precision of the estimate (confidence intervals) and the spread of
the distribution (standard deviations).
Both are in units of the original variable (here in years).

                       40                                sd
                                95% CI:        x t0.05
                                                         x sd

                            0       10            20            30
                                     Years in prison
                  Examining the Difference between
                   Two Means for the Same Person

                                          sd diff                     sd diff
          CI 95%  x1i  x 2 i  t 0.05              diff  t 0.05
                                              n                           n
• Difference in means ± t0.05 times standard error
• Standard error of difference using estimate of the standard deviation of
  the difference.
• Assumption include that the difference is normally distributed (not the
  individual scores ... for most tests the assumptions are about the
                          last week's journal question

• Just calculate a variable for the difference, and perform the calculations as
  you did before.
     Brewed Awakenings: http://mybrewedawakening.com/
Data from 10 people's coffee preferences.
  FRESHi        INSTANTi           DIFFi  DIFFi -                                    (DIFFi - )2
        5           3                2       1                                            1

                                         this variable assumed normal
        4           3                1       0                                            0
        6           5                1       0                                            0
        3           4               -1      -2                                            4
        4           4                 0     -1                                            1
        5           3                 2       1                                           1
        6           3                 3       2                                           4
        3           3                 0      -1                                           1
        5           3                 2       1                                           1
        4           4                 0      -1                                           1
Sum 45            35                10        0                                          14
Mean 4.5          3.5               1.0       0                                         sd=1.25
     CI 95%  1.0  2.26           1.0  0.89                          Does this allow us to say anything else?

                                for n=20 and sd=10, width = 9.33
Width of 95% CI

                                  for n=20 and sd=5, width = 4.66

                                      for n=20 and sd=2, width = 1.87
                                        for n=20 and sd=1, width = 0.93

                       0   20    40           60          80            100

                                  Sample size
        Confidence intervals for differences between groups

                                             sd diff                     sd diff
           CI 95%  x1i  x 2 i  t 0.05                diff  t 0.05
                                                 n                           n

                              (n1  1) var1  (n2  1) var 2
                 pooled var 
                                   (n1  1)  (n2  1)
one of several

                                                        1   1 
                 CI 95%  x1  x2  t 0.05    pooled var  
                                                         n1 n2 

                         So, more of a pain to calculate.
                      How big is an effect?

• APA and all other science organizations stress the importance of
  saying how large an effect is (when one is found).
• Difference in two means. Raw value. Useful.
• Correlation. Standardized. Also useful.
• Difference in means divided by some measure of spread.
  Standardized. Also useful.
• In Coffee example, standard deviation of liking ratings or of

• Lots of effect size measures for different situations. Many can be
  transformed into a correlation-like measure, so many people like
      Switzerland                                                         Italy

                                           730 km

            They arrived (60.7 ± 6.9 (stat.) ± 7.4 (sys.)) ns faster than light!
                    (v-c)/c = (2.48 ± 0.28 (stat.) ± 0.30 (sys.)) ×10-5

(c = 186,282 miles per s)
        Calculating confidence intervals

In SPSS Explore and often as an option. Similar in R (or as a
function), but often just get the standard error (reason why
discussed soon).

Lots of procedures print the confidence intervals or have
printing them as an option.

There used to be a single useful page that did lots. See
http://faculty.vassar.edu/lowry/VassarStats.htm and
http://www.stat.tamu.edu/~jhardin/applets/ for several pages.
Maybe you can write R functions to do these?
How To Number 1
                      Two Approaches

• Mathematics (which is what is built into SPSS)
• Computation - the bootstrap
                 Hypothesis Testing: The quest for p

• If p < .05 we are happy.
• Not a good philosophy of science, but how a lot of psychology (and other
  disciplines) has been done.
        “The almost universal reliance on merely refuting the null
        hypothesis is a terrible mistake, is basically unsound, poor
        scientific strategy, and one of the worst things that ever
        happened in the history of psychology” (Meehl, 1978, p. 817).

• If H0 is true, 5% of the time we would reject it. This is called a Type 1
• H0 always false, so not really sure what the point of it is (more next week).
                           Power: 1 - β

                               State of the World
        Decision              H0 true         H0 false
        don’t reject H0                     Type 2 error
        reject H0            Type 1 error                .
Probability of making a Type 2 error is conditional on the effect being a
certain size. Denoted β.

1- β is power. Convention to aim for is 80%.

Need to know the size of effect that you want to detect. Most use past
research (recommended) or Cohen’s guidelines. This is wrong!
                    A few ways to do it

Simulation. Set up a model of the smallest effect you want to detect,
and use "sample"
General stats programs. SPSS/PASW has an add-on (and syntax),
R has a few function.
Cohen's tables. Discussed later.
G*Power (or other specialist programs)
How to:
G*Power (Erdfelder, Faul, & Buchner, 1996, and later versions)

 Lots of software out there.

 In R, power.t.test, fpower,
  power.prop.test, etc.

                                    People have written SPSS
                                    syntax for power:
                                    but not easy to use.
                       Cohen’s Tables

• For t test, medium
                            1   2
  sized effect is:       d           0.50
• Small is 0.20 and large is 0.80.
• If the minimum difference worth detecting is 0.50, you need
  128 people in your sample to give you an 80% probability of
  detecting this difference (p < .05).
• For a small effect size you need 784.
• Many surveys have shown that often the power is too low!
Medium: 64 people in each group. 128 total.
                 Is it really that easy?

                        Yes and no

The computations are a little tricky but looking up in the
tables is easy.

Understanding what to do, and getting adequate sample
sizes sometimes difficult
       Thom Baguley’s (2004) critique: Positives

• Avoiding low power
• Avoiding excessive power
• Efficient planning

• Used retrospectively because SPSS prints something called
  power (“fundamentally flawed”).
• Standardization and automation
• Ignoring things other than n which affect power
• Treating the effect size as the expected effect size, not the
  minimum worth detecting
• Should we be rejecting interval hypotheses rather than
  point hypotheses?
Journal: Why Dilbert was doomed to fail, assuming all Ratbert is claiming is
that he is different from chance.

• Confidence intervals give you all the information in a p value,
  plus more.
• Still, an odd thing.

• Power is one thing to take into account when deciding on the
  sample size.
• Do not blindly use Cohen’s conventions.
                    Last Week's Journal

• Take one of your peers' research statements. Generate a
  causal hypothesis of interest and an associative
  hypothesis of interest.
• Create a variable that is the average (i.e., the mean) of
  two normally distributed variables? Is the average of two
  normally distributed variables, itself normally distributed?
     The sum of Normal variables is normal
     The amalgamation is not (in general)

• In one sentence answer the following: Why do we
  calculate the mean value for some attribute for our
• Find out how many participants you need if you want to be able to detect
  an r of .05 with 80% chance, with alpha = .05. And write down the
  number. How about for r = .55.
   – Some do with Cohen's tables, some with
      G*Power. Talk with peers.
• Play with the G*Power plots.
• Why was Dilbert, in the first frame, bound to fail?

• Suppose you have 25 variables distributed Chi-square with three degrees
  of freedom and 200 people. var1 <- rchisq(200,3) makes one variable.
  Look at hist(var1). Is it skewed? Add up 25 of these variables. Is this sum
  skewed? Look at hist(var1 + ... + var25).

   (there is a reason we are doing this, and the code is in two pages ... try
   first without looking, then look)
            Stop Dan
   They don't want anymore
statistics now. They want to go.

(but there is a hint for journal on the next slide)
# Here it is for 25
library(e1071) # This is for the skewness function
par(mfrow=c(1,2)) # This makes 2 graphs on 1 screen
x <- rchisq(200,3)
shapiro.test(x) # This tests normality
                # Shapiro is from FIU
for (i in 2:25)   x <- x + rchisq(200,3)

# Try with sum of 100 variables

To top