An introduction to sample size and power calculations by uwx15571


									An introduction to sample size and power

Bhaswati Ganguli,
Department of Statistics,
University of Calcutta.

1st December, 2009.
An example
 Sample 1: 99 64 91 115 101

 Sample 2: 119 116 97 126 114

 True difference in population means is 5

   Two Sample t-test
t = 2.1294, df = 8, p-value = 0.06586
alternative hypothesis: true difference in means is
                        not equal to 0
95 percent confidence interval= [-1.7, 43.2]
Now lets repeat this experiment 100 times

                          In 92 out of 100
                          repetitions, we conclude
                          that there is no
                          difference in sample

                          Power for comparison of 2 means.
                           mu1     = 110
                           mu2     = 115
                           sd1    = 20
                           sd2    = 20
                           n1     =5
                           n2     =5
                           alpha = 0.05
                           power = 0.059
Another example
 Sample 1: 1 1 0 1 1 0 0 1 0 0 1 0 1 0 1

 Sample 2: 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0

 True population Odds Ratio (OR) = 1.5

 The 95% confidence interval for the OR is [0.06, 3.8]
Repeat the experiment 100 times

                         In 71 out of 100
                         repetitions, we
                         conclude that the
                         population OR is 1.
Some prerequisites
 Parameter

 Test of hypothesis

 Power

 “..much confusion may arise when a word in common
  use is also given a technical meaning. Statistics abounds
  in such terms, including normal, random, variance,
  significant, etc.”
   Altman & Martin ; BMJ 1999;318:1667-1667
  ( 19 June ).

 Variable: Information recorded about a sample of

 Parameter: do not relate to actual measurements or
  attributes but to quantities defining a theoretical model.
 In green = Histogram showing distribution of
measurements of serum albumin in 481 white men.
 In red = Density showing the normal distribution which
fits the data most closely.
Test of hypothesis
 A rule for deciding, based on the observed sample,
  whether the population parameter assumes a certain
  specified value.
Tests of hypothesis
   H   0   : The mean serum albumin among white males aged over 30 is 40.
   H   a   :The mean serum albumin among white males aged over 30 is 48.

   H   0   : The proportion of low birth weight babies in rural India is 20%.
   H   a   : The proportion of low birth weight babies in rural India is 40%.

   H   0   : The OR for osteoporeosis among women as compared to men is 1.0.
   H   a   : The OR for osteoporeosis among women as compared to men is 2.0.
Parameter =a single proportion
 Health workers wish to determine whether the rate of
  neonatal tetanus is decreasing.

 What sample size is necessary to test the null hypothesis
  that the population proportion is 0.15 at the 0.05 level if
  it is desired to have a 90% probability of detecting a
  decrease to a rate of 100 per thousand if that were the
  true proportion?
 Prob[ test correctly detects decrease| proportion is 0.1,
  Type I error = 0.05] = 0.9

 n =1.645√0.15(0.85) + 1.282√0.10(0.90)2(0.05)2 =
  377.90 .

 Hence we see that a total sample size of 378 live births
  would be necessary.
For more details:
 References: Dixon and Massey (1983),
             Lemeshow et al. (1990),
             Fleiss (1981)
             Lachin (1981).
 Books containing sample size tables are available e.g.
              Machin and Campbell (1987);
              Machin et al. (1997;
              Lemeshow et al. (1990).
 Commercial and public domain software available.
                    R Documentation for binom.confint

   Nine methods are allowed for constructing the confidence interval(s):

   Exact - Pearson-Klopper method.

   Asymptotic - using the Central Limit Theorem.

   agresti-coull - Agresti-Coull method.

   Wilson - Wilson method.

   prop.test - equivalent to prop.test(x = x, n = n, conf.level = conf.level)$

   Bayes - see binom.bayes.

   Logit - see binom.logit.

   Cloglog - see binom.cloglog.

   Probit - see binom.probit.

   Profile - see binom.profile.
Parameter= Relative Risk
 Two competing therapies for a particular cancer are to
  be evaluated in a multi-center clinical trial. Patients
  are randomized to either treatment A or B and are
  followed for recurrence of disease for five years following

 How many patients should be studied in each of the two
  arms of the trial in order to have 90% power to reject
  H0 : RR = 1 in favor of the alternative RR = 0.5, if the
  test is to be performed at the two-sided α = 0.05 level
  and it is assumed that the probability of recurrence in
  the placebo group= 0.35?
Parameter=Odds Ratio
 The efficacy of BCG vaccine in preventing childhood
  tuberculosis is in doubt and a study is designed to
  compare the immunization coverage rates in a group of
  tuberculosis cases compared to a group of controls.

 Available information indicates that roughly 30% of the
  controls are not vaccinated, and we wish to have an
  80% chance of detecting whether the odds ratio
  is significantly different from 1 at the 5% level.
  If an odds ratio of 2 would be considered an important
  difference between the two groups, how large a sample
  should be included in each study group?
Additional Considerations
   References: Dixon and Massey (1983), Lemeshow et al.
    (1990), Fleiss (1981) Lachin (1981).

   Books containing sample size tables are available e.g. Machin
    and Campbell 1987; Machin et al. 1997; Lemeshow et al.

   Commercial and public domain software is available for sample
    size calculation.

   Fine print:
       May be based on normal approximation or Fishers exact test
       May require variance stabilisation,
       May require continuity corrections for values near 0 or 1 (or for
        small sample sizes),
       For a fixed total size, power will tend to be higher if sample sizes
        are equal
       Sample size calculations for the difference between two correlated
        proportions are based on the McNemar test.
Parameter =Difference in mean values
  A two-group, randomized trial is planned in elderly females
  after hip fracture.

  The outcome variable will be change in hematocrit level during
  the study.

     The sample sizes in the two groups will be equal.
     A 5% level two-sided t test.
     Pilot data suggests that the standard deviation for change
      will be about 2.0%
     It would be of interest to detect a difference of 2.2% in the
      changes observed in placebo and treated groups.

  What sample size in each group would be required to achieve a
  power of 90% ?
 Unequal variances: When the standard deviations in the
  two groups are markedly unequal, the usual t test with
  pooled variances is no longer the appropriate test.

 Transformations:
   Eg square root, log, Box-Cox
   Use if there is a pattern to the inequality
     (eg if groups with higher means have higher sds)

 If transformation does not solve the problem,it is
  possible that comparison of means is not the most
  appropriate method.

 If it is,a two sample t-test appropriate for a Behrens
  Fisher situation may be used.
 If   non-normality is an issue,
       Plan a large study
       Consider transformations
       Use a non-parametric procedure instead, such as the
        two-sample Mann-Whitney|Wilcoxon rank test.
Logistic Regression with a single
continuous risk factor

   About 30% of patients with blocked arteries followed for a
  year will have renewed blockage = “restenosis”.
  A study is to be planned to assess the effect of serum
  cholesterol on the likelihood of restenosis.

     Based on the prior results from a screening trial, mean
      serum cholesterol in middle-aged males is about 210
     One standard deviation above the mean is approximately
      250 mg/dL.
     In the screening study, the OR for the six-year death rate
      for these two cholesterol levels was about 1.5. The study
      should be large enough to detect an effect of serum
      cholesterol on arterial restenosis of a size similar to that
      seen for death rate.
Logistic regression with a single
continuous covariate

  We plan to conduct the test of the predictive effect of
  cholesterol level on the probability of restenosis using a
  5% two-sided test and want to have 90% power to
  detect an odds ratio of 1.5 for values of cholesterol of
  250 mg/dL versus 210 mg/dL.

  We set the effect size, δ =|μ1 − μ2|/σ = 0.405.

  The ratio of sample sizes expected to be in the no-
  restenosis versus the restenosis groups, r, equals
  0.7|0.3 = 2.333.
Variance Inflation Factor

    Adjusting sample size for multiple risk factors
    and confounders

 Precise sample size calculations require precise
  quantitative information about the
  interdependence structure between the

 We can however, use a “variance inflation
  factor” to adjust the sample size for the single
  covariate case.
Variance Inflation Factor
 If two other covariates with a squared multiple
  correlation with cholesterol of 0.15 are to be entered into
  the logistic regression

 Multiply the sample size obtained for a single covariate
  by the variance inflation factor 1/(1 − 0.15)= 1.18, to
  increase the required sample size to 365.
    The design effect
 In reality we use more complex survey designs such as cluster

    New sample size = sample size under SRS X “Design effect”

    “Design effect” = 1 + d (n – 1),
       where d = intraclass correlation for the statistic in question
               n = the average size of the cluster
    Measurement error and sample size

FREEDMAN, L.S., SCHATZKIN, A. and WAX, Y. (1990), AJE, 132 ,1185-1195.

Dietary measurement error has two consequences relevant to epidemiologic studies: first,
a proportion of subjects are misclassified into the wrong groups, and second, the
distribution of reported intakes is wider than the distribution of true intakes. While the
first effect has been dealt with by several other authors, the second effect has not
received as much attention. Using a simple errors-in-measurement model, the authors
investigate the implications of measurement error for the distribution of fat intake. They
then show how the inference of a more narrow distribution of true intakes affects the
calculation of sample size for a cohort study. The authors give an example of the
calculation for a cohort study investigating dietary fat and colorectal cancer. This shows
that measurement error has a profound effect on sample size requiring a six to
eightfold increase over the number required in the absence of error. If the
correlation coefficient between reported and true intakes is 0.65. Reliable detection of a
relative risk of 1.36 beween a true intake of greater than 47.5% calories from fat and less
than 25% calories from fat would require approximately one million subjects.

Resource: Sample size calculator at
Resources in R

 Available from

   pwr: power and sample size calculations folowing Cohen (1998).
   asypow: power utilizing asymptotic Likelihood Ratio Methods
   Bayescount Bayesian Power calculations for count distributions data
    using MCMC
   Normalp: Package for exponential power distributions
   pammPower analysis for random effects in mixed models
   binomSamsize: Confidence intervals and sample size determination
    for a binomial proportion under simple random sampling and pooled
   pairwiseCI: Confidence intervals for two sample comparison
   MBESS sample size calculations for behavioural models obtained by
    setting the width of the confidence intervals
   epiR, epicalc,powersurvEpi: sample size calculations for a variety of
    epidemiological designs
   Survey: Analysis of complex surveys
   HMisc, TeachingDemos: Sample size calculation and visual tools to
    illustrate associated concepts
Genetic power calculators
 Purcell S, Cherny SS, Sham PC. (2003) Genetic Power
  Calculator: design of linkage and association genetic
  mapping studies of complex traits. Bioinformatics,

 Sample size calculator at
Practical Issues
 For complex study designs or statistical methods, there
  may be no easily applied formulae or software.
   Use simplifications of the design
   Simulation

 Investigate whether the sample size is adequate for
   evaluation of secondary outcomes
   analyses of pre-defined subsets.

 Sample size values obtained from software will need to
  be inflated to allow for dropout or loss to follow up.

 All power calculations should be accompanied by
  sensitivity analysis.
     Prospective vs retrospective analysis
 Prospective power analyses is exploratory in nature.
 Retrospective analysis = After the study, we may be concerned
  that the statistical power of the test was low

 Question :Should additional information (particularly the
  observed effect size and variance) be used to retrospectively
  calculate the power of the test?

Thomas, L. (1997) Retrospective power analysis. ConservationBiology, 11,276–280

       Different methods may lead to different conclusions.
       It is unfortunate that this kind of power analysis is readily available
        in statistical software packages.
       Retrospective analyses are no substitute for the proper
        planning of research.
Why perform/ report formal sample size
 Small sample size
      Does not imply bias
      Will manifest itself as large confidence intervals and lack of

 Sample size calculations are important
      Guarantees adequate precision
      First, they specify the primary endpoint
      Safeguards against changing outcomes and claiming “significant”

 An alert for potential problems.
      Did the trial encounter recruitment difficulties?
      Did the trial stop early?
      Was a formal statistical stopping rule used?

To top