# Statistical Bootstrapping

Document Sample

```					Statistical Bootstrapping

Peter D. Christenson
Biostatistician

January 20, 2005
Outline

•   Example of bootstrap in a previous paper
•   Basics: Precision of estimates
•   Practical difficulties
•   Bootstrap concept
•   Particular use in our example
•   Other uses

Good bootstrap reference with tutorial pdf* and
software:
www.insightful.com/hesterberg/bootstrap

*Source for most figures here.
Paper Using Bootstrapping

Outcome: Labor progression was estimated by the
duration of labor for each cm of cervical dilation using
serial vaginal exams.
Predictor: Classification as overweight, obese or normal
weight.
Many possible confounding factors, e.g. fetal size.
Paper: Major Results

p-value is based on the precision of the estimated
durations of 6.20 and 7.52 hours.
p<0.01 ↔ the 99% CI centered at 7.52-6.20 is
above 0 (so it is unlikely that there is no difference).
Paper: From statistical methods

Survival analysis methods were needed to estimate
the median duration from, say 4 to 5 cm dilation, since
the exact times were not known.
Nevertheless, they did not just use the p-values from
the software output, but used “bootstrapping”. Why?
Basics: Precision vs. Normal Range

• A random sample of 100 women shows a 4-10
cm dilation duration mean±SD of 7.0±1.25.
• Normal (95%) range is ~ 7±2(1.25) = 4.5 to 9.5
hours.
• With no other info about a patient, we predict
she will have between 4.5 and 9.5 hours,
based on SD=variation among individuals.
• But, how well (precise) did the study estimate
the mean duration of all women? We only have
one mean, but want 7.0±2(SD of means).
Estimating Precision from Theory
To get SD of mean, conceptually take many samples:

“All” pregnant
women

Mean of X-bars
SD of X-bars

Of course, we don’t have the luxury of more than 1 sample.
From math theory, SD of a mean of N is SD/√N = SEM:

“All” pregnant
women
Extensions of the Theory
The theory has been extended beyond means (SEM) to SEs
for more complex measures, such as predictions from
regression:
Blue bands
are “normal
ranges” and
red bands
are CIs,
showing
precision.
But, the
relation is not
just a factor
of √N, as with
means.
Difficulties
For most situations, standard errors (SE) have
been developed based on theory. They may not be
accurate in some circumstances:
• The sample may not be a simple random one
that is required for standard SE formulas.
• There may be non-sampling sources of variation,
e.g., using estimated results from one analysis in
further analyses.
• There may be approximations or assumptions
required for the formulas based on theory that are
known to not hold.
The Bootstrap Standard Error
Obtain a single sample of size N. Then:
1. Create thousands of samples with
replacement of size N, called “bootstrap
samples” or “resamples” from the original
sample.
2. Calculate the quantity of interest, the
bootstrap estimate, for each sample.
3. Find the bootstrap distribution of these
quantities, and in particular their SD, which
is the bootstrap SE:
M*s are the
M             M*       M*       bootstrap
estimates of M
The Bootstrap SE: Concept
Consider a sample of N=6 with 3 bootstrap samples:

Mean±SD of original sample = 4.46±7.54.
SEM = 7.54/√6 = 3.08
Bootstrap SEM is SD(4.13,4.64,1.74) = 1.55
Here, bootstrap SE is awful since only 3 samples were
drawn. Typically, thousands are used.
Back to Labor Progression Paper

Why was bootstrapping used in the paper?
The design used a stratified random sample, not
a simple random sample:

There are SE formulas for some quantities from
studies that over-sample some groups, as was done
here, but perhaps not for these adjusted medians.

• Typically fewer assumptions.

• Very general and reliable: can use the same
software code for many estimation problems
that have different formulas.

classical methods.

• Can model the entire estimation process, not
just sampling error. See next 2 slides.
Labor Progression Paper:

Recall that durations were adjusted for several
covariates, including oxytocin use and fetal
size.

The oversampling of heavier women was
accounted for with bootstrapping so that these

A single set of covariates was used for all
bootstrap samples.
Labor Progression Paper:
Recall that durations were adjusted for several
covariates, including oxytocin use and fetal
size.

The entire process of selecting covariates could
have been performed separately with each
bootstrap sample.

This could better incorporate the uncertainty of
choosing which factors need to be used for
Conclusions
• Bootstrapping can avoid the requirement of
unnecessary assumptions.
• It is not needed in most applications.
• It is needed for studies w/o simple random
sampling, unless software for other sampling
designs is used.
• For our paper here, it probably had a small
impact, but could have been used to gain