# Sampling and Sample Size Computation

Document Sample

```					     Sampling and
Sample Size Computation

Grace H. Encelan-Brizuela, MD, MSPH
Department of Preventive and Community Medicine
University of the East
RAMON MAGSAYSAY MEMORIAL MEDICAL CENTER, Inc.
Objectives of the Lecture
At the end of this lecture, students should
be able to:
 Define research populations

 Enumerate reasons for sampling

 Discuss ways of designing inclusion and
exclusion criteria
 Discuss types of sampling methods

 Compute for sample size
Introduction
   Challenge to every research protocol:
It must specify a sample of subjects that:
 can be studied at an acceptable cost in time
and money
 is large enough to control random error in
generalizing the study findings to the
population
 is representative enough to control systematic
error in these inferences
Introduction
   Basic terms and concepts…

   Population – complete set of people with
specified set of characteristics

   Sample – subset of the population, selected
so as to be representative of the larger
population
Introduction
   Basic terms and concepts…

   Target population – the large set of patients
throughout the world to which the results will
be generalized. Defined by clinical and
demographic characteristics.

   Accessible population – the subset of the
target population that is available for the
study. Defined by geographic and temporal
characteristics.
Reasons for sampling
1.   Samples can be studied more quickly than
populations
2.   A study of a sample is less expensive than
studying an entire population
3.   A study of an entire population is impossible in
most situations
4.   Sample results are often more accurate than
results based on a population
5.   If samples are properly selected, probability
methods can be used to estimate the error in
the resulting statistics
6.   Samples can be selected to reduce
heterogeneity
STUDY PLAN
RESEARCH QUESTION
(Truth in the Study)
(Truth in the Universe)

STEP # 2                 STEP # 3
STEP # 1
Accessible Population     Intended Sample
Target Populations

Specify temporal      Design an approach
Specify clinical and
And geographic          to selecting the
Demographic
Characteristics            sample
Characteristics

CRITERIA                CRITERIA
CRITERIA

Representative of       Representative of
Well suited to the
target populations     accessible population
Research Question
and easy to            and easy to do
study

SAMPLING
SPECIFICATION
Specification
   Establishing Inclusion Criteria

Inclusion criteria – define the main
characteristics of the target and accessible
populations
Designing Inclusion Criteria
Considerations                            Examples
Inclusion    Specifying the characteristics      A 5 year trial of calcium
criteria     that define populations that are    supplementation for preventing
relevant to the research            osteoporosis might specify that the
question and efficient for study:   subjects be:

Target       Demographic characteristics         White females age 45 – 50
population
Clinical characteristics            In good general health:
no known life threatening
disease; not taking long-term
corticosteroids
Specification
   Establishing Exclusion Criteria

Exclusion criteria – indicate subsets of
individuals who meet the eligibility criteria,
but are likely to interfere with the quality
of the data or the interpretation of the
findings
Designing Exclusion Criteria
Considerations                           Examples
Exclusion   Specifying subsets of the         A 5 year trial of calcium
criteria    population that will not be       supplementation for preventing
studied because of:               osteoporosis might exclude subjects
who are:
A high likelihood of being lost
to follow-up                      Plan to move out of state

An inability to provide good      Disoriented or having language
data                              barriers

Ethical barriers                  Kidney stone formers

The subject’s refusal to          Unwilling to accept possibility of
participate                       random allocation to placebo group
STUDY PLAN
RESEARCH QUESTION
(Truth in the Study)
(Truth in the Universe)

STEP # 2                 STEP # 3
STEP # 1
Accessible Population     Intended Sample
Target Populations

Specify temporal      Design an approach
Specify clinical and
And geographic          to selecting the
Demographic
Characteristics            sample
Characteristics

CRITERIA                CRITERIA
CRITERIA

Representative of       Representative of
Well suited to the
target populations     accessible population
Research Question
and easy to            and easy to do
study

SAMPLING
SPECIFICATION
Designing Inclusion Criteria
Considerations                            Examples
Inclusion    Specifying the characteristics      A 5 year trial of calcium
criteria     that define populations that are    supplementation for preventing
relevant to the research            osteoporosis might specify that the
question and efficient for study:   subjects be:

Target       Demographic characteristics         White females age 45 – 50
population
Clinical characteristics            In good general health:
no known life threatening
disease; not taking long-term
corticosteroids

Accessible   Geographic characteristics          Patients attending the medical clinic
population                                       at the investigator’s hospital
Temporal characteristics            Between Jan 1 and Dec 31, 2006
SAMPLING
Probability Sampling
- uses a random process to guarantee that
each unit of the population has a specified
chance of selection
Probability Sampling
   Simple Random sampling
- every subject has an equal probability of
being selected for the study.
- recommended way is to use a table of
random numbers or a computer generated
list of random numbers
Probability Sampling
   Simple Random sampling
- process of enumerating every unit of the
accessible population, and then selecting
the sample at random
- what are needed:
accurate listing of the population
mechanism to find and enroll those
who are chosen
Probability Sampling
   Systematic sampling
- involves selecting by a periodic process;
starting point is chosen at random
Example: get 200 sample from a population of
3400
Procedure: Number all units 1 to 3400; divide
population with the number to be sampled
(3400/200 = 17). Select any number between 1
to 17 to be the k. Then select every 17th subject
thereafter.
Probability Sampling
   Systematic sampling
NOTE: should not be used when a cyclic
repetition is inherent in the sampling
frame.
e.g. not appropriate for selecting months
of the year in a study of the frequency of
different types of accidents, because some
accidents occur most often at certain
times of the year
Probability Sampling
   Stratified Random sampling
- involves dividing the population into
subgroups according to characteristics and
taking a random sample from each of
these “strata”
Probability Sampling
   Stratified Random sampling
- characteristics used to stratify should be
related to the measurement of interest
- in Medicine, commonly used strata
include: age, gender, severity of disease
Probability Sampling
   Cluster sampling
- process of taking a random sample of
natural groupings of individuals in the
population; very useful when the
population is widely dispersed and it is
impractical or costly to list and sample
from all of its elements
Probability Sampling
   Cluster sampling
- clusters are commonly based on
geographic areas or districts, so this
approach is used more often in
epidemiologic research than in clinical
research
SAMPLING
Nonprobability Sampling
- sampling method in which the probability
that a subject is selected is unknown
Nonprobability Sampling
   Consecutive Sampling
- involves taking every patient who meets
the selection criteria over a specified time
interval or number of patients; it amounts
to taking the complete accessible
population over the duration of the study
Nonprobability Sampling
   Convenience Sampling
- process of taking those members of the
accessible population who are easily
available.
Nonprobability Sampling
   Judgemental Sampling
- involves handpicking from the accessible
population those individuals judged most
appropriate for the study
Sample Size
Computation
Introduction
Factors that affect the number of subjects required for a
study:
1.  Whether alpha level chosen is the usual (p value 0.05)
or smaller
2.  Whether beta error is considered in addition to alpha
error
3.  Whether the desired difference between means or
proportions to be detected is fairly small or extremely
small
4.  Whether the research design involves paired or
unpaired data
5.  Whether a large or small variance is anticipated in the
data set
Introduction
Review of basic concepts and terms
 Effect size

 Alpha level or Significance level –
probability that a positive finding is due to
chance alone.
 Power – the probability that the effect will
be detected
Recall…
t=      d
sd
√N
Where: d is the mean difference that was
observed, sd is the standard error of that
mean difference, and N is the sample size
   To solve for N, rearrangements have to be
done. The formula becomes
N = (zα)2 * (s)2
(d)2
Derivation of the Basic
Sample Size Formula
Formula for the Calculation of Sample Size
for studies commonly pursued in Medical
Research
 Studies using the paired t test (e.g. before
and after studies) and considering alpha
(Type I) error only
N = (zα)2 * (s)2
(d)2
Derivation of the Basic
Sample Size Formula
Formula for the Calculation of Sample Size
for studies commonly pursued in Medical
Research
 Studies using the Student’s t test (e.g. one
experimental group and one control
group) and considering alpha (Type I)
error only
N = (zα)2 * 2 * (s)2
(d)2
Derivation of the Basic
Sample Size Formula
Formula for the Calculation of Sample Size
for studies commonly pursued in Medical
Research
 Studies using the Student’s t test and
considering alpha (Type I) error and beta
(Type II) errors
N = (zα + zβ )2 * 2 * (s)2
(d)2
Derivation of the Basic
Sample Size Formula
Formula for the Calculation of Sample Size
for studies commonly pursued in Medical
Research
 Studies using a test of differences in
proportions and considering alpha (Type I)
error and beta (Type II) errors
N = (zα + zβ )2 * 2 * p(1 - p)
(d)2
Derivation of the
Basic Sample Size Formula
Study Characteristics      Assumptions made by Investigator

Type of Study            Before and after study of an anti-HPN drug
Data sets                Pre-treatment and post-treatment
observations in the same group
of subjects
Variable                 Systolic blood pressure
Standard deviation (s)   15 mm Hg
Variance (s2)            225 mm Hg
Data for alpha (zα)      p = 0.05; therefore, 95% confidence
desired (two-tailed test); Zα = 1.96
Difference to be         10 mm Hg or larger difference
detected (d)              between pre and post-treatment blood
pressure values
Derivation of the
Basic Sample Size Formula
N = (zα)2 * (s)2
(d)2
= (1.96)2 * (15)2
(10)2
= (3.84)*(225)
(100)
= 864 = 8.64 = 9 subjects total
100
Derivation of the
Basic Sample Size Formula
Study Characteristics      Assumptions made by Investigator

Type of Study            RCT of an anti-HPN drug
Data sets                Observations in one experimental group
and one control group

Variable                 Systolic blood pressure
Standard deviation (s)   15 mm Hg
Variance (s2)            225 mm Hg
Data for alpha (zα)      p = 0.05; therefore, 95% confidence
desired (two-tailed test); Zα = 1.96
Difference to be         10 mm Hg or larger difference
detected (d)              between mean blood pressure values of the
experimental group and control group
Derivation of the
Basic Sample Size Formula
N = (zα)2 * 2 * (s)2
(d)2
= (1.96)2 * 2 * (15)2
(10)2
= (3.84)*2*(225)
(100)
= 1728 = 17.28
100
= 18 subjects per group * 2 grps = 36 subjects
Derivation of the
Basic Sample Size Formula
Study Characteristics      Assumptions made by Investigator

Type of Study            RCT of an anti-HPN drug
Data sets                Observations in one experimental group
and one control group
Variable                 Systolic blood pressure
Standard deviation (s)   15 mm Hg
Variance (s2)            225 mm Hg
Data for alpha (zα)      p = 0.05; therefore, 95% confidence
desired (two-tailed test); Zα = 1.96
Data for alpha (zβ)      20% beta error; therefore, 80% power
desired (one-tailed test); Zβ = 0.84
Difference to be         10 mm Hg or larger difference
detected (d)              between mean blood pressure values of the
experimental group and control group
Derivation of the
Basic Sample Size Formula
N = (zα + zβ )2 * 2 * (s)2
(d)2
= (1.96+0.84)2*2* (15)2
(10)2
= (7.84)*2* (225)
100
= 3528 = 35.28
100
= 36 subjects per grp * 2 grps = 72 subjects
Derivation of the
Basic Sample Size Formula
Study Characteristics     Assumptions made by Investigator

Type of Study           RCT of a drug to reduce the 5yr mortality in
patients with a particular form of cancer
Data sets               Observations in one experimental group
and one control group
Variable                Success=5-yr survival after Tx; Failure=
death within 5 yrs of Tx
Variance, p (1-p)       p=0.55;therefore, (1-p) = 0.45
Data for alpha (zα)     p = 0.05; therefore, 95% confidence
desired (two-tailed test); Zα = 1.96
Data for alpha (zβ)     20% beta error; therefore, 80% power
desired (one-tailed test); Zβ = 0.84
Difference to be        0.1 or larger difference bet the success
detected (d)            (survival) of the E grp and that of the C grp)
Derivation of the
Basic Sample Size Formula
N = (zα + zβ )2 * 2 * p(1 - p)
(d)2
= (1.96+0.84)2 * 2 * (0.55)(0.45)
(0.1)2
= (7.84)*2*(0.2475)
0.01
= 3.88 = 388
0.01
= 388 subjects per grp * 2 grps = 776
Bigger does not always
mean better (in terms of
sample size…at least)
    References:
1.   Epidemiology, Biostatistics and
Preventive Medicine (Jekel)
2.   Designing Clinical Research (Hulley and
Cummings)
3.   Research Methods in Health and
Medicine (Sanchez)
4.   Basic and Clinical Biostatistics (Dawson
and Trapp)

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 36 posted: 8/31/2012 language: English pages: 45