Sampling and Sample Size Computation

Document Sample
Sampling and Sample Size Computation Powered By Docstoc
					     Sampling and
Sample Size Computation



 Grace H. Encelan-Brizuela, MD, MSPH
     Department of Preventive and Community Medicine
                   University of the East
   RAMON MAGSAYSAY MEMORIAL MEDICAL CENTER, Inc.
     Objectives of the Lecture
At the end of this lecture, students should
  be able to:
 Define research populations

 Enumerate reasons for sampling

 Discuss ways of designing inclusion and
  exclusion criteria
 Discuss types of sampling methods

 Compute for sample size
Introduction
   Challenge to every research protocol:
    It must specify a sample of subjects that:
     can be studied at an acceptable cost in time
      and money
     is large enough to control random error in
      generalizing the study findings to the
      population
     is representative enough to control systematic
      error in these inferences
Introduction
   Basic terms and concepts…

       Population – complete set of people with
        specified set of characteristics

       Sample – subset of the population, selected
        so as to be representative of the larger
        population
Introduction
   Basic terms and concepts…

       Target population – the large set of patients
        throughout the world to which the results will
        be generalized. Defined by clinical and
        demographic characteristics.

       Accessible population – the subset of the
        target population that is available for the
        study. Defined by geographic and temporal
        characteristics.
Reasons for sampling
1.   Samples can be studied more quickly than
     populations
2.   A study of a sample is less expensive than
     studying an entire population
3.   A study of an entire population is impossible in
     most situations
4.   Sample results are often more accurate than
     results based on a population
5.   If samples are properly selected, probability
     methods can be used to estimate the error in
     the resulting statistics
6.   Samples can be selected to reduce
     heterogeneity
                                                             STUDY PLAN
RESEARCH QUESTION
                                                         (Truth in the Study)
(Truth in the Universe)


                                      STEP # 2                 STEP # 3
      STEP # 1
                                 Accessible Population     Intended Sample
  Target Populations

                                   Specify temporal      Design an approach
 Specify clinical and
                                   And geographic          to selecting the
   Demographic
                                    Characteristics            sample
   Characteristics

                                      CRITERIA                CRITERIA
      CRITERIA



                                  Representative of       Representative of
  Well suited to the
                                  target populations     accessible population
  Research Question
                                     and easy to            and easy to do
                                         study



                                                             SAMPLING
                 SPECIFICATION
Specification
   Establishing Inclusion Criteria

    Inclusion criteria – define the main
    characteristics of the target and accessible
    populations
  Designing Inclusion Criteria
                  Considerations                            Examples
Inclusion    Specifying the characteristics      A 5 year trial of calcium
criteria     that define populations that are    supplementation for preventing
             relevant to the research            osteoporosis might specify that the
             question and efficient for study:   subjects be:


Target       Demographic characteristics         White females age 45 – 50
population
             Clinical characteristics            In good general health:
                                                  no known life threatening
                                                  disease; not taking long-term
                                                  corticosteroids
Specification
   Establishing Exclusion Criteria

    Exclusion criteria – indicate subsets of
    individuals who meet the eligibility criteria,
    but are likely to interfere with the quality
    of the data or the interpretation of the
    findings
  Designing Exclusion Criteria
                Considerations                           Examples
Exclusion   Specifying subsets of the         A 5 year trial of calcium
criteria    population that will not be       supplementation for preventing
            studied because of:               osteoporosis might exclude subjects
                                              who are:
            A high likelihood of being lost
            to follow-up                      Plan to move out of state

            An inability to provide good      Disoriented or having language
            data                              barriers

            Ethical barriers                  Kidney stone formers

            The subject’s refusal to          Unwilling to accept possibility of
            participate                       random allocation to placebo group
                                                             STUDY PLAN
RESEARCH QUESTION
                                                         (Truth in the Study)
(Truth in the Universe)


                                      STEP # 2                 STEP # 3
      STEP # 1
                                 Accessible Population     Intended Sample
  Target Populations

                                   Specify temporal      Design an approach
 Specify clinical and
                                   And geographic          to selecting the
   Demographic
                                    Characteristics            sample
   Characteristics

                                      CRITERIA                CRITERIA
      CRITERIA



                                  Representative of       Representative of
  Well suited to the
                                  target populations     accessible population
  Research Question
                                     and easy to            and easy to do
                                         study



                                                             SAMPLING
                 SPECIFICATION
  Designing Inclusion Criteria
                  Considerations                            Examples
Inclusion    Specifying the characteristics      A 5 year trial of calcium
criteria     that define populations that are    supplementation for preventing
             relevant to the research            osteoporosis might specify that the
             question and efficient for study:   subjects be:


Target       Demographic characteristics         White females age 45 – 50
population
             Clinical characteristics            In good general health:
                                                  no known life threatening
                                                  disease; not taking long-term
                                                  corticosteroids


Accessible   Geographic characteristics          Patients attending the medical clinic
population                                       at the investigator’s hospital
             Temporal characteristics            Between Jan 1 and Dec 31, 2006
              SAMPLING
Probability Sampling
  - uses a random process to guarantee that
  each unit of the population has a specified
  chance of selection
Probability Sampling
   Simple Random sampling
    - every subject has an equal probability of
    being selected for the study.
    - recommended way is to use a table of
    random numbers or a computer generated
    list of random numbers
Probability Sampling
   Simple Random sampling
    - process of enumerating every unit of the
    accessible population, and then selecting
    the sample at random
    - what are needed:
        accurate listing of the population
        mechanism to find and enroll those
             who are chosen
Probability Sampling
   Systematic sampling
    - involves selecting by a periodic process;
    starting point is chosen at random
    Example: get 200 sample from a population of
                     3400
    Procedure: Number all units 1 to 3400; divide
    population with the number to be sampled
    (3400/200 = 17). Select any number between 1
    to 17 to be the k. Then select every 17th subject
    thereafter.
Probability Sampling
   Systematic sampling
    NOTE: should not be used when a cyclic
    repetition is inherent in the sampling
    frame.
    e.g. not appropriate for selecting months
    of the year in a study of the frequency of
    different types of accidents, because some
    accidents occur most often at certain
    times of the year
Probability Sampling
   Stratified Random sampling
    - involves dividing the population into
    subgroups according to characteristics and
    taking a random sample from each of
    these “strata”
Probability Sampling
   Stratified Random sampling
    - characteristics used to stratify should be
    related to the measurement of interest
    - in Medicine, commonly used strata
    include: age, gender, severity of disease
Probability Sampling
   Cluster sampling
    - process of taking a random sample of
    natural groupings of individuals in the
    population; very useful when the
    population is widely dispersed and it is
    impractical or costly to list and sample
    from all of its elements
Probability Sampling
   Cluster sampling
    - clusters are commonly based on
    geographic areas or districts, so this
    approach is used more often in
    epidemiologic research than in clinical
    research
             SAMPLING
Nonprobability Sampling
 - sampling method in which the probability
 that a subject is selected is unknown
Nonprobability Sampling
   Consecutive Sampling
    - involves taking every patient who meets
    the selection criteria over a specified time
    interval or number of patients; it amounts
    to taking the complete accessible
    population over the duration of the study
Nonprobability Sampling
   Convenience Sampling
    - process of taking those members of the
    accessible population who are easily
    available.
Nonprobability Sampling
   Judgemental Sampling
    - involves handpicking from the accessible
    population those individuals judged most
    appropriate for the study
Sample Size
Computation
Introduction
Factors that affect the number of subjects required for a
    study:
1.  Whether alpha level chosen is the usual (p value 0.05)
    or smaller
2.  Whether beta error is considered in addition to alpha
    error
3.  Whether the desired difference between means or
    proportions to be detected is fairly small or extremely
    small
4.  Whether the research design involves paired or
    unpaired data
5.  Whether a large or small variance is anticipated in the
    data set
Introduction
Review of basic concepts and terms
 Effect size

 Alpha level or Significance level –
  probability that a positive finding is due to
  chance alone.
 Power – the probability that the effect will
  be detected
Recall…
t=      d
       sd
      √N
Where: d is the mean difference that was
 observed, sd is the standard error of that
 mean difference, and N is the sample size
   To solve for N, rearrangements have to be
    done. The formula becomes
        N = (zα)2 * (s)2
                (d)2
        Derivation of the Basic
        Sample Size Formula
Formula for the Calculation of Sample Size
  for studies commonly pursued in Medical
  Research
 Studies using the paired t test (e.g. before
  and after studies) and considering alpha
  (Type I) error only
      N = (zα)2 * (s)2
              (d)2
        Derivation of the Basic
        Sample Size Formula
Formula for the Calculation of Sample Size
  for studies commonly pursued in Medical
  Research
 Studies using the Student’s t test (e.g. one
  experimental group and one control
  group) and considering alpha (Type I)
  error only
      N = (zα)2 * 2 * (s)2
                 (d)2
        Derivation of the Basic
        Sample Size Formula
Formula for the Calculation of Sample Size
  for studies commonly pursued in Medical
  Research
 Studies using the Student’s t test and
  considering alpha (Type I) error and beta
  (Type II) errors
  N = (zα + zβ )2 * 2 * (s)2
                 (d)2
        Derivation of the Basic
        Sample Size Formula
Formula for the Calculation of Sample Size
  for studies commonly pursued in Medical
  Research
 Studies using a test of differences in
  proportions and considering alpha (Type I)
  error and beta (Type II) errors
  N = (zα + zβ )2 * 2 * p(1 - p)
                  (d)2
                  Derivation of the
             Basic Sample Size Formula
Study Characteristics      Assumptions made by Investigator

Type of Study            Before and after study of an anti-HPN drug
Data sets                Pre-treatment and post-treatment
                          observations in the same group
                          of subjects
Variable                 Systolic blood pressure
Standard deviation (s)   15 mm Hg
Variance (s2)            225 mm Hg
Data for alpha (zα)      p = 0.05; therefore, 95% confidence
                          desired (two-tailed test); Zα = 1.96
Difference to be         10 mm Hg or larger difference
detected (d)              between pre and post-treatment blood
                          pressure values
            Derivation of the
       Basic Sample Size Formula
N = (zα)2 * (s)2
        (d)2
  = (1.96)2 * (15)2
        (10)2
  = (3.84)*(225)
        (100)
  = 864 = 8.64 = 9 subjects total
    100
                  Derivation of the
             Basic Sample Size Formula
Study Characteristics      Assumptions made by Investigator

Type of Study            RCT of an anti-HPN drug
Data sets                Observations in one experimental group
                          and one control group

Variable                 Systolic blood pressure
Standard deviation (s)   15 mm Hg
Variance (s2)            225 mm Hg
Data for alpha (zα)      p = 0.05; therefore, 95% confidence
                          desired (two-tailed test); Zα = 1.96
Difference to be         10 mm Hg or larger difference
detected (d)              between mean blood pressure values of the
                          experimental group and control group
              Derivation of the
         Basic Sample Size Formula
N = (zα)2 * 2 * (s)2
           (d)2
  = (1.96)2 * 2 * (15)2
           (10)2
  = (3.84)*2*(225)
           (100)
  = 1728 = 17.28
    100
  = 18 subjects per group * 2 grps = 36 subjects
                  Derivation of the
             Basic Sample Size Formula
Study Characteristics      Assumptions made by Investigator

Type of Study            RCT of an anti-HPN drug
Data sets                Observations in one experimental group
                          and one control group
Variable                 Systolic blood pressure
Standard deviation (s)   15 mm Hg
Variance (s2)            225 mm Hg
Data for alpha (zα)      p = 0.05; therefore, 95% confidence
                          desired (two-tailed test); Zα = 1.96
Data for alpha (zβ)      20% beta error; therefore, 80% power
                          desired (one-tailed test); Zβ = 0.84
Difference to be         10 mm Hg or larger difference
detected (d)              between mean blood pressure values of the
                          experimental group and control group
              Derivation of the
         Basic Sample Size Formula
N = (zα + zβ )2 * 2 * (s)2
               (d)2
  = (1.96+0.84)2*2* (15)2
               (10)2
  = (7.84)*2* (225)
           100
  = 3528 = 35.28
    100
  = 36 subjects per grp * 2 grps = 72 subjects
                  Derivation of the
             Basic Sample Size Formula
Study Characteristics     Assumptions made by Investigator

Type of Study           RCT of a drug to reduce the 5yr mortality in
                          patients with a particular form of cancer
Data sets               Observations in one experimental group
                          and one control group
Variable                Success=5-yr survival after Tx; Failure=
                          death within 5 yrs of Tx
Variance, p (1-p)       p=0.55;therefore, (1-p) = 0.45
Data for alpha (zα)     p = 0.05; therefore, 95% confidence
                          desired (two-tailed test); Zα = 1.96
Data for alpha (zβ)     20% beta error; therefore, 80% power
                          desired (one-tailed test); Zβ = 0.84
Difference to be        0.1 or larger difference bet the success
 detected (d)            (survival) of the E grp and that of the C grp)
              Derivation of the
         Basic Sample Size Formula
N = (zα + zβ )2 * 2 * p(1 - p)
                (d)2
  = (1.96+0.84)2 * 2 * (0.55)(0.45)
                  (0.1)2
  = (7.84)*2*(0.2475)
            0.01
  = 3.88 = 388
    0.01
  = 388 subjects per grp * 2 grps = 776
Bigger does not always
mean better (in terms of
sample size…at least)
    References:
1.   Epidemiology, Biostatistics and
     Preventive Medicine (Jekel)
2.   Designing Clinical Research (Hulley and
     Cummings)
3.   Research Methods in Health and
     Medicine (Sanchez)
4.   Basic and Clinical Biostatistics (Dawson
     and Trapp)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:36
posted:8/31/2012
language:English
pages:45