Docstoc

Introduction to Biostatistics ZJU

Document Sample
Introduction to Biostatistics ZJU Powered By Docstoc
					    Introduction to
Biostatistics (ZJU 2008)
             Wenjiang Fu, Ph.D
             Associate Professor
   Division of Biostatistics, Department of
                Epidemiology
          Michigan State University
     East Lansing, Michigan 48824, USA
            Email: fuw@msu.edu
     www: http://www.msu.edu/~fuw
      Chapter 3 Probability Theory
   Probability of an event: Pr (A).
   Sample space: a collection of all possible outcomes of events.
   Examples:
       Tossing a coin {H}, {T}. Tossing two coins {HH}, {HT}, {TT}.
       Tossing a dice {1}, {2}, … {6}. Blood pressure {L}, {H}, {N}.
       Sex {M}, {F}.       Race {white}, {black}, {Asian} …
       How to have a probability of male students in the class.
            1. count {M} and {F}.       N(male) / N(total).
            2. Sampling method. Assuming total 40 students.
             Take random samples of 40 with replacement. Calculate proportion p.
             Repeat this procedure many times, say B = 200 times.
             Take the average of these 200 proportions (p1+…+p200)/200 =p0
       This number p0 will be close to the true proportion p in the class.
        This method is called Bootstrap sampling – Invented by Efron.
        Efron and Tibshirani: Introduction to the Bootstrap, Spring 1991.
           Bootstrap R program
###### male student proportion in    p0 <- mean(pp)
###### class by bootstrap sampling
                                     plot(c(1:B), pp)
x <- c(rep(1,25),rep(0,15))
#### 25 males among 40 students
x                                    abline(h=p0,
                                        col=‘red’)
p<- sum(x)/length(x)
                                     title(main=
B <- 200                             ‘Bootstrap
pp <- rep(0, B)                         Approximation’)
                                     p
for (i in 1:B) {                     p0
   y <- sample(x, replace = T)
   pp[i] <- sum(y)/length(y)         Answer:
}
                                     p = 0.625   p0 = 0.6188
Bootstrap
                             Events
   An event is any set of outcomes of interest
   Examples: heart attack, diarrhea, accident …
   Probability (of an event): relative frequency of this event over
    an indefinitely large (or infinite) number of trials. Pr (A)
   Example 1: Student of age 19 – 20 in the classroom
    Take a student, check the age,              1 – yes,   0 – no ;
    put student back to the pool. And record the process:
    1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, ……
    so far, 7 / 15 around certain number (Prob.)
   Example 2: Student of systolic BP 95 – 105 mm Hg,
    1 – systolic BP 95 – 105; 0 – otherwise.
          1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, … …
   Example 3: Student had Hepatitis – B vaccine.
          1- H.B. vaccinated;         0- not yet.
        Properties of probability
Properties of probability (Pr (A) ).
 (1) 0  Pr (A)  1
 (2) If A and B can not occur at the same time, then
  Pr (A or B) = Pr (A) + Pr (B)
 Mutually exclusive: if both can not occur at the same
  time.
 Example: A = {DBP > 90} , B = {DBP < 75}
 Notation { . } – Event description.
 Venn diagram – John Venn (1834 – 1923) of England
               Venn Diagram

John Venn popularized the idea of Venn Diagrams.

 John lived from 1834 to 1923 in England. He was a priest
 and taught at Gonville and Caius College of Cambridge. He
 wrote several books including two on logic.
                 Venn diagram


A                B               A∩B
                                         B
                             A

Mutually exclusive           Not mutually exclusive



             A           A
           Complement
A


                      Event operations
       Union A  B: either A or B occurs or both occur.
       Intersection A  B: the event that both A and B occur
        simultaneously.
       Complement A: the event where A does not occur.
       Example: A = { SBP  70 } , B = { SBP  90 }
        A  B = { SBP  70 or SBP  90 } = { All SBP readings },
        A  B = { 70  SBP  90 }
         A = {SBP< 70}
       Independence Two events are independent if the outcome of
        one event does not affect the other’s: or the intersection event
        probability follows
         Pr (A  B)= Pr (A) x Pr (B)
        Q: if two events are mutually exclusive, are they independent?
                    Probability
   Example: A = {Father hypertension}, Pr (A) = 0.6
    B = {Mother hypertension}. Pr (B) = 0.5
    1) If Pr (A  B) = 0.3, are A and B independent?
    2) How about if Pr (A  B) = 0.6?
   Pr (A  B) = 0.3 = 0.6  0.5 = Pr (A)  Pr (B)
    Conclusion: A and B are independent in 1).
    If Pr (A  B) = 0.6 then
    Pr (A  B) = 0.6 > 0.6  0.5 = Pr (A)  Pr (B)
    Conclusion: A and B are not independent.
    Causes of hypertension: very complex. But high salt
    intake leads to hypertension. Couples commonly share
    the same kind of food at home and may thus be highly
    correlated to be hypertensive.
                       Probability
   Example: Example: Students A and B take an exam.
    A – {Student A passes}; B – {Student B passes}.
    Pr (A) = 0.7, Pr (B) = 0.8,
    Pr (A  B) = ?
    Pr (A  B) = 0.7  0.8 = 0.56 if independent.
    How about if two students compete for the same award?

   Example: TB (tuberculosis) tests:
    A = {“+” by skin test}      B = {“+” by X-ray test}
    Ideally use B to confirm A.
    Pr (A) = 0.3, Pr (B) = 0.5, Pr (A  B) = 0.15
    then A and B are independent.
               Probability Laws
   Multiplication law
    If A1, A2, …, Ak are k mutually independent, then
    Pr(A1  A2…  Ak) = Pr (A1)  Pr (A2)  …  Pr
    (Ak)
   Addition law
    A and B are two events, then
    Pr (A  B) = Pr (A) + Pr (B) – Pr (A  B)
    The Venn diagram for addition law

              AB         AB       AB


                        3 mutually exclusive events
                Probability Laws
   Addition law for more than 2 events
    A, B and C are three events, then
    Pr (A  B  C) = Pr (A) + Pr (B) + Pr (C)
    – Pr (A  B) – Pr (A  C) – Pr (B  C)
    + Pr (A  B  C)
    The Venn diagram for addition law


                A                       B
                             AB C


                             C
                       Probability
   Example: 3.1 - 3.13
    Family with flu: mother, father, 2 kids.
    A1 = {mother has flu}, A2 = {father flu}, A3 = {1st kid flu}
    A4 = {2nd kid flu}, B = {at least one kid has flu},
    C = {at least one parent flu}, D = {at least one person flu}
   If Pr (A1) = .1, Pr (A2) = .1,         Pr (both parents) = .02,
    Pr (each kid has flu) = .2, Pr (at least one kid has flu) = .3
   Q: B = ? C = ? D = ? A1, A2 indep ? A3, A4 indep?
    Pr (B) = ? {no kid has flu} = ? {no person has flu} = ?
   A: Pr(A3  A4)=?
   Addition Law: Pr(A3  A4) = Pr(A3) + Pr(A4) - Pr(A3  A4)
   Pr(A3  A4) =Pr(A3)+Pr(A4) - Pr(A3  A4) = .2 + .2 - .3 = .1
          Conditional Probability
   Example. TB tests:       Skin test v.s. X-ray test
    A = {“+” by skin test}          B = {“+” by x-ray}
    Ideally   AB         and A  B
    B confirms A, or A and B function the same;
        Extremely dependent
       Opposite: skin test is independent of X-ray test
              i.e. A and B do not depend on each other.
       Not ideal (in practice)
       Need some kind of dependence: A  B
     Want to know the probability of B given A
         Conditional Probability

   Conditional Probability of B given A
    Pr (B | A) = Pr (B  A) / Pr (A)
   Interpretation!
   Rules:
    (1) A, B indep,  Pr (B|A) = Pr (B)=Pr (B|A)
    (2) A, B dep,  Pr (B|A)  Pr (B)  Pr(B|A)
        because Pr (AB)  Pr (A) Pr (B)
          Applications of Conditional
                  Probability
   Relative risk or risk ratio (RR) of B given A
        Pr (B|A) / Pr(B |A )
   Example. TB: Pr (B|A) = 0.01, Pr (B|A) = 0.0001
    RR = Pr (B | A) / Pr (B |A) = 0.01 / 0.0001 = 100

   Interpretation: people with A is 100 times as likely to have TB
    as people with A .
   But if A , B indep, 
         Pr(B|A) = Pr(B) = Pr (B|A)  RR = 1
    Interpretation: equally likely to have TB regardless of A or A.
   Not a good screening test.
          Applications of Conditional
                  Probability
   Example: Smoking and lung cancer
    A = {smoking},      B = {lung cancer}
    RR = Pr (B|A) / Pr (B|A)
        = 0.08 / 0.0001 = 800
    Interpretation:     Smokers are 800 times as likely to develop
    lung cancer as non-smokers.

   Screening tests A , B;
         A = {“+” by screening}, B = {“+” by confirmation}
    Pr (B) = Pr (B  A) + Pr (B  A)
    From conditional probability
    Pr (B|A) = Pr (B  A) / Pr (A); Pr (B|A) = Pr (B A)/Pr (A)

   Pr (B) = Pr (B|A) Pr(A) + Pr (B|A) Pr(A)
          Applications of Conditional
                  Probability
   Example: A group of people had screening test for TB and
    5% got positive results. Assume 85% of the (+) will be
    confirmed by X-ray test, and 0.5% of the () will also be
    diagnosed by X-ray as TB patients. What proportion of the
    this group of people had TB?
   A = {(+) by screening}, B = {diagnosed by x-ray}
    Want to know Pr (B).
   Answer:      Pr (B) = Pr (B|A) Pr(A) + Pr (B|A) Pr(A)
    Pr (A) = .05,        Pr (B|A) = .85,        Pr (B|A) = .005
    Pr (A) = 1 – Pr (A) = 1 - .05 = .95
    Pr (B) = .85 x .05 + .005 x .95 = .0425 + .00475 = .04725
    RR = Pr (B|A) / Pr (B|A) = .85 / .005 = 170
       Applications of Probability
   Exhaustive events
    A1, …, Ak are exhaustive events, if at least one must
    occur.
    A1, …, Ak are mutually exclusive and exhaustive events
    if one and only one event must occur.
   -- No two events occur at the same time
   Examples: BP. A = {SBP  70}, C = {SBP > 110}
    B = {70 < SBP 110}. These 3 events mutually
    exclusive and exhaustive.
   Complement events. A and A
               Total Probability Rule
   A1, …, Ak are k mutually exclusive and exhaustive events.
    B is any event. Then Pr (B) = ki=1 Pr (B|Ai) Pr (Ai) .
   Note that Pr (B) = ki=1 Pr (BAi).
   The total probability is an weighted average of the
    conditional probability Pr(B|Ai) with weight Pr(Ai).
   Example: 3.22 A five year study of cataract.
    Population: 5000 people
    aged 60 and up.              Age group Population percentage   Cataract

    Q: What percentage of            60 – 64   45%                  2.4%
    population will have cataract?   65 – 69   28%                  4.6%
                                     70 – 74   20%                  8.8%
    How many cases of cataract       75+        7%                 15.3%
    can be expected?
                    Examples using
                  Total Probability Rule
   Answer: Define events: B = {develop cataract}.
    A1 = {in age group 60 – 64}, A2 = {in age group 65 – 69}
    A3 = {in age group 70 – 74}, A4 = {age 75+}.
    Pr (B) = 4i=1 Pr (B|Ai) Pr (Ai)
    Pr (Ai) :       45%      28%      20%     7%
    Pr (B|Ai):     2.4%     4.6%     8.8%    15.3%
    Pr (B) = 2.4% x 45% + … + 15.3% x 7% = 0.052 = 5.2%
    The number of expected cases = 5000 x 5.2% = 260

   Example: TB screening. A = {(+) by skin test},
    B = {(+) by x-ray test}. 1000 people went for
    TB screening. 50 got positive results.
    Further x-ray test confirms 40 out of    Pr (B) = Pr (B|A) Pr (A) + Pr (B|A) Pr (A)
    the 50 positives. Assume the skin test   =(40/50) x (50/1000)+0.2% x(1000–50)/1000
                                             = 0.04 + 0.0019 = 0.0419
    missed 0.2% of the negatives who are
                                             # TB patients = 0.0419 x 1000 = 41.9  42
    TB patients. How many TB patients
    are there in this group of 1000 people?
                            Bayes’ theorem
     Thomas Bayes (1702 ~ 1761)
mathematician who first used probability inductively
and established a mathematical basis for probability
inference (a means of calculating, from the number of
times an event has not occurred, the probability that it
will occur in future trials).
     Bayes’ rule and screening test
   Screening test : T. A = {T+}; Confirmation: B = {disease}.
   Q: How good is the screening test?
       1) Can it give a “+” to a subject with disease?      Sensitive
       2) Can it give a “-” to a subject with no disease?   Specific
       3) Can it predict disease accurately?                Predictive

   Predictive value positive (PV+) : Pr (dis. | T+)
   Predictive value negative (PV-) : Pr (no dis. | T – )
   Sensitivity = Pr (T+ | dis)
   Specificity = Pr (T – | no dis)
   False positive = {T + with no disease}
    False positive rate = Pr (T+ | no dis)
   False negative = {T – with disease}
    False negative rate = Pr (T – | dis )
     Bayes’ rule and screening test
   Example: Smoking v.s. lung cancer
    if using smoking for screening  too many false +.
    if using family history of breast cancer v.s. BR CA  too
    many false negatives.
   Pr (B) : probability of disease in the reference population.
   Bayes’ Rule                    Pr( A | B)  Pr( B)
    PV + = Pr (B|A) =       Pr( A | B)  Pr( B)  Pr( A | B)  Pr( B)
                                          x  sensitivity
                   =        x  sensitivity  (1 x)  (1 specificity)
    PV – = Pr (B|A) =                 Pr( A | B)  Pr( B)
                           Pr( A | B)  Pr( B)  Pr( A | B)  Pr( B)

                   =
                                      (1 x)  specificity
                           (1 x)  specificity  x  (1 sensitivity)
where x = Pr (B) = prevalence of disease.
      Facts about Screening Tests
   Some important points:
   Predictive values depend on
    1). Sensitivity, 2). Specificity, and 3). Prevalence of the
    disease.
   Sensitivity and specificity remain the same with the test,
    but the prevalence of the disease changes from
    population to population, from time to time.
   So, Pr (A|B) and Pr (A|B) remain the same with the same test,
    while predictive values Pr (B|A) and Pr (B|A) change with
    Pr (B).
   False positive rate Pr (A|B) = 1 – Pr (A|B) = 1 – specificity
   False negative rate Pr (A|B) = 1 – Pr (A | B) = 1 – sensitivity
   So false positive rate and false negative rate remain the same
    with the test when population changes.
       Calculation for Screening Test
        Example                     Disease
                                +              –    Total
    Screening       +          80             20    100
       Test
                    –          10             890   900
       Total                   90             910   1000

   Sensitivity = Pr (T+|D+) = 80/90 = .889
   Specificity = Pr (T-|D-) = 890/910 = .978
   Pr (D+) = 90 / 1000 = .09
   PV+ = Pr (D+|T+) = 80/100 = .80
   PV- = Pr (D-|T-) = 890 / 900 = .989
   Pf+ = Pr (T+|D-) = 20 / 910 = .022
   Pf- = Pr (T-|D+) = 10 / 90 = .111
       Calculation for Screening Test
        Example                     Disease
                                +              –    Total
    Screening       +          80             20    100
       Test
                    –          10             890   900
       Total                   90             910   1000

   Sensitivity = Pr (T+|D+) = 80/90 = .889
   Specificity = Pr (T-|D-) = 890/910 = .978
   Pr (D+) = 90 / 1000 = .09                       Pf- = 1 -
   PV+ = Pr (D+|T+) = 80/100 = .80                 sensitivity
   PV- = Pr (D-|T-) = 890 / 900 = .989
   Pf+ = Pr (T+|D-) = 20 / 910 = .022
   Pf- = Pr (T-|D+) = 10 / 90 = .111
       Calculation for Screening Test
        Example                     Disease
                                +              –    Total
    Screening       +          80             20    100
       Test
                    –          10             890   900
       Total                   90             910   1000

   Sensitivity = Pr (T+|D+) = 80/90 = .889
   Specificity = Pr (T-|D-) = 890/910 = .978
   Pr (D+) = 90 / 1000 = .09                          Pf+ = 1 -
   PV+ = Pr (D+|T+) = 80/100 = .80                    Specificity
   PV- = Pr (D-|T-) = 890 / 900 = .989
   Pf+ = Pr (T+|D-) = 20 / 910 = .022
   Pf- = Pr (T-|D+) = 10 / 90 = .111
               Generalized Bayes’ Rule
   Let B1, …, Bk be a set of mutually exclusive and exhaustive
    disease states, i.e. at least one occurs, but no two occur at the
    same time. Let A be symptoms of the disease.
                         Pr( A | Bi )  Pr( Bi )
    Pr (Bi | A) =    k
                     Pr( A | B j )  Pr(B j )
                    j 1


                                                              k                  k
    Pr (Bi | A) = Pr( A  Bi )                     Pr (A) =    Pr( A  B j )   Pr( A | B j ) Pr(B j )
                    Pr( A)                                    j 1              j 1

    Example.
    Suppose A = {chronic cough, lung biopsy}
    Disease: B1 = {normal}, B2 = {lung cancer}, B3 = {sarcoidosis}
    Pr (A| B1) = .001,      Pr (A| B2) = .9, Pr (A| B3) = .9
    for a 60 years old non-smoker:
          Pr (B1) = .99,    Pr (B2) = .001, Pr (B3) = .009
    Q: A 60 years old non-smoker with symptom A, what disease would you
    diagnose?
                  Generalized Bayes’ Rule
   A: 60 years old non-smoker:
    Pr (B1) = .99,      Pr (B2) = .001, Pr (B3) = .009
                  k                k
    Pr (A) =  Pr( A  B j )   Pr( A | B j ) Pr(B j )
                 j 1             j 1
         = .001 x .99 + .9 x .001 + .9 x .009 = .00999
    Pr (B1|A)=Pr(A|B1) Pr (B1)/Pr (A) =.001 x .99/.00999 = .099
    Pr (B2|A)=Pr (A|B2) Pr (B2)/Pr (A)= .9 x .001 /.00999 = .090
    Pr (B3|A)=Pr (A|B3) Pr (B3)/Pr (A) = .9 x .009/.00999 = .811

    With the conditional probabilities, we diagnose this person has
    sarcoidosis with probability p = 0.811.
    Receiver Operating Characteristic
              (ROC) Curve
         Clinical
         criterion




Normal         Hypertensive


A Blood Pressure Test
    Disease Prevalence and Incidence
   Prevalence = Pr (Having a disease at one time point)
            = total number of cases of disease at one time point /
    total number of population at the same time
    Prevalence is an instant rate, at one time point only.
   Cumulative incidence or incidence
    Incidence = Pr (developing a new case of disease over a period
    of time)
    It has a unit: per person-year or (person-year)-1
    It takes the time of exposure into consideration, within one period of time,
    eg. 1 yr.
   Example.
    Prevalence of hepatitis A is .01. How many cases of H.A are expected in a
    community of 1000 people?
    # cases = prevalence x population = .01 x 1000 = 10.
   Breast cancer incidence is 5 per 105 person-year. How many new cases are
    expected over next 5 years in a city with a population of 105 ?
    # new cases = incidence x duration x population
                = 5 / (105 person-year) x 5 year x 105 people = 25 cases

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:7/31/2011
language:English
pages:33