# Introduction to Biostatistics ZJU

Document Sample

Introduction to
Biostatistics (ZJU 2008)
Wenjiang Fu, Ph.D
Associate Professor
Division of Biostatistics, Department of
Epidemiology
Michigan State University
East Lansing, Michigan 48824, USA
Email: fuw@msu.edu
www: http://www.msu.edu/~fuw
Chapter 3 Probability Theory
   Probability of an event: Pr (A).
   Sample space: a collection of all possible outcomes of events.
   Examples:
   Tossing a coin {H}, {T}. Tossing two coins {HH}, {HT}, {TT}.
   Tossing a dice {1}, {2}, … {6}. Blood pressure {L}, {H}, {N}.
   Sex {M}, {F}.       Race {white}, {black}, {Asian} …
   How to have a probability of male students in the class.
   1. count {M} and {F}.       N(male) / N(total).
   2. Sampling method. Assuming total 40 students.
Take random samples of 40 with replacement. Calculate proportion p.
Repeat this procedure many times, say B = 200 times.
Take the average of these 200 proportions (p1+…+p200)/200 =p0
   This number p0 will be close to the true proportion p in the class.
This method is called Bootstrap sampling – Invented by Efron.
Efron and Tibshirani: Introduction to the Bootstrap, Spring 1991.
Bootstrap R program
###### male student proportion in    p0 <- mean(pp)
###### class by bootstrap sampling
plot(c(1:B), pp)
x <- c(rep(1,25),rep(0,15))
#### 25 males among 40 students
x                                    abline(h=p0,
col=‘red’)
p<- sum(x)/length(x)
title(main=
B <- 200                             ‘Bootstrap
pp <- rep(0, B)                         Approximation’)
p
for (i in 1:B) {                     p0
y <- sample(x, replace = T)
}
p = 0.625   p0 = 0.6188
Bootstrap
Events
   An event is any set of outcomes of interest
   Examples: heart attack, diarrhea, accident …
   Probability (of an event): relative frequency of this event over
an indefinitely large (or infinite) number of trials. Pr (A)
   Example 1: Student of age 19 – 20 in the classroom
Take a student, check the age,              1 – yes,   0 – no ;
put student back to the pool. And record the process:
1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, ……
so far, 7 / 15 around certain number (Prob.)
   Example 2: Student of systolic BP 95 – 105 mm Hg,
1 – systolic BP 95 – 105; 0 – otherwise.
1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, … …
   Example 3: Student had Hepatitis – B vaccine.
1- H.B. vaccinated;         0- not yet.
Properties of probability
Properties of probability (Pr (A) ).
 (1) 0  Pr (A)  1
 (2) If A and B can not occur at the same time, then
Pr (A or B) = Pr (A) + Pr (B)
 Mutually exclusive: if both can not occur at the same
time.
 Example: A = {DBP > 90} , B = {DBP < 75}
 Notation { . } – Event description.
 Venn diagram – John Venn (1834 – 1923) of England
Venn Diagram

John Venn popularized the idea of Venn Diagrams.

John lived from 1834 to 1923 in England. He was a priest
and taught at Gonville and Caius College of Cambridge. He
wrote several books including two on logic.
Venn diagram

A                B               A∩B
B
A

Mutually exclusive           Not mutually exclusive

A           A
Complement
A

Event operations
   Union A  B: either A or B occurs or both occur.
   Intersection A  B: the event that both A and B occur
simultaneously.
   Complement A: the event where A does not occur.
   Example: A = { SBP  70 } , B = { SBP  90 }
A  B = { SBP  70 or SBP  90 } = { All SBP readings },
A  B = { 70  SBP  90 }
A = {SBP< 70}
   Independence Two events are independent if the outcome of
one event does not affect the other’s: or the intersection event
probability follows
Pr (A  B)= Pr (A) x Pr (B)
    Q: if two events are mutually exclusive, are they independent?
Probability
   Example: A = {Father hypertension}, Pr (A) = 0.6
B = {Mother hypertension}. Pr (B) = 0.5
1) If Pr (A  B) = 0.3, are A and B independent?
2) How about if Pr (A  B) = 0.6?
   Pr (A  B) = 0.3 = 0.6  0.5 = Pr (A)  Pr (B)
Conclusion: A and B are independent in 1).
If Pr (A  B) = 0.6 then
Pr (A  B) = 0.6 > 0.6  0.5 = Pr (A)  Pr (B)
Conclusion: A and B are not independent.
Causes of hypertension: very complex. But high salt
intake leads to hypertension. Couples commonly share
the same kind of food at home and may thus be highly
correlated to be hypertensive.
Probability
   Example: Example: Students A and B take an exam.
A – {Student A passes}; B – {Student B passes}.
Pr (A) = 0.7, Pr (B) = 0.8,
Pr (A  B) = ?
Pr (A  B) = 0.7  0.8 = 0.56 if independent.
How about if two students compete for the same award?

   Example: TB (tuberculosis) tests:
A = {“+” by skin test}      B = {“+” by X-ray test}
Ideally use B to confirm A.
Pr (A) = 0.3, Pr (B) = 0.5, Pr (A  B) = 0.15
then A and B are independent.
Probability Laws
   Multiplication law
If A1, A2, …, Ak are k mutually independent, then
Pr(A1  A2…  Ak) = Pr (A1)  Pr (A2)  …  Pr
(Ak)
A and B are two events, then
Pr (A  B) = Pr (A) + Pr (B) – Pr (A  B)
The Venn diagram for addition law

AB         AB       AB

3 mutually exclusive events
Probability Laws
   Addition law for more than 2 events
A, B and C are three events, then
Pr (A  B  C) = Pr (A) + Pr (B) + Pr (C)
– Pr (A  B) – Pr (A  C) – Pr (B  C)
+ Pr (A  B  C)
The Venn diagram for addition law

A                       B
AB C

C
Probability
   Example: 3.1 - 3.13
Family with flu: mother, father, 2 kids.
A1 = {mother has flu}, A2 = {father flu}, A3 = {1st kid flu}
A4 = {2nd kid flu}, B = {at least one kid has flu},
C = {at least one parent flu}, D = {at least one person flu}
   If Pr (A1) = .1, Pr (A2) = .1,         Pr (both parents) = .02,
Pr (each kid has flu) = .2, Pr (at least one kid has flu) = .3
   Q: B = ? C = ? D = ? A1, A2 indep ? A3, A4 indep?
Pr (B) = ? {no kid has flu} = ? {no person has flu} = ?
   A: Pr(A3  A4)=?
   Addition Law: Pr(A3  A4) = Pr(A3) + Pr(A4) - Pr(A3  A4)
   Pr(A3  A4) =Pr(A3)+Pr(A4) - Pr(A3  A4) = .2 + .2 - .3 = .1
Conditional Probability
   Example. TB tests:       Skin test v.s. X-ray test
A = {“+” by skin test}          B = {“+” by x-ray}
Ideally   AB         and A  B
B confirms A, or A and B function the same;
Extremely dependent
       Opposite: skin test is independent of X-ray test
i.e. A and B do not depend on each other.
       Not ideal (in practice)
       Need some kind of dependence: A  B
     Want to know the probability of B given A
Conditional Probability

   Conditional Probability of B given A
Pr (B | A) = Pr (B  A) / Pr (A)
   Interpretation!
   Rules:
(1) A, B indep,  Pr (B|A) = Pr (B)=Pr (B|A)
(2) A, B dep,  Pr (B|A)  Pr (B)  Pr(B|A)
because Pr (AB)  Pr (A) Pr (B)
Applications of Conditional
Probability
   Relative risk or risk ratio (RR) of B given A
Pr (B|A) / Pr(B |A )
   Example. TB: Pr (B|A) = 0.01, Pr (B|A) = 0.0001
RR = Pr (B | A) / Pr (B |A) = 0.01 / 0.0001 = 100

   Interpretation: people with A is 100 times as likely to have TB
as people with A .
   But if A , B indep, 
Pr(B|A) = Pr(B) = Pr (B|A)  RR = 1
Interpretation: equally likely to have TB regardless of A or A.
   Not a good screening test.
Applications of Conditional
Probability
   Example: Smoking and lung cancer
A = {smoking},      B = {lung cancer}
RR = Pr (B|A) / Pr (B|A)
= 0.08 / 0.0001 = 800
Interpretation:     Smokers are 800 times as likely to develop
lung cancer as non-smokers.

   Screening tests A , B;
A = {“+” by screening}, B = {“+” by confirmation}
Pr (B) = Pr (B  A) + Pr (B  A)
From conditional probability
Pr (B|A) = Pr (B  A) / Pr (A); Pr (B|A) = Pr (B A)/Pr (A)

   Pr (B) = Pr (B|A) Pr(A) + Pr (B|A) Pr(A)
Applications of Conditional
Probability
   Example: A group of people had screening test for TB and
5% got positive results. Assume 85% of the (+) will be
confirmed by X-ray test, and 0.5% of the () will also be
diagnosed by X-ray as TB patients. What proportion of the
this group of people had TB?
   A = {(+) by screening}, B = {diagnosed by x-ray}
Want to know Pr (B).
   Answer:      Pr (B) = Pr (B|A) Pr(A) + Pr (B|A) Pr(A)
Pr (A) = .05,        Pr (B|A) = .85,        Pr (B|A) = .005
Pr (A) = 1 – Pr (A) = 1 - .05 = .95
Pr (B) = .85 x .05 + .005 x .95 = .0425 + .00475 = .04725
RR = Pr (B|A) / Pr (B|A) = .85 / .005 = 170
Applications of Probability
   Exhaustive events
A1, …, Ak are exhaustive events, if at least one must
occur.
A1, …, Ak are mutually exclusive and exhaustive events
if one and only one event must occur.
   -- No two events occur at the same time
   Examples: BP. A = {SBP  70}, C = {SBP > 110}
B = {70 < SBP 110}. These 3 events mutually
exclusive and exhaustive.
   Complement events. A and A
Total Probability Rule
   A1, …, Ak are k mutually exclusive and exhaustive events.
B is any event. Then Pr (B) = ki=1 Pr (B|Ai) Pr (Ai) .
   Note that Pr (B) = ki=1 Pr (BAi).
   The total probability is an weighted average of the
conditional probability Pr(B|Ai) with weight Pr(Ai).
   Example: 3.22 A five year study of cataract.
Population: 5000 people
aged 60 and up.              Age group Population percentage   Cataract

Q: What percentage of            60 – 64   45%                  2.4%
population will have cataract?   65 – 69   28%                  4.6%
70 – 74   20%                  8.8%
How many cases of cataract       75+        7%                 15.3%
can be expected?
Examples using
Total Probability Rule
   Answer: Define events: B = {develop cataract}.
A1 = {in age group 60 – 64}, A2 = {in age group 65 – 69}
A3 = {in age group 70 – 74}, A4 = {age 75+}.
Pr (B) = 4i=1 Pr (B|Ai) Pr (Ai)
Pr (Ai) :       45%      28%      20%     7%
Pr (B|Ai):     2.4%     4.6%     8.8%    15.3%
Pr (B) = 2.4% x 45% + … + 15.3% x 7% = 0.052 = 5.2%
The number of expected cases = 5000 x 5.2% = 260

   Example: TB screening. A = {(+) by skin test},
B = {(+) by x-ray test}. 1000 people went for
TB screening. 50 got positive results.
Further x-ray test confirms 40 out of    Pr (B) = Pr (B|A) Pr (A) + Pr (B|A) Pr (A)
the 50 positives. Assume the skin test   =(40/50) x (50/1000)+0.2% x(1000–50)/1000
= 0.04 + 0.0019 = 0.0419
missed 0.2% of the negatives who are
# TB patients = 0.0419 x 1000 = 41.9  42
TB patients. How many TB patients
are there in this group of 1000 people?
Bayes’ theorem
Thomas Bayes (1702 ~ 1761)
mathematician who first used probability inductively
and established a mathematical basis for probability
inference (a means of calculating, from the number of
times an event has not occurred, the probability that it
will occur in future trials).
Bayes’ rule and screening test
   Screening test : T. A = {T+}; Confirmation: B = {disease}.
   Q: How good is the screening test?
   1) Can it give a “+” to a subject with disease?      Sensitive
   2) Can it give a “-” to a subject with no disease?   Specific
   3) Can it predict disease accurately?                Predictive

   Predictive value positive (PV+) : Pr (dis. | T+)
   Predictive value negative (PV-) : Pr (no dis. | T – )
   Sensitivity = Pr (T+ | dis)
   Specificity = Pr (T – | no dis)
   False positive = {T + with no disease}
False positive rate = Pr (T+ | no dis)
   False negative = {T – with disease}
False negative rate = Pr (T – | dis )
Bayes’ rule and screening test
   Example: Smoking v.s. lung cancer
if using smoking for screening  too many false +.
if using family history of breast cancer v.s. BR CA  too
many false negatives.
   Pr (B) : probability of disease in the reference population.
   Bayes’ Rule                    Pr( A | B)  Pr( B)
PV + = Pr (B|A) =       Pr( A | B)  Pr( B)  Pr( A | B)  Pr( B)
x  sensitivity
=        x  sensitivity  (1 x)  (1 specificity)
PV – = Pr (B|A) =                 Pr( A | B)  Pr( B)
Pr( A | B)  Pr( B)  Pr( A | B)  Pr( B)

=
(1 x)  specificity
(1 x)  specificity  x  (1 sensitivity)
where x = Pr (B) = prevalence of disease.
   Some important points:
   Predictive values depend on
1). Sensitivity, 2). Specificity, and 3). Prevalence of the
disease.
   Sensitivity and specificity remain the same with the test,
but the prevalence of the disease changes from
population to population, from time to time.
   So, Pr (A|B) and Pr (A|B) remain the same with the same test,
while predictive values Pr (B|A) and Pr (B|A) change with
Pr (B).
   False positive rate Pr (A|B) = 1 – Pr (A|B) = 1 – specificity
   False negative rate Pr (A|B) = 1 – Pr (A | B) = 1 – sensitivity
   So false positive rate and false negative rate remain the same
with the test when population changes.
Calculation for Screening Test
Example                     Disease
+              –    Total
Screening       +          80             20    100
Test
–          10             890   900
Total                   90             910   1000

   Sensitivity = Pr (T+|D+) = 80/90 = .889
   Specificity = Pr (T-|D-) = 890/910 = .978
   Pr (D+) = 90 / 1000 = .09
   PV+ = Pr (D+|T+) = 80/100 = .80
   PV- = Pr (D-|T-) = 890 / 900 = .989
   Pf+ = Pr (T+|D-) = 20 / 910 = .022
   Pf- = Pr (T-|D+) = 10 / 90 = .111
Calculation for Screening Test
Example                     Disease
+              –    Total
Screening       +          80             20    100
Test
–          10             890   900
Total                   90             910   1000

   Sensitivity = Pr (T+|D+) = 80/90 = .889
   Specificity = Pr (T-|D-) = 890/910 = .978
   Pr (D+) = 90 / 1000 = .09                       Pf- = 1 -
   PV+ = Pr (D+|T+) = 80/100 = .80                 sensitivity
   PV- = Pr (D-|T-) = 890 / 900 = .989
   Pf+ = Pr (T+|D-) = 20 / 910 = .022
   Pf- = Pr (T-|D+) = 10 / 90 = .111
Calculation for Screening Test
Example                     Disease
+              –    Total
Screening       +          80             20    100
Test
–          10             890   900
Total                   90             910   1000

   Sensitivity = Pr (T+|D+) = 80/90 = .889
   Specificity = Pr (T-|D-) = 890/910 = .978
   Pr (D+) = 90 / 1000 = .09                          Pf+ = 1 -
   PV+ = Pr (D+|T+) = 80/100 = .80                    Specificity
   PV- = Pr (D-|T-) = 890 / 900 = .989
   Pf+ = Pr (T+|D-) = 20 / 910 = .022
   Pf- = Pr (T-|D+) = 10 / 90 = .111
Generalized Bayes’ Rule
   Let B1, …, Bk be a set of mutually exclusive and exhaustive
disease states, i.e. at least one occurs, but no two occur at the
same time. Let A be symptoms of the disease.
Pr( A | Bi )  Pr( Bi )
Pr (Bi | A) =    k
 Pr( A | B j )  Pr(B j )
j 1

k                  k
Pr (Bi | A) = Pr( A  Bi )                     Pr (A) =    Pr( A  B j )   Pr( A | B j ) Pr(B j )
Pr( A)                                    j 1              j 1

Example.
Suppose A = {chronic cough, lung biopsy}
Disease: B1 = {normal}, B2 = {lung cancer}, B3 = {sarcoidosis}
Pr (A| B1) = .001,      Pr (A| B2) = .9, Pr (A| B3) = .9
for a 60 years old non-smoker:
Pr (B1) = .99,    Pr (B2) = .001, Pr (B3) = .009
Q: A 60 years old non-smoker with symptom A, what disease would you
diagnose?
Generalized Bayes’ Rule
   A: 60 years old non-smoker:
Pr (B1) = .99,      Pr (B2) = .001, Pr (B3) = .009
k                k
Pr (A) =  Pr( A  B j )   Pr( A | B j ) Pr(B j )
j 1             j 1
= .001 x .99 + .9 x .001 + .9 x .009 = .00999
Pr (B1|A)=Pr(A|B1) Pr (B1)/Pr (A) =.001 x .99/.00999 = .099
Pr (B2|A)=Pr (A|B2) Pr (B2)/Pr (A)= .9 x .001 /.00999 = .090
Pr (B3|A)=Pr (A|B3) Pr (B3)/Pr (A) = .9 x .009/.00999 = .811

With the conditional probabilities, we diagnose this person has
sarcoidosis with probability p = 0.811.
(ROC) Curve
Clinical
criterion

Normal         Hypertensive

A Blood Pressure Test
Disease Prevalence and Incidence
   Prevalence = Pr (Having a disease at one time point)
= total number of cases of disease at one time point /
total number of population at the same time
Prevalence is an instant rate, at one time point only.
   Cumulative incidence or incidence
Incidence = Pr (developing a new case of disease over a period
of time)
It has a unit: per person-year or (person-year)-1
It takes the time of exposure into consideration, within one period of time,
eg. 1 yr.
   Example.
Prevalence of hepatitis A is .01. How many cases of H.A are expected in a
community of 1000 people?
# cases = prevalence x population = .01 x 1000 = 10.
   Breast cancer incidence is 5 per 105 person-year. How many new cases are
expected over next 5 years in a city with a population of 105 ?
# new cases = incidence x duration x population
= 5 / (105 person-year) x 5 year x 105 people = 25 cases

DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 3 posted: 7/31/2011 language: English pages: 33
How are you planning on using Docstoc?