Document Sample

Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University East Lansing, Michigan 48824, USA Email: fuw@msu.edu www: http://www.msu.edu/~fuw Chapter 3 Probability Theory Probability of an event: Pr (A). Sample space: a collection of all possible outcomes of events. Examples: Tossing a coin {H}, {T}. Tossing two coins {HH}, {HT}, {TT}. Tossing a dice {1}, {2}, … {6}. Blood pressure {L}, {H}, {N}. Sex {M}, {F}. Race {white}, {black}, {Asian} … How to have a probability of male students in the class. 1. count {M} and {F}. N(male) / N(total). 2. Sampling method. Assuming total 40 students. Take random samples of 40 with replacement. Calculate proportion p. Repeat this procedure many times, say B = 200 times. Take the average of these 200 proportions (p1+…+p200)/200 =p0 This number p0 will be close to the true proportion p in the class. This method is called Bootstrap sampling – Invented by Efron. Efron and Tibshirani: Introduction to the Bootstrap, Spring 1991. Bootstrap R program ###### male student proportion in p0 <- mean(pp) ###### class by bootstrap sampling plot(c(1:B), pp) x <- c(rep(1,25),rep(0,15)) #### 25 males among 40 students x abline(h=p0, col=‘red’) p<- sum(x)/length(x) title(main= B <- 200 ‘Bootstrap pp <- rep(0, B) Approximation’) p for (i in 1:B) { p0 y <- sample(x, replace = T) pp[i] <- sum(y)/length(y) Answer: } p = 0.625 p0 = 0.6188 Bootstrap Events An event is any set of outcomes of interest Examples: heart attack, diarrhea, accident … Probability (of an event): relative frequency of this event over an indefinitely large (or infinite) number of trials. Pr (A) Example 1: Student of age 19 – 20 in the classroom Take a student, check the age, 1 – yes, 0 – no ; put student back to the pool. And record the process: 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, …… so far, 7 / 15 around certain number (Prob.) Example 2: Student of systolic BP 95 – 105 mm Hg, 1 – systolic BP 95 – 105; 0 – otherwise. 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, … … Example 3: Student had Hepatitis – B vaccine. 1- H.B. vaccinated; 0- not yet. Properties of probability Properties of probability (Pr (A) ). (1) 0 Pr (A) 1 (2) If A and B can not occur at the same time, then Pr (A or B) = Pr (A) + Pr (B) Mutually exclusive: if both can not occur at the same time. Example: A = {DBP > 90} , B = {DBP < 75} Notation { . } – Event description. Venn diagram – John Venn (1834 – 1923) of England Venn Diagram John Venn popularized the idea of Venn Diagrams. John lived from 1834 to 1923 in England. He was a priest and taught at Gonville and Caius College of Cambridge. He wrote several books including two on logic. Venn diagram A B A∩B B A Mutually exclusive Not mutually exclusive A A Complement A Event operations Union A B: either A or B occurs or both occur. Intersection A B: the event that both A and B occur simultaneously. Complement A: the event where A does not occur. Example: A = { SBP 70 } , B = { SBP 90 } A B = { SBP 70 or SBP 90 } = { All SBP readings }, A B = { 70 SBP 90 } A = {SBP< 70} Independence Two events are independent if the outcome of one event does not affect the other’s: or the intersection event probability follows Pr (A B)= Pr (A) x Pr (B) Q: if two events are mutually exclusive, are they independent? Probability Example: A = {Father hypertension}, Pr (A) = 0.6 B = {Mother hypertension}. Pr (B) = 0.5 1) If Pr (A B) = 0.3, are A and B independent? 2) How about if Pr (A B) = 0.6? Pr (A B) = 0.3 = 0.6 0.5 = Pr (A) Pr (B) Conclusion: A and B are independent in 1). If Pr (A B) = 0.6 then Pr (A B) = 0.6 > 0.6 0.5 = Pr (A) Pr (B) Conclusion: A and B are not independent. Causes of hypertension: very complex. But high salt intake leads to hypertension. Couples commonly share the same kind of food at home and may thus be highly correlated to be hypertensive. Probability Example: Example: Students A and B take an exam. A – {Student A passes}; B – {Student B passes}. Pr (A) = 0.7, Pr (B) = 0.8, Pr (A B) = ? Pr (A B) = 0.7 0.8 = 0.56 if independent. How about if two students compete for the same award? Example: TB (tuberculosis) tests: A = {“+” by skin test} B = {“+” by X-ray test} Ideally use B to confirm A. Pr (A) = 0.3, Pr (B) = 0.5, Pr (A B) = 0.15 then A and B are independent. Probability Laws Multiplication law If A1, A2, …, Ak are k mutually independent, then Pr(A1 A2… Ak) = Pr (A1) Pr (A2) … Pr (Ak) Addition law A and B are two events, then Pr (A B) = Pr (A) + Pr (B) – Pr (A B) The Venn diagram for addition law AB AB AB 3 mutually exclusive events Probability Laws Addition law for more than 2 events A, B and C are three events, then Pr (A B C) = Pr (A) + Pr (B) + Pr (C) – Pr (A B) – Pr (A C) – Pr (B C) + Pr (A B C) The Venn diagram for addition law A B AB C C Probability Example: 3.1 - 3.13 Family with flu: mother, father, 2 kids. A1 = {mother has flu}, A2 = {father flu}, A3 = {1st kid flu} A4 = {2nd kid flu}, B = {at least one kid has flu}, C = {at least one parent flu}, D = {at least one person flu} If Pr (A1) = .1, Pr (A2) = .1, Pr (both parents) = .02, Pr (each kid has flu) = .2, Pr (at least one kid has flu) = .3 Q: B = ? C = ? D = ? A1, A2 indep ? A3, A4 indep? Pr (B) = ? {no kid has flu} = ? {no person has flu} = ? A: Pr(A3 A4)=? Addition Law: Pr(A3 A4) = Pr(A3) + Pr(A4) - Pr(A3 A4) Pr(A3 A4) =Pr(A3)+Pr(A4) - Pr(A3 A4) = .2 + .2 - .3 = .1 Conditional Probability Example. TB tests: Skin test v.s. X-ray test A = {“+” by skin test} B = {“+” by x-ray} Ideally AB and A B B confirms A, or A and B function the same; Extremely dependent Opposite: skin test is independent of X-ray test i.e. A and B do not depend on each other. Not ideal (in practice) Need some kind of dependence: A B Want to know the probability of B given A Conditional Probability Conditional Probability of B given A Pr (B | A) = Pr (B A) / Pr (A) Interpretation! Rules: (1) A, B indep, Pr (B|A) = Pr (B)=Pr (B|A) (2) A, B dep, Pr (B|A) Pr (B) Pr(B|A) because Pr (AB) Pr (A) Pr (B) Applications of Conditional Probability Relative risk or risk ratio (RR) of B given A Pr (B|A) / Pr(B |A ) Example. TB: Pr (B|A) = 0.01, Pr (B|A) = 0.0001 RR = Pr (B | A) / Pr (B |A) = 0.01 / 0.0001 = 100 Interpretation: people with A is 100 times as likely to have TB as people with A . But if A , B indep, Pr(B|A) = Pr(B) = Pr (B|A) RR = 1 Interpretation: equally likely to have TB regardless of A or A. Not a good screening test. Applications of Conditional Probability Example: Smoking and lung cancer A = {smoking}, B = {lung cancer} RR = Pr (B|A) / Pr (B|A) = 0.08 / 0.0001 = 800 Interpretation: Smokers are 800 times as likely to develop lung cancer as non-smokers. Screening tests A , B; A = {“+” by screening}, B = {“+” by confirmation} Pr (B) = Pr (B A) + Pr (B A) From conditional probability Pr (B|A) = Pr (B A) / Pr (A); Pr (B|A) = Pr (B A)/Pr (A) Pr (B) = Pr (B|A) Pr(A) + Pr (B|A) Pr(A) Applications of Conditional Probability Example: A group of people had screening test for TB and 5% got positive results. Assume 85% of the (+) will be confirmed by X-ray test, and 0.5% of the () will also be diagnosed by X-ray as TB patients. What proportion of the this group of people had TB? A = {(+) by screening}, B = {diagnosed by x-ray} Want to know Pr (B). Answer: Pr (B) = Pr (B|A) Pr(A) + Pr (B|A) Pr(A) Pr (A) = .05, Pr (B|A) = .85, Pr (B|A) = .005 Pr (A) = 1 – Pr (A) = 1 - .05 = .95 Pr (B) = .85 x .05 + .005 x .95 = .0425 + .00475 = .04725 RR = Pr (B|A) / Pr (B|A) = .85 / .005 = 170 Applications of Probability Exhaustive events A1, …, Ak are exhaustive events, if at least one must occur. A1, …, Ak are mutually exclusive and exhaustive events if one and only one event must occur. -- No two events occur at the same time Examples: BP. A = {SBP 70}, C = {SBP > 110} B = {70 < SBP 110}. These 3 events mutually exclusive and exhaustive. Complement events. A and A Total Probability Rule A1, …, Ak are k mutually exclusive and exhaustive events. B is any event. Then Pr (B) = ki=1 Pr (B|Ai) Pr (Ai) . Note that Pr (B) = ki=1 Pr (BAi). The total probability is an weighted average of the conditional probability Pr(B|Ai) with weight Pr(Ai). Example: 3.22 A five year study of cataract. Population: 5000 people aged 60 and up. Age group Population percentage Cataract Q: What percentage of 60 – 64 45% 2.4% population will have cataract? 65 – 69 28% 4.6% 70 – 74 20% 8.8% How many cases of cataract 75+ 7% 15.3% can be expected? Examples using Total Probability Rule Answer: Define events: B = {develop cataract}. A1 = {in age group 60 – 64}, A2 = {in age group 65 – 69} A3 = {in age group 70 – 74}, A4 = {age 75+}. Pr (B) = 4i=1 Pr (B|Ai) Pr (Ai) Pr (Ai) : 45% 28% 20% 7% Pr (B|Ai): 2.4% 4.6% 8.8% 15.3% Pr (B) = 2.4% x 45% + … + 15.3% x 7% = 0.052 = 5.2% The number of expected cases = 5000 x 5.2% = 260 Example: TB screening. A = {(+) by skin test}, B = {(+) by x-ray test}. 1000 people went for TB screening. 50 got positive results. Further x-ray test confirms 40 out of Pr (B) = Pr (B|A) Pr (A) + Pr (B|A) Pr (A) the 50 positives. Assume the skin test =(40/50) x (50/1000)+0.2% x(1000–50)/1000 = 0.04 + 0.0019 = 0.0419 missed 0.2% of the negatives who are # TB patients = 0.0419 x 1000 = 41.9 42 TB patients. How many TB patients are there in this group of 1000 people? Bayes’ theorem Thomas Bayes (1702 ~ 1761) mathematician who first used probability inductively and established a mathematical basis for probability inference (a means of calculating, from the number of times an event has not occurred, the probability that it will occur in future trials). Bayes’ rule and screening test Screening test : T. A = {T+}; Confirmation: B = {disease}. Q: How good is the screening test? 1) Can it give a “+” to a subject with disease? Sensitive 2) Can it give a “-” to a subject with no disease? Specific 3) Can it predict disease accurately? Predictive Predictive value positive (PV+) : Pr (dis. | T+) Predictive value negative (PV-) : Pr (no dis. | T – ) Sensitivity = Pr (T+ | dis) Specificity = Pr (T – | no dis) False positive = {T + with no disease} False positive rate = Pr (T+ | no dis) False negative = {T – with disease} False negative rate = Pr (T – | dis ) Bayes’ rule and screening test Example: Smoking v.s. lung cancer if using smoking for screening too many false +. if using family history of breast cancer v.s. BR CA too many false negatives. Pr (B) : probability of disease in the reference population. Bayes’ Rule Pr( A | B) Pr( B) PV + = Pr (B|A) = Pr( A | B) Pr( B) Pr( A | B) Pr( B) x sensitivity = x sensitivity (1 x) (1 specificity) PV – = Pr (B|A) = Pr( A | B) Pr( B) Pr( A | B) Pr( B) Pr( A | B) Pr( B) = (1 x) specificity (1 x) specificity x (1 sensitivity) where x = Pr (B) = prevalence of disease. Facts about Screening Tests Some important points: Predictive values depend on 1). Sensitivity, 2). Specificity, and 3). Prevalence of the disease. Sensitivity and specificity remain the same with the test, but the prevalence of the disease changes from population to population, from time to time. So, Pr (A|B) and Pr (A|B) remain the same with the same test, while predictive values Pr (B|A) and Pr (B|A) change with Pr (B). False positive rate Pr (A|B) = 1 – Pr (A|B) = 1 – specificity False negative rate Pr (A|B) = 1 – Pr (A | B) = 1 – sensitivity So false positive rate and false negative rate remain the same with the test when population changes. Calculation for Screening Test Example Disease + – Total Screening + 80 20 100 Test – 10 890 900 Total 90 910 1000 Sensitivity = Pr (T+|D+) = 80/90 = .889 Specificity = Pr (T-|D-) = 890/910 = .978 Pr (D+) = 90 / 1000 = .09 PV+ = Pr (D+|T+) = 80/100 = .80 PV- = Pr (D-|T-) = 890 / 900 = .989 Pf+ = Pr (T+|D-) = 20 / 910 = .022 Pf- = Pr (T-|D+) = 10 / 90 = .111 Calculation for Screening Test Example Disease + – Total Screening + 80 20 100 Test – 10 890 900 Total 90 910 1000 Sensitivity = Pr (T+|D+) = 80/90 = .889 Specificity = Pr (T-|D-) = 890/910 = .978 Pr (D+) = 90 / 1000 = .09 Pf- = 1 - PV+ = Pr (D+|T+) = 80/100 = .80 sensitivity PV- = Pr (D-|T-) = 890 / 900 = .989 Pf+ = Pr (T+|D-) = 20 / 910 = .022 Pf- = Pr (T-|D+) = 10 / 90 = .111 Calculation for Screening Test Example Disease + – Total Screening + 80 20 100 Test – 10 890 900 Total 90 910 1000 Sensitivity = Pr (T+|D+) = 80/90 = .889 Specificity = Pr (T-|D-) = 890/910 = .978 Pr (D+) = 90 / 1000 = .09 Pf+ = 1 - PV+ = Pr (D+|T+) = 80/100 = .80 Specificity PV- = Pr (D-|T-) = 890 / 900 = .989 Pf+ = Pr (T+|D-) = 20 / 910 = .022 Pf- = Pr (T-|D+) = 10 / 90 = .111 Generalized Bayes’ Rule Let B1, …, Bk be a set of mutually exclusive and exhaustive disease states, i.e. at least one occurs, but no two occur at the same time. Let A be symptoms of the disease. Pr( A | Bi ) Pr( Bi ) Pr (Bi | A) = k Pr( A | B j ) Pr(B j ) j 1 k k Pr (Bi | A) = Pr( A Bi ) Pr (A) = Pr( A B j ) Pr( A | B j ) Pr(B j ) Pr( A) j 1 j 1 Example. Suppose A = {chronic cough, lung biopsy} Disease: B1 = {normal}, B2 = {lung cancer}, B3 = {sarcoidosis} Pr (A| B1) = .001, Pr (A| B2) = .9, Pr (A| B3) = .9 for a 60 years old non-smoker: Pr (B1) = .99, Pr (B2) = .001, Pr (B3) = .009 Q: A 60 years old non-smoker with symptom A, what disease would you diagnose? Generalized Bayes’ Rule A: 60 years old non-smoker: Pr (B1) = .99, Pr (B2) = .001, Pr (B3) = .009 k k Pr (A) = Pr( A B j ) Pr( A | B j ) Pr(B j ) j 1 j 1 = .001 x .99 + .9 x .001 + .9 x .009 = .00999 Pr (B1|A)=Pr(A|B1) Pr (B1)/Pr (A) =.001 x .99/.00999 = .099 Pr (B2|A)=Pr (A|B2) Pr (B2)/Pr (A)= .9 x .001 /.00999 = .090 Pr (B3|A)=Pr (A|B3) Pr (B3)/Pr (A) = .9 x .009/.00999 = .811 With the conditional probabilities, we diagnose this person has sarcoidosis with probability p = 0.811. Receiver Operating Characteristic (ROC) Curve Clinical criterion Normal Hypertensive A Blood Pressure Test Disease Prevalence and Incidence Prevalence = Pr (Having a disease at one time point) = total number of cases of disease at one time point / total number of population at the same time Prevalence is an instant rate, at one time point only. Cumulative incidence or incidence Incidence = Pr (developing a new case of disease over a period of time) It has a unit: per person-year or (person-year)-1 It takes the time of exposure into consideration, within one period of time, eg. 1 yr. Example. Prevalence of hepatitis A is .01. How many cases of H.A are expected in a community of 1000 people? # cases = prevalence x population = .01 x 1000 = 10. Breast cancer incidence is 5 per 105 person-year. How many new cases are expected over next 5 years in a city with a population of 105 ? # new cases = incidence x duration x population = 5 / (105 person-year) x 5 year x 105 people = 25 cases

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 3 |

posted: | 7/31/2011 |

language: | English |

pages: | 33 |

OTHER DOCS BY MikeJenny

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.