VIEWS: 4 PAGES: 35 POSTED ON: 10/15/2011 Public Domain
Reverend Thomas Bayes (1702-1761) Bayesian Networks 1. Probability theory 2. BN as knowledge model 3. Bayes in Court 4. Dazzle examples 5. Conclusions Jenneke IJzerman, Bayesiaanse Statistiek in de Rechtspraak, VU Amsterdam, September 2004. http://www.few.vu.nl/onderwijs/stage/werkstuk/werkstukken/werkstuk-ijzerman.doc Expert Systems 8 1 Thought Experiment: Hypothesis Selection Imagine two types of bag: Probability of this result from • BagA: 250 + 750 • BagA: 0. 0144 • BagB: 750 + 250 • BagB: 0. 396 Conclusion: The bag is BagB. But… • We don’t know how the bag was selected • We don’t even know that type BagB exists Take 5 balls from a bag: • Experiment is meaningful only • Result: 4 + 1 in light of the a priori posed What is the type of the bag? hypotheses (BagA, BagB) and their assumed likelihoods. Expert Systems 8 2 Classical and Bayesian statistics Classical statistics: • Compute the prob for your data, assuming a hypothesis • Reject a hypothesis if the data becomes unlikely Bayesian statistics: • Compute the prob for a hypothesis, given your data • Requires a priori prob for each hypothesis; these are extremely important! Expert Systems 8 3 Part I: Probability theory What is a probability? • Frequentist: relative Blont Not blond frequency of occurrence. 30 70 • Subjectivist: amount of belief • Mathematician: Blond Not Axioms (Kolmogorov), blond assignment of non-negative numbers to a set of states, Mother 15 15 sum 1 (100%). blond State has several variables: Mother 15 55 product space. n.b. With n binary variables: 2n. Multi-valued variables. Expert Systems 8 4 Conditional Probability: Using evidence Blond Not blond • First table: Probability for any woman to 30 70 deliver blond baby • Second table: Blond Not Describes for blond and non- blond blond mothers separately Mother 15 15 • Third table: blond Describe only for blond mother Mother 15 55 n.b. Row is rescaled with its weight; Def. conditional probability: Blond Not Pr(A|B) = Pr( A & B ) / Pr(B) blond Rewrite: Mother 50 50 Pr(A & B) = Pr(B) x Pr(A | B) blond Expert Systems 8 5 Dependence and Independence Blond Not • The prob for a blond child are blond 30%, but larger for a blond Mother 15 15 mother and smaller for a blond non-blond mother. • The prob for a boy are 50%, Mother 15 55 also for blond mothers, and n.b. also for non-blond mothers. Boy Girl Def.: A and B are independent: Pr(A|B) = Pr(A) Mother 15 15 blond Exercise: Show that Mother 35 35 Pr(A|B) = Pr(A) n.b. is equivalent to Pr(B|A) = Pr(B) Boy Girl (aka B and A are independent). Mother 50 50 blond Expert Systems 8 6 Bayes Rule: from data to hypothesis 4+1 Other • Classical Probability Theory: 0.0144 is the relative weight BagA 0.0144 0.986 of 4+1 in the ROW of BagA. • Bayesian Theory describes BagB 0.396 0.604 the distribution over the column of 4+1. Other Bayes’ Rule: • Observe that Pr(A & B) = Pr(A) x Pr(B|A) Classical statistics: = Pr(B) x Pr(A|B) ROW distribution • Conclude Bayes’ Rule: P( B | A) P( A) P( A | B) Bayesian statistic: P( B) COLUMN distr. Expert Systems 8 7 Reasons for Dependence 1: Causality • Dependency: P(B|A) ≠ P(B) Alternative explanation: • Positive Correlation: > B causes A. • Negative correlation: < In the same example: P(party) = 50% Possible explanation: P(party | h.a.) = 83% A causes B. P(party | no h.a.) = 48% Example: P(headache) = 6% P(ha | party) = 10% “Headaches make students go P(ha | ¬party) = 2% to parties.” In statistics, correlation has no h.a. no h.a. direction. party 5 45 no part 1 49 Expert Systems 8 8 Reasons for Dependence 2: Common cause 1. The student party may lead 2. Table of headache and to headache and is costly money: (money versus broke): h.a. no h.a. h.a. no h.a. money 3 67 mon-br broke 3 27 party 5 45 2-3 18-27 Pr(broke) = 30% no part 1 49 Pr(broke | h.a.) = 50% 1-0 49-0 3. Table of headache and money for party attendants: h.a. no h.a. This dependency disappears if the common cause variable is money 2 18 known broke 3 27 Expert Systems 8 9 Reasons for Dependence 3: Common effect A and B are independent: Their combination stimulates C; for instances satisfying C: (#C) A non A A non A B 40 (14) 40 (4) B 14 4 non B 10 (1) 10 (1) non B 1 1 Pr(B) = 80% Pr(B) = 90% Pr(B|A) = 80% Pr(B|A) = 93%, Pr(B|¬A)=80% B and A are independent. This dependency appears if the common effect variable is known Expert Systems 8 10 Part II: Bayesian Networks Pr - pa 50% pa • Probabilistic Graphical Model • Probabilistic Network • Bayesian Network • Belief Network ha br Consists of: Pr pa ¬pa Pr pa ¬pa • Variables (n) • Domains (here binary) ha 10% 2% br 40% 0% • Acyclic arc set, modeling the statistical influences A B • Per variable V (indegree k): Pr(V | E), for 2k cases of E. C Information in node: exponential in indegree. Pr A,B A,¬B ¬A,B ¬A,¬B C 56% 10% 10% 10% Expert Systems 8 11 The Bayesian Network Model Closed World Assumption • Rule based: Direction of arcs and correlation IF x attends party THEN x has headache Pr - ha WITH cf = .10 ha 6% What if x didn’t attend? • Bayesian model: Pr - pa pa pa 50% Pr ha ¬ha pa 83% 48% ha Pr pa ¬pa ha 10% 2% 1. BN does not necessarily model causality Pr(ha|¬pa) is included: claim 2. Built upon HE understanding all relevant info is modeled of relationships; often causal Expert Systems 8 12 A little theorem • A Bayesian network on n binary variables uniquely defines a probability distribution over the associated set of 2n states. • Distribution has 2n parameters (numbers in [0..1] with sum 1). • Typical network has in-degree 2 to 3: represented by 4n to 8n parameters (PIGLET!!). • Bayesian Networks are an efficient representation Expert Systems 8 13 The Utrecht DSS group • Initiated by Prof Linda van der Gaag from ~1990 • Focus: development of BN support tools • Use experience from building several actual BNs • Medical applications • Oesoca, ~40 nodes. • Courses: Probabilistic Reasoning • Network Algoritms (Ma ACS). Expert Systems 8 14 How to obtain a BN model Describe Human Expert knowledge: Learn BN structure Metastatic Cancer may be detected by an automatically from increased level of serum calcium (SC). The data by means of Brain Tumor (BT) may be seen on a CT scan (CT). Severe headaches (SH) are indicative Data Mining for the presence of a brain tumor. Both a • Research of Carsten Brain tumor and an increased level of serum • Models not intuitive calcium may bring the patient in a coma • Not considered XS (Co). • Helpful addition to Knowledge Acquisition mc bt ct from Human Expert • Master ACS. sc co sh Probabilities: Expert guess or statistical study Expert Systems 8 15 Inference in Bayesian Networks The probability of a state The marginal (overall) S = (v1, .. , vn): probability of each variable: Multiply Pr(vi | S) Pr - pa Pr(pa) = 50% pa 50% Pr(ha) = 6% Pr(br) = 20% ha br Pr pa ¬pa Pr pa ¬pa Sampling: Produce a series of cases, distributed according ha 10% 2% br 40% 0% to the probability distribution implicit in the BN Pr (pa, ¬ha, ¬br) = 0.50 * 0.90 * 0.60 = 0.27 Expert Systems 8 16 Consultation: Entering Evidence Consultation applies the BN knowledge to a specific case • Known variable values can be entered into the network • Probability tables for all nodes are updated • Obtain (sth like) new BN modeling the conditional distribution • Again, show distributions and state probabilities • Backward and Forward propagation Expert Systems 8 17 Test Selection (Danielle) • In consultation, enter data until goal variable is known with sufficient probability. • Data items are obtained at specific cost. • Data items influence the distribution of the goal. Problem: • Given the current state of the consultation, find out what is the best variable to test next. Started CS study 1996, PhD Thesis defense Oct 2005 Expert Systems 8 18 Complexity of Network Design (Johan) • Boolean formula can be coded into a BN • SAT-problems reformulated as BN problems • Monotonicity, Kth MPE, Inference • Complete for PP^PP^NP • Started PhD Oct 2005 • Graduated Oct 2009 Expert Systems 8 19 Some more work done in Linda’s DSS group • Sensitivity Analysis: Numerical parameters in the BN may be inaccurate; how does this influence the consultation outcome? • More efficient inferencing: Inferencing is costly, especially in the presence of • Cycles (NB.: There are no directed cycles!) • Nodes with a high in-degree Approximate reasoning, network decompositions, … • Writing a program tool: Dazzle Expert Systems 8 20 Part III: In the Courtroom What happens in a trial? • Prosecutor and Defense collect information • Judge decides if there is sufficient evidence that P( B | A) P( A) person is guilty P( A | B) P( B) Forensic tests are far more conclusive than medical ones but still probabilistic in nature! Pr(symptom|sick) = 80% Pr(trace|innocent) = 0.01% Jenneke IJzerman, Bayesiaanse Tempting to forget statistics. Statistiek in de Rechtspraak, VU Need a priori probabilities. Amsterdam, September 2004. Expert Systems 8 21 Prosecutor’s Fallacy The story: The analysis • A DNA sample was taken • The prosecutor confuses from the crime site Pr(inn | evid) (a) • Probability of a match of Pr(evid | inn) (b) samples of different people • Forensic experts can only is 1 in 10,000 shed light on (b) • 20,000 inhabitants are • The Judge must find (a); sampled a priori probabilities are • John’s DNA matches the needed!! (Bayes) sample • Dangerous to convict on DNA samples alone • Prosecutor: chances that John is innocent is 1 in 10,000 • Pr(innocent match) = 86% • Judge convicts John • Pr(1 such match) = 27% Expert Systems 8 22 Defender’s Fallacy The story Implicit assumptions: • Town has 100,001 people • Offender is from town. • We expect 11 to match • Equal a priori probability for (1 guilty plus 10 innocent) each inhabitant • Probability that John is guilty is 9%. It is necessary to take other circumstances into account; • John must be released why was John prosecuted and what other evidence exists? Conclusions: • PF: it is necessary to take Bayes and a priori probs into account • DF: estimating the a prioris is crucial for the outcome Expert Systems 8 23 Experts’ and Judge’s task IJzerman’s ideas about trial: Is this realistic? 1. Forensic Expert may not verslagen van deskundigen 1. Avoid confusing Pr(G|E) and claim a priori or a posteriori Pr(E|G), a good idea behelzende hun gevoelen probabilities (Dutch Penalty 2. A priori’s are extremely betreffende hetgeen hunne Code, 344-1.4) important; this almost pre- wetenschap hen leert omtrent 2. Judge must set a priori determines the verdict datgene wat aan hun oordeel 3. Judge must compute a onderworpen done? Bayesian 3. How is this is posteriori, based on Network designed and statements of experts controlled by Judge? 4. Judge must have explicit 4. No judge will obey a threshold of probability for mathematical formula beyond reasonable doubt 5. Public agreement and 5. Threshold should be acceptance? explicitized in law. Expert Systems 8 24 Bayesian Alcoholism Test • Driving under influence of alcohol leads to a penalty • Administrative procedure may voiden licence • Judge must decide if the subject is an alcohol addict; incidental or regular (harmful) drinking • Psychiatrists advice the court by determining if drinking was incidental or regular • Goal HHAU: Harmful and Hazardous Alcohol Use • Probabilistically confirmed or denied by clinical tests • Bayesian Alcoholism Test: developed 1999-2004 by A. Korzec, Amsterdam. Expert Systems 8 25 Variables in Bayesian Alcoholism Test Hidden variables: • HHAU: alcoholism • Liver disease Observable causes: • Hepatitis risk • Social factors • BMI, diabetes Observable effects: • Skin color • Lab: blood, breadth • Level of Response • Smoking • CAGE questionnaire Expert Systems 8 26 Knowledge Elicitation for BAT Knowledge in the Network How it was obtained • Qualitative • Network structure?? - What variables are relevant IJzerman does not report - How do they interrelate about this • Probabilities • Quantitative - Literature studies: - A priori probabilities 40% of probabilities - Conditional probabilities - Expert opinions: for hidden diseases 60% of probabilities - Conditional probabilities for effects - Response of lab tests to hidden diseases Expert Systems 8 27 Consultation with BAT Enter evidence about subject: The network will return: • Clinical signs: • Probability that Subject has skin, smoking, LRA; HHAU CAGE. • Probabilities for liver disease • Lab results and diabetes • Social factors The responsible Human Medical Expert converts this probability to a YES/NO for the judge! (Interpretation phase) Knowing what the CAGE is used for may influence the answers that HME may take other data into the subject gives. account (rare disease). Expert Systems 8 28 Part IV: Bayes in the Field The Dazzle program • Tool for designing and analysing BN • Mouse-click the network; fill in the probabilities • Consult by evidence submission • Read posterior probabilities • Development 2004-2006 • Written in Haskell • Arjen van IJzendoorn, Martijn Schrage • www.cs.uu.nl/dazzle Expert Systems 8 29 Importance of a good model In 1998, Donna Anthony Prosecutor: (31) was convicted for The probability of two cot murdering her two children. deaths in one family is too She was in prison for seven small, unless the mother is years but claimed her guilty. children died of cot death. Expert Systems 8 30 The Evidence against Donna Anthony • BN with priors eliminates Prosecutor’s Fallacy • Enter the evidence: both children died • A priori probability is very small (1 in 1,000,000) • Dazzle establishes a 97.6% probability of guilt • Name of expert: Prof. Sir Roy Meadow (1933) • His testimony brought a dozen mothers in prison in a decade Expert Systems 8 31 A More Refined Model Allow for genetic or social circumstances for which parent is not liable. Expert Systems 8 32 The Evidence against Donna? Refined model: genetic defect is the most likely cause of repeated deaths Donna Anthony was released in 2005 after 7 years in prison 6/2005: Struck from GMC register 7/2005: Appeal by Meadow 2/2006: Granted; otherwise experts refuse witnessing Expert Systems 8 33 Classical Swine Fever, Petra Geenen • Swine Fever is a costly disease • Development 2004/5 • 42 vars, 80 arcs • 2454 Prs, but many are 0. • Pig/herd level • Prior extremely small • Probability elicitation with questionnaire Expert Systems 8 34 Conclusions • Mathematically sound model to reason with uncertainty • Applicable to areas where knowledge is highly statistical • Acquisition: Instead of classical IF a THEN b (WITH c), obtain both Pr(b|a) and Pr(b|¬a) • More work but more powerful model • One formalism allows both diagnostic and prognostic reasoning • Danger: apparent exactness is deceiving • Disadvantage: Lack of explanation facilities (research); Model is quite transparant, but consultations are not; Design questions have high complexity (> NPC). • Increasing popularity, despite difficulty in building Expert Systems 8 35