VIEWS: 3 PAGES: 38 POSTED ON: 9/3/2012 Public Domain
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 15, 5/25/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes 5/25/2005 EE562 Uncertainty & Bayesian Networks Chapter 13/14 5/25/2005 EE562 Outline • Inference • Independence and Bayes' Rule • Chapter 14 – Syntax – Semantics – Parameterized Distributions 5/25/2005 EE562 Homework • Last HW of the quarter • Due next Wed, June 1st, in class: – Chapter 13: 13.3, 13.7, 13.16 – Chapter 14: 14.2, 14.3, 14.10 5/25/2005 EE562 Inference by enumeration • Start with the joint probability distribution: • • For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω) • 5/25/2005 EE562 Inference by enumeration • Start with the joint probability distribution: • • For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω) • 5/25/2005 EE562 • P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 Inference by enumeration • Start with the joint probability distribution: • • For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω) • 5/25/2005 EE562 • P(toothache Ç cavity) = 0.108 + 0.012 + 0.016 + 0.064 = Inference by enumeration • Start with the joint probability distribution: • • Can also compute conditional probabilities: • P(cavity | toothache) = P(cavity toothache) P(toothache) = 0.016+0.064 0.108 + 0.012 + 0.016 + 0.064 5/25/2005 EE562 = 0.4 Normalization • Denominator can be viewed as a normalization constant α • P(Cavity | toothache) = α, P(Cavity,toothache) = α, [P(Cavity,toothache,catch) + P(Cavity,toothache, catch)] = α, [<0.108,0.016> + <0.012,0.064>] = α, <0.12,0.08> = <0.6,0.4> General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables 5/25/2005 EE562 Inference by enumeration, contd. Let X be all the variables. Typically, we are interested in the posterior joint distribution of the query variables Y given specific values e for the evidence variables E Let the hidden variables be H = X - Y - E Then the required summation of joint entries is done by summing out the hidden variables: P(Y | E = e) = αP(Y,E = e) = αΣhP(Y,E= e, H = h) • The terms in the summation are joint entries because Y, E and H together exhaust the set of random variables • • Obvious problems: • 5/25/2005 EE562 1. Worst-case time complexity O(dn) where d is the largest arity Independence • A and B are independent iff P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B) P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather) • 16 entries reduced to 10; for n independent biased coins, O(2n) →O(n) • • Absolute independence powerful but rare • • Dentistry is a large field with hundreds of variables, none of which are 5/25/2005independent. What to do? EE562 • Conditional independence • P(Toothache, Cavity, Catch) has 23 – 1 = 7 independent entries • • If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: • (1) P(catch | toothache, cavity) = P(catch | cavity) • The same independence holds if I haven't got a cavity: • (2) P(catch | toothache,cavity) = P(catch | cavity) • Catch is conditionally independent of Toothache given Cavity: • P(Catch | Toothache,Cavity) = P(Catch | Cavity) 5/25/2005 EE562 • Equivalent statements: Conditional independence contd. • Write out full joint distribution using chain rule: • P(Toothache, Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) I.e., 2 + 2 + 1 = 5 independent numbers • In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. 5/25/2005 EE562 • Bayes' Rule • Product rule P(ab) = P(a | b) P(b) = P(b | a) P(a) • Bayes' rule: P(a | b) = P(b | a) P(a) / P(b) • or in distribution form • P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y) • Useful for assessing diagnostic probability from causal probability: • – P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect) – 5/25/2005 EE562 – E.g., let M be meningitis, S be stiff neck: Bayes' Rule and conditional independence P(Cavity | toothache catch) = αP(toothache catch | Cavity) P(Cavity) = αP(toothache | Cavity) P(catch | Cavity) P(Cavity) • This is an example of a naïve Bayes model: • P(Cause,Effect1, … ,Effectn) = P(Cause) πiP(Effecti|Cause) 5/25/2005 EE562 Key Benefit • Probabilistic reasoning (using things like conditional probability, conditional independence, and Bayes’ rule) make it possible to make reasonable decisions amongst a set of actions, that otherwise (without probability, as in propositional or first order logic) we would have to resort to random guessing. • Example: Wumpus World 5/25/2005 EE562 Bayesian Networks Chapter 14 5/25/2005 EE562 Bayesian networks • A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions • Syntax: – a set of nodes, one per variable – – a directed, acyclic graph (link ≈ "directly influences") – a conditional distribution for each node given its parents: P (Xi | Parents (Xi)) • In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values 5/25/2005 EE562 Example • Topology of network encodes conditional independence assertions: • Weather is independent of the other variables • Toothache and Catch are conditionally independent given Cavity 5/25/2005 EE562 Example • I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? • Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls • Network topology reflects "causal" knowledge: – A burglar can set the alarm off – An earthquake can set the alarm off – The alarm can cause Mary to call – The alarm can cause John to call 5/25/2005 EE562 Example contd. 5/25/2005 EE562 Compactness • A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values • Each row requires one number p for Xi = true (the number for Xi = false is just 1-p) • If each variable has no more than k parents, the complete network requires O(n · 2k) numbers • I.e., grows linearly with n, vs. O(2n) for the full joint distribution • For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31) 5/25/2005 EE562 Semantics The full joint distribution is defined as the product of the local conditional distributions: n P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi)) e.g., P(j m a b e) = P (j | a) P (m | a) P (a | b, e) P (b) P (e) 5/25/2005 EE562 Local Semantics Local semantics: each node is conditionally independent of its nondescendants given its parents Thm: Local semantics global semantics 5/25/2005 EE562 Markov Blanket Each node is conditionally independent of all others given its “Markov blanket”, i.e., its parents, children, and children’s parents. 5/25/2005 EE562 Constructing Bayesian networks Key point: we need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics. • 1. Choose an ordering of variables X1, … ,Xn • 2. For i = 1 to n – add Xi to the network – – select parents from X1, … ,Xi-1 such that P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1) This choice of parents guarantees: P (X1, … ,Xn) = πi =1 P (Xi | X1, … , Xi-1) (chain rule) = πi =1P (Xi | Parents(Xi)) 5/25/2005 EE562 (by construction) Example • Suppose we choose the ordering M, J, A, B, E • P(J | M) = P(J)? 5/25/2005 EE562 Example • Suppose we choose the ordering M, J, A, B, E • P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? 5/25/2005 EE562 Example • Suppose we choose the ordering M, J, A, B, E • P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? P(B | A, J, M) = P(B)? 5/25/2005 EE562 Example • Suppose we choose the ordering M, J, A, B, E • P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No EE562 5/25/2005 P(E | B, A ,J, M) = P(E | A)? Example • Suppose we choose the ordering M, J, A, B, E • P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No EE562 5/25/2005 P(E | B, A ,J, M) = P(E | A)? No Example contd. • Deciding conditional independence is hard in noncausal directions • • (Causal models and conditional independence seem hardwired for humans!) • Assessing conditional probabilities is hard in non-causal directions. • • Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed 5/25/2005 EE562 • Example: car diagnosis • Initial evidence: car won’t start • Testable variables (green), “broken, so fix it” variables (orange) • Hidden variables (gray) ensure sparse structure, reduce parameters. 5/25/2005 EE562 Example: car insurance 5/25/2005 EE562 compact conditional dists. • CPT grows exponentially with number of parents • CPT becomes infinite with continuous-valued parent or child • Solution: canonical distributions that are defined compactly • Deterministic nodes are the simplest case: – X = f(Parents(X)), for some deterministic function f (could be logical form) • E.g., boolean functions – NorthAmerican Canadian Ç US Ç Mexican • E.g,. numerical relationships among continuous variables 5/25/2005 EE562 compact conditional dists. • “Noisy-Or” distributions model multiple interacting causes: – 1) Parents U1, …, Uk include all possible causes – 2) Independent failure probability qi for each cause alone – : X ´ U1 Æ U2 Æ … Æ Uk – P(X|U1, …, Uj, : Uj+1, …, : Uk ) = 1 - i=1j qi • Number of parameters is linear in number of parents. 5/25/2005 EE562 Hybrid (discrete+cont) networks • Discrete (Subsidy? and Buys?); continuous (Harvest and Cost) • Option 1: discretization – large errors and large CPTs • Option 2: finitely parameterized canonical families – Gaussians, Logistic Distributions (as used in Neural Networks) • Continuous variables, discrete+continuous parents (e.g., Cost) • Discrete variables, continuous parents (e.g., Buys?) 5/25/2005 EE562 Summary • Bayesian networks provide a natural representation for (causally induced) conditional independence • Topology + CPTs = compact representation of joint distribution • Generally easy for domain experts to construct • Take my Graphical Models class if more interested (much more theoretical depth) 5/25/2005 EE562