Document Sample

Uncertainty Cholwich Nattee Sirindhorn International Institute of Technology Thammasat University Lecture 10: Uncertainty 1/41 Uncertainty Logical agents have limitations in handling uncertain knowledge. For example, consider a KB: ∀p Symptom(p, Toothache) ⇒ Disease(p, Cavity) ∨ Disease(p, GumDisease) ∨ . . . We cannot conclude that a patient with toothache has cavity. Three main reasons that makes logical agents fail: Laziness: too much work to construct the complete KB. Theoretical ignorance: no complete theory for the domain. Practical ignorance: not all necessary percepts can be checked. The best way to deal with them is to provide degree of belief based on probability theory. Lecture 10: Uncertainty 2/41 Probability and Degree of Belief The probability is used to denote the degree of belief not the degree of truth. For example, A probability of 0.8 for having cavity in a patient with toothache means we believe that there is an 80% chance that the patient with toothache has a cavity. The probability is about the agent’s beliefs, not directly about the world. For example, the agent draws a card from a shuﬄed pack. Before looking at the card, we have a probability of 1/52 for being the ace of spades. After looking at the card, an appropriate probability will be just 0 or 1. An assignment of probability just shows entailment status with the currently available knowledge base. Lecture 10: Uncertainty 3/41 Uncertainty: Example [1] Let At be an action leaving for airport t minutes before ﬂight. Will At get me there on time? Problems partial observability (road state, other drivers’s plan, etc.) noisy sensors uncertainty in action outcomes (ﬂat tire, etc.) immense complexity of modeling and predicting traﬃc Using a logical approach: 1. Risks falsehood: “A25 will get me there on time”, or 2. Leads to too weak conclusions for decision making: “A25 will get me there on time if there is no accident on the bridge, and it does not rain and my tires remain intact etc etc.” Lecture 10: Uncertainty 4/41 Uncertainty: Example [2] Probabilities are used to relate propositions to the current KB, for example, P(A25 gets me there on time|no accidents) = 0.06 Probabilities change with new evidence, P(A25 gets me there on time|no accidents, 5 a.m.) = 0.15 Lecture 10: Uncertainty 5/41 Making Decisions under Uncertainty Suppose I believe the following: P(A25 gets me there on time| . . . ) = 0.04 P(A90 gets me there on time| . . . ) = 0.70 P(A120 gets me there on time| . . . ) = 0.95 P(A1440 gets me there on time| . . . ) = 0.9999 Which action to choose? It depends on my preferences for missing ﬂight vs. airport cuisine, etc. Utility theory is used to represent and infer preferences. Decision theory = utility theory + probability theory Lecture 10: Uncertainty 6/41 Probability Basics Let Ω be the sample space, e.g., 6 possible rolls of a die. ω ∈ Ω is a sample point/possible world/atomic event. A probability model is a sample space with an assignment P(ω) for every ω ∈ Ω 0 ≤ P(ω) ≤ 1 P(ω) = 1 ω E.g., P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6 An event A is any subset of Ω P(A) = P(ω) ω∈A E.g. P(die roll < 4) = P(1) + P(2) + P(3) = 0.5 Lecture 10: Uncertainty 7/41 Random Variables The random variable is used to refer to a part of the world whose status is initially unknown. For example, Cavity refers to whether the tooth has a cavity. Each random variable has a domain of values, e.g. the domain of Cavity might be true, false Boolean random variables have the domain true, false . For example, Cavity = true is also written cavity, and we often write ¬cavity for Cavity = false Discrete random variables take on values from a countable domain. For example, the domain of Weather might be sunny, rainy, cloudy, snow Continuous random variables take on values from the real numbers. For example, Temp = 21.6 or Temp < 22.0 Lecture 10: Uncertainty 8/41 Atomic Event Atomic event is a complete speciﬁcation of the state of the world. For example, if the world composes of only two variables Cavity and Toothache, then there are four distinct atomic events: Cavity = true ∧ Toothache = true Cavity = true ∧ Toothache = false Cavity = false ∧ Toothache = true Cavity = false ∧ Toothache = false Lecture 10: Uncertainty 9/41 Propositions [1] A proposition can be thought as the event where the proposition is true. For example, given Boolean random variables A and B: event a = set of sample points where A(ω) = true event a ∧ b = points where A(ω) = true and B(ω) = true With Boolean variables, sample point = propositional logic model, e.g. A = true, or a ∧ ¬b Proposition = disjunction of atomic events in which it is true, e.g. P(a ∨ b) = P(¬a ∧ b) + P(a ∧ ¬b) + P(a ∧ b) Lecture 10: Uncertainty 10/41 Propositions [2] From the deﬁnitions, the logically related events must have related probabilities. For example, P(a ∨ b) = P(a) + P(b) − P(a ∧ b) A B A B Lecture 10: Uncertainty 11/41 Prior Probability Prior or unconditional probabilities of propositions correspond to belief prior to arrival of any evidence. E.g., P(cavity) = 0.1, and P(Weather = sunny) = 0.72 Probability distribution gives values for all possible assignments: P(Weather) = 0.72, 0.1, 0.08, 0.1 Joint probability distribution gives the probability of every atomic event. E.g., P(Weather, Cavity) = a 4 × 2 matrix of values Weather sunny rain cloudy snow Cavity = true 0.144 0.02 0.016 0.02 Cavity = false 0.576 0.08 0.064 0.08 Lecture 10: Uncertainty 12/41 Probability for Continuous Variables Probability distribution is expressed as a parameterized function of value, e.g. P(X = x) = U [18, 26](x) = uniform density between 18 and 26 0.125 18 dx 26 P(X = 20.5) = 0.125 really means lim P(20.5 ≤ X ≤ 20.5 + dx)/dx = 0.125 dx→0 Lecture 10: Uncertainty 13/41 Gaussian Density 2 2 P(X = x) = √ 1 e −(x−µ) /2σ 2πσ 0 Lecture 10: Uncertainty 14/41 Conditional Probability [1] Conditional or posterior probabilities P(a|b), where a and b are any propositions. This is read as “the probability of a, given that all we know is b.” E.g. P(cavity|toothache) = 0.8 Notation for conditional distribution, e.g., P(Cavity|Toothache) New evidence may change the probability, e.g., P(cavity|toothache, cavity) = 1 Anyway, new evidence may be irrelevant, allowing simpliﬁcation, e.g., P(cavity|toothache, thaiWins) = P(cavity|toothache) = 0.8 Lecture 10: Uncertainty 15/41 Conditional Probability [2] Deﬁnition of conditional probability: P(a ∧ b) P(a|b) = if P(b) = 0 P(b) Alternative formulation: P(a ∧ b) = P(a|b)P(b) = P(b|a)P(a) For conditional distribution, P(Weather, Cavity) = P(Weather|Cavity)P(Cavity) (View as a 4 × 2 set of equations, not matrix mult) Lecture 10: Uncertainty 16/41 Inference by Enumeration [1] Inference by Enumeration is a simple method for probabilistic inference. It is the computation from observed evidence of posterior probabilities from query propositions. Fully joint distribution is used as the knowledge base. For example, toothache toothache L catch catch catch catch L L cavity .108 .012 .072 .008 cavity .016 .064 .144 .576 L For any proposition φ, sum the atomic events where it is true: P(φ) = P(ω) ω:ω|=φ Lecture 10: Uncertainty 17/41 Inference by Enumeration [2] For example, P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 toothache toothache L catch catch catch catch L L cavity .108 .012 .072 .008 cavity .016 .064 .144 .576 L Lecture 10: Uncertainty 18/41 Inference by Enumeration [3] P(toothache ∨ cavity) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28 toothache toothache L catch catch catch catch L L cavity .108 .012 .072 .008 cavity .016 .064 .144 .576 L Lecture 10: Uncertainty 19/41 Inference by Enumeration [4] P(¬cavity ∧ toothache) P(¬cavity|toothache) = P(toothache) 0.016 + 0.064 = 0.108 + 0.012 + 0.016 + 0.064 = 0.4 toothache toothache L catch catch catch catch L L cavity .108 .012 .072 .008 cavity .016 .064 .144 .576 L Lecture 10: Uncertainty 20/41 Inference by Enumeration [5] P(Cavity|toothache) = αP(Cavity, toothache) = α[P(Cavity, toothache, catch) + P(Cavity, toothache, ¬catch)] = α[ 0.108, 0.016 + 0.012 + 0.064 ] = α 0.12, 0.08 = 0.6, 0.4 toothache toothache L catch L catch catch catch L cavity .108 .012 .072 .008 cavity .016 .064 .144 .576 L Lecture 10: Uncertainty 21/41 Inference by Enumeration [6] General idea is to compute distribution on query variable by ﬁxing evidence variables and summing over hidden variables. P(X |e) = αP(X , e) = α P(X , e, y) y where, X is the query variable. e is the observed values of evidence variables. y is the remaining unobserved variables. Lecture 10: Uncertainty 22/41 Exercises From the given fully joint distribution, compute the following probabilities, and probability distributions. disease ¬disease TestA = low TestA = high TestA = low TestA = high TestB = low 0.10 0.07 0.07 0.03 TestB = norm 0.03 0.07 0.20 0.07 TestB = high 0.17 0.13 0.03 0.03 1. P(disease ∧ TestB = low ∧ TestA = high) 2. P(disease ∨ TestA = low) 3. P(TestA = high ⇒ disease) 4. P(TestA = high|TestB = low, disease) 5. P(Disease|TestA = high) Lecture 10: Uncertainty 23/41 Absolute Independence Variables A and B are independent iﬀ P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A)P(B) For example, Cavity Cavity decomposes into Toothache Catch Toothache Catch Weather Weather P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity)P(Weather) Number of entries reduce from 32 to 12. Absolute independence is powerful but very rare Dentistry is a large ﬁeld Lecture 10: Uncertainty 24/41 Conditional Independence [1] If a patient has a cavity, the probability that the probe catches in it does not depend on whether the patient has a toothache: P(catch|toothache, cavity) = P(catch|cavity) The same independence holds if the patient does not have a cavity. P(catch|toothache, ¬cavity) = P(catch|¬cavity) We can say that Catch is conditionally independent of Toothache given Cavity: P(Catch|Toothache, Cavity) = P(Catch|Cavity) Lecture 10: Uncertainty 25/41 Conditional Independence [2] Two variables: X and Y are conditionally independent given Z , iﬀ P(X |Y , Z ) = P(X |Z ) P(Y |X , Z ) = P(Y |Z ) P(X , Y |Z ) = P(X |Z )P(Y |Z ) Lecture 10: Uncertainty 26/41 Conditional Independence [3] The fully joint distribution, P(Toothache, Catch, Cavity) has 23 − 1 = 7 independence entries. We can write the distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache|Catch, Cavity)P(Catch, Cavity) = P(Toothache|Catch, Cavity)P(Catch|Cavity)P(Cavity) = P(Toothache|Cavity)P(Catch|Cavity)P(Cavity) This requires only 2 + 2 + 1 = 5 independent entries. Knowing P(toothache|cavity) and P(toothache|¬cavity) is enough for P(Toothache|Cavity), and so on Lecture 10: Uncertainty 27/41 Bayes’ Rule From the product rule, P(a ∧ b) = P(a|b)P(b) = P(b|a)P(a) Thus, we have P(a|b)P(b) P(b|a) = P(a) This is known as Bayes’ rule In more general case, we have a set of questions: P(X |Y )P(Y ) P(Y |X ) = P(X ) Lecture 10: Uncertainty 28/41 Bayes’ Rule: Normalization Bayes’ rule can be written: P(X |Y )P(Y ) P(Y |X ) = i P(X |Y = yi )P(Y = yi ) More generally, P(Y |X ) = αP(X |Y )P(Y ) Lecture 10: Uncertainty 29/41 Applying Bayes’ rule: Example [1] A doctor knows that the disease meningitis causes the patient to have a stiﬀ neck, 50% of the time. The doctor also knows some unconditional facts: the prior probability that a patient has meningitis is 1/50000, and the prior probability that any patient has a stiﬀ neck is 1/20. What is the probability that the patient who has a stiﬀ neck will has meningitis? Lecture 10: Uncertainty 30/41 Appling Bayes’ rule: Example [2] Let s be the proposition that the patient has a stiﬀ neck, and m be the proposition that the patient has meningitis, we have P(s|m) = 0.5 P(m) = 1/50000 P(s) = 1/20 Thus, P(s|m)P(m) 0.5 × 1/50000 P(m|s) = = = 0.0002 P(s) 1/20 Lecture 10: Uncertainty 31/41 Applying Bayes’ rule: Example [3] Using Bayes’ rule and normalization, we have P(M |s) = α P(s|m)P(m), P(s|¬m)P(¬m) In general, we have P(Y |X ) = αP(X |Y )P(Y ) Lecture 10: Uncertainty 32/41 Combining Evidence: Example [1] When we want to combine several pieces of information: P(T , Ct|Cv)P(Ct) P(Cv|T , Ct) = P(T , Ct) However, Catch and Toothache are conditionally independent given Cavity, we have P(T |Cv)P(Ct|Cv)P(Cv) P(Cv|T , Ct) = P(T , Ct) We can combine each evidence sequentially. Lecture 10: Uncertainty 33/41 Naïve Bayes Model P(Cause, Eﬀect 1 , . . . , Eﬀect n ) = P(Cause) P(Eﬀect i |Cause) i Cause Effect Effect .... Effect 1 2 n Lecture 10: Uncertainty 34/41 Exercises After your yearly checkup, the doctor has bad news and good news. The bad news is that you tested positive for a serious disease and that test is 99% accurate. The good news is that this is a rare disease, striking only 1 in 10,000 people of your age. What are the chances that you actually have the disease?1 In a dishonest casino, one die out of 100 dies is loaded to make 6 come up 50% of the time. If someone rolls three 6’s in a row, what is the probability that the die is loaded?2 1 AIMA Execises 13.8 2 http://www.mscs.mu.edu/˜cstruble/class/cosc159/spring2003/notes/ Lecture 10: Uncertainty 35/41 Application to Data Mining [1] Classiﬁcation problem aims to predict the class of object from given set of evidence. For example, the credit card company want to predict customers’ credit risk (true or false) from their applications composing of several attributes. We need to compute P(Class|E = e) where Class is a set of classes, and E is the evidence we want to predict. Then, we select c ∈ Class with the highest probability. cpredicted = argmax P(Class = c|E = e) c P(E = e|Class = c)P(Class = c) = argmax c P(E = e) = argmax P(E = e|Class = e)P(Class = c) c Lecture 10: Uncertainty 36/41 Application to Data Mining [2] By gathering statistical data, we can estimate the probability of evidence in ease class, i.e., we know P(E = e|Class = c), and P(Class = c) Generally speaking, the evidence composes of several attributes: E = e1 , e2 , e3 , . . . . To simplify computation, the attributes of E are assumed to be conditionally independent given Class = C . Thus, k P(E = e|Class = c) = P(Ei = ei |Class = c) i=1 This data mining technique is called “Naïve Bayesian Classiﬁcation” Lecture 10: Uncertainty 37/41 Naïve Bayesian Classiﬁcation: Example [1] ID Credit History Debt Collateral Income Credit Risk? 1 bad high none 0-15k high 2 unknown high none 15-35k high 3 unknown low none 15-35k moderate 4 unknown low none 0-15k high 5 unknown low none >35k low 6 unknown low adequate >35k low 7 bad low none 0-15k high 8 bad low adequate >35k moderate 9 good low none >35k low 10 good high adequate >35k low 11 good high none 0-15k high 12 good high none 15-35k moderate 13 good high none >35k low 14 bad high none 15-35k high Source: http://www.mscs.mu.edu/~cstruble/class/cosc159/spring2003/notes/ Lecture 10: Uncertainty 38/41 Naïve Bayesian Classiﬁcation: Example [2] What is the credit risk of the following customer? e = bad, low, none, 0-15k To predict the credit risk, we need to select the most suitable class: cpredict = argmax P(E = e|Class = c)P(Class = c) c∈{low,mod,high} Lecture 10: Uncertainty 39/41 Naïve Bayesian Classiﬁcation: Example [3] Since we have three classes, we need to compute three times for each case: 1. Case: Class = low P(E = e|Class = low)P(Class = low) = ( P(Ei = ei |Class = low))P(Class = low) i = P(History = Bad|Class = low) × P(Debt = bad|Class = low P(Collateral = none|Class = low) × P(Income = 0-15k|Class = low) × P(Class = low) 0 3 3 0 5 ≈ × × × × = 0.0 5 5 5 5 14 Lecture 10: Uncertainty 40/41 Naïve Bayesian Classiﬁcation: Example [4] 2. Case: Class = moderate P(E = e|Class = moderate)P(Class = moderate) 1 2 2 0 3 ≈ × × × × = 0.0 3 3 3 3 14 3. Case: Class = high P(E = e|Class = moderate)P(Class = moderate) 3 2 6 4 6 ≈ × × × × ≈ 0.05 6 6 6 6 14 Thus, argmax P(E = e|Class = c)P(Class = c) = high c Lecture 10: Uncertainty 41/41

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 17 |

posted: | 11/22/2011 |

language: | English |

pages: | 21 |

OTHER DOCS BY liaoqinmei

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.