VIEWS: 3 PAGES: 32 POSTED ON: 8/11/2012
Uncertainty in AI (Preliminary to Bayesian Network) CS570 Lecture Notes by Jin Hyung Kim Computer Science Department, KAIST Uncertainty Motivation of Uncertainty modeling Characteristics of real-world applications Truth value is unknown Too complex to compute prior to make decision Source of Uncertainty Uncertainty arises because of both laziness and ignorance. It is inescapable in complex, dynamic, or inaccessible worlds. Cannot be explained by deterministic model Ex, decay of radioactive substances Don’t understand well Ex: disease transmit mechanism Partial Information Too complex to compute, but the detail is not needed Ex: coin tossing Types of Uncertainty Randomness Which side will be up if I toss a coin ? Vagueness Am I pretty ? Confidence How much are you confident on your decision ? One Formalism for all vs. Separate formalisms Representation + Computational Engine Combining several results to one Uncertainty Representation Binary Logic Multi-valued Logic Probability Theory Upper/Lower Probability Most likely Estimate Optimistic Estimate Possibility Theory Pessimistic Estimate in PERT Applications Decision making under uncertainty – Rational Agent Useful answer from uncertain, conflicting knowledge acquiring qualitative and quantitative relationships data fusion multiple experts’ opinion aggregation Wide range of Applications diagnose disease language understanding pattern recognition managerial decision making Handling Uncertain Knowledge Diagnosis Rule p Symptom(p, Toothache) Disease(p, Cavity) p Symptom(p, Toothache) Disease(p, Cavity) Disease(p,GumDisease) Disease(p, ImpactedWisdom)… Pr( Symptom | Disease) Causal Rule p Disease(p, Cavity) Symptom(p, Toothache) Not every Cavity causes toothache Pr (Disease | Symptom) Why First-order Logic Fails? Laziness : Too much works to prepare complete set of exceptionless rule, and too hard to use the enormous rules Theoretical ignorance : Medical science has no complete theory for the domain Practical ignorance : All the necessary tests cannot be run, even though we know all the rules Degree of Belief Agent can provide degree of belief for sentence Main tool : Probability theory assign a numerical degree of belief between 0 and 1 to sentences the way of summarizing the uncertainty that comes from laziness and ignorance Probability can be derived from statistical data Degree of Belief vs. Degree of Truth Degree of Belief The sentence itself is in fact either true or false Same ontological commitment as logic ; the facts either do or do not hold in the world Probability theory Degree of Truth (membership) Not a question of the external world Case of vagueness or uncertainty about the meaning of the linguistic term “tall”, “pretty” Fuzzy set theory, fuzzy logic Probabilistic Reasoning System Assign probability to a proposition based on the percepts that it has received to date Evidence : perception that an agent receives Probabilities can change when more evidence is acquired Prior / unconditional probability : no evidence at all Posterior / conditional probability : after evidence is obtained Uncertainty and Rational Decisions No plan can guarantee to achieve the goal To make choice, agent must have preferences between the different possible outcomes of various plans missing plane v.s. long waiting Utility theory to represent and reason with preferences Utility: the quality of being useful (degree of usefulness) Principle of Maximum Expected Utility Decision Theory = Probability Theory + Utility Theory An agent is rational if and only if it chooses the action that yields the highest expected utility, average over all possible outcomes of the action Prior Probability P(A): unconditional or prior probability that the proposition A is true No other information on the proposition P(Cavity) = 0.1 Proposition can include equality using random variable P(Weather = Sunny) = 0.7, P(Weather = Rain) = 0.2 P(Weather = Cloudy) = 0.08, P(Weather = Snow) = 0.02 Eachrandom variable X has domain of possible values <x1, x2, …, xn> Conditional Probability As soon as evidence concerning the previously unknown proposition making up the domain, prior probabilities are no longer applicable We use conditional or posterior probabilities P(A|B) : Probability of A given that all we know is B P(Cavity|Toothache) = 0.8 New information C is known, P(A|BC) Joint Probability Distribution as a Knowledge base Completely assigns probabilities to all propositions in the domain Jointprobability distribution P(X1, X2, …, Xn) assigns probabilities to all possible atomic events Probability of any event can be drivable Large table or high dimensional function Difficult to get & difficult to maintain Approximation by lower order probabilities Where Do Probabilities Come From ? Frequentist The numbers can come only from experiments Objectivist Probabilities arereal aspect of universe (propensities of objects to behave in certain way) Subjectivist Probabilities asa way of characterizing an agent’s beliefs, rather than having any external physical significance Elicitation from human expert Human as probability transducer Is Human Good ? Endless debate over source and status of probability number Where Do Probabilities Come From ? Probability thatthe sun will still exist tomorrow (question raised by Hume’s Inquiry) The probability is undefined, because there has never been an experiment that tested the existence of the sun tomorrow The probability is 1, because in all the experiments that have been done (on past days) the sun has existed. The probability is 1 - , where is the proportion of stars in the universe that go supernova and explode per day. The probability is (d+1)/(d+2), where d is the number of days that the sun has existed so far. (Laplace) The probability can be derived from the type, age, size, and temperature of the sun, even though we have never observed another star with those exact properties. Probability Elicitation Experiment : Show { red, green, yellow } {square, triangle, circle}s many times in random sequence and ask Pr( red | square ) Pr(a, b) vs Pr(a) & Pr(b|a) Pr( attribute | object ) vs Pr (object| attribute ) Pr( effect| cause ) vs Pr (cause| effect ) Pr( a) vs Pr(a | b, c, d) Human are inconsistent, Systemically biased, Anchored Tversky's Legacy scenarios by Tversky and Kahneman A taxi hit a pedestrian one night and fled the scene. The entire case against the taxi company rests on the evidence of one witness, an elderly man who saw the accident from his window some distance away. He says that he saw the pedestrian struck by a blue taxi. In trying to establish the case, the lawyer for the injured pedestrian establishes the following facts: only two taxi companies in town, 'Blue Cabs' and 'Black Cabs'. On the night in question, 85% of all taxis on the road were black and 15% were blue. The witness has demonstrated that he can successfully distinguish a blue taxi from a black taxi 80% of the time. If you were on the jury, how would you decide? Opaque Urn B A After observing 4 back and 2 white balls in that sequence, what is your belief that the balls from urn A ? Approximating High-Order Probabilities Pa (C1 ,, CK ) P(C1 ) P(C2 | C1 ) P(C3 | C1C2 )...P(Ck | C1C2 ...Ck 1 ) K P(C j | Ci1( j ) ), where 0 i1( j ) j j 1 Example: P(C1 ) P(C2 | C1 ) P(C3 | C1 )...P(Ck | Ck 1 ) Directed Tree representation of product approximation Root :unconditioned variable Directed arc AB as Pr(B|A) There are many product approximations by 2nd order probabilities Select “best” among those Preliminary of Chow Tree By Kullback-Leibler(KL) Divergence measure P(C ) D( P, Pa ) P(C ) log , ( 0) C Pa (C ) Mutual Information, I between X and Y I(X;Y) = H(Y) - H(Y|X) = H(X) + H(Y) - H(X,Y) p( xi , y j ) I ( X ;Y ) p( x , y ) log p( x ) p( y ) xi , y j i j i j Chow-Tree Algorithm P (C ) P (C ) D ( P, Pa ) P (C ) log P (C ) log K P(C C Pa (C ) C j | Ci ( j ) ) j 1 K P (C ) log P (C j | Ci ( j ) ) P (C ) log P (C ) j 1 C C K P (C j | Ci ( j ) ) P (C j ) P (C ) log H (C ) j 1 C P (C j ) K K I (C j ; Ci ( j ) ) H (C j ) H (C ) j 1 j 1 Weight each link with mutual information Select the maximum spanning tree as the best approximation Chow-Tree Algorithm Example Approximate Pr(a | ~b, c, d) from P(A,B) a ~a P(A,C) a ~a P(A,D) a ~a b 8/32 12/32 c 8/32 9/32 d 5/32 7/32 ~b 7/32 5/32 ~c 7/32 8/32 ~d 10/32 10/32 P(B,C) b ~b P(B,D) b ~b P(C,D) c ~c c 11/32 6/32 d 7/32 5/32 d 4/32 8/32 ~c 9/32 6/32 ~d 13/32 7/32 ~d 13/32 7/32 Other Uncertainty formalism Certainty factor in Rule-based (logical) system Rule with certainty factor If A then B (cf) where cf = (0, 1) Interpreted as the added belief ratio if A is confirmed fully. “smake cancer (0.7)” not= “smoke ~cancer(0.3)” Representing Ignorance Dempster-Shafer Theory Confidence as a probability interval Representation of vagueness : Fuzzy Set Representing Vagueness : Fuzzy Membership is 한 or 힐 ? Pretty girl Old man Very old man Fuzzy Sets Sets with fuzzy boundaries A = Set of tall people Crisp set A Fuzzy set A 1.0 1.0 .9 .5 Membership function 5’10’’ Heights 5’10’’ 6’2’’ Heights Membership Functions (MFs) Characteristics of MFs: Subjective measures Not probability functions MFs “tall” in Asia .8 .5 “tall” in the US “tall” in NBA .1 5’10’’ Heights Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan Fuzzy Sets Formal definition: A fuzzy set A in X is expressed as a set of ordered pairs: A {( x, A ( x ))| x X } Membership Universe or Fuzzy set function universe of discourse (MF) A fuzzy set is totally characterized by a membership function (MF). Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan Fuzzy Sets with Cont. Universes Fuzzy set B = “about 50 years old” X = Set of positive real numbers (continuous) B = {(x, mB(x)) | x in X} 1 B(x) x 50 2 1 10 Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan Fuzzy Partition Fuzzy partitions formed by the linguistic values “young”, “middle aged”, and “old”: lingmf.m Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan MF Terminology MF 1 .5 a 0 Core X Crossover points a - cut Support Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan Set-Theoretic Operations subset.m Excerpted from J.-S. Roger Jang (張智星)fuzsetop.m Hua Univ., Taiwan CS Dept., Tsing