Search problems

Document Sample
Search problems Powered By Docstoc
					Uncertainty in AI
   

Introduction Basic Probability Theory Probabilistic Reasoning Why should we use probability theory?
 Dutch Book Theorem

Sources of Uncertainty
Information is partial Information is not fully reliable. Representation language is inherently imprecise. Information comes from multiple sources and it is conflicting. Information is approximate Non-absolute cause-effect relationships exist

Basic Probability
Probability theory enables us to make rational decisions. Which mode of transportation is safer:
 

Car or Plane? What is the probability of an accident?

Basic Probability Theory
An experiment has a set of potential outcomes, e.g., throw a dice The sample space of an experiment is the set of all possible outcomes, e.g., {1, 2, 3, 4, 5, 6} An event is a subset of the sample space.


 

{3, 6}
even = {2, 4, 6} odd = {1, 3, 5}

Probability as Relative Frequency
An event has a probability. Consider a long sequence of experiments. If we look at the number of times a particular event occurs in that sequence, and compare it to the total number of experiments, we can compute a ratio. This ratio is one way of estimating the probability of the event. P(E) = (# of times E occurred)/(total # of trials)


100 attempts are made to swim a length in 30 secs. The swimmer succeeds on 20 occasions therefore the probability that a swimmer can complete the length in 30 secs is:
 20/100 = 0.2  Failure = 1-.2 or 0.8

The experiments, the sample space and the events must be defined clearly for probability to be meaningful

What is the probability of an accident?

Theoretical Probability
Principle of Indifference— Alternatives are always to be judged equiprobable if we have no reason to expect or prefer one over the other. Each outcome in the sample space is assigned equal probability. Example: throw a dice

P({1})=P({2})= ... =P({6})=1/6

Law of Large Numbers
As the number of experiments increases the relative frequency of an event more closely approximates the theoretical probability of the event.

if the theoretical assumptions hold. Draw parallel lines 1 inch apart on a plane Throw a 1-inch needle on the plane P( needle crossing a line )=2/π

Buffon’s Needle for Computing π
  

number of throws  2 number of crossings

Large Number Reveals Untruth in Assumptions
Results of 1,000,000 throws of a die Number 1 2 3 4 5 6 Fraction .155 .159 .164 .169 .174 .179

Axioms of Probability Theory
Suppose P(.) is a probability function, then
1. for any event E, 0≤P(E) ≤1. 2. P(S) = 1, where S is the sample space. 3. for any two mutually exclusive events E1 and E2, P(E1  E2) = P(E1) + P(E2)

Any function that satisfies the above three axioms is a probability function.

Joint Probability
Let A, B be two events, the joint probability of both A and B being true is denoted by P(A, B). Example: P(spade) is the probability of the top card being a spade. P(king) is the probability of the top card being a king. P(spade, king) is the probability of the top card being both a spade and a king, i.e., the king of spade. P(king, spade)=P(spade, king) ???

Properties of Probability
1. P(E) = 1– P(E)
2. If E1 and E2 are logically equivalent, then P(E1)=P(E2).
 

E1: Not all philosophers are more than six feet tall. E2: Some philosopher is not more that six feet tall. Then P(E1)=P(E2).

3. P(E1, E2)≤P(E1).

Conditional Probability
The probability of an event may change after knowing another event.
The probability of A given B is denoted by P(A|B). Example

P( W=space ) the probability of a randomly selected word from an English text is ‘space’ P( W=space | W’=outer) the probability of ‘space’ if the previous word is ‘outer’


A: the top card of a deck of poker cards is a king of spade P(A) = 1/52

However, if we know
B: the top card is a king

then, the probability of A given B is true is
P(A|B) = 1/4.

How to Compute P(A|B)?

N(A and B) P(A|B)= = N(B)

N(A and B) P(A, B) N = N(B) P(B) N


N(brown-cows) P(brown-cow) = N(cows) P(cow)

Business Students
Of 100 students completing a course, 20 were business major. Ten students received As in the course, and three of these were business majors., suppose A is the event that a randomly selected student got an A in the course, B is the event that a randomly selected event is a business major. What is the probability of A? What is the probability of A after knowing B is true?
B A 3 20 not B 80 7

Probabilistic Reasoning

What we know about a situation. What we want to conclude. P( Hypothesis | Evidence )



Credit Card Authorization
E is the data about the applicant's age, job, education, income, credit history, etc, H is the hypothesis that the credit card will provide positive return. The decision of whether to issue the credit card to the applicant is based on the probability P(H|E).

Medical Diagnosis
E is a set of symptoms, such as, coughing, sneezing, headache, ...

H is a disorder, e.g., common cold, SARS, flu.
The diagnosis problem is to find an H (disorder) such that P(H|E) is maximum.

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations. Please rank the following statements by their probability, using 1 for the most probable and 8 for the least probable. a. Linda is a teacher in elementary school. b. Linda works in a bookstore and takes yoga classes. c. Linda is active in the feminist movement. d. Linda is psychiatric social worker. e. Linda is a member of the League of Women Voters. f. Linda is a bank teller. g. Linda is an insurance salesperson. h. Linda is a bank teller and is active in the feminist movement.

A patient takes a lab test and the result comes back positive. The test has a false negative rate of 2% and false positive rate of 3%. Furthermore, 0.8% of the entire population have this cancer.

What is the probability of cancer if we know the test result is positive?

Bayes Theorem
If P(E2)>0, then P(E1|E2)=P(E2|E1)P(E1)/P(E2)
This can be derived from the definition of conditional probability.

The Three-Card Problem
Three cards are in a hat. One is red on both sides (the red-red card). One is white on both sides (the whitewhite card). One is red on one side and white on the other (the red-white card). A single card is drawn randomly and tossed into the air.
a. What is the probability that the red-red card was drawn? (RR) b. What is the probability that the drawn cards lands with a white side up? (W-up) c. What is the probability that the red-red card was not drawn, assuming that the drawn card lands with the a red side up. (not-RR|R-up)

Fair Bets
A bet is fair to an individual I if, according to the individual's probability assessment, the bet will break even in the long run. The following three bet are fair :
Bet (a): Win $4.20 if RR;

lose $2.10
otherwise. [since you believe P(RR)=1/3] Bet (b): Win $2.00 if W-up; lose $2.00 otherwise. [since you believe P(W-up)=1/2] Bet (c): Win $4.00 if R-up and not-RR; lose $4.00 if R-up and RR; neither win nor lose if not-R-up. [since you believe P(not-RR|R-up)=1/2]

Dutch Book
The bets that you accepted have an interesting property:
No matter what card is drawn in the three-card problem, and no matter how it lands, you are guaranteed to lose money.

This is called a Dutch Book

there are three possible outcomes
1. Some card other than red-red is drawn, and it lands with white side up. That is, W-up and not-RR 2. Some card other than red-red is drawn, and it lands with a red side up. That is, R-up and not-RR. 3. The red-red card is drawn, and it lands (of course) with a red side up. That is, R-up and RR.

1 a. –$2.10

2 –$2.10

3 +$4.20

c. total

±$0.00 –$0.10

+$4.00 –$0.10

–$4.00 –$1.80

The Dutch Book Theorem
Suppose that an individual I is willing to accept any bet that is fair for I. Then a Dutch book can be made against I if and only if I's assessment of probability violates Bayesian axiomatization.

Independence: Intuition
Events are independent if one has nothing whatever to do with others. Therefore, for two independent events, knowing one happening does change the probability of the other event happening.


one toss of coin is independent of another coin (assuming it is a regular coin). price of tea in England is independent of the result of general election in Canada.

Independent or Dependent?
Getting cold and getting cat-allergy Mile Per Gallon and acceleration.

Size of a person’s vocabulary the person’s shoe size.

Independence: Definition
Events A and B are independent iff: P(A, B) = P(A) x P(B)
which is equivalent to P(A|B) = P(A) and P(B|A) = P(B) when P(A, B) >0.

T1: the first toss is a head.
T2: the second toss is a tail. P(T2|T1) = P(T2)

Conditional Independence
Dependent events can become independent given certain other events.
 

Size of shoe Age Size of vocabulary

Two events A, B are conditionally independent given a third event C iff
P(A|B, C) = P(A|C)

Conditional Independence: Definition
Let E1 and E2 be two events, they are conditionally independent given E iff
P(E1|E, E2)=P(E1|E),

that is the probability of E1 is not changed after knowing E2, given E is true. Equivalent formulations:
P(E1, E2|E)=P(E1|E) P(E2|E) P(E2|E, E1)=P(E2|E)

Example: Play Tennis?
Outlook sunny sunny overcast rain rain rain overcast sunny sunny rain sunny overcast overcast rain Temperature hot hot hot mild cool cool cool mild cool mild mild mild hot mild Humidity high high high high normal normal normal high normal normal normal high normal high W indy false true false false false true true false false false true true false true Class − − + + + − + − + + + + + −

Predict playing tennis when <sunny, cool, high, strong> What probability should be used to make the prediction? How to compute the probability?

Probabilities of Individual Attributes
Given the training set, we can compute the probabilities
Outlook sunny overcast rain Tempreature hot mild cool + 2/9 4/9 3/9 2/9 4/9 3/9 − 3/5 0 2/5 2/5 2/5 1/5 Humidity + − high 3/9 4/5 normal 6/9 1/5 Windy true false

3/9 3/5 6/9 2/5

P(+) = 9/14 P(−) = 5/14

Naïve Bayes Method
Knowledge Base contains
  

A set of hypotheses A set of evidences Probability of an evidence given a hypothesis A sub set of the evidences known to be present in a situation the hypothesis with the highest posterior probability: P(H|E1, E2, …, Ek).
 The probability itself does not matter so much.



Naïve Bayes Method

Hypotheses are exhaustive and mutually exclusive Evidences are conditionally independent given a hypothesis
 P(E1, E2,…, Ek|H) = P(E1|H)…P(Ek|H)  P(H | E1, E2,…, Ek)  H1 v H2 v … v Hk  ¬ (Hi ^ Hj) for any i≠j


= P(E1, E2,…, Ek, H)/P(E1, E2,…, Ek) = P(E1, E2,…, Ek|H)P(H)/P(E1, E2,…, Ek)

Naïve Bayes Method
The goal is to find H that maximize P(H|E1, E2,…, Ek) Since P(H|E1, E2,…, Ek) = P(E1, E2,…, Ek|H)P(H)/P(E1, E2,…, Ek) and P(E1, E2,…, Ek) is the same for different hypotheses, Maximizing P(H|E1, E2,…, Ek) is equivalent to maximizing P(E1, E2,…, Ek|H)P(H)= P(E1|H)…P(Ek|H)P(H) Naïve Bayes Method

Find a hypothesis that maximizes P(E1|H)…P(Ek|H)P(H)

Example: Play Tennis
P(+| sunny, cool, high, strong) vs. P(−| sunny, cool, high, strong) P(sunny|+)P(cool|+)P(high|+)P(strong|+)P(+) vs. P(sunny|−)P(cool|−)P(high|−)P(strong|−)P(−)
Outlook sunny overcast rain Tempreature hot mild cool + 2/9 4/9 3/9 2/9 4/9 3/9 − 3/5 0 2/5 2/5 2/5 1/5 Humidity + high 3/9 normal 6/9 Windy true false − 4/5 1/5
P(+) = 9/14 P(−) = 5/14

3/9 6/9

3/5 2/5

Application: Spam Detection

Dear sir, We want to transfer to overseas ($ 126,000.000.00 USD) One hundred and Twenty six million United States Dollars) from a Bank in Africa, I want to ask you to quietly look for a reliable and honest person who will be capable and fit to provide either an existing ……

Legitimate email

Ham: for lack of better name.

Hypotheses: {Spam, Ham} Evidence: a document

The document is treated as a set (or bag) of words


 The prior probability of an e-mail message being a spam.  How to estimate this probability?


 the probability that a word is w if we know w is chosen

from a spam.  How to estimate this probability?

Limitations of Naïve Bayesian
Cannot handle hypotheses of composite hypotheses well

 

Suppose H1 ,..., H n are independent of each other Consider a composite hypothesis H1 ^ H 2 How to compute the posterior probability
P ( H1 ^ H 2 | E1 ,..., E l ) ?

Using the Bayes’ Theorem
P ( H1 ^ H 2 | E1 ,..., E l )  P ( E1 ,... E l | H1 ^ H 2 ) P ( H1 ^ H 2 ) P ( E1 ,... E l )

P ( E1 ,...El | H1 ^ H 2 )   lj 1 P ( E j | H1 ^ H 2 ) assuming E j are independent, given H1 ^ H 2
P( H1 ^ H 2 )  P( H1 ) P( H 2 ) because they are independent
How to compute P ( E j | H1 ^ H 2 ) ?

Assuming H1 ,..., H n are independent, given E1 ,..., E l ? P ( H1 ^ H 2 | E1 ,..., E l )  P ( H1 | E1 ,..., E l ) P ( H 2 | E1 ,..., E l )

but this is a very unreasonable assumption
E: earth quake A: alarm set off B: burglar E and B are independent But when A is given, they are (adversely) dependent because they become competitors to explain A P(B|A, E) <<P(B|A) E explains away of A

Need a better representation and a better assumption

Cannot handle causal chaining

Ex. A: weather of the year B: cotton production of the year C: cotton price of next year  Observed: A influences C  The influence is not direct (A -> B -> C) P(C|B, A) = P(C|B): instantiation of B blocks influence of A on C

Basics of Probability Theory
   

Probabilistic Reasoning

Experiment, sample space, events Axioms and prosperities Joint Probability Conditional Probability Bayes Theorem

Dutch Book Theorem Independence and Conditional Independence Naïve Bayes Method

Shared By: