Documents
User Generated
Resources
Learning Center

UAI

VIEWS: 3 PAGES: 32

• pg 1
```									    Uncertainty in AI
(Preliminary to Bayesian Network)

CS570 Lecture Notes

by Jin Hyung Kim
Computer Science Department, KAIST
Uncertainty
   Motivation of Uncertainty modeling
   Characteristics of real-world applications
   Truth value is unknown
   Too complex to compute prior to make decision
   Source of Uncertainty
   Uncertainty arises because of both laziness and ignorance. It is
inescapable in complex, dynamic, or inaccessible worlds.
   Cannot be explained by deterministic model
 Don’t understand well
Ex: disease transmit mechanism
 Partial Information
   Too complex to compute, but the detail is not needed
Ex:   coin tossing
Types of Uncertainty

   Randomness
   Which side will be up if I toss a coin ?
   Vagueness
   Am I pretty ?
   Confidence
   How much are you confident on your decision ?

   One Formalism for all vs. Separate formalisms
Representation + Computational Engine
   Combining several results to one
Uncertainty Representation

Binary Logic

Multi-valued Logic

Probability Theory

Upper/Lower Probability
Most likely Estimate
Optimistic Estimate
Possibility Theory   Pessimistic Estimate
in PERT
Applications

   Decision making under uncertainty – Rational Agent
   Useful answer from uncertain, conflicting knowledge
   acquiring qualitative and quantitative relationships
   data fusion
   multiple experts’ opinion aggregation
   Wide range of Applications
   diagnose disease
   language understanding
   pattern recognition
   managerial decision making
Handling Uncertain Knowledge

   Diagnosis Rule
   p Symptom(p, Toothache) Disease(p, Cavity)
   p Symptom(p, Toothache) Disease(p, Cavity) 
Disease(p,GumDisease)  Disease(p, ImpactedWisdom)…

   Pr( Symptom | Disease)
   Causal Rule
   p Disease(p, Cavity)  Symptom(p, Toothache)
   Not every Cavity causes toothache

   Pr (Disease | Symptom)
Why First-order Logic Fails?

 Laziness : Too much works to prepare complete
set of exceptionless rule, and too hard to use the
enormous rules
 Theoretical ignorance : Medical science has no
complete theory for the domain
 Practical ignorance : All the necessary tests
cannot be run, even though we know all the
rules
Degree of Belief

 Agent can provide degree of belief for sentence
 Main tool : Probability theory
 assign a numerical degree of belief between 0 and 1
to sentences
 the way of summarizing the uncertainty that comes
from laziness and ignorance
   Probability can be derived from statistical data
Degree of Belief vs. Degree of
Truth
 Degree   of Belief
The  sentence itself is in fact either true or false
Same ontological commitment as logic ; the facts either do
or do not hold in the world
Probability theory

 Degree   of Truth (membership)
Not a question of the external world
Case of vagueness or uncertainty about the meaning of the
linguistic term “tall”, “pretty”
Fuzzy set theory, fuzzy logic
Probabilistic Reasoning System

 Assign  probability to a proposition based on the
percepts that it has received to date
 Evidence : perception that an agent receives
 Probabilities can change when more evidence is
acquired
 Prior / unconditional probability : no evidence at all
 Posterior / conditional probability : after evidence is
obtained
Uncertainty and Rational
Decisions
   No plan can guarantee to achieve the goal
   To make choice, agent must have preferences between the
different possible outcomes of various plans
 missing   plane v.s. long waiting
 Utility theory to represent and reason with preferences
 Utility: the quality of being useful (degree of usefulness)
 Principle of Maximum Expected Utility
 Decision Theory = Probability Theory + Utility Theory
 An agent is rational if and only if it chooses the action that
yields the highest expected utility, average over all possible
outcomes of the action
Prior Probability

 P(A): unconditional or prior probability that the
proposition A is true
No other information on the proposition
P(Cavity) = 0.1

 Proposition can   include equality using random
variable
P(Weather = Sunny) = 0.7, P(Weather = Rain) = 0.2
P(Weather = Cloudy) = 0.08, P(Weather = Snow) = 0.02

 Eachrandom variable X has domain of possible
values <x1, x2, …, xn>
Conditional Probability

 As soon as evidence concerning the previously
unknown proposition making up the domain, prior
probabilities are no longer applicable
 We use conditional or posterior probabilities P(A|B) :
Probability of A given that all we know is B
P(Cavity|Toothache) =   0.8
 New   information C is known, P(A|BC)
Joint Probability Distribution as
a Knowledge base
 Completely   assigns probabilities to all propositions in
the domain
Jointprobability distribution P(X1, X2, …, Xn) assigns
probabilities to all possible atomic events
Probability of any event can be drivable

 Large   table or high dimensional function
Difficult to   get & difficult to maintain
 Approximation      by lower order probabilities
Where Do Probabilities Come From ?

   Frequentist
 The   numbers can come only from experiments
   Objectivist
 Probabilities arereal aspect of universe (propensities of objects to
behave in certain way)
   Subjectivist
 Probabilities asa way of characterizing an agent’s beliefs, rather than
having any external physical significance
 Elicitation from human expert
 Human as probability transducer
 Is Human Good ?

   Endless debate over source and status of probability number
Where Do Probabilities Come From ?
 Probability thatthe sun will still exist tomorrow
(question raised by Hume’s Inquiry)
The  probability is undefined, because there has never been
an experiment that tested the existence of the sun tomorrow
The probability is 1, because in all the experiments that
have been done (on past days) the sun has existed.
The probability is 1 - , where  is the proportion of stars in
the universe that go supernova and explode per day.
The probability is (d+1)/(d+2), where d is the number of
days that the sun has existed so far. (Laplace)
The probability can be derived from the type, age, size, and
temperature of the sun, even though we have never
observed another star with those exact properties.
Probability Elicitation

Experiment : Show { red, green, yellow } {square,
triangle, circle}s many times in random sequence and
ask Pr( red | square )

 Pr(a, b) vs Pr(a) & Pr(b|a)
 Pr( attribute | object ) vs Pr (object| attribute )
 Pr( effect| cause ) vs Pr (cause| effect )
 Pr( a) vs Pr(a | b, c, d)

 Human are inconsistent, Systemically biased,
Anchored
Tversky's Legacy
scenarios by Tversky and Kahneman

A taxi hit a pedestrian one night and fled the scene. The entire
case against the taxi company rests on the evidence of one
witness, an elderly man who saw the accident from his window
some distance away. He says that he saw the pedestrian struck
by a blue taxi. In trying to establish the case, the lawyer for the
injured pedestrian establishes the following facts:
only two taxi companies in town, 'Blue Cabs' and 'Black Cabs'.
On the night in question, 85% of all taxis on the road were
black and 15% were blue.
The witness has demonstrated that he can successfully distinguish
a blue taxi from a black taxi 80% of the time.
If you were on the jury, how would you decide?
Opaque Urn                    B
A

 After
observing 4 back and 2 white balls in that
sequence, what is your belief that the balls from urn A ?
Approximating High-Order Probabilities

Pa (C1 ,, CK )  P(C1 ) P(C2 | C1 ) P(C3 | C1C2 )...P(Ck | C1C2 ...Ck 1 )
K
  P(C j | Ci1( j ) ), where 0  i1( j )  j
j 1

Example:            P(C1 ) P(C2 | C1 ) P(C3 | C1 )...P(Ck | Ck 1 )
   Directed Tree representation of product approximation
 Root :unconditioned variable
 Directed arc AB as Pr(B|A)
   There are many product approximations by 2nd order probabilities
   Select “best” among those
Preliminary of Chow Tree

   By Kullback-Leibler(KL) Divergence measure
P(C )
D( P, Pa )   P(C ) log         , ( 0)
C           Pa (C )

   Mutual Information, I between X and Y
I(X;Y) = H(Y) - H(Y|X) = H(X) + H(Y) - H(X,Y)

p( xi , y j )
I ( X ;Y )     p( x , y ) log p( x ) p( y )
xi , y j
i   j
i        j
Chow-Tree Algorithm
P (C )                               P (C )
D ( P, Pa )   P (C ) log             P (C ) log        K

 P(C
C               Pa (C )   C
j   | Ci ( j ) )
j 1
K
   P (C ) log P (C j | Ci ( j ) )   P (C ) log P (C )
j 1   C                              C
K               P (C j | Ci ( j ) ) P (C j )
   P (C ) log                                      H (C )
j 1   C                 P (C j )
K                 K
  I (C j ; Ci ( j ) )   H (C j )  H (C )
j 1              j 1

   Weight each link with mutual information
   Select the maximum spanning tree as the best approximation
Chow-Tree Algorithm Example

   Approximate Pr(a | ~b, c, d) from

P(A,B) a        ~a      P(A,C) a     ~a     P(A,D) a        ~a

b   8/32    12/32     c   8/32   9/32     d    5/32     7/32
~b 7/32        5/32     ~c   7/32   8/32    ~d 10/32       10/32

P(B,C) b        ~b       P(B,D) b     ~b    P(C,D) c        ~c
c   11/32    6/32     d   7/32   5/32      d    4/32     8/32
~c     9/32     6/32    ~d   13/32 7/32      ~d    13/32    7/32
Other Uncertainty formalism

 Certainty   factor in Rule-based (logical) system
Rule with certainty factor
If A then B (cf) where cf = (0, 1)
Interpreted as the added belief ratio if A is confirmed fully.
“smake  cancer (0.7)” not= “smoke  ~cancer(0.3)”

 Representing   Ignorance
Dempster-Shafer  Theory
Confidence as a probability interval

 Representation of    vagueness : Fuzzy Set
Representing Vagueness
: Fuzzy Membership
           is 한 or 힐 ?

   Pretty girl

   Old man
   Very old man
Fuzzy Sets

 Sets   with fuzzy boundaries
A = Set of tall people

Crisp set A                    Fuzzy set A
1.0                                1.0
.9
.5                    Membership
function

5’10’’      Heights           5’10’’ 6’2’’     Heights
Membership Functions (MFs)

 Characteristics     of MFs:
Subjective measures
Not     probability functions
MFs                                      “tall” in Asia

.8
.5                                       “tall” in the US

“tall” in NBA
.1
5’10’’                               Heights

Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan
Fuzzy Sets

 Formal   definition:
A   fuzzy set A in X is expressed as a set of ordered pairs:

A  {( x,  A ( x ))| x  X }

Membership                   Universe or
Fuzzy set
function               universe of discourse
(MF)

A fuzzy set is totally characterized by a
membership function (MF).

Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan
Fuzzy Sets with Cont.
Universes
 Fuzzy   set B = “about 50 years old”
X = Set of positive real numbers (continuous)
B = {(x, mB(x)) | x in X}

1
B(x) 
x  50 
2

1         
 10 

Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan
Fuzzy Partition

 Fuzzy
partitions formed by the linguistic values
“young”, “middle aged”, and “old”:

lingmf.m
Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan
MF Terminology

MF

1

.5

a
0
Core                             X

Crossover points

a - cut

Support

Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan
Set-Theoretic Operations

subset.m

Excerpted from J.-S. Roger Jang (張智星)fuzsetop.m Hua Univ., Taiwan
CS Dept., Tsing

```
To top