Learning Center
Plans & pricing Sign in
Sign Out



									    Uncertainty in AI
(Preliminary to Bayesian Network)

      CS570 Lecture Notes

          by Jin Hyung Kim
Computer Science Department, KAIST
   Motivation of Uncertainty modeling
        Characteristics of real-world applications
        Truth value is unknown
        Too complex to compute prior to make decision
   Source of Uncertainty
   Uncertainty arises because of both laziness and ignorance. It is
    inescapable in complex, dynamic, or inaccessible worlds.
        Cannot be explained by deterministic model
        Ex, decay of radioactive            substances
      Don’t understand well
          Ex: disease transmit mechanism
      Partial Information
        Too complex to compute, but the detail is not needed
          Ex:   coin tossing
                    Types of Uncertainty

   Randomness
       Which side will be up if I toss a coin ?
   Vagueness
       Am I pretty ?
   Confidence
       How much are you confident on your decision ?

   One Formalism for all vs. Separate formalisms
                 Representation + Computational Engine
       Combining several results to one
Uncertainty Representation

          Binary Logic

          Multi-valued Logic

          Probability Theory

          Upper/Lower Probability
                               Most likely Estimate
                               Optimistic Estimate
          Possibility Theory   Pessimistic Estimate
                               in PERT

   Decision making under uncertainty – Rational Agent
   Useful answer from uncertain, conflicting knowledge
       acquiring qualitative and quantitative relationships
       data fusion
       multiple experts’ opinion aggregation
   Wide range of Applications
       diagnose disease
       language understanding
       pattern recognition
       managerial decision making
          Handling Uncertain Knowledge

   Diagnosis Rule
       p Symptom(p, Toothache) Disease(p, Cavity)
       p Symptom(p, Toothache) Disease(p, Cavity) 
        Disease(p,GumDisease)  Disease(p, ImpactedWisdom)…

       Pr( Symptom | Disease)
   Causal Rule
       p Disease(p, Cavity)  Symptom(p, Toothache)
       Not every Cavity causes toothache

       Pr (Disease | Symptom)
        Why First-order Logic Fails?

 Laziness : Too much works to prepare complete
  set of exceptionless rule, and too hard to use the
  enormous rules
 Theoretical ignorance : Medical science has no
  complete theory for the domain
 Practical ignorance : All the necessary tests
  cannot be run, even though we know all the
                  Degree of Belief

 Agent can provide degree of belief for sentence
 Main tool : Probability theory
     assign a numerical degree of belief between 0 and 1
      to sentences
     the way of summarizing the uncertainty that comes
      from laziness and ignorance
   Probability can be derived from statistical data
     Degree of Belief vs. Degree of
 Degree   of Belief
  The  sentence itself is in fact either true or false
  Same ontological commitment as logic ; the facts either do
   or do not hold in the world
  Probability theory

 Degree   of Truth (membership)
  Not a question of the external world
  Case of vagueness or uncertainty about the meaning of the
   linguistic term “tall”, “pretty”
  Fuzzy set theory, fuzzy logic
    Probabilistic Reasoning System

 Assign  probability to a proposition based on the
  percepts that it has received to date
 Evidence : perception that an agent receives
 Probabilities can change when more evidence is
 Prior / unconditional probability : no evidence at all
 Posterior / conditional probability : after evidence is
             Uncertainty and Rational
   No plan can guarantee to achieve the goal
   To make choice, agent must have preferences between the
    different possible outcomes of various plans
      missing   plane v.s. long waiting
 Utility theory to represent and reason with preferences
 Utility: the quality of being useful (degree of usefulness)
 Principle of Maximum Expected Utility
 Decision Theory = Probability Theory + Utility Theory
 An agent is rational if and only if it chooses the action that
  yields the highest expected utility, average over all possible
  outcomes of the action
                 Prior Probability

 P(A): unconditional or prior probability that the
 proposition A is true
   No other information on the proposition
   P(Cavity) = 0.1

 Proposition can   include equality using random
   P(Weather = Sunny) = 0.7, P(Weather = Rain) = 0.2
   P(Weather = Cloudy) = 0.08, P(Weather = Snow) = 0.02

 Eachrandom variable X has domain of possible
 values <x1, x2, …, xn>
           Conditional Probability

 As soon as evidence concerning the previously
  unknown proposition making up the domain, prior
  probabilities are no longer applicable
 We use conditional or posterior probabilities P(A|B) :
  Probability of A given that all we know is B
   P(Cavity|Toothache) =   0.8
 New   information C is known, P(A|BC)
   Joint Probability Distribution as
          a Knowledge base
 Completely   assigns probabilities to all propositions in
 the domain
  Jointprobability distribution P(X1, X2, …, Xn) assigns
   probabilities to all possible atomic events
  Probability of any event can be drivable

 Large   table or high dimensional function
  Difficult to   get & difficult to maintain
 Approximation      by lower order probabilities
    Where Do Probabilities Come From ?

   Frequentist
      The   numbers can come only from experiments
   Objectivist
      Probabilities arereal aspect of universe (propensities of objects to
       behave in certain way)
   Subjectivist
      Probabilities asa way of characterizing an agent’s beliefs, rather than
       having any external physical significance
      Elicitation from human expert
          Human as probability transducer
          Is Human Good ?

   Endless debate over source and status of probability number
Where Do Probabilities Come From ?
 Probability thatthe sun will still exist tomorrow
  (question raised by Hume’s Inquiry)
   The  probability is undefined, because there has never been
    an experiment that tested the existence of the sun tomorrow
   The probability is 1, because in all the experiments that
    have been done (on past days) the sun has existed.
   The probability is 1 - , where  is the proportion of stars in
    the universe that go supernova and explode per day.
   The probability is (d+1)/(d+2), where d is the number of
    days that the sun has existed so far. (Laplace)
   The probability can be derived from the type, age, size, and
    temperature of the sun, even though we have never
    observed another star with those exact properties.
             Probability Elicitation

Experiment : Show { red, green, yellow } {square,
 triangle, circle}s many times in random sequence and
 ask Pr( red | square )

 Pr(a, b) vs Pr(a) & Pr(b|a)
 Pr( attribute | object ) vs Pr (object| attribute )
 Pr( effect| cause ) vs Pr (cause| effect )
 Pr( a) vs Pr(a | b, c, d)

 Human are inconsistent, Systemically biased,
                  Tversky's Legacy
                                            scenarios by Tversky and Kahneman

A taxi hit a pedestrian one night and fled the scene. The entire
   case against the taxi company rests on the evidence of one
   witness, an elderly man who saw the accident from his window
   some distance away. He says that he saw the pedestrian struck
   by a blue taxi. In trying to establish the case, the lawyer for the
   injured pedestrian establishes the following facts:
only two taxi companies in town, 'Blue Cabs' and 'Black Cabs'.
   On the night in question, 85% of all taxis on the road were
   black and 15% were blue.
The witness has demonstrated that he can successfully distinguish
   a blue taxi from a black taxi 80% of the time.
If you were on the jury, how would you decide?
                  Opaque Urn                    B

 After
      observing 4 back and 2 white balls in that
 sequence, what is your belief that the balls from urn A ?
    Approximating High-Order Probabilities

    Pa (C1 ,, CK )  P(C1 ) P(C2 | C1 ) P(C3 | C1C2 )...P(Ck | C1C2 ...Ck 1 )
                     P(C j | Ci1( j ) ), where 0  i1( j )  j
                      j 1

Example:            P(C1 ) P(C2 | C1 ) P(C3 | C1 )...P(Ck | Ck 1 )
   Directed Tree representation of product approximation
      Root :unconditioned variable
      Directed arc AB as Pr(B|A)
   There are many product approximations by 2nd order probabilities
   Select “best” among those
           Preliminary of Chow Tree

   By Kullback-Leibler(KL) Divergence measure
                                    P(C )
           D( P, Pa )   P(C ) log         , ( 0)
                        C           Pa (C )

   Mutual Information, I between X and Y
       I(X;Y) = H(Y) - H(Y|X) = H(X) + H(Y) - H(X,Y)

                                               p( xi , y j )
             I ( X ;Y )     p( x , y ) log p( x ) p( y )
                            xi , y j
                                       i   j
                                                  i        j
              Chow-Tree Algorithm
                             P (C )                               P (C )
D ( P, Pa )   P (C ) log             P (C ) log        K

                                                            P(C
             C               Pa (C )   C
                                                                     j   | Ci ( j ) )
                                                           j 1
             P (C ) log P (C j | Ci ( j ) )   P (C ) log P (C )
                 j 1   C                              C
                 K               P (C j | Ci ( j ) ) P (C j )
             P (C ) log                                      H (C )
                 j 1   C                 P (C j )
                 K                 K
            I (C j ; Ci ( j ) )   H (C j )  H (C )
                 j 1              j 1

   Weight each link with mutual information
   Select the maximum spanning tree as the best approximation
         Chow-Tree Algorithm Example

   Approximate Pr(a | ~b, c, d) from

P(A,B) a        ~a      P(A,C) a     ~a     P(A,D) a        ~a

    b   8/32    12/32     c   8/32   9/32     d    5/32     7/32
 ~b 7/32        5/32     ~c   7/32   8/32    ~d 10/32       10/32

P(B,C) b        ~b       P(B,D) b     ~b    P(C,D) c        ~c
    c   11/32    6/32     d   7/32   5/32      d    4/32     8/32
 ~c     9/32     6/32    ~d   13/32 7/32      ~d    13/32    7/32
       Other Uncertainty formalism

 Certainty   factor in Rule-based (logical) system
   Rule with certainty factor
      If A then B (cf) where cf = (0, 1)
   Interpreted as the added belief ratio if A is confirmed fully.
   “smake  cancer (0.7)” not= “smoke  ~cancer(0.3)”

 Representing   Ignorance
   Dempster-Shafer  Theory
   Confidence as a probability interval

 Representation of    vagueness : Fuzzy Set
             Representing Vagueness
               : Fuzzy Membership
           is 한 or 힐 ?

   Pretty girl

   Old man
   Very old man
                          Fuzzy Sets

 Sets   with fuzzy boundaries
                       A = Set of tall people

           Crisp set A                    Fuzzy set A
 1.0                                1.0
                                     .5                    Membership

              5’10’’      Heights           5’10’’ 6’2’’     Heights
      Membership Functions (MFs)

 Characteristics     of MFs:
   Subjective measures
   Not     probability functions
    MFs                                      “tall” in Asia

       .5                                       “tall” in the US

                                               “tall” in NBA
                              5’10’’                               Heights

        Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan
                         Fuzzy Sets

 Formal   definition:
  A   fuzzy set A in X is expressed as a set of ordered pairs:

                 A  {( x,  A ( x ))| x  X }

                             Membership                   Universe or
    Fuzzy set
                              function               universe of discourse

           A fuzzy set is totally characterized by a
                 membership function (MF).

        Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan
             Fuzzy Sets with Cont.
 Fuzzy   set B = “about 50 years old”
  X = Set of positive real numbers (continuous)
  B = {(x, mB(x)) | x in X}

       B(x) 
                      x  50 
                 1         
                     10 

        Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan
                  Fuzzy Partition

 Fuzzy
      partitions formed by the linguistic values
 “young”, “middle aged”, and “old”:

      Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan
                MF Terminology




                                 Core                             X

                           Crossover points

                                  a - cut


     Excerpted from J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan
   Set-Theoretic Operations


 Excerpted from J.-S. Roger Jang (張智星)fuzsetop.m Hua Univ., Taiwan
                                       CS Dept., Tsing

To top