Docstoc

Uncertainty Uncertainty

Document Sample
Uncertainty Uncertainty Powered By Docstoc
					                                        Uncertainty

                                          Cholwich Nattee
                              Sirindhorn International Institute of Technology
                                           Thammasat University




Lecture 10: Uncertainty                                                            1/41




Uncertainty
                     Logical agents have limitations in handling uncertain
                     knowledge.
                     For example, consider a KB:
                          ∀p Symptom(p, Toothache) ⇒
                          Disease(p, Cavity) ∨ Disease(p, GumDisease) ∨ . . .
                     We cannot conclude that a patient with toothache has
                     cavity.
                     Three main reasons that makes logical agents fail:
                          Laziness: too much work to construct the complete KB.
                          Theoretical ignorance: no complete theory for the
                          domain.
                          Practical ignorance: not all necessary percepts can be
                          checked.
                     The best way to deal with them is to provide
                     degree of belief based on probability theory.
Lecture 10: Uncertainty                                                            2/41
Probability and Degree of Belief
                     The probability is used to denote the degree of belief not
                     the degree of truth.
                     For example, A probability of 0.8 for having cavity in a
                     patient with toothache means we believe that there is an
                     80% chance that the patient with toothache has a cavity.
                     The probability is about the agent’s beliefs, not directly
                     about the world.
                     For example, the agent draws a card from a shuffled pack.
                     Before looking at the card, we have a probability of 1/52
                     for being the ace of spades. After looking at the card, an
                     appropriate probability will be just 0 or 1.
                     An assignment of probability just shows entailment status
                     with the currently available knowledge base.

Lecture 10: Uncertainty                                                                  3/41




Uncertainty: Example [1]
                     Let At be an action leaving for airport t minutes before
                     flight.
                     Will At get me there on time?
                     Problems
                             partial observability (road state, other drivers’s plan,
                             etc.)
                             noisy sensors
                             uncertainty in action outcomes (flat tire, etc.)
                             immense complexity of modeling and predicting traffic
                     Using a logical approach:
                          1. Risks falsehood: “A25 will get me there on time”, or
                          2. Leads to too weak conclusions for decision making:
                             “A25 will get me there on time if there is no accident on
                             the bridge, and it does not rain and my tires remain
                             intact etc etc.”
Lecture 10: Uncertainty                                                                  4/41
Uncertainty: Example [2]


                     Probabilities are used to relate propositions to the current
                     KB, for example,

                          P(A25 gets me there on time|no accidents) = 0.06

                     Probabilities change with new evidence,

                     P(A25 gets me there on time|no accidents, 5 a.m.) = 0.15




Lecture 10: Uncertainty                                                                   5/41




Making Decisions under Uncertainty
                     Suppose I believe the following:

                           P(A25    gets   me   there   on   time| . . . )   =   0.04
                           P(A90    gets   me   there   on   time| . . . )   =   0.70
                          P(A120    gets   me   there   on   time| . . . )   =   0.95
                          P(A1440   gets   me   there   on   time| . . . )   =   0.9999


                     Which action to choose?
                           It depends on my preferences for missing flight vs.
                           airport cuisine, etc.
                     Utility theory is used to represent and infer preferences.
                     Decision theory = utility theory + probability theory

Lecture 10: Uncertainty                                                                   6/41
Probability Basics
                     Let Ω be the sample space, e.g., 6 possible rolls of a die.
                     ω ∈ Ω is a sample point/possible world/atomic event.
                     A probability model is a sample space with an assignment
                     P(ω) for every ω ∈ Ω

                                          0 ≤ P(ω) ≤ 1
                                              P(ω) = 1
                                            ω

                     E.g., P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
                     An event A is any subset of Ω

                                          P(A) =         P(ω)
                                                   ω∈A

                     E.g. P(die roll < 4) = P(1) + P(2) + P(3) = 0.5
Lecture 10: Uncertainty                                                               7/41




Random Variables
                     The random variable is used to refer to a part of the
                     world whose status is initially unknown.
                     For example, Cavity refers to whether the tooth has a
                     cavity.
                     Each random variable has a domain of values, e.g. the
                     domain of Cavity might be true, false
                          Boolean random variables have the domain
                           true, false . For example, Cavity = true is also written
                          cavity, and we often write ¬cavity for Cavity = false
                          Discrete random variables take on values from a
                          countable domain. For example, the domain of Weather
                          might be sunny, rainy, cloudy, snow
                          Continuous random variables take on values from the
                          real numbers. For example, Temp = 21.6 or
                          Temp < 22.0
Lecture 10: Uncertainty                                                               8/41
Atomic Event

                     Atomic event is a complete specification of the state of
                     the world.
                     For example, if the world composes of only two variables
                     Cavity and Toothache, then there are four distinct
                     atomic events:

                               Cavity = true ∧ Toothache = true
                              Cavity = true ∧ Toothache = false
                              Cavity = false ∧ Toothache = true
                              Cavity = false ∧ Toothache = false




Lecture 10: Uncertainty                                                           9/41




Propositions [1]
                     A proposition can be thought as the event where the
                     proposition is true.
                     For example, given Boolean random variables A and B:
                          event a = set of sample points where A(ω) = true
                          event a ∧ b = points where A(ω) = true and
                          B(ω) = true
                     With Boolean variables, sample point = propositional
                     logic model, e.g. A = true, or a ∧ ¬b
                     Proposition = disjunction of atomic events in which it is
                     true, e.g.

                          P(a ∨ b) = P(¬a ∧ b) + P(a ∧ ¬b) + P(a ∧ b)


Lecture 10: Uncertainty                                                          10/41
Propositions [2]
                     From the definitions, the logically related events must
                     have related probabilities.
                     For example, P(a ∨ b) = P(a) + P(b) − P(a ∧ b)


                                               A B


                                           A         B




Lecture 10: Uncertainty                                                        11/41




Prior Probability
                     Prior or unconditional probabilities of propositions
                     correspond to belief prior to arrival of any evidence.
                     E.g., P(cavity) = 0.1, and P(Weather = sunny) = 0.72
                     Probability distribution gives values for all possible
                     assignments:
                                 P(Weather) = 0.72, 0.1, 0.08, 0.1
                     Joint probability distribution gives the probability of
                     every atomic event.
                     E.g., P(Weather, Cavity) = a 4 × 2 matrix of values

                            Weather        sunny   rain   cloudy   snow
                          Cavity = true    0.144   0.02    0.016    0.02
                          Cavity = false   0.576   0.08    0.064    0.08

Lecture 10: Uncertainty                                                        12/41
Probability for Continuous Variables
                     Probability distribution is expressed as a parameterized
                     function of value, e.g. P(X = x) = U [18, 26](x) =
                     uniform density between 18 and 26

                               0.125




                                           18           dx       26

                     P(X = 20.5) = 0.125 really means

                            lim P(20.5 ≤ X ≤ 20.5 + dx)/dx = 0.125
                            dx→0



Lecture 10: Uncertainty                                                         13/41




Gaussian Density

                                               2    2
                     P(X = x) =    √ 1 e −(x−µ) /2σ
                                    2πσ




                                                0




Lecture 10: Uncertainty                                                         14/41
Conditional Probability [1]
                     Conditional or posterior probabilities P(a|b), where a and
                     b are any propositions. This is read as “the probability of
                     a, given that all we know is b.”
                     E.g. P(cavity|toothache) = 0.8
                     Notation for conditional distribution, e.g.,
                     P(Cavity|Toothache)
                     New evidence may change the probability, e.g.,

                                   P(cavity|toothache, cavity) = 1

                     Anyway, new evidence may be irrelevant, allowing
                     simplification, e.g.,

                     P(cavity|toothache, thaiWins) = P(cavity|toothache) = 0.8

Lecture 10: Uncertainty                                                            15/41




Conditional Probability [2]
                     Definition of conditional probability:

                                              P(a ∧ b)
                                   P(a|b) =            if P(b) = 0
                                               P(b)

                     Alternative formulation:

                                P(a ∧ b) = P(a|b)P(b) = P(b|a)P(a)

                     For conditional distribution,

                          P(Weather, Cavity) = P(Weather|Cavity)P(Cavity)

                     (View as a 4 × 2 set of equations, not matrix mult)


Lecture 10: Uncertainty                                                            16/41
Inference by Enumeration [1]
                     Inference by Enumeration is a simple method for
                     probabilistic inference. It is the computation from
                     observed evidence of posterior probabilities from query
                     propositions.
                     Fully joint distribution is used as the knowledge base.
                     For example,
                                               toothache            toothache




                                                                L
                                             catch       catch catch       catch



                                                     L




                                                                       L
                                    cavity   .108 .012         .072 .008
                                    cavity   .016 .064         .144 .576
                                L




                     For any proposition φ, sum the atomic events where it is
                     true:
                                        P(φ) =       P(ω)
                                                     ω:ω|=φ

Lecture 10: Uncertainty                                                            17/41




Inference by Enumeration [2]


                     For example,
                          P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2

                                               toothache            toothache
                                                                L




                                             catch       catch catch       catch
                                                     L




                                                                       L




                                    cavity   .108 .012         .072 .008
                                    cavity   .016 .064         .144 .576
                                L




Lecture 10: Uncertainty                                                            18/41
Inference by Enumeration [3]


                    P(toothache ∨ cavity) = 0.108 + 0.012 + 0.072 +
                                            0.008 + 0.016 + 0.064
                                          = 0.28

                                          toothache            toothache




                                                           L
                                        catch       catch catch       catch
                                                L




                                                                  L
                               cavity   .108 .012         .072 .008
                               cavity   .016 .064         .144 .576
                           L




Lecture 10: Uncertainty                                                       19/41




Inference by Enumeration [4]

                                              P(¬cavity ∧ toothache)
             P(¬cavity|toothache) =
                                                   P(toothache)
                                                      0.016 + 0.064
                                            =
                                              0.108 + 0.012 + 0.016 + 0.064
                                            = 0.4

                                          toothache            toothache
                                                           L




                                        catch       catch catch       catch
                                                L




                                                                  L




                               cavity   .108 .012         .072 .008
                               cavity   .016 .064         .144 .576
                           L




Lecture 10: Uncertainty                                                       20/41
Inference by Enumeration [5]

            P(Cavity|toothache) = αP(Cavity, toothache)
                                = α[P(Cavity, toothache, catch)
                                  + P(Cavity, toothache, ¬catch)]
                                = α[ 0.108, 0.016 + 0.012 + 0.064 ]
                                = α 0.12, 0.08 = 0.6, 0.4

                                             toothache              toothache




                                                                L
                                           catch   L   catch catch         catch




                                                                       L
                                  cavity   .108 .012            .072 .008
                                  cavity   .016 .064            .144 .576
                              L




Lecture 10: Uncertainty                                                            21/41




Inference by Enumeration [6]

                     General idea is to compute distribution on query variable
                     by fixing evidence variables and summing over hidden
                     variables.

                                      P(X |e) = αP(X , e)
                                              = α   P(X , e, y)
                                                            y

                     where,
                          X is the query variable.
                          e is the observed values of evidence variables.
                          y is the remaining unobserved variables.



Lecture 10: Uncertainty                                                            22/41
Exercises
                     From the given fully joint distribution, compute the
                     following probabilities, and probability distributions.
                                          disease                             ¬disease
                                TestA = low    TestA = high          TestA = low   TestA = high
             TestB = low            0.10           0.07                  0.07          0.03
            TestB = norm            0.03           0.07                  0.20          0.07
            TestB = high            0.17           0.13                  0.03          0.03



            1.       P(disease ∧ TestB = low ∧ TestA = high)
            2.       P(disease ∨ TestA = low)
            3.       P(TestA = high ⇒ disease)
            4.       P(TestA = high|TestB = low, disease)
            5.       P(Disease|TestA = high)

Lecture 10: Uncertainty                                                                           23/41




Absolute Independence
                     Variables A and B are independent iff
        P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A)P(B)

                     For example,
                                                                               Cavity
                                          Cavity        decomposes into   Toothache Catch
                                  Toothache     Catch
                                        Weather
                                                                              Weather




                          P(Toothache, Catch, Cavity, Weather)
                                 = P(Toothache, Catch, Cavity)P(Weather)
                     Number of entries reduce from 32 to 12.
                     Absolute independence is powerful but very rare
                     Dentistry is a large field
Lecture 10: Uncertainty                                                                           24/41
Conditional Independence [1]
                     If a patient has a cavity, the probability that the probe
                     catches in it does not depend on whether the patient has
                     a toothache:

                           P(catch|toothache, cavity) = P(catch|cavity)

                     The same independence holds if the patient does not
                     have a cavity.

                          P(catch|toothache, ¬cavity) = P(catch|¬cavity)

                     We can say that Catch is conditionally independent of
                     Toothache given Cavity:

                          P(Catch|Toothache, Cavity) = P(Catch|Cavity)

Lecture 10: Uncertainty                                                          25/41




Conditional Independence [2]



                     Two variables: X and Y are conditionally independent
                     given Z , iff
                          P(X |Y , Z ) = P(X |Z )
                          P(Y |X , Z ) = P(Y |Z )
                          P(X , Y |Z ) = P(X |Z )P(Y |Z )




Lecture 10: Uncertainty                                                          26/41
Conditional Independence [3]

                     The fully joint distribution, P(Toothache, Catch, Cavity)
                     has 23 − 1 = 7 independence entries.
                     We can write the distribution using chain rule:

                     P(Toothache, Catch, Cavity)
                     = P(Toothache|Catch, Cavity)P(Catch, Cavity)
                     = P(Toothache|Catch, Cavity)P(Catch|Cavity)P(Cavity)
                     = P(Toothache|Cavity)P(Catch|Cavity)P(Cavity)

                     This requires only 2 + 2 + 1 = 5 independent entries.
                          Knowing P(toothache|cavity) and P(toothache|¬cavity)
                          is enough for P(Toothache|Cavity), and so on


Lecture 10: Uncertainty                                                          27/41




Bayes’ Rule
                     From the product rule,

                            P(a ∧ b) = P(a|b)P(b) = P(b|a)P(a)

                     Thus, we have
                                                  P(a|b)P(b)
                                       P(b|a) =
                                                    P(a)

                     This is known as Bayes’ rule
                     In more general case, we have a set of questions:

                                                  P(X |Y )P(Y )
                                     P(Y |X ) =
                                                     P(X )


Lecture 10: Uncertainty                                                          28/41
Bayes’ Rule: Normalization


                     Bayes’ rule can be written:

                                                P(X |Y )P(Y )
                             P(Y |X ) =
                                            i P(X |Y = yi )P(Y = yi )

                     More generally,

                                    P(Y |X ) = αP(X |Y )P(Y )




Lecture 10: Uncertainty                                                     29/41




Applying Bayes’ rule: Example [1]


        A doctor knows that the disease meningitis causes the patient to
        have a stiff neck, 50% of the time. The doctor also knows some
        unconditional facts: the prior probability that a patient has
        meningitis is 1/50000, and the prior probability that any patient
        has a stiff neck is 1/20.

        What is the probability that the patient who has a stiff neck
        will has meningitis?




Lecture 10: Uncertainty                                                     30/41
Appling Bayes’ rule: Example [2]

                     Let s be the proposition that the patient has a stiff neck,
                     and m be the proposition that the patient has meningitis,
                     we have
                           P(s|m) = 0.5
                           P(m) = 1/50000
                           P(s) = 1/20
                     Thus,
                                     P(s|m)P(m)   0.5 × 1/50000
                          P(m|s) =              =               = 0.0002
                                         P(s)          1/20




Lecture 10: Uncertainty                                                           31/41




Applying Bayes’ rule: Example [3]


                     Using Bayes’ rule and normalization, we have

                             P(M |s) = α P(s|m)P(m), P(s|¬m)P(¬m)

                     In general, we have

                                      P(Y |X ) = αP(X |Y )P(Y )




Lecture 10: Uncertainty                                                           32/41
Combining Evidence: Example [1]

                     When we want to combine several pieces of information:
                                              P(T , Ct|Cv)P(Ct)
                            P(Cv|T , Ct) =
                                                  P(T , Ct)

                     However, Catch and Toothache are conditionally
                     independent given Cavity, we have
                                         P(T |Cv)P(Ct|Cv)P(Cv)
                          P(Cv|T , Ct) =
                                                 P(T , Ct)

                     We can combine each evidence sequentially.



Lecture 10: Uncertainty                                                          33/41




Naïve Bayes Model



        P(Cause, Effect 1 , . . . , Effect n ) = P(Cause)           P(Effect i |Cause)
                                                              i


                                             Cause




                             Effect      Effect      ....   Effect
                                   1           2                  n




Lecture 10: Uncertainty                                                          34/41
Exercises

                     After your yearly checkup, the doctor has bad news and
                     good news. The bad news is that you tested positive for a
                     serious disease and that test is 99% accurate. The good
                     news is that this is a rare disease, striking only 1 in
                     10,000 people of your age. What are the chances that
                     you actually have the disease?1
                     In a dishonest casino, one die out of 100 dies is loaded to
                     make 6 come up 50% of the time. If someone rolls three
                     6’s in a row, what is the probability that the die is
                     loaded?2


                1
                    AIMA Execises 13.8
                2
                    http://www.mscs.mu.edu/˜cstruble/class/cosc159/spring2003/notes/
Lecture 10: Uncertainty                                                                35/41




Application to Data Mining [1]
                     Classification problem aims to predict the class of object
                     from given set of evidence.
                          For example, the credit card company want to predict
                          customers’ credit risk (true or false) from their
                          applications composing of several attributes.
                     We need to compute P(Class|E = e) where Class is a
                     set of classes, and E is the evidence we want to predict.
                     Then, we select c ∈ Class with the highest probability.

                      cpredicted = argmax P(Class = c|E = e)
                                         c
                                          P(E = e|Class = c)P(Class = c)
                                 = argmax
                                      c             P(E = e)
                                 = argmax P(E = e|Class = e)P(Class = c)
                                         c


Lecture 10: Uncertainty                                                                36/41
Application to Data Mining [2]
                     By gathering statistical data, we can estimate the
                     probability of evidence in ease class, i.e., we know
                     P(E = e|Class = c), and P(Class = c)
                     Generally speaking, the evidence composes of several
                     attributes: E = e1 , e2 , e3 , . . . . To simplify computation,
                     the attributes of E are assumed to be conditionally
                     independent given Class = C . Thus,
                                                                   k
                            P(E = e|Class = c) =                       P(Ei = ei |Class = c)
                                                                 i=1

                     This data mining technique is called “Naïve Bayesian
                     Classification”

Lecture 10: Uncertainty                                                                                   37/41




Naïve Bayesian Classification: Example [1]
             ID           Credit History        Debt       Collateral        Income        Credit Risk?
              1                bad              high         none             0-15k           high
              2             unknown             high         none            15-35k           high
              3             unknown             low          none            15-35k         moderate
              4             unknown             low          none             0-15k           high
              5             unknown             low          none             >35k             low
              6             unknown             low        adequate           >35k             low
              7                bad              low          none             0-15k           high
              8                bad              low        adequate           >35k          moderate
              9               good              low          none             >35k             low
             10               good              high       adequate           >35k             low
             11               good              high         none             0-15k           high
             12               good              high         none            15-35k         moderate
             13               good              high         none             >35k             low
             14                bad              high         none            15-35k           high
                           Source: http://www.mscs.mu.edu/~cstruble/class/cosc159/spring2003/notes/


Lecture 10: Uncertainty                                                                                   38/41
Naïve Bayesian Classification: Example [2]


                     What is the credit risk of the following customer?
                             e = bad, low, none, 0-15k
                             To predict the credit risk, we need to select the most
                             suitable class:

                             cpredict =      argmax          P(E = e|Class = c)P(Class = c)
                                          c∈{low,mod,high}




Lecture 10: Uncertainty                                                                   39/41




Naïve Bayesian Classification: Example [3]

                     Since we have three classes, we need to compute three
                     times for each case:
                          1. Case: Class = low

                             P(E = e|Class = low)P(Class = low)
                               = (        P(Ei = ei |Class = low))P(Class = low)
                                     i
                               = P(History = Bad|Class = low) × P(Debt = bad|Class = low
                                   P(Collateral = none|Class = low) ×
                                 P(Income = 0-15k|Class = low) × P(Class = low)
                                 0 3 3 0          5
                               ≈  × × × ×           = 0.0
                                 5 5 5 5 14



Lecture 10: Uncertainty                                                                   40/41
Naïve Bayesian Classification: Example [4]
                          2. Case: Class = moderate

                                       P(E = e|Class = moderate)P(Class = moderate)
                                            1 2 2 0          3
                                        ≈     × × × ×           = 0.0
                                            3 3 3 3 14
                          3. Case: Class = high

                                       P(E = e|Class = moderate)P(Class = moderate)
                                            3 2 6 4          6
                                        ≈     × × × ×           ≈ 0.05
                                            6 6 6 6 14
                     Thus,

                            argmax P(E = e|Class = c)P(Class = c) = high
                               c



Lecture 10: Uncertainty                                                         41/41

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:17
posted:11/22/2011
language:English
pages:21