60 RULE003 IFthe defendant has a alibi by MWNM3s


									Dealing with Uncertainty
       Reasoning Under Uncertainty
Monotonic Reasoning

•   A reasoning process that moves in one direction only.
•   The number of facts in the knowledge base is always increasing.
•   The conclusions derived are valid deductions and they remain so.

Reasoning process applied to practical everyday problems must
  recognize uncertainty
• Available information is frequently incomplete
• Conditions change over time
• There is frequently a need to make an efficient but possibly incorrect
  guess when reasoning reaches a dead end.
     Reasoning Under Uncertainty
Non-monotonic Reasoning

Non-monotonic reasoning (NMR) is based on augmenting
  absolute truth with beliefs.

These tentative beliefs are generally based on default
  assumptions that are made in light of lack of evidence.

A NMR system tracks a set of tentative beliefs and revise
  those beliefs when knowledge is observed or derived.
     Reasoning Under Uncertainty

• Uncertainty may cause bad treatment in medicine, loss
  of money in business.

• Classic examples of successful expert systems which
  deal with uncertainty are MYCIN for medical diagnosis
  and PROSPECTOR for mineral exploration.

• In case of medicine, delaying treatment for more tests
  (for more exact knowledge) may add considerable costs;
  the patient may die.
     Reasoning Under Uncertainty
Many different types of errors can contribute to

1. data might be missing or unavailable
2. data might be ambiguous or unreliable due to
   measurement errors
3. the representation of data may be imprecise or
4. data may just be user's best guess (random)
5. data may be based on defaults, and defaults may
   have exceptions
     Reasoning Under Uncertainty
Given these sources of errors, most knowledge base
   systems incorporate some form of uncertainty

There are three issues to be considered:
   1. How to represent uncertain data.
   2. How to combine two or more pieces of uncertain data.
   3. How to draw inference using uncertain data
      Reasoning Under Uncertainty
Errors and Induction

Deduction is going from general to specific
   All men are mortal
   Socrates is a man
      therefore Socrates is mortal

Induction tries to generalize from the specific.

My disk has never crashed.
   therefore my disk will never crash.
       Reasoning Under Uncertainty
Inductive arguments can never be proven correct (except mathematical
   induction). Inductive arguments can provide some degree of
   confidence that the conclusion is correct.

Deductive errors or fallacies may also occur
    If p implies q
    q is true
    therefore p

If the valve is in good condition then the output is normal
The output is normal
Therefore, the valve is in good condition

Uncertainty is major problem in knowledge elicitation, especially when
  the expert's knowledge must be quantized in rules.
Approaches in Dealing with Uncertainty

Numerically oriented methods:
  •   Bayes’ Rules
  •   Certainty Factors
  •   Dempster Shafer
  •   Fuzzy Sets
Quantitative approaches
•   Non-monotonic reasoning
Symbolic approaches
•   Cohen’s Theory of Endorsements
•   Fox’s semantic systems
            Classical Probability

• This is also called a priori probability. It is
  assumed that all possible events are known and
  that each event is equally likely to happen
  (rolling a die).
• Prior or unconditional probability is the one
  before the evidence is received.
• Posterior or conditional probability is the one
  after the evidence is received.
                Theory of Probability
Formal theory of probability can be made using 3 axioms:

Axiom 1         0 =< P(E) =< 1

Axiom 2           P( E )  1
                                   where Ei, i=< 1=< n are mutually


                P(E) + P(E') = 1

Axiom 3         P( E1  E2 )  P( E1 )  P( E2 )  P( E1  E2 )
              Theory of Probability
Experimental or Subjective Probabilities
• In contrast to the prior approach, experimental
  probability defines the probability of an event P(E) as the
  limit of a frequency distribution.

              P(E) =[ lim (N-> infinity)] f(E)/N

This type of probability is called a posterior probability.

A subjective probability is a belief or opinion expressed as
  a probability rather than a probability based on axioms or
  empirical measurements. This is applied on the
  decisions for non-repeatable events.
              Theory of Probability
Compound Probabilities

• What is the probability of rolling a die with an
  outcome of even number divisible by 3.
  Event A=               {2, 4, 6}
  Event B =              {3, 6}
                          A  B  6
                                       n( A  B ) 1
                         P( A  B)              
                                          n( s )   6
                         P( A  B)  P( A) P( B)
                Theory of Probability
The two events are called stochastically independent
  events if and only if the above formula is true.

Stochastic is a Greek word meaning "guess". It is
  commonly used as a synonym for probability, random or

The probability of rolling a die with an outcome of even
  number or divisible by 3.
          P( A  B)  P( A)  P( B)  P( A  B)
                      = 3/6 + 2/6 – 1/6 = 4/6
            Theory of Probability

Conditional Probabilities
• The probability of an event A, given that event B
  occurred, is called a conditional probability and
  indicated by P(A|B).

               P( A  B)
  P( A | B)             forP( B)  0
                 P( B)
  P( A | B) P( B)  P( A  B)
                          Baye's Theorem

Baye's Theorem in terms of events E, and hypothesis,

                  P( E  H i )
  P( H i | E ) 
                  P( E  H j )

                   P( E | H i ) P( H i )
                   P( E | H j ) P( H j )

                P( E | H i ) P( H i )
                      P( E )
                   Baye's Theorem

The conditional probability, P(A|B), states the probability of
event A given that event B occurred. The inverse problem
is to find the inverse probability which states the probability
of an earlier event given that a later one occurred.

Example: Probability of chosing brand X given it has

This is inverse or posterior probability.
Table below shows hypothetical disk crashes using a brand X drive
within one year
                          X          X’      Total of Rows
 Crash C                 0.6        0.1      0.7
 No Crash C’             0.2        0.1      0.3
 Total of Columns        0.8        0.2      1.0
P(C|X) = ?
P(C|X) = P(C  X) / P(X) = 0.6 / 0.8 = 0.75
P(C|X’) = P(C  X’) / P(X’) = 0.1 / 0.2 = 0.50

P(X|C) = ?
P(X|C) = P(C  X) / P(C) = 0.6 / 0.7 = 6/7
P(X|C) = P(C|X) P(X) / P(C) = 0.75 * 0.8 / 0.7
                               = 0.6 / 0.7
Suppose, statistics show that Brand X drive crashes with a probability
of 75% within one year and non-Brand X drive crash within one year is
50%. The inverse question is, what is the probability of a crashed drive
being brand X or non-brand X.
 Hypothetical Reasoning and Backward

• Bayesian decision making is used in
  PROSPECTOR to decide favorable sites for
  mineral exploration.
• Generally conditional probability is forward in
  time, while a posterior probability is backward in
• Example of Bayesian decision making under
                    Oil exploration

• If there is no evidence for or against we may guess that
  P(O) = P(O') = 0.5

• We may believe that the chances are better for finding
  P(O) = 0.6, P(O') = 0.4

• Assume the probabilities for the outcomes of seismic test
  for oil exploration as:
  P(+|O) = 0.8, P(-|O) = 0.2 (false -)
  P(+|O')= 0.1 (false +), P(-|O') = 0.9
Using above conditional (prior) probabilities we can
construct the initial probability tree .
    Advantages and Disadvantages of
           Bayesian Methods
• Bayesian methods have support of probability theory and
  have well defined semantics for decision making.

Disadvantages are
• They require significant amount of probability data to
  construct a knowledge base.
• If the probabilities are statistical, sample size must be
  sufficient. If they are provided by an expert then their
  comprehensiveness and consistency must be queried.
• Reducing associations between the hypotheses and the
  evidences to simple numerical values removes relevant
  information necessary for reasoning (explanation of how
  a conclusion is reached).
     Reasoning with Certainty Factors
During the development of MYCIN, researchers developed certainty
  factors formalism for the following reasons:

• The medical data lacks large quantities of data and/or the numerous
  approximations required by Bayes' theorem.

• There is a need to represent medical knowledge and heuristics
  explicitly, which can not be done when using probabilities.

• Physicians reason by capturing evidence that supports or denies a
  particular hypothesis.
    Certainty Factor (CF) Formalism
Eg of MYCIN rule
        IF the stain of the organism is gram pos
        AND the morphology of the organism is coccus
        AND the growth of the organism is chains
        THEN there is evidence that the organism is streptococcus CF

Given the evidence a doctor only partially believe the

• General Form
  IF E1 And E2 ….THEN H CF = Cfi
                       where E= evidence & H is the conclusion
    Certainty Factor (CF) Formalism
• A measure of belief, MB(h, e) indicates the degree to
  which our belief in hypothesis, h, is increased based
  on the presence of evidence, e
• A measure of disbelief, MD(h, e), indicates the
  degree to which our disbelief in hypothesis, h, is
  increased based on the presence of evidence, e.

     When     p(h | e) = 0 MB(h, e) = 0 MD(h, e) = 1

              p(h | e) = 1 MB(h, e) = 1 MD(h, e) = 0
Certainty Factor (CF) Formalism

                CF interpretation
    Certainty Factor (CF) Formalism

•   CF(h | e) = MB(h, e) - MD(h, e)
    -1 < CF < 1

•   When there is total belief
    – CF = 1, and
•   When there is a total disbelief in hypothesis
    – CF = -1
•   When there is no evidence to make judgment
    – CF = 0
     -1                     0                     1
     F range of disbelief       range of belief   T
    Certainty Factor (CF) Formalism

•   Composite CF can be calculated as follows:
    CFcomp(h, e) = MBcomp(h, e) - MDcomp(h, e)

•   For P1 and P2 premises of the rule,
    CF(P1 and P2)= MIN((CF(P1), CF(P2))
    CF(P1 or P2) = MAX ((CF(P1), CF(P2))
    Certainty Factor (CF) Formalism

For example consider a rule in a knowledge base:
• (P1 and P2) or P3 R1(.7) and R2(.3)

•   If CFs for P1, P2, and P3 are 0.6, 0.4, and 0.2,
    respectively then R1 and R2 may be anticipated with
    CFs 0.28 and 0.12 respectively.

    CF(P1(0.6) and P2(0.4)) = MIN(.6, .4) = 0.4
    CF((0.4) or P3(0.2)) = MAX (0.4, 0.2) = 0.4
    CF(R1) = .7 * .4 = .28
    CF(R2) = .3 * .4 = .12
      Certainty Factor (CF) Formalism

Two properties that are required of the combination
  operation are:
Commutative – The value should not depend on the
  order in which the rules are taken.
Asymptotic – The more evidence we have for the
  belief in a conclusion the higher should be the
  certainty factor, but if it is not absolutely certain,
  then it should remain below 1.
    Certainty Factor (CF) Formalism
Propagation of Certainty Factors

When there are two or more rules supporting the same
   conclusion CFs are propagated as follows:

CFrevised = CFold + CFnew(1 - CFold) if both CFold and
                                     CFnew > 0
          = CFold + CFnew(1 + CFold) if both CFold and
                                     CFnew < 0

        =                            otherwise
           Certainty Factor Example
In a murder trial the defendant is being accused of a first degree
murder (hypothesis).The jury must balance the evidences presented
by the prosecutor and the defense attorney to decide if the suspect is

RULE001 IFthe defendant's fingerprints are on the weapon,
        THEN the defendant is guilty. CF=0.75
RULE002 IFthe defendant has a motive,
        THEN the defendant is guilty. CF=0.60
RULE003 IFthe defendant has a alibi,
        THEN he is not guilty. CF=-.80
        Certainty Factor Example

We start with CF = 0.0 for the defendant being
• After submission of the evidence 1 (fingerprints
  on the weapon)
  CFcomb1 = CF rule1's conclusin * CF evid1
              = 0.75 * 0.90 = 0.675
  CF revised = CF old + CF new * (1 - CFold)
              = 0.0 + 0.675(1-0.0) = 0.675
   Example of CFs Propagation

     CFcon1=CFnew=0.675                     CFnew=0.675 Guilty
                                   CF = 0.0             CFrevised=0.675

                                               CFrevised=CFold + CFnew*(1-CFold)
                                                        =0.0 + 0.675*(1-0.0)
      on weapon

RULE 1. IF the defendant’s fingerprints are on the weapon
        THEN the defendant is guilty

CFcon1=CFevid1*CFrule1 (single premise rule)
         Certainty Factor Example

The defendant’s mother in law says that he had the motive
  for slaying
CFnew       = CFcomb2 = CF rule2's conclusin * CF evid2
            = 0.60 * 0.50 = 0.30

CF revised = CF old + CF new * (1 - CFold)
           = 0.675 + 0.30(1-0.675) = 0.7725
       CFcon2=CFnew=0.30            Guilty          CFnew=0.30 Guilty
                                    CFrevised=0.675             CFrevised=0.772

                                             CFrevised=CFold + CFnew*(1-CFold)
                                                      =0.675 + 0.30*(1-0.675)
       Motive exists

RULE 2. IF the defendant has a motive
        THEN the defendant is guilty of the crime

   CFcon2=CFevid2*CFrule2   (single premise rule)
          Certainty Factor Example
A respected judge witnesses for alibi, so a cf of 0.95 is
assigned for this evidence

CFcomb3       = CF rule3's conclusin * CF evid3
              = 0.95 * (-0.80) = -0.76

CFrevised =

              = (0.7725 - 0.76) / (1 - 0.76) = 0.052
      CFcon3=CFnew=-0.76       Guilty              CFnew=-0.76

                                                         CFold  CFnew
                                                    1  min(| CFold |, | CFnew | )
   Alibi found
   CFevid3=0.95                                    = (0.772-0.76)/(1-0.76)
   CFrule3= -0.80                                  = 0.052

 RULE 3. IF the defendant has an alibi
         THEN he is not guilty

        = -0.76
          Certainty Factor Example

Confidence Factor in guilty verdict after introduction of all
evidences is:
     Advantages of Certainty Factors
• It is a simple computational model that permits experts to
  estimate their confidence in conclusions being drawn.
• It permits the expression of belief and disbelief in each
  hypothesis, allowing the expression of the effect of
  multiple sources of evidence.
• It allows knowledge to be captured in a rule
  representation while allowing the quantification of
• The gathering of the CF values is significantly easier
  than the gathering of values for the other methods. No
  statistical base is required – you merely have to ask the
  expert for the values.
Deep Inference Chains

If we have a chain of inference such as:
   IF A      THEN B        CF=0.8
   IF B      THEN C        CF= 0.9

Then because of the multiplication of CFs the resulting CF
For example if       CF(A) = 0.8, then
                     CF(C) = .8*.8*.9 = .58
With long chain of inferences the final CF may become
  very small

Many Rules with same Conclusion

The more rules with the same conclusion the
  higher the CF value. If there are many rules then
  CF can become artificially high.
Conjunctive Rules
If a rule has a number of conjunctive premises, overall CF may be
    reduced too much.
IF sky dark AND temperature dropping
THEN will rain 0.9

If CF(sky dark) = 1,
    CF(temperature dropping) = .1 then
    CF(will rain) = min(1, .1)*.9 = .09 whereas if we had

IF the sky dark THEN will rain 0.7
IF temperature dropping THEN will rain 0.5

CF1 = 1 * .7 = 0.7,        CF2 = .1 * .5 = 0.05
CF (will rain) = .7 + .05*(1 - .7) = .7 + 0.015 = .715
                   Fuzzy Logic
In everyday speech we use vague or imprecise terms to
describe properties.

  Fuzzy logic was developed by Zadeh to deal with these
  imprecise values in a mathematical way.
                   Fuzzy Logic

• It will allow us to deal with fuzzy rules

   IF the temperature is cold
   THEN the motor speed stops

   IF speed is slow
   THEN make acceleration high.
                     Fuzzy Sets
• In ordinary set theory, an element from the domain is
  either in a set or not in a set.
• In fuzzy sets, a number in the range 0-1 is attached to an
  element – the degree to which the element belongs to
  the set.
• A value of 1 means the element is definitely in the set
• A value of 0 means the element is definitely not in the
• Other values are grades of membership.
• Formally a fuzzy set A from X is given by its membership
  function which has type
                         A : X  [0, 1]
Fuzzy Sets

    Fuzzy set of small men

    Small men – Simpler Curve
                  Fuzzy Sets

• The following figure shows the representation of
  three fuzzy sets for small, medium and tall men.
  We see that a man of height 4.8 feet is
  considered both small and medium to some
                   Boolean Operations
The Boolean operations of union, intersection, and complement can be
defined in the straightforward manner.

The operation is
                       A (x) = 1 - A (x)
               Boolean Operations

The intersection of two fuzzy sets A and B is given by
             AB (x) = min({A (x), B (x)})

The union of two fuzzy sets A and B is given by
            AB (x) = max({A (x), B (x)})
              Fuzzy Reasoning

In this section, fuzzy rules and how inference is
performed on these rules is presented.

This will be illustrated by a fuzzy system used to
control an air conditioner. The variables to be used
(with fuzzy values) are temperature (of the room)
and speed (of the fan motor).
                     Fuzzy Reasoning
The rules are given as follows:
• IF the temperature is cold
  THEN motor speed stops

• IF the temperature is cool
  THEN motor speed slows             Temperature Fuzzy Sets

• IF the temperature is just right
  THEN motor speed medium

• IF the temperature is warm
  THEN motor speed fast

• IF the temperature is hot
                                     Speed Fuzzy Sets
  THEN motor speed blast
               Fuzzy Reasoning

• In a fuzzy system all the rules fire in parallel,
  although in the end many will not contribute to
  the output.
• What we need to determine, in the above
  system is, given a particular value of the
  temperature how do we calculate the motor
                Fuzzy Reasoning

• Now, the temperature can be measured fairly accurately,
  but it will lie in several fuzzy sets. For example if the
  temperature were 17C then from the figure we see that it
  is about 25% cool and 80% just right.
                 Fuzzy Reasoning
• This means that rules 2 and 3 will contribute to the
  output speed of the motor.

• The fuzzy set for the output can be calculated by
  multiplying the slow graph by .25 and the medium graph
  by .80 assuming the contribution is proportional to the
  fuzzy values of the input temperature
                Fuzzy Reasoning

• One way to amalgamate two sets is to sum the values
  (with a maximum of 1).

           Amalgamated sets and average
                 Fuzzy Reasoning

• Other ways of amalgamation (e.g. taking maximum) are

• Now we need to determine the actual speed of the motor.
  This can be done by finding the average value of the
  curve – I.e. the position where the areas on either side of
  the perpendicular through this point are equal.

• Phil Grant: University of Wales Swansea

To top