Docstoc

Rules and Rewriting CLIPS

Document Sample
Rules and Rewriting CLIPS Powered By Docstoc
					                                                      Reverend Thomas Bayes
                                                           (1702-1761)


Bayesian Networks

1.   Probability theory
2.   BN as knowledge model
3.   Bayes in Court
4.   Dazzle examples
5.   Conclusions



            Jenneke IJzerman,
            Bayesiaanse Statistiek in de Rechtspraak,
            VU Amsterdam, September 2004.
            http://www.few.vu.nl/onderwijs/stage/werkstuk/werkstukken/werkstuk-ijzerman.doc


                  Expert Systems 8                                                            1
Thought Experiment: Hypothesis Selection

Imagine two types of bag:      Probability of this result from
• BagA: 250 + 750              • BagA: 0. 0144
• BagB: 750 + 250              • BagB: 0. 396
                               Conclusion: The bag is BagB.


                               But…
                               • We don’t know how the bag
                                 was selected
                               • We don’t even know that type
                                 BagB exists
Take 5 balls from a bag:
                               • Experiment is meaningful only
• Result: 4 + 1
                                 in light of the a priori posed
What is the type of the bag?     hypotheses (BagA, BagB) and
                                 their assumed likelihoods.


                 Expert Systems 8                                2
Classical and Bayesian statistics

Classical statistics:
• Compute the prob for your
  data, assuming a hypothesis
• Reject a hypothesis if the
  data becomes unlikely

Bayesian statistics:
• Compute the prob for a
  hypothesis, given your data
• Requires a priori prob for
  each hypothesis;
  these are extremely
  important!




                Expert Systems 8    3
Part I: Probability theory

What is a probability?
• Frequentist: relative               Blont         Not blond
  frequency of occurrence.              30             70
• Subjectivist: amount of belief

• Mathematician:
                                              Blond      Not
  Axioms (Kolmogorov),
                                                        blond
  assignment of non-negative
  numbers to a set of states,       Mother     15        15
  sum 1 (100%).                     blond

State has several variables:        Mother     15        55
  product space.                     n.b.
With n binary variables: 2n.

Multi-valued variables.


                 Expert Systems 8                               4
Conditional Probability: Using evidence

  Blond         Not blond    • First table:
                               Probability for any woman to
   30              70          deliver blond baby
                             • Second table:
          Blond     Not        Describes for blond and non-
                   blond       blond mothers separately
Mother     15        15      • Third table:
blond                          Describe only for blond mother
Mother     15        55
 n.b.                        Row is rescaled with its weight;
                             Def. conditional probability:
          Blond     Not        Pr(A|B) = Pr( A & B ) / Pr(B)
                   blond
                             Rewrite:
Mother     50        50
                               Pr(A & B) = Pr(B) x Pr(A | B)
blond


                  Expert Systems 8                              5
Dependence and Independence
                                             Blond    Not
• The prob for a blond child are                     blond
  30%, but larger for a blond
                                    Mother    15      15
  mother and smaller for a
                                    blond
  non-blond mother.
• The prob for a boy are 50%,       Mother    15      55
  also for blond mothers, and        n.b.
  also for non-blond mothers.
                                             Boy     Girl
Def.: A and B are independent:
  Pr(A|B) = Pr(A)                   Mother    15      15
                                    blond
Exercise: Show that                 Mother    35      35
   Pr(A|B) = Pr(A)                   n.b.
is equivalent to
   Pr(B|A) = Pr(B)                           Boy     Girl
(aka B and A are independent).
                                    Mother    50      50
                                    blond
                 Expert Systems 8                           6
   Bayes Rule: from data to hypothesis

                 4+1              Other       • Classical Probability Theory:
                                                0.0144 is the relative weight
     BagA        0.0144           0.986         of 4+1 in the ROW of BagA.
                                              • Bayesian Theory describes
     BagB        0.396            0.604         the distribution over the
                                                column of 4+1.
     Other
                                              Bayes’ Rule:
                                              • Observe that
                                                Pr(A & B) = Pr(A) x Pr(B|A)
                      Classical statistics:                = Pr(B) x Pr(A|B)
                       ROW distribution       • Conclude Bayes’ Rule:

                                                           P( B | A) P( A)
                                               P( A | B) 
Bayesian statistic:                                            P( B)
 COLUMN distr.


                          Expert Systems 8                                 7
Reasons for Dependence 1: Causality

• Dependency: P(B|A) ≠ P(B)       Alternative explanation:
• Positive Correlation: >            B causes A.
• Negative correlation: <         In the same example:
                                     P(party) = 50%
Possible explanation:                P(party | h.a.) = 83%
  A causes B.                        P(party | no h.a.) = 48%
Example: P(headache) = 6%
           P(ha | party) = 10%    “Headaches make students go
           P(ha | ¬party) = 2%      to parties.”


                                  In statistics, correlation has no
            h.a.     no h.a.
                                     direction.
   party     5         45
  no part    1         49



                   Expert Systems 8                             8
Reasons for Dependence 2: Common cause

1. The student party may lead        2. Table of headache and
   to headache and is costly            money:
   (money versus broke):
                                                  h.a.   no h.a.
           h.a.    no h.a.               money     3       67
mon-br
                                         broke     3       27
 party      5        45
           2-3      18-27            Pr(broke) = 30%
 no part    1        49              Pr(broke | h.a.) = 50%
           1-0      49-0
                                     3. Table of headache and
                                        money for party attendants:

                                                  h.a.    no h.a.
This dependency disappears if
the common cause variable is            money       2         18
known
                                         broke      3         27
                  Expert Systems 8                                 9
Reasons for Dependence 3: Common effect

A and B are independent:       Their combination stimulates C;
                                 for instances satisfying C:


  (#C)       A      non A                     A       non A
   B      40 (14)   40 (4)           B        14         4

 non B    10 (1)    10 (1)          non B     1          1


Pr(B) = 80%                    Pr(B) = 90%
Pr(B|A) = 80%                  Pr(B|A) = 93%, Pr(B|¬A)=80%
B and A are independent.

                               This dependency appears if the
                               common effect variable is known


                 Expert Systems 8                                10
Part II: Bayesian Networks                                     Pr    -
                                                               pa    50%
                                                         pa
•   Probabilistic Graphical Model
•   Probabilistic Network
•   Bayesian Network
•   Belief Network
                                           ha                            br
Consists of:
                                      Pr    pa      ¬pa       Pr    pa        ¬pa
• Variables (n)
• Domains (here binary)               ha 10% 2%               br    40% 0%
• Acyclic arc set, modeling the
  statistical influences
                                                A              B
• Per variable V (indegree k):
  Pr(V | E), for 2k cases of E.
                                                     C
Information in node:
   exponential in indegree.         Pr      A,B     A,¬B ¬A,B ¬A,¬B

                                    C       56%     10%       10%        10%
                   Expert Systems 8                                           11
 The Bayesian Network Model
Closed World Assumption
• Rule based:                            Direction of arcs and correlation
     IF x attends party
     THEN x has headache                            Pr    -
                                              ha
     WITH cf = .10                                  ha    6%
  What if x didn’t attend?
• Bayesian model:
          Pr    -                                        pa
                              pa
          pa    50%
                                               Pr   ha        ¬ha
                                               pa   83% 48%

           ha       Pr   pa        ¬pa
                    ha 10% 2%              1. BN does not necessarily
                                              model causality
 Pr(ha|¬pa) is included: claim             2. Built upon HE understanding
 all relevant info is modeled                 of relationships; often causal
                     Expert Systems 8                                     12
A little theorem

• A Bayesian network on n binary variables
  uniquely defines a probability distribution
  over the associated set of 2n states.

• Distribution has 2n parameters
  (numbers in [0..1] with sum 1).
• Typical network has in-degree 2 to 3:
  represented by 4n to 8n parameters (PIGLET!!).

• Bayesian Networks are an efficient representation




                Expert Systems 8                      13
The Utrecht DSS group

• Initiated by Prof Linda van der Gaag from ~1990
• Focus: development of BN support tools
• Use experience from building several actual BNs
• Medical
  applications
• Oesoca,
  ~40 nodes.

• Courses:
  Probabilistic
  Reasoning
• Network
  Algoritms
  (Ma ACS).


                  Expert Systems 8                  14
   How to obtain a BN model
Describe Human Expert knowledge:                Learn BN structure
  Metastatic Cancer may be detected by an       automatically from
  increased level of serum calcium (SC). The
                                                data by means of
  Brain Tumor (BT) may be seen on a CT scan
  (CT). Severe headaches (SH) are indicative    Data Mining
  for the presence of a brain tumor. Both a     • Research of Carsten
  Brain tumor and an increased level of serum   • Models not intuitive
  calcium may bring the patient in a coma       • Not considered XS
  (Co).
                                                • Helpful addition to
                                                   Knowledge Acquisition
       mc           bt          ct                 from Human Expert
                                                • Master ACS.


       sc           co          sh

Probabilities: Expert guess or statistical
  study


                     Expert Systems 8                             15
 Inference in Bayesian Networks

 The probability of a state                The marginal (overall)
   S = (v1, .. , vn):                        probability of each variable:
   Multiply Pr(vi | S)
                           Pr   -
                 pa                               Pr(pa) = 50%
                           pa   50%
                                                  Pr(ha) = 6%
                                                  Pr(br) = 20%

     ha                         br

Pr    pa     ¬pa      Pr   pa        ¬pa   Sampling: Produce a series of
                                             cases, distributed according
ha 10% 2%             br   40% 0%
                                             to the probability distribution
                                             implicit in the BN
Pr (pa, ¬ha, ¬br) = 0.50 * 0.90 * 0.60
                 = 0.27

                       Expert Systems 8                                  16
Consultation: Entering Evidence

Consultation applies the BN knowledge to a specific case
• Known variable values can be entered into the network
• Probability tables for all nodes are updated

• Obtain (sth
  like) new BN
  modeling the
  conditional
  distribution
• Again, show
  distributions
  and state
  probabilities

• Backward and
  Forward
  propagation Expert Systems 8                        17
Test Selection (Danielle)

• In consultation, enter data
  until goal variable is known
  with sufficient probability.
• Data items are obtained at
  specific cost.
• Data items influence the
  distribution of the goal.

Problem:
• Given the current state of
  the consultation, find out
  what is the best variable to
  test next.


                                 Started CS study 1996,
                                 PhD Thesis defense Oct 2005
                 Expert Systems 8                              18
Complexity of Network Design (Johan)

• Boolean formula can be coded into a BN
• SAT-problems reformulated as BN problems
• Monotonicity, Kth MPE, Inference

• Complete for
  PP^PP^NP

• Started PhD
  Oct 2005
• Graduated
  Oct 2009




                 Expert Systems 8            19
Some more work done in Linda’s DSS group

• Sensitivity Analysis:
  Numerical parameters in the BN may be inaccurate;
  how does this influence the consultation outcome?

• More efficient inferencing:
  Inferencing is costly, especially in the presence of
   • Cycles (NB.: There are no directed cycles!)
   • Nodes with a high in-degree
  Approximate reasoning, network decompositions, …

• Writing a program tool: Dazzle




                 Expert Systems 8                        20
Part III: In the Courtroom

What happens in a trial?
• Prosecutor and Defense
  collect information
• Judge decides if there is
  sufficient evidence that                                      P( B | A) P( A)
  person is guilty                                P( A | B) 
                                                                    P( B)


Forensic tests are far more
conclusive than medical ones
but still probabilistic in
nature!
  Pr(symptom|sick) = 80%
  Pr(trace|innocent) = 0.01%
                                   Jenneke IJzerman, Bayesiaanse
Tempting to forget statistics.     Statistiek in de Rechtspraak, VU
Need a priori probabilities.       Amsterdam, September 2004.

                Expert Systems 8                                       21
Prosecutor’s Fallacy

The story:                      The analysis
• A DNA sample was taken        • The prosecutor confuses
  from the crime site                Pr(inn | evid)        (a)
• Probability of a match of          Pr(evid | inn)        (b)
  samples of different people   • Forensic experts can only
  is 1 in 10,000                  shed light on (b)
• 20,000 inhabitants are        • The Judge must find (a);
  sampled                         a priori probabilities are
• John’s DNA matches the          needed!!     (Bayes)
  sample                        • Dangerous to convict on DNA
                                  samples alone
• Prosecutor: chances that
  John is innocent is 1 in
  10,000                        • Pr(innocent match) = 86%
• Judge convicts John           • Pr(1 such match) = 27%



                 Expert Systems 8                          22
Defender’s Fallacy

The story                           Implicit assumptions:
• Town has 100,001 people           • Offender is from town.
• We expect 11 to match             • Equal a priori probability for
  (1 guilty plus 10 innocent)         each inhabitant
• Probability that John is guilty
  is 9%.                            It is necessary to take other
                                    circumstances into account;
• John must be released             why was John prosecuted and
                                    what other evidence exists?

                                    Conclusions:
                                    • PF: it is necessary to take
                                      Bayes and a priori probs into
                                      account
                                    • DF: estimating the a prioris
                                      is crucial for the outcome

                  Expert Systems 8                                23
Experts’ and Judge’s task

IJzerman’s ideas about trial:       Is this realistic?
1. Forensic Expert may not           verslagen van deskundigen
                                    1. Avoid confusing Pr(G|E) and
   claim a priori or a posteriori       Pr(E|G), a good idea
                                     behelzende hun gevoelen
   probabilities (Dutch Penalty     2. A priori’s are extremely
                                     betreffende hetgeen hunne
   Code, 344-1.4)                       important; this almost pre-
                                     wetenschap hen leert omtrent
2. Judge must set a priori              determines the verdict
                                     datgene wat aan hun oordeel
3. Judge must compute a              onderworpen done? Bayesian
                                    3. How is this is
   posteriori, based on                 Network designed and
   statements of experts                controlled by Judge?

4. Judge must have explicit         4. No judge will obey a
   threshold of probability for        mathematical formula
   beyond reasonable doubt          5. Public agreement and
5. Threshold should be                 acceptance?
   explicitized in law.



                  Expert Systems 8                              24
Bayesian Alcoholism Test

• Driving under influence of alcohol leads to a penalty
• Administrative procedure may voiden licence

• Judge must decide if the subject is an alcohol addict;
  incidental or regular (harmful) drinking
• Psychiatrists advice the court by determining if drinking
  was incidental or regular

• Goal HHAU: Harmful and Hazardous Alcohol Use
• Probabilistically confirmed or denied by clinical tests
• Bayesian Alcoholism Test: developed 1999-2004 by A.
  Korzec, Amsterdam.




                Expert Systems 8                          25
Variables in Bayesian Alcoholism Test

Hidden variables:
• HHAU: alcoholism
• Liver disease

Observable causes:
• Hepatitis risk
• Social factors
• BMI, diabetes

Observable effects:
• Skin color
• Lab: blood, breadth
• Level of Response
• Smoking
• CAGE questionnaire



                Expert Systems 8        26
Knowledge Elicitation for BAT

Knowledge in the Network        How it was obtained
• Qualitative                   • Network structure??
  - What variables are relevant   IJzerman does not report
  - How do they interrelate       about this
                                • Probabilities
• Quantitative                    - Literature studies:
  - A priori probabilities          40% of probabilities
  - Conditional probabilities     - Expert opinions:
    for hidden diseases             60% of probabilities
  - Conditional probabilities
    for effects
  - Response of lab tests to
    hidden diseases




                Expert Systems 8                             27
  Consultation with BAT

  Enter evidence about subject:      The network will return:
  • Clinical signs:                  • Probability that Subject has
    skin, smoking, LRA;                HHAU
    CAGE.                            • Probabilities for liver disease
  • Lab results                        and diabetes
  • Social factors

                                     The responsible Human Medical
                                     Expert converts this probability
                                     to a YES/NO for the judge!
                                     (Interpretation phase)
Knowing what the CAGE is used
for may influence the answers that   HME may take other data into
the subject gives.                   account (rare disease).




                      Expert Systems 8                             28
Part IV: Bayes in the Field

The Dazzle program
• Tool for designing and analysing BN
• Mouse-click the network;
  fill in the probabilities
• Consult by evidence submission
• Read posterior probabilities

• Development 2004-2006
• Written in Haskell
• Arjen van IJzendoorn, Martijn Schrage

• www.cs.uu.nl/dazzle




               Expert Systems 8           29
Importance of a good model




In 1998, Donna Anthony        Prosecutor:
(31) was convicted for        The probability of two cot
murdering her two children.   deaths in one family is too
She was in prison for seven   small, unless the mother is
years but claimed her         guilty.
children died of cot death.

               Expert Systems 8                        30
The Evidence against Donna Anthony

• BN with priors eliminates
  Prosecutor’s Fallacy
• Enter the evidence:
  both children died
• A priori probability is very
  small (1 in 1,000,000)

• Dazzle establishes a
  97.6% probability of guilt

• Name of expert: Prof. Sir
  Roy Meadow (1933)
• His testimony brought a
  dozen mothers in prison in
  a decade

                Expert Systems 8     31
A More Refined Model

Allow for genetic or social circumstances
for which parent is not liable.




               Expert Systems 8             32
     The Evidence against Donna?

     Refined model: genetic
     defect is the most likely
     cause of repeated deaths

     Donna Anthony was
     released in 2005 after 7
     years in prison



6/2005: Struck from GMC register
7/2005: Appeal by Meadow
2/2006: Granted; otherwise experts
  refuse witnessing



                      Expert Systems 8   33
Classical Swine Fever, Petra Geenen

                                  • Swine Fever is a
                                    costly disease
                                  • Development
                                    2004/5
                                  • 42 vars, 80 arcs
                                  • 2454 Prs, but
                                    many are 0.

                                  • Pig/herd level
                                  • Prior extremely
                                    small
                                  • Probability
                                    elicitation with
                                    questionnaire



            Expert Systems 8                    34
Conclusions
• Mathematically sound model to reason with uncertainty
• Applicable to areas where knowledge is highly statistical

• Acquisition: Instead of classical IF a THEN b (WITH c),
  obtain both Pr(b|a) and Pr(b|¬a)
• More work but more powerful model
• One formalism allows both diagnostic and prognostic
  reasoning

• Danger: apparent exactness is deceiving
• Disadvantage: Lack of explanation facilities (research);
  Model is quite transparant, but consultations are not;
  Design questions have high complexity (> NPC).

• Increasing popularity, despite difficulty in building

                Expert Systems 8                          35

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:10/15/2011
language:
pages:35