# Introduction to Bayesian Networks - PowerPoint

Document Sample

```					                                         Université catholique de Louvain
Faculté des Sciences Appliquées - FSA

Laboratoire de Télécommunications et Télédétection (TELE)
Département d’Eléctricité (ELEC)

Introduction to Bayesian Networks

Bayesian Networks - Dynamic Bayesian Networks
Inference - Learning
OpenBayes

Kosta Gaitanis

Kosta Gaitanis - UCL Tele                Introduction to Bayesian Networks              1
Outline
    Bayesian Networks
    What is a Bayesian Network and why use them ?
    Inference
    Probabilistic calculations in practice
    Belief Propagation
    Junction Tree Construction
    Monte Carlo methods

    Learning Bayesian Networks
    Why learning ?
    Basic learning techniques

    Software Packages
    OpenBayes

Kosta Gaitanis - UCL Tele             Introduction to Bayesian Networks   2
Bayesian Networks

Formal Definition of BNs
Introduction to probabilistic calculations
Where do Bayes Nets come from ?

    Common problems in real life :
    Complexity
    Uncertainty

Uncertainty               Complexity --> Modularity
Consistency of the model          Appealing Interface
Learning               General Purpose Algorithms

Probability
Graphs
Theory

Bayesian
Networks

Kosta Gaitanis - UCL Tele                 Introduction to Bayesian Networks             4
What is a Bayes Net ?
Compact representation of joint probability
distributions via conditional independence
Family of Alarm
Qualitative part:                       Earthquake                 Burglary
E    B P(A | E,B)
Directed acyclic graph (DAG)                                                     e    b 0.9 0.1
 Nodes - random vars.
e    b 0.2 0.8
 Edges - direct influence              Radio                  Alarm              e   b 0.9 0.1
e   b 0.01 0.99

Call
Together:
Define a unique distribution in a
Quantitative part:
Set of conditional probability
factored form
distributions

Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks                                5
Figure from N. Friedman
Why are Bayes nets useful?

 Graph structure supports
       Modular representation of knowledge
       Local, distributed algorithms for inference and learning
       Intuitive (possibly causal) interpretation

          Factored representation may have
exponentially fewer parameters than full
joint P(X1,…,Xn) =>
       lower sample complexity (less data for learning)
       lower time complexity (less time for inference)

Kosta Gaitanis - UCL Tele        Introduction to Bayesian Networks        6
What can Bayes Nets be used for ?
    Posterior probabilities
    Probability of any event given any evidence
Explaining away effect

    Most probable explanation
    Scenario that explains evidence
Earthquake        Burglary

    Rational decision making
    Maximize expected utility                                 Radio           Alarm

    Value of Information
Call

Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks                                  7
Figure from N. Friedman
A real Bayes net: Alarm
Domain: Monitoring Intensive-Care Patients
    37 variables                                                                                      MINVOLSET

    509 parameters                    PULMEMBOLUS         INTUBATION               KINKEDTUBE         VENTMACH     DISCONNECT

PAP      SHUNT                        VENTLUNG                         VENITUBE

PRESS
MINOVL      FIO2     VENTALV

ANAPHYLAXIS                           PVSAT       ARTCO2

TPR                 SAO2          INSUFFANESTH          EXPCO2

HYPOVOLEMIA         LVFAILURE             CATECHOL

LVEDVOLUME        STROEVOLUME       HISTORY        ERRBLOWOUTPUT        HR    ERRCAUTER

CVP     PCWP         CO                                            HREKG        HRSAT
HRBP

BP

Kosta Gaitanis - UCL Tele    Introduction to Bayesian Networks                                                          8
Figure from N. Friedman
Formal Definition of a BN
    DAG :
Directed Acyclic Graph

    Nodes :
each node is a stochastic variable                             A       B

    Edges :
each edge represents a direct
influence between 2 variables

    CPTs :                                                D               C
Quantifies the dependency of two
variables  Pr(X|pa(X))
Eg : Pr(C|A,B), Pr(D|A)

    A priori distribution :                                           E
for each node with no parents
Eg : Pr(A) and Pr(B)

Kosta Gaitanis - UCL Tele          Introduction to Bayesian Networks               9
Arc Reversal - Bayes Rule
X1              X2                                              X1               X2

X3                                                           X3

p(x1, x2, x3) = p(x3 | x1) p(x2 | x1) p(x1)                          p(x1, x2, x3) = p(x3 | x2, x1) p(x2) p( x1)

is equivalent to                           is equivalent to
Markov Equivalence Class
X1              X2                                               X1               X2

X3                                                             X3

p(x1, x2, x3) = p(x3 | x1) p(x2 , x1)                          p(x1, x2, x3) = p(x3, x2 | x1) p( x1)

= p(x3 | x1) p(x1 | x2) p( x2)                                = p(x2 | x3, x1) p(x3 | x1) p( x1)
Kosta Gaitanis - UCL Tele             Introduction to Bayesian Networks                                 10
Conditional Independence Properties
    Formal Definition :
A node is conditionally independent (d-separated) of its
ancestors given its parents

    Bayes Ball Algorithm :
    Two variables (A and B) are conditionally independent if a ball can not
go from A to B
    Permitted movements :

Hidden Variable

Known Variable

Kosta Gaitanis - UCL Tele        Introduction to Bayesian Networks                     11
Continuous and discrete nodes

    Discrete stochastic variables are quantified using CPTs

    Continuous stochastic variables (eg. Gaussian) are quantified
using σ and μ

    Linear Gaussian Distributions : Pr(x) = N(mui,j + Σwk*xk, , σi,j)

    Any combination of discrete and continuous variables can be
used in the same BN

Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks            12
Inference

Basic Inference Rules
Belief Propagation
Junction Tree
Monte Carlo methods

Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks                13
Some Probabilities…
    Bayes Rule :            A A  B
B
,)   B )
|)
Pr( Pr(Pr(
B A
|)
A )
Pr(Pr(

    Independence A  B                       A  A
|B    )
Pr( ) Pr(
B  B
|A    )
Pr( ) Pr(
iff :                                   Pr(  APr(

 A ) Pr( B
,B    )  )


A , A B |
,, D ) ,, A
BC     Pr(
Pr()Pr(C )D
  
A B Pr( )
)Pr( , | B
A D
Pr(| ) C ,A
    Chain Rule :

...

A B Pr(Pr( )
)
Pr(| ) C ) D,
Pr( | , 
A  A    A
B
B |, C

A) A)
,
Pr( Pr(B
Pr(  , )
A) Pr( b
AB
    Marginalisation :                   b

Kosta Gaitanis - UCL Tele    Introduction to Bayesian Networks   14
A small example of calculations

R 
a )
,
WG
Pr( b

R 
aWG
| 
)
Pr( b
WG
Pr()
b
Rain    Wet Grass                  Pr(Rain | WetGrass)
Rain        Pr(Rain)                                            F          F                             0.91
T             0.5         bWGb a)
WG  b 
Pr( Pr( R F
a)
,R              |
R        Pr(   )            T                               0.0
F             0.5                                              T          F                             0.09
Rain Wet Grass Pr(WetGrass, Rain)
T         T                               1.0
F       F           0.50
F       T           0.00
T       F           0.05
/
Rain              X  T       T           0.45                Rain

Rain &
WetGrass

Wet                                                                    Wet
Grass                                                                  Grass
Marginalise
Rain       Wet Grass    Pr(WetGrass|Rain)
WetGrass   Pr(WetGrass)
F             F              1.0
T           0.45
F            T               0.0
F           0.55
T             F              0.1
T            T               0.9                                       Pr(  a b
a Pr( 
)
WGWG R
,  )
b

Kosta Gaitanis - UCL Tele               Introduction to Bayesian Networks                                      15
Another example : Water-Sprinkler

Time needed for calculations
Using Bayes chain rule :
C ) ) R |,  |,S
R  C R

,S Pr(Pr( R
Pr( C|) S) W
,
,W Pr( C C
Pr( ,)                                                    2 x 4 x 8 x 16 = 1024
Using conditional independency properties :
R  C C |,
,W Pr( 
,   
Pr( C|) SPr(
,S Pr(Pr(R
C ) ) R |) W)
S                                                          2 x 4 x 4 x 8 = 256

Kosta Gaitanis - UCL Tele     Introduction to Bayesian Networks                             16
Inference in a BN

    If the grass is wet, there are 2 possible explanations :
rain or sprinkler
    Which is the more likely?

 0
C   
 .
Pr(,, )
S)Pr(
T c , T278
W S
T R WT
,

T
S)
W

|T
Pr(   ,
r
. Sprinkler
43
0

T
Pr( T .
WPr(
)    0
)
W 647

R 0
 c , T458
 ST
, )Pr(
W R .
Pr(,, )
T ,C
T   W
 

)
W
|T
T
R
Pr(   s

. Rain
70
0

T
Pr( T .
WPr(
)   0
W) 647

The grass is more likely to be wet because of the rain

Kosta Gaitanis - UCL Tele        Introduction to Bayesian Networks      17
Inference in a BN (2)

    Bottom-Up :
    From effects to causes  diagnostic
    Eg. Expert systems, Pattern Recognition,…

    Top-Down :
    From causes to effects  reasoning
    Eg. Generative models, planning,…

    Explain Away :
    Sprinkler and rain “compete” to explain the fact that the grass is
wet
 they are conditionally dependent when their common child
(wet grass) is observed

Kosta Gaitanis - UCL Tele      Introduction to Bayesian Networks               18
Belief Propagation                                              The algorithm’s purpose is :
“… fusing and propagating the
impact of new evidence and
Aka Pearl’s algorithm, sum-product algorithm                       beliefs through Bayesian
networks so that each
proposition eventually will be
   2 pass : Collect and Distribute                                assigned a certainty measure
consistent with the axioms of
   Only works for Poly-trees                                      probability theory.” (Pearl,
1988, p 143)

Collect Evidence                                  Distribute Evidence

root                                             root

Kosta Gaitanis - UCL Tele          Introduction to Bayesian Networks          Figure from P. Green
19
“The impact of each new piece of evidence is
Propagation                              viewed as a perturbation that propagates through
the network via message-passing between

Example                                  neighboring variables . . .” (Pearl, 1988, p 143`

Data
Data

     The example above requires five time periods to reach
equilibrium after the introduction of data (Pearl, 1988, p 174)
Kosta Gaitanis - UCL Tele    Introduction to Bayesian Networks                        20
Singly Connected Networks
(or Polytrees)
Definition : A directed acyclic graph (DAG) in which only one
semipath (sequence of connected nodes ignoring direction
of the arcs) exists between any two nodes.

Multiple parents
and/or
multiple children

Do not
satisfy
definition
Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks                21
Inference in general graphs
    BP is only guaranteed to be correct for trees

    A general graph should be converted to a junction tree, by
clustering nodes

    Computational complexity is exponential in size of the resulting
clusters
 Problem : Find an optimal Junction Tree (NP-hard)

Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks           22
Converting to a Junction Tree

Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks   23
Approximate inference
    Why?
   to avoid exponential complexity of exact inference in discrete
loopy graphs
   Because we cannot compute messages in closed form (even
for trees) in the non-linear/non-Gaussian case

    How?
   Deterministic approximations: loopy BP, mean field,
structured variational, etc
   Stochastic approximations: MCMC (Gibbs sampling),
likelihood weighting, particle filtering, etc

- Algorithms make different speed/accuracy tradeoffs
- Should provide the user with a choice of algorithms

Kosta Gaitanis - UCL Tele      Introduction to Bayesian Networks             24
Markov Chain Monte Carlo methods
   Principle :
    Create a topological sort of the BN
    For i=1:N
    For v in topological_sort
    Sample v from Pr(v|Pa(v)=si,pa(v))
where si,pa(v) are the sampled values for Pa(V)
    Pr(v) = Σsi,v / N

Kosta Gaitanis - UCL Tele         Introduction to Bayesian Networks   25
MCMC with importance sampling
   For i=1:N
    For v in topological_sort
    If v is not observed:
        Sample v from Pr(v|Pa(v)=si,pa(v))
where si,pa(v) are the sampled values for Pa(V)
        Weighti *= 1
    If v is observed:
        si,v = obs
        Weighti *= Pr(v=obs|Pa(v)=si,pa(v))

    Pr(v) = Σsi,v * weighti / N

Kosta Gaitanis - UCL Tele                 Introduction to Bayesian Networks   26
References
    A Brief Introduction to Graphical Models and Bayesian Networks
(Kevin Murph, 1998)
    http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

    Artificial Intelligence I (Dr. Dennis Bahler)
    http://www.csc.ncsu.edu/faculty/bahler/courses/csc520f02/bayes1.html

    Nir Friedman
    http://www.cs.huji.ac.il/~nir/

    Judea Pearl, Causality (on-line book)
    http://bayes.cs.ucla.edu/BOOK-2K/index.html

    Introduction to Bayesian Networks
    A tutorial for the 66th MORS symposium
    Dennis M. Buede, Joseph A. Tatmam, Terry A. Bresnick

Kosta Gaitanis - UCL Tele           Introduction to Bayesian Networks            27
Learning Bayesian Networks

Why Learning ?
Basic Learning techniques
Learning Bayesian Networks

    Process :
    Input: dataset and prior information
    Output: Bayesian Network

    Prior Information :
    A Bayesian Network (or fragments of it…)
    Dependency between variables
    Prior probabilities

Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks   29
The Learning Problem

Known Structure                      Unknown Structure
Complete Data               Statistical parametric                Discrete optimization
estimation                         over structures
(closed-form eq.)                     (discrete search)

Incomplete Data                  Parametric                            Combined
optimization                    (Structural EM, mixture
descent,…)

Kosta Gaitanis - UCL Tele    Introduction to Bayesian Networks                       30
Example : Binomial Experiment

    When tossed, it can land in one of two positions:
    We denote θ the (unknown) probability P(H)
Given a sequence of toss samples D=x[1],x[2],…,x[M], we
want to estimate the probabilities P(H)= θ and P(T)=1- θ

Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks   31
The Likelihood Function
   How good is a particular θ?
It depends on how likely it is to generate the
( DD
L | ( |
data (
observed :)P ) P])               
x[m                        
m

   Thus, the likelihood for the sequence
H,T,T,H,H is :

( )  
 1)1) 
LD (  
:  (  

Kosta Gaitanis - UCL Tele       Introduction to Bayesian Networks       32
Sufficient Statistics

To compute the likelihood in the thumbtack example,
we only require NH and NT
  (  
L: )  1 )
( D               N
H                   N
T

NH and NT are sufficient statistics for the binomial
distribution

A  sufficient statistic is a function that summarizes,
from the data, the relevant information for the
likelihood :
    If s(D)=s(D’), then L(θ|D)=L(θ |D’)
Kosta Gaitanis - UCL Tele     Introduction to Bayesian Networks       33
Maximum Likelihood Estimation

    MLE principle :
Learn parameters that maximize the
likelihood function

    In our example we get :                        ˆ N
    H
N N
H   T

which is what would one except…

Kosta Gaitanis - UCL Tele    Introduction to Bayesian Networks                 34
More on Learning
    More than 2 possible values :
    Same principle but more complex equations, multiple maxima, θi ,…

    Dirichlet Priors :
    Add our knowledge of the system to the training data in form of
“imaginary” counts
    Avoid never observed distributions and augment confidence because we
have a bigger sample size

Kosta Gaitanis - UCL Tele       Introduction to Bayesian Networks             35
More on Learning (2)

        Missing Data :
       Estimate missing data using bayesian inference
       Multiple maxima in likelihood function  gradient
descent

       Complicative issue :
   The fact that a value is missing, might be indicative of
its value
The patient did not undergo X-Ray since she complained about fever

Kosta Gaitanis - UCL Tele         Introduction to Bayesian Networks              36
Expectation Maximization Algorithm
   While not_converged
    For s in samples:
    Calculate Pr(x|s)
    Calculate ML estimator using Pr(x|s) as a weight
    Replace parameters

Kosta Gaitanis - UCL Tele         Introduction to Bayesian Networks   37
Structure Learning
   Bayesian Information Criterion (BIC) :

   Find the graph with the highest BIC score
   Greedy Structure Learning :
    Start from a given graph
    Choose the neighbouring network with the highest
score
    Start again
Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks   38
References

    Learning Bayesian Networks from Data (Nir
Friedman, Moises Goldszmidt)
    http://www.cs.berkeley.edu/~nir/Tutorial

    A Tutorial on Learning With Bayesian
Networks (David Heckerman, November 1996)
    Technical Report, MSR-TR-95-06

Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks   39
Software Packages

OpenBayes for Python
www.openbayes.org
BayesNet for Python

    OpenSource project for performing inference
on static Bayes Nets using Python

    Python is a high-level programming language
    Easy to learn
    Easy to use
    Fast to write programs
    Not as fast as C (about 5 times slower), but C routines can be
called very easily

Kosta Gaitanis - UCL Tele     Introduction to Bayesian Networks            41
Using OpenBayes
   Create a network
   Use MCMC for inference
   Use JunctionTree for inference
   Learn the parameters from complete data
   Learn the parameters from incomplete data
   Learn the structure
   www.openbayes.org

Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks   42
Rhododendron
   Predict the probability
of existence in other
regions of the world
   Variables:
    Temperature
    Pluviometry
    Altitude
    Slope

Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks   43
Other Software Packages

By Kevin Murphy
(Commercial and free software)

http://www.cs.ubc.ca/~murphyk/Software/BNT/bnsoft.html

Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks   44
Thank you for your attention !

Kosta Gaitanis - UCL Tele   Introduction to Bayesian Networks   45

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 205 posted: 11/21/2008 language: English pages: 45
gregoria