# modeling-decision

Document Sample

```					Modeling Decision

Nur Aini Masruroh
Outline
 Introduction
 Probabilistic thinking
 Decision tree
 Introduction to Bayesian Network and Influence Diagram
Introduction
 Why are decisions hard to make?
1.   Complexity
   There are many alternatives or possible solutions
   There are many factors to be considered and many of these factors are
interdependent
2.   Uncertainty
   The possible future outcomes are uncertain or difficult to predict
   Information may be vague, incomplete, or unavailable
3.   Multiple conflicting objectives
   The decision maker(s) may have many goals and objectives
   Many of these goals or objectives may be conflicting in nature
Good decision versus good outcome
Good decision

outcome                    outcome

 Good decision is not guarantee good outcome – it only enhances
the chance
Probabilistic thinking
 Event is a distinction about some states of the world
 Example:
 Whether the next person entering the room is a beer drinker
 Whether it will be raining tonight, etc

 When we identify an event, we have in mind what we meant.
But will other people know precisely what you mean?
 Even you may not have precise definition of what you have in
mind
 To avoid ambiguity, every event should pass the clarity test
 Clarity test: to ensure that we are absolutely clear and precise about the
definition of every event we are dealing with in a decision problem
Possibility tree
 Single event tree
 Example: event “the next person entering this room is a businessman”
 Suppose B represents a businessman and B’ otherwise,
Possibility tree
 Two-event trees
 Simultaneously consider several events
 Example: event “the next person entering this room is a businessman”
and event “the next person entering this room is a graduate” can be
jointly considered
Reversing the order of events in a tree
 In the previous example, we have considered the distinctions in
 The same information can be expressed with the events in the
reverse order, i.e., G to B.
Multiple event trees
gender.
Assigning probabilities to events
 To assign probabilities, it depends on our state of information
 Example: information relevant to assessment of the likelihood that
the next person entering the room is a businessman might include
the followings:
 There is an alumni meeting outside the room and most of them are
 You have made arrangement to meet a friend here and she to your
knowledge is not a businessman. She is going to show up any moment.
 Etc
 After considering all relevant background information, we assign
the likelihood that the next person entering the room is a
businessman by assigning a probability value to each of the
possibilities or outcomes
Marginal and conditional probabilities
 In general, given information about the outcome of some events,
we may revise our probabilities of other events
 We do this through the use of conditional probabilities
 The probability of an event X given specific outcomes of another
event Y is called the conditional probability X given Y
 The conditional probability of event X given event Y and other
background information ξ, is denoted by p(X|Y, ξ) and is given by

p( X  Y |  )
p( X | Y ,  )                     for p(Y |  )  0
p(Y |  )
Factorization rule for joint probability
Changing the order of conditioning
 Suppose in the previous tree we have

There is no reason why we should always conditioned G on B. suppose we want to
draw the tree in the order G to B

Need to flip the
tree!
Flipping the tree
 Graphical approach
 Change the ordering of the underlying possibility tree
 Transfer the elemental (joint) probabilities from the original tree to the new
tree
 Compute the marginal probability for the first variable in the new tree, i.e.,
G. We add the elemental probabilities that are related to G1 and G2
respectively.
 Compute conditional probabilities for B given G

 Bayes’ theorem
 Doing the above tree flipping is already applying Bayes’theorem
Bayes’ Theorem
 Given two uncertain events X and Y. Suppose the probabilities
p(X|ξ) and p(Y|X, ξ) are known, then

p ( X |  ) p (Y | X ,  )
p( X | Y ,  ) 
p (Y |  )
where
p (Y |  )   p ( X |  ) p (Y | X |  )
X
Application of conditional probability
Direct conditioning: Relevance of smoking to lung cancer
 Suppose:
S: A person is a heavy smoker which is defined as having smoked at least two
packs of cigarettes per day for a period of at least 10 years during a lifetime
L: A person has lung cancer according to standard medical definition

 A doctor not associated with lung cancer treatment assigned the following
probabilities:
Relevance of smoking to lung cancer
(cont’d)
 A lung cancer specialist remarked: “The probability p(L1|S1, ξ) = 0.1 is too low”
 When asked to explain why, he said:
“Because in all these years as a lung cancer specialist, whenever I visited my lung cancer
ward, it is always full of smokers.”
 What’s wrong with the above statement?
 The answer can be found by flipping the tree:
Relevance of smoking to lung cancer
(cont’d)
 What the specialist referred to as “high” is actually the probability
of a person being a smoker given that he has lung cancer, i.e.,
p(S1|L1, ξ) = 0.769 is exactly what he was referring to.
 He has confused p(S1|L1, ξ) with p(L1|S1, ξ)
 Notice that p(L1|S1, ξ) << p(S1|L1, ξ)
 Hence even highly a trained professional can fall victim to wrong
reasoning
Expected value criterion
 Suppose you face a situation where you must choose between
alternatives A and B as follows:
 Alternative A: \$10,000 for sure.
 Alternative B: 70% chance of receiving \$18,000 and 30% chance of loosing
\$4,000.
 Compare now Alternative B with:
 Alternative C: 70% chance of winning \$24,600 and 30% chance of loosing
\$19,400
 Note that EMV(B) = EMV(C), but are they “equivalent”?
 Alternative C seems to be “more risky” than Alternative B even
thought they have the same EMV.
 Conclusion: EMV does not take Risk into account
 In 1713 Nicolas Bernoulli suggested playing the following games:
 An unbiased coin is tossed until it lands with Tails
 The player is paid \$2 if tails comes up the opening toss, \$4 if tails first
appears on the second toss, \$8 if tails appears on third toss, \$16 if tails
appears on the forth toss, and so forth
 What is the maximum you would pay to play the above game?
 If we follow the EMV criterion:
       k
1     1       1       1
EMV     (\$2)   (\$2)   (\$4)   (\$8)  ...  
k

k 1  2    2       4       8

 This means that you should be willing to pay up to an infinite amount of money
to play the game, but why people are unwilling to pay more than a few dollars?
 25 years later, Nicolas’s cousin, Daniel Bernoulli, arrived at a solution that
contained the first seeds of contemporary decision theory
 Daniel reasoned that the marginal increase the value or “utility” of money
declines with the amount already possessed.
 A gain of \$1,000 is more significant to a poor person than to a rich man
through both gain same amount
 Specifically, Daniel Bernoulli argued that the value or utility of money should
exhibit some form of diminishing marginal return with increase in wealth:

The measure to use to value the game is
then the “expected utility”
k

 

1
EU     u 2k
k 1  2 

 u is an increasing concave function,
converge to a finite number
The rules of actional thought
 How a person should acts or decides rationally under uncertainty?
 Answer: by following the following rules or axioms:
 The ordering rule
 The equivalence or continuity rule
 The substitution or independence rule
 Decomposition rule
 The choice rule

 The above five rules form the axioms for Decision Theory
The ordering rule
 The decision maker must be able to state his preference among the prospects,
outcomes, or prizes of any deal
 Furthermore, the transitivity property must be satisfied: that is, if he prefers X
to Y, and Y to Z, then he must prefer X to Z
 Mathematically,
 The ordering rule implies that the decision maker can provide a complete
preference ordering of all the outcomes from the best to the worst
 Suppose a person does not follow the transitivity property: the money pump
argument
The equivalence or continuity rule
 Given a prospect A, B, and C such that A  B  C   , then there
exists p where 0 < p < 1 such that the decision maker will be
indifferent between receiving the prospect B for sure and receiving
a deal with a probability p for prospect A and a probability of 1 – p
for prospect C

 Given that A  B  C
 B: certain equivalent of the uncertain deal on the right
 p: preference probability of prospect B with respect to prospects A and C
The substitution rule
 We can always substitute a deal with its certainty equivalent without affecting
preference
 For example, suppose the decision maker is indifferent between B and the A – C
deal below

 Then he must be indifferent between the two deals below where B is
substituted for the A – C deal
The decomposition rule
 We can reduce compound deals to simple ones using the rules of probabilities
 For example, a decision maker should be indifferent between the following two
deals:
The choice or monotonicity rule
 Suppose that a decision maker can choose between two deals L1 and L2 as
follows:

 If the decision maker prefers A to B, then he must prefer L1 to L2 if and only if
p1 > p2. That is, if   A B

 In other words, the decision maker must prefer the deal that offers the greater
chance of receiving the better outcome
Maximum expected utility principle
 Let a decision maker faces the choice between two uncertain deals or lotteries
L1 and L2 with outcomes A1, A2, …, An as follows:

 There is no loss of generality in assuming that L1 and L2 have the same set of
outcomes A1, A2, …, An because we can always assign zero probability to those
outcomes that do not exist in either L1 and L2.
 It’s not clear whether L1 or L2 is preferred
 By ordering rule, let A  A  ...  A
1      2           n
Maximum expected utility principle
 Again, there is no loss of generality as we can always renumber the subscripts
according to the preference ordering
 We note that A1 is the most preferred outcome, while An is the least preferred
outcome
 By equivalent rule, for each outcome Ai (i =1, …, n) there is a number ui such
that 0 ≤ ui ≤ 1 and

 Note that u1 = 1 and un = 0. Why?
Maximum expected utility principle
 By the substitution rule, we replace each Ai (i=1,…,n) in L1 and L2 with the
above constructed equivalent lotteries
Maximum expected utility principle
 By the decomposition rule, L1 and L2 may be reduced to
equivalent deals with only two outcomes (A1 and An) each having
different probabilities
 Finally, by the choice rule, since A1  An , the decision maker
should prefer lottery L1 to lottery L2 if and only if

n              n

u p  u q
i 1
i   i
i 1
i i
Utilities and utility functions
 We define the quantity ui (i=1,…,n) as the utility of outcome Ai
and the function that returns the values ui given Ai as a utility
function, i.e. u(Ai) = ui
 The quantities
n               n

 p u ( A ) and  q u ( A )
i 1
i   i
i 1
i   i

are known as the expected utilities for lotteries L1 and L2
respectively
 Hence the decision maker must prefer the lottery with a higher
expected utility
Case for more than 2 alternatives
 The previous may be generalized to the case when a decision maker is
faced with more than two uncertain alternatives. He should choose the
one with maximum expected utility
 Hence
n
best alternativ e  arg Max  pij u ( Ai )
j
i 1

where pi j is the probability for the outcome Ai in the alternative j
Comparing expected utility criterion with expected
monetary value criterion
 The expected utility criterion takes into account both return and
risk whereas expected monetary value criterion does not consider
risk
 The alternative with the maximum expected utility is the best
taking into account the trade off between return and risk
 The best preference trade-off depends on a person’s risk attitude
 Different types of utility function represent different attitudes and
degree of aversion to risk taking
Decision tree
 Consider the following party problem:

Problem: decide party
location to maximize total
satisfaction

Note:
Decision is represented by
square
Uncertainties are
represented by circles
Preference
 Suppose we have the following preference

Note:
Best case: O – S  set 1
Worst case: O – R  set 0
Other outcomes set the preference
relative to these two values
Assigning probability to the decision
tree
 Suppose we believe that the probability it will rain is 0.6,
Applying substitution rule
Using utility values
 We may interpret preference probability as utility values,
Introduction to Bayesian Network and
Influence Diagram
“A good representation is the key to good
problem solving”
Probabilistic modeling using BN

 Suppose we have the following problem (represented in decision tree):

 Can be represented using Bayesian Network (BN):

Conditional
Probability Table
(CPT) is
embedded in
each arch
Probabilistic modeling using BN

 The network can be extended …

Can you imagine the
size of decision tree for
these?
Bayesian Network: definition
 Also called relevance diagrams, probabilistic network, causal
network, causal graph, etc.
 BN represents the probabilistic relations between uncertain
variables
 It is a directed acyclic graph; the nodes in the graph indicate the
variables of concern, while the arcs between nodes indicate the
probabilistic relations among the nodes
 In each node, we store a conditional probability distribution of
the variable represented by that node, conditioned on the
outcomes of all the uncertain variables that are parents of that
node
Two layers of representation of
knowledge
 Qualitative level
 Graphical structure represents the probabilistic dependence
or relevance between variables
 Quantitative level
 Conditional probabilities represent the local “strength” of the
dependence relationship
Where do the numbers in a BN come
from?
 Direct assessment by domain experts
 Learn from sufficient amount of data using:
 Statistical estimation methods
 Machine learning and data mining algorithms
 Output from other mathematical models
 Simulation models
 Stochastic models
 Systems dynamics models
 Etc
 Combination of the above
 Expert assess the graphical structure and learning algorithms or other models fill in
the number
 Learn both structure and numbers and let the experts fine-tune the results
Properties of BN
 Presence of an arc indicates possible relevance
 Arc reversal:

 If we are interested to know the probability that he is a smoker if a specific
person has lung cancer …

 The operation will compute and replace the probabilities at the two nodes
 An arc can be drawn in any direction
Arc reversal operation
 Suppose initially we have,

 Then we want,

 The probability distribution p(Y) and p(X|Y) for the new network can be
computed using Bayes’ Theorem as follows:

p(Y )   p( X ) p(Y | X ) Original network
X

p( X ) p(Y | X )
p( X | Y ) 
p(Y )
Arc reversal: example

 Note: in arc reversal, sometimes we should add arc(s) to preserve
the Bayes’ Theorem. However, if possible, avoid arc reversal that
will introduce additional arcs as that implies loss of conditional
independence information
If an arc can be drawn in any direction, which shall
I use?
 During the network construction, draw arcs in the directions in
which you know the conditional probabilities or you know that
there are data which you can used to determined these values
later. Arcs drawn in these directions are said to be in assessment
order.
 During inference, if he arcs are not in the desired directions,
reverse them. Arcs in directions required for inference are said to
be in inference order.
 Example:
 The network with the arc from “smoking” to “lung cancer” is in assessment
order
 The network with the arc from “lung cancer” to “smoking” is in inference
order
BN represents joint probability distribution
 BN can help simplifying the JPD
 Consider the following BN
Without constructing BN first
…

p(A,B,C,D,E,F)=                        p(A,B,C,D,E,F)=
p(A)p(B|A)p(C)p(D|B,C)p(E|B)p(F|B,E)   p(A)p(B|A)p(C|A,B)p(D|A,B,C)p(E|A,B,C,D
)p(F|A,B,C,D,E)
Example of BN:
car starting system
Example of BN:
cause of dyspnea
Example of BN:
ink jet printer trouble shooting
Example of BN:
patient monitoring in an ICU (alarm project)
Decision modeling using Influence
Diagram
 BN represents probabilistic relationship among uncertain
variables
 They are useful for pure probabilistic reasoning and
inferences
 BN can be extended to Influence Diagram (ID) to
represent decision problem by adding decision nodes and
value nodes
 This is analogues to extending a probability tree to a
values or utilities to the end points of the tree
Decision node
 Decision variable: variable within the control of the decision
maker
 Represented by rectangular node in an ID
 In each decision node, we store a list of possible alternatives
associated with the decision variable
Arcs
 Information arcs: arc from chance node into decision node

 Influence arcs: arcs from decision node to chance node
Arcs (cont’d)
 Chronological arcs:
 Arc from one decision node to another decision node indicates
the chronological order in which the decisions are being
carried out
Value node and value arc
 Used to represent the utility or value function of the decision maker
 Denoted by a diamond
 Value node must be a sink node, i.e. it has only incoming arcs (known as value
arcs) but no outgoing arc
 Value arcs indicate the variables whose outcomes the decision maker cares
about or have impact on his utility
 Only one value node is allowed in a standard ID
Deterministic node
 Special type of chance node
 Represent the variable whose outcomes are deterministic (i.e. has
probability = 1), once the outcomes of other conditioning nodes
are known
 Denoted by a double-oval
ID vs decision tree
No   Influence Diagram                                      Decision tree

1    Compact                                                Combinatory
The size of an ID is equal to the total number of      The size of decision tree grows exponentially with the total
variables                                              number of variables. A binary tree with n nodes has 2n leaf
nodes
2    Graphical representation of independence               Numerical representation of independence
Conditional independence relations among the           Conditional independence relations among the variables
variables are represented by the graphical structure   can only determined through numerical computation using
of network. No numerical computations needed to        the numerical computation using the probabilities
determine conditional independence relations
3    Non-directional                                        Unidirectional
The nodes and arcs of an ID may be added or            A decision tree can only be built in the direction from the
deleted in any order. This makes the modeling          root to the leaf nodes. The exact sequence of the nodes or
process flexible                                       events must be known in advance
4    Symmetric model only                                   Asymmetric model possible
The outcomes of all nodes must be conditioned on       The outcomes of some nodes may be omitted for certain
all outcomes of its parents. This implies that the     outcomes of its parent leading to a asymmetrical tree
equivalent tree must be symmetrical
Example 1
Example 2
Example 3
Decision model : example 1

The party problem

Basic risky decision problem
Decision model : example 2

Decision problem with imperfect information
Decision model : example 3
Production/sale problem
Decision model : example 4
Maintenance decision for space shuttle tiles
Decision model : example 5
Basic model for electricity
generation investment evaluation
Evaluating ID
 To find the optimal decision policy of a problem represented by an
ID
 Methods:
 Convert ID into an equivalent decision tree and perform tree
roll back
 Perform operations directly on the network to obtain the
optimal decision policy. First algorithm is that of Shachter
(1986)
 Clemen, R.T. and Reilly, T. (2001). Making Hard
Decisions with Decision Tools. California: Duxbury
Thomson Learning
 Howard, R.A, (1988). Decision Analysis: Practice and
Promise. Management Science, 34(6), pp. 679 – 695.
 Russell, S. and Norvig, P. (2003). Artificial intelligent: A
modern approach, 2 ed. Prentice-Hall, Inc.
 Shachter, R.D., 1986, Evaluating Influence Diagrams,
Operations Research, 34(6), pp. 871 – 882.

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 6 posted: 1/21/2011 language: English pages: 73