modeling-decision

Document Sample
modeling-decision Powered By Docstoc
					Modeling Decision

   Nur Aini Masruroh
Outline
 Introduction
 Probabilistic thinking
 Decision tree
 Introduction to Bayesian Network and Influence Diagram
Introduction
 Why are decisions hard to make?
1.   Complexity
        There are many alternatives or possible solutions
        There are many factors to be considered and many of these factors are
         interdependent
2.   Uncertainty
        The possible future outcomes are uncertain or difficult to predict
        Information may be vague, incomplete, or unavailable
3.   Multiple conflicting objectives
        The decision maker(s) may have many goals and objectives
        Many of these goals or objectives may be conflicting in nature
   Good decision versus good outcome
                           Good decision



                 Good                        Bad
                outcome                    outcome




                            Bad decision



 Good decision is not guarantee good outcome – it only enhances
  the chance
Probabilistic thinking
 Event is a distinction about some states of the world
   Example:
     Whether the next person entering the room is a beer drinker
     Whether it will be raining tonight, etc

 When we identify an event, we have in mind what we meant.
  But will other people know precisely what you mean?
   Even you may not have precise definition of what you have in
    mind
 To avoid ambiguity, every event should pass the clarity test
   Clarity test: to ensure that we are absolutely clear and precise about the
    definition of every event we are dealing with in a decision problem
Possibility tree
 Single event tree
   Example: event “the next person entering this room is a businessman”
   Suppose B represents a businessman and B’ otherwise,
Possibility tree
 Two-event trees
   Simultaneously consider several events
   Example: event “the next person entering this room is a businessman”
    and event “the next person entering this room is a graduate” can be
    jointly considered
Reversing the order of events in a tree
 In the previous example, we have considered the distinctions in
  the order of “businessman” then “graduate”, i.e., B to G.
 The same information can be expressed with the events in the
  reverse order, i.e., G to B.
Multiple event trees
 We can jointly consider three events businessman, graduate, and
  gender.
Assigning probabilities to events
 To assign probabilities, it depends on our state of information
  about the event
 Example: information relevant to assessment of the likelihood that
  the next person entering the room is a businessman might include
  the followings:
   There is an alumni meeting outside the room and most of them are
    businessman
   You have made arrangement to meet a friend here and she to your
    knowledge is not a businessman. She is going to show up any moment.
   Etc
 After considering all relevant background information, we assign
  the likelihood that the next person entering the room is a
  businessman by assigning a probability value to each of the
  possibilities or outcomes
Marginal and conditional probabilities
 In general, given information about the outcome of some events,
  we may revise our probabilities of other events
 We do this through the use of conditional probabilities
 The probability of an event X given specific outcomes of another
  event Y is called the conditional probability X given Y
 The conditional probability of event X given event Y and other
  background information ξ, is denoted by p(X|Y, ξ) and is given by

                            p( X  Y |  )
         p( X | Y ,  )                     for p(Y |  )  0
                               p(Y |  )
Factorization rule for joint probability
    Changing the order of conditioning
 Suppose in the previous tree we have




     There is no reason why we should always conditioned G on B. suppose we want to
     draw the tree in the order G to B


                                           Need to flip the
                                           tree!
Flipping the tree
 Graphical approach
   Change the ordering of the underlying possibility tree
   Transfer the elemental (joint) probabilities from the original tree to the new
    tree
   Compute the marginal probability for the first variable in the new tree, i.e.,
    G. We add the elemental probabilities that are related to G1 and G2
    respectively.
   Compute conditional probabilities for B given G


 Bayes’ theorem
   Doing the above tree flipping is already applying Bayes’theorem
Bayes’ Theorem
 Given two uncertain events X and Y. Suppose the probabilities
  p(X|ξ) and p(Y|X, ξ) are known, then

                          p( X |  ) p(Y | X ,  )
         p( X | Y ,  ) 
                                 p(Y |  )
         where
         p(Y |  )   p( X |  ) p(Y | X |  )
                      X
Application of conditional probability
Direct conditioning: Relevance of smoking to lung cancer
 Suppose:
   S: A person is a heavy smoker which is defined as having smoked at least two
   packs of cigarettes per day for a period of at least 10 years during a lifetime
   L: A person has lung cancer according to standard medical definition

 A doctor not associated with lung cancer treatment assigned the following
   probabilities:
Relevance of smoking to lung cancer
(cont’d)
 A lung cancer specialist remarked: “The probability p(L1|S1, ξ) = 0.1 is too low”
 When asked to explain why, he said:
  “Because in all these years as a lung cancer specialist, whenever I visited my lung cancer
  ward, it is always full of smokers.”
 What’s wrong with the above statement?
 The answer can be found by flipping the tree:
Relevance of smoking to lung cancer
(cont’d)
 What the specialist referred to as “high” is actually the probability
  of a person being a smoker given that he has lung cancer, i.e.,
  p(S1|L1, ξ) = 0.769 is exactly what he was referring to.
 He has confused p(S1|L1, ξ) with p(L1|S1, ξ)
 Notice that p(L1|S1, ξ) << p(S1|L1, ξ)
 Hence even highly a trained professional can fall victim to wrong
  reasoning
Expected value criterion
 Suppose you face a situation where you must choose between
  alternatives A and B as follows:
   Alternative A: $10,000 for sure.
   Alternative B: 70% chance of receiving $18,000 and 30% chance of loosing
    $4,000.
  What is your personal choice?
 Compare now Alternative B with:
   Alternative C: 70% chance of winning $24,600 and 30% chance of loosing
    $19,400
 Note that EMV(B) = EMV(C), but are they “equivalent”?
 Alternative C seems to be “more risky” than Alternative B even
  thought they have the same EMV.
 Conclusion: EMV does not take Risk into account
The Petersburg Paradox
 In 1713 Nicolas Bernoulli suggested playing the following games:
    An unbiased coin is tossed until it lands with Tails
    The player is paid $2 if tails comes up the opening toss, $4 if tails first
      appears on the second toss, $8 if tails appears on third toss, $16 if tails
      appears on the forth toss, and so forth
 What is the maximum you would pay to play the above game?
 If we follow the EMV criterion:
                     k
              1     1       1       1
   EMV     ($2)   ($2)   ($4)   ($8)  ...  
                    k

         k 1  2    2       4       8

 This means that you should be willing to pay up to an infinite amount of money
   to play the game, but why people are unwilling to pay more than a few dollars?
The Petersburg Paradox
 25 years later, Nicolas’s cousin, Daniel Bernoulli, arrived at a solution that
  contained the first seeds of contemporary decision theory
 Daniel reasoned that the marginal increase the value or “utility” of money
  declines with the amount already possessed.
 A gain of $1,000 is more significant to a poor person than to a rich man
  through both gain same amount
 Specifically, Daniel Bernoulli argued that the value or utility of money should
  exhibit some form of diminishing marginal return with increase in wealth:

                                             The measure to use to value the game is
                                             then the “expected utility”
                                                                 k

                                                                      
                                                          
                                                               1
                                                  EU     u 2k
                                                          k 1  2 

                                                u is an increasing concave function,
                                               converge to a finite number
The rules of actional thought
 How a person should acts or decides rationally under uncertainty?
 Answer: by following the following rules or axioms:
    The ordering rule
    The equivalence or continuity rule
    The substitution or independence rule
    Decomposition rule
    The choice rule


 The above five rules form the axioms for Decision Theory
The ordering rule
 The decision maker must be able to state his preference among the prospects,
  outcomes, or prizes of any deal
 Furthermore, the transitivity property must be satisfied: that is, if he prefers X
  to Y, and Y to Z, then he must prefer X to Z
    Mathematically,
 The ordering rule implies that the decision maker can provide a complete
  preference ordering of all the outcomes from the best to the worst
 Suppose a person does not follow the transitivity property: the money pump
  argument
The equivalence or continuity rule
 Given a prospect A, B, and C such that A  B  C   , then there
  exists p where 0 < p < 1 such that the decision maker will be
  indifferent between receiving the prospect B for sure and receiving
  a deal with a probability p for prospect A and a probability of 1 – p
  for prospect C




 Given that A  B  C
   B: certain equivalent of the uncertain deal on the right
   p: preference probability of prospect B with respect to prospects A and C
The substitution rule
 We can always substitute a deal with its certainty equivalent without affecting
  preference
 For example, suppose the decision maker is indifferent between B and the A – C
  deal below




 Then he must be indifferent between the two deals below where B is
   substituted for the A – C deal
The decomposition rule
 We can reduce compound deals to simple ones using the rules of probabilities
 For example, a decision maker should be indifferent between the following two
  deals:
The choice or monotonicity rule
 Suppose that a decision maker can choose between two deals L1 and L2 as
   follows:




 If the decision maker prefers A to B, then he must prefer L1 to L2 if and only if
   p1 > p2. That is, if   A B




 In other words, the decision maker must prefer the deal that offers the greater
   chance of receiving the better outcome
Maximum expected utility principle
 Let a decision maker faces the choice between two uncertain deals or lotteries
   L1 and L2 with outcomes A1, A2, …, An as follows:




 There is no loss of generality in assuming that L1 and L2 have the same set of
  outcomes A1, A2, …, An because we can always assign zero probability to those
  outcomes that do not exist in either L1 and L2.
 It’s not clear whether L1 or L2 is preferred
 By ordering rule, let A  A  ...  A
                           1      2          n
Maximum expected utility principle
 Again, there is no loss of generality as we can always renumber the subscripts
  according to the preference ordering
 We note that A1 is the most preferred outcome, while An is the least preferred
  outcome
 By equivalent rule, for each outcome Ai (i =1, …, n) there is a number ui such
  that 0 ≤ ui ≤ 1 and




 Note that u1 = 1 and un = 0. Why?
Maximum expected utility principle
 By the substitution rule, we replace each Ai (i=1,…,n) in L1 and L2 with the
  above constructed equivalent lotteries
Maximum expected utility principle
 By the decomposition rule, L1 and L2 may be reduced to
  equivalent deals with only two outcomes (A1 and An) each having
  different probabilities
 Finally, by the choice rule, since A1  An, the decision maker
  should prefer lottery L1 to lottery L2 if and only if

                     n              n

                    u p  u q
                    i 1
                           i   i
                                   i 1
                                          i i
Utilities and utility functions
 We define the quantity ui (i=1,…,n) as the utility of outcome Ai
  and the function that returns the values ui given Ai as a utility
  function, i.e. u(Ai) = ui
 The quantities
            n               n

            p u( A ) and  q u( A )
           i 1
                  i   i
                           i 1
                                  i   i



  are known as the expected utilities for lotteries L1 and L2
  respectively
 Hence the decision maker must prefer the lottery with a higher
  expected utility
   Case for more than 2 alternatives
 The previous may be generalized to the case when a decision maker is
  faced with more than two uncertain alternatives. He should choose the
  one with maximum expected utility
 Hence
                                       n
           best alternativ e  arg Max  pij u ( Ai )
                                     j
                                         i 1




  where pij is the probability for the outcome Ai in the alternative j
Comparing expected utility criterion with expected
monetary value criterion
   The expected utility criterion takes into account both return and
    risk whereas expected monetary value criterion does not consider
    risk
   The alternative with the maximum expected utility is the best
    taking into account the trade off between return and risk
   The best preference trade-off depends on a person’s risk attitude
   Different types of utility function represent different attitudes and
    degree of aversion to risk taking
Decision tree
 Consider the following party problem:


                                   Problem: decide party
                                   location to maximize total
                                   satisfaction

                                   Note:
                                   Decision is represented by
                                   square
                                   Uncertainties are
                                   represented by circles
Preference
 Suppose we have the following preference


                                  Note:
                                  Best case: O – S  set 1
                                  Worst case: O – R  set 0
                                  Other outcomes set the preference
                                  relative to these two values
Assigning probability to the decision
tree
 Suppose we believe that the probability it will rain is 0.6,
Applying substitution rule
Using utility values
 We may interpret preference probability as utility values,
Introduction to Bayesian Network and
          Influence Diagram
“A good representation is the key to good
           problem solving”
            Probabilistic modeling using BN

 Suppose we have the following problem (represented in decision tree):




 Can be represented using Bayesian Network (BN):


                                                               Conditional
                                                               Probability Table
                                                               (CPT) is
                                                               embedded in
                                                               each arch
         Probabilistic modeling using BN

 The network can be extended …

                                             Can you imagine the
                                           size of decision tree for
                                                    these?
Bayesian Network: definition
 Also called relevance diagrams, probabilistic network, causal
  network, causal graph, etc.
 BN represents the probabilistic relations between uncertain
  variables
 It is a directed acyclic graph; the nodes in the graph indicate the
  variables of concern, while the arcs between nodes indicate the
  probabilistic relations among the nodes
 In each node, we store a conditional probability distribution of
  the variable represented by that node, conditioned on the
  outcomes of all the uncertain variables that are parents of that
  node
Two layers of representation of
knowledge
  Qualitative level
    Graphical structure represents the probabilistic dependence
     or relevance between variables
  Quantitative level
    Conditional probabilities represent the local “strength” of the
     dependence relationship
Where do the numbers in a BN come
from?
   Direct assessment by domain experts
   Learn from sufficient amount of data using:
      Statistical estimation methods
      Machine learning and data mining algorithms
   Output from other mathematical models
      Simulation models
      Stochastic models
      Systems dynamics models
      Etc
   Combination of the above
      Expert assess the graphical structure and learning algorithms or other models fill in
       the number
      Learn both structure and numbers and let the experts fine-tune the results
Properties of BN
 Presence of an arc indicates possible relevance
 Arc reversal:


 If we are interested to know the probability that he is a smoker if a specific
   person has lung cancer …




 The operation will compute and replace the probabilities at the two nodes
    An arc can be drawn in any direction
Arc reversal operation
 Suppose initially we have,




 Then we want,




 The probability distribution p(Y) and p(X|Y) for the new network can be
  computed using Bayes’ Theorem as follows:

                p (Y )   p ( X ) p (Y | X ) Original network
                         X

                               p ( X ) p (Y | X )
                p( X | Y ) 
                                     p (Y )
Arc reversal: example




 Note: in arc reversal, sometimes we should add arc(s) to preserve
  the Bayes’ Theorem. However, if possible, avoid arc reversal that
  will introduce additional arcs as that implies loss of conditional
  independence information
If an arc can be drawn in any direction, which shall
I use?
 During the network construction, draw arcs in the directions in
  which you know the conditional probabilities or you know that
  there are data which you can used to determined these values
  later. Arcs drawn in these directions are said to be in assessment
  order.
 During inference, if he arcs are not in the desired directions,
  reverse them. Arcs in directions required for inference are said to
  be in inference order.
 Example:
   The network with the arc from “smoking” to “lung cancer” is in assessment
    order
   The network with the arc from “lung cancer” to “smoking” is in inference
    order
     BN represents joint probability distribution
      BN can help simplifying the JPD
         Consider the following BN
                                               Without constructing BN first
                                               …




p(A,B,C,D,E,F)=                        p(A,B,C,D,E,F)=
p(A)p(B|A)p(C)p(D|B,C)p(E|B)p(F|B,E)   p(A)p(B|A)p(C|A,B)p(D|A,B,C)p(E|A,B,C,D
                                       )p(F|A,B,C,D,E)
Example of BN:
car starting system
Example of BN:
cause of dyspnea
Example of BN:
ink jet printer trouble shooting
Example of BN:
patient monitoring in an ICU (alarm project)
Decision modeling using Influence
Diagram
 BN represents probabilistic relationship among uncertain
  variables
 They are useful for pure probabilistic reasoning and
  inferences
 BN can be extended to Influence Diagram (ID) to
  represent decision problem by adding decision nodes and
  value nodes
 This is analogues to extending a probability tree to a
  decision tree by adding decision branches and adding
  values or utilities to the end points of the tree
Decision node
 Decision variable: variable within the control of the decision
  maker
 Represented by rectangular node in an ID
 In each decision node, we store a list of possible alternatives
  associated with the decision variable
Arcs
 Information arcs: arc from chance node into decision node




 Influence arcs: arcs from decision node to chance node
Arcs (cont’d)
 Chronological arcs:
   Arc from one decision node to another decision node indicates
    the chronological order in which the decisions are being
    carried out
Value node and value arc
 Used to represent the utility or value function of the decision maker
 Denoted by a diamond
 Value node must be a sink node, i.e. it has only incoming arcs (known as value
  arcs) but no outgoing arc
 Value arcs indicate the variables whose outcomes the decision maker cares
  about or have impact on his utility
 Only one value node is allowed in a standard ID
Deterministic node
 Special type of chance node
 Represent the variable whose outcomes are deterministic (i.e. has
  probability = 1), once the outcomes of other conditioning nodes
  are known
 Denoted by a double-oval
                  ID vs decision tree
No   Influence Diagram                                      Decision tree

1    Compact                                                Combinatory
     The size of an ID is equal to the total number of      The size of decision tree grows exponentially with the total
     variables                                              number of variables. A binary tree with n nodes has 2n leaf
                                                            nodes
2    Graphical representation of independence               Numerical representation of independence
     Conditional independence relations among the           Conditional independence relations among the variables
     variables are represented by the graphical structure   can only determined through numerical computation using
     of network. No numerical computations needed to        the numerical computation using the probabilities
     determine conditional independence relations
3    Non-directional                                        Unidirectional
     The nodes and arcs of an ID may be added or            A decision tree can only be built in the direction from the
     deleted in any order. This makes the modeling          root to the leaf nodes. The exact sequence of the nodes or
     process flexible                                       events must be known in advance
4    Symmetric model only                                   Asymmetric model possible
     The outcomes of all nodes must be conditioned on       The outcomes of some nodes may be omitted for certain
     all outcomes of its parents. This implies that the     outcomes of its parent leading to a asymmetrical tree
     equivalent tree must be symmetrical
Example 1
Example 2
Example 3
Decision model : example 1


                               The party problem




Basic risky decision problem
  Decision model : example 2

Decision problem with imperfect information
Decision model : example 3
Production/sale problem
     Decision model : example 4
Maintenance decision for space shuttle tiles
Decision model : example 5
                     Basic model for electricity
                     generation investment evaluation
Evaluating ID
 To find the optimal decision policy of a problem represented by an
  ID
 Methods:
    Convert ID into an equivalent decision tree and perform tree
     roll back
    Perform operations directly on the network to obtain the
     optimal decision policy. First algorithm is that of Shachter
     (1986)
Readings
 Clemen, R.T. and Reilly, T. (2001). Making Hard
  Decisions with Decision Tools. California: Duxbury
  Thomson Learning
 Howard, R.A, (1988). Decision Analysis: Practice and
  Promise. Management Science, 34(6), pp. 679 – 695.
 Russell, S. and Norvig, P. (2003). Artificial intelligent: A
  modern approach, 2 ed. Prentice-Hall, Inc.
 Shachter, R.D., 1986, Evaluating Influence Diagrams,
  Operations Research, 34(6), pp. 871 – 882.

				
DOCUMENT INFO