VIEWS: 15 PAGES: 73 POSTED ON: 1/13/2011
Modeling Decision Nur Aini Masruroh Outline Introduction Probabilistic thinking Decision tree Introduction to Bayesian Network and Influence Diagram Introduction Why are decisions hard to make? 1. Complexity There are many alternatives or possible solutions There are many factors to be considered and many of these factors are interdependent 2. Uncertainty The possible future outcomes are uncertain or difficult to predict Information may be vague, incomplete, or unavailable 3. Multiple conflicting objectives The decision maker(s) may have many goals and objectives Many of these goals or objectives may be conflicting in nature Good decision versus good outcome Good decision Good Bad outcome outcome Bad decision Good decision is not guarantee good outcome – it only enhances the chance Probabilistic thinking Event is a distinction about some states of the world Example: Whether the next person entering the room is a beer drinker Whether it will be raining tonight, etc When we identify an event, we have in mind what we meant. But will other people know precisely what you mean? Even you may not have precise definition of what you have in mind To avoid ambiguity, every event should pass the clarity test Clarity test: to ensure that we are absolutely clear and precise about the definition of every event we are dealing with in a decision problem Possibility tree Single event tree Example: event “the next person entering this room is a businessman” Suppose B represents a businessman and B’ otherwise, Possibility tree Two-event trees Simultaneously consider several events Example: event “the next person entering this room is a businessman” and event “the next person entering this room is a graduate” can be jointly considered Reversing the order of events in a tree In the previous example, we have considered the distinctions in the order of “businessman” then “graduate”, i.e., B to G. The same information can be expressed with the events in the reverse order, i.e., G to B. Multiple event trees We can jointly consider three events businessman, graduate, and gender. Assigning probabilities to events To assign probabilities, it depends on our state of information about the event Example: information relevant to assessment of the likelihood that the next person entering the room is a businessman might include the followings: There is an alumni meeting outside the room and most of them are businessman You have made arrangement to meet a friend here and she to your knowledge is not a businessman. She is going to show up any moment. Etc After considering all relevant background information, we assign the likelihood that the next person entering the room is a businessman by assigning a probability value to each of the possibilities or outcomes Marginal and conditional probabilities In general, given information about the outcome of some events, we may revise our probabilities of other events We do this through the use of conditional probabilities The probability of an event X given specific outcomes of another event Y is called the conditional probability X given Y The conditional probability of event X given event Y and other background information ξ, is denoted by p(X|Y, ξ) and is given by p( X Y | ) p( X | Y , ) for p(Y | ) 0 p(Y | ) Factorization rule for joint probability Changing the order of conditioning Suppose in the previous tree we have There is no reason why we should always conditioned G on B. suppose we want to draw the tree in the order G to B Need to flip the tree! Flipping the tree Graphical approach Change the ordering of the underlying possibility tree Transfer the elemental (joint) probabilities from the original tree to the new tree Compute the marginal probability for the first variable in the new tree, i.e., G. We add the elemental probabilities that are related to G1 and G2 respectively. Compute conditional probabilities for B given G Bayes’ theorem Doing the above tree flipping is already applying Bayes’theorem Bayes’ Theorem Given two uncertain events X and Y. Suppose the probabilities p(X|ξ) and p(Y|X, ξ) are known, then p( X | ) p(Y | X , ) p( X | Y , ) p(Y | ) where p(Y | ) p( X | ) p(Y | X | ) X Application of conditional probability Direct conditioning: Relevance of smoking to lung cancer Suppose: S: A person is a heavy smoker which is defined as having smoked at least two packs of cigarettes per day for a period of at least 10 years during a lifetime L: A person has lung cancer according to standard medical definition A doctor not associated with lung cancer treatment assigned the following probabilities: Relevance of smoking to lung cancer (cont’d) A lung cancer specialist remarked: “The probability p(L1|S1, ξ) = 0.1 is too low” When asked to explain why, he said: “Because in all these years as a lung cancer specialist, whenever I visited my lung cancer ward, it is always full of smokers.” What’s wrong with the above statement? The answer can be found by flipping the tree: Relevance of smoking to lung cancer (cont’d) What the specialist referred to as “high” is actually the probability of a person being a smoker given that he has lung cancer, i.e., p(S1|L1, ξ) = 0.769 is exactly what he was referring to. He has confused p(S1|L1, ξ) with p(L1|S1, ξ) Notice that p(L1|S1, ξ) << p(S1|L1, ξ) Hence even highly a trained professional can fall victim to wrong reasoning Expected value criterion Suppose you face a situation where you must choose between alternatives A and B as follows: Alternative A: $10,000 for sure. Alternative B: 70% chance of receiving $18,000 and 30% chance of loosing $4,000. What is your personal choice? Compare now Alternative B with: Alternative C: 70% chance of winning $24,600 and 30% chance of loosing $19,400 Note that EMV(B) = EMV(C), but are they “equivalent”? Alternative C seems to be “more risky” than Alternative B even thought they have the same EMV. Conclusion: EMV does not take Risk into account The Petersburg Paradox In 1713 Nicolas Bernoulli suggested playing the following games: An unbiased coin is tossed until it lands with Tails The player is paid $2 if tails comes up the opening toss, $4 if tails first appears on the second toss, $8 if tails appears on third toss, $16 if tails appears on the forth toss, and so forth What is the maximum you would pay to play the above game? If we follow the EMV criterion: k 1 1 1 1 EMV ($2) ($2) ($4) ($8) ... k k 1 2 2 4 8 This means that you should be willing to pay up to an infinite amount of money to play the game, but why people are unwilling to pay more than a few dollars? The Petersburg Paradox 25 years later, Nicolas’s cousin, Daniel Bernoulli, arrived at a solution that contained the first seeds of contemporary decision theory Daniel reasoned that the marginal increase the value or “utility” of money declines with the amount already possessed. A gain of $1,000 is more significant to a poor person than to a rich man through both gain same amount Specifically, Daniel Bernoulli argued that the value or utility of money should exhibit some form of diminishing marginal return with increase in wealth: The measure to use to value the game is then the “expected utility” k 1 EU u 2k k 1 2 u is an increasing concave function, converge to a finite number The rules of actional thought How a person should acts or decides rationally under uncertainty? Answer: by following the following rules or axioms: The ordering rule The equivalence or continuity rule The substitution or independence rule Decomposition rule The choice rule The above five rules form the axioms for Decision Theory The ordering rule The decision maker must be able to state his preference among the prospects, outcomes, or prizes of any deal Furthermore, the transitivity property must be satisfied: that is, if he prefers X to Y, and Y to Z, then he must prefer X to Z Mathematically, The ordering rule implies that the decision maker can provide a complete preference ordering of all the outcomes from the best to the worst Suppose a person does not follow the transitivity property: the money pump argument The equivalence or continuity rule Given a prospect A, B, and C such that A B C , then there exists p where 0 < p < 1 such that the decision maker will be indifferent between receiving the prospect B for sure and receiving a deal with a probability p for prospect A and a probability of 1 – p for prospect C Given that A B C B: certain equivalent of the uncertain deal on the right p: preference probability of prospect B with respect to prospects A and C The substitution rule We can always substitute a deal with its certainty equivalent without affecting preference For example, suppose the decision maker is indifferent between B and the A – C deal below Then he must be indifferent between the two deals below where B is substituted for the A – C deal The decomposition rule We can reduce compound deals to simple ones using the rules of probabilities For example, a decision maker should be indifferent between the following two deals: The choice or monotonicity rule Suppose that a decision maker can choose between two deals L1 and L2 as follows: If the decision maker prefers A to B, then he must prefer L1 to L2 if and only if p1 > p2. That is, if A B In other words, the decision maker must prefer the deal that offers the greater chance of receiving the better outcome Maximum expected utility principle Let a decision maker faces the choice between two uncertain deals or lotteries L1 and L2 with outcomes A1, A2, …, An as follows: There is no loss of generality in assuming that L1 and L2 have the same set of outcomes A1, A2, …, An because we can always assign zero probability to those outcomes that do not exist in either L1 and L2. It’s not clear whether L1 or L2 is preferred By ordering rule, let A A ... A 1 2 n Maximum expected utility principle Again, there is no loss of generality as we can always renumber the subscripts according to the preference ordering We note that A1 is the most preferred outcome, while An is the least preferred outcome By equivalent rule, for each outcome Ai (i =1, …, n) there is a number ui such that 0 ≤ ui ≤ 1 and Note that u1 = 1 and un = 0. Why? Maximum expected utility principle By the substitution rule, we replace each Ai (i=1,…,n) in L1 and L2 with the above constructed equivalent lotteries Maximum expected utility principle By the decomposition rule, L1 and L2 may be reduced to equivalent deals with only two outcomes (A1 and An) each having different probabilities Finally, by the choice rule, since A1 An, the decision maker should prefer lottery L1 to lottery L2 if and only if n n u p u q i 1 i i i 1 i i Utilities and utility functions We define the quantity ui (i=1,…,n) as the utility of outcome Ai and the function that returns the values ui given Ai as a utility function, i.e. u(Ai) = ui The quantities n n p u( A ) and q u( A ) i 1 i i i 1 i i are known as the expected utilities for lotteries L1 and L2 respectively Hence the decision maker must prefer the lottery with a higher expected utility Case for more than 2 alternatives The previous may be generalized to the case when a decision maker is faced with more than two uncertain alternatives. He should choose the one with maximum expected utility Hence n best alternativ e arg Max pij u ( Ai ) j i 1 where pij is the probability for the outcome Ai in the alternative j Comparing expected utility criterion with expected monetary value criterion The expected utility criterion takes into account both return and risk whereas expected monetary value criterion does not consider risk The alternative with the maximum expected utility is the best taking into account the trade off between return and risk The best preference trade-off depends on a person’s risk attitude Different types of utility function represent different attitudes and degree of aversion to risk taking Decision tree Consider the following party problem: Problem: decide party location to maximize total satisfaction Note: Decision is represented by square Uncertainties are represented by circles Preference Suppose we have the following preference Note: Best case: O – S set 1 Worst case: O – R set 0 Other outcomes set the preference relative to these two values Assigning probability to the decision tree Suppose we believe that the probability it will rain is 0.6, Applying substitution rule Using utility values We may interpret preference probability as utility values, Introduction to Bayesian Network and Influence Diagram “A good representation is the key to good problem solving” Probabilistic modeling using BN Suppose we have the following problem (represented in decision tree): Can be represented using Bayesian Network (BN): Conditional Probability Table (CPT) is embedded in each arch Probabilistic modeling using BN The network can be extended … Can you imagine the size of decision tree for these? Bayesian Network: definition Also called relevance diagrams, probabilistic network, causal network, causal graph, etc. BN represents the probabilistic relations between uncertain variables It is a directed acyclic graph; the nodes in the graph indicate the variables of concern, while the arcs between nodes indicate the probabilistic relations among the nodes In each node, we store a conditional probability distribution of the variable represented by that node, conditioned on the outcomes of all the uncertain variables that are parents of that node Two layers of representation of knowledge Qualitative level Graphical structure represents the probabilistic dependence or relevance between variables Quantitative level Conditional probabilities represent the local “strength” of the dependence relationship Where do the numbers in a BN come from? Direct assessment by domain experts Learn from sufficient amount of data using: Statistical estimation methods Machine learning and data mining algorithms Output from other mathematical models Simulation models Stochastic models Systems dynamics models Etc Combination of the above Expert assess the graphical structure and learning algorithms or other models fill in the number Learn both structure and numbers and let the experts fine-tune the results Properties of BN Presence of an arc indicates possible relevance Arc reversal: If we are interested to know the probability that he is a smoker if a specific person has lung cancer … The operation will compute and replace the probabilities at the two nodes An arc can be drawn in any direction Arc reversal operation Suppose initially we have, Then we want, The probability distribution p(Y) and p(X|Y) for the new network can be computed using Bayes’ Theorem as follows: p (Y ) p ( X ) p (Y | X ) Original network X p ( X ) p (Y | X ) p( X | Y ) p (Y ) Arc reversal: example Note: in arc reversal, sometimes we should add arc(s) to preserve the Bayes’ Theorem. However, if possible, avoid arc reversal that will introduce additional arcs as that implies loss of conditional independence information If an arc can be drawn in any direction, which shall I use? During the network construction, draw arcs in the directions in which you know the conditional probabilities or you know that there are data which you can used to determined these values later. Arcs drawn in these directions are said to be in assessment order. During inference, if he arcs are not in the desired directions, reverse them. Arcs in directions required for inference are said to be in inference order. Example: The network with the arc from “smoking” to “lung cancer” is in assessment order The network with the arc from “lung cancer” to “smoking” is in inference order BN represents joint probability distribution BN can help simplifying the JPD Consider the following BN Without constructing BN first … p(A,B,C,D,E,F)= p(A,B,C,D,E,F)= p(A)p(B|A)p(C)p(D|B,C)p(E|B)p(F|B,E) p(A)p(B|A)p(C|A,B)p(D|A,B,C)p(E|A,B,C,D )p(F|A,B,C,D,E) Example of BN: car starting system Example of BN: cause of dyspnea Example of BN: ink jet printer trouble shooting Example of BN: patient monitoring in an ICU (alarm project) Decision modeling using Influence Diagram BN represents probabilistic relationship among uncertain variables They are useful for pure probabilistic reasoning and inferences BN can be extended to Influence Diagram (ID) to represent decision problem by adding decision nodes and value nodes This is analogues to extending a probability tree to a decision tree by adding decision branches and adding values or utilities to the end points of the tree Decision node Decision variable: variable within the control of the decision maker Represented by rectangular node in an ID In each decision node, we store a list of possible alternatives associated with the decision variable Arcs Information arcs: arc from chance node into decision node Influence arcs: arcs from decision node to chance node Arcs (cont’d) Chronological arcs: Arc from one decision node to another decision node indicates the chronological order in which the decisions are being carried out Value node and value arc Used to represent the utility or value function of the decision maker Denoted by a diamond Value node must be a sink node, i.e. it has only incoming arcs (known as value arcs) but no outgoing arc Value arcs indicate the variables whose outcomes the decision maker cares about or have impact on his utility Only one value node is allowed in a standard ID Deterministic node Special type of chance node Represent the variable whose outcomes are deterministic (i.e. has probability = 1), once the outcomes of other conditioning nodes are known Denoted by a double-oval ID vs decision tree No Influence Diagram Decision tree 1 Compact Combinatory The size of an ID is equal to the total number of The size of decision tree grows exponentially with the total variables number of variables. A binary tree with n nodes has 2n leaf nodes 2 Graphical representation of independence Numerical representation of independence Conditional independence relations among the Conditional independence relations among the variables variables are represented by the graphical structure can only determined through numerical computation using of network. No numerical computations needed to the numerical computation using the probabilities determine conditional independence relations 3 Non-directional Unidirectional The nodes and arcs of an ID may be added or A decision tree can only be built in the direction from the deleted in any order. This makes the modeling root to the leaf nodes. The exact sequence of the nodes or process flexible events must be known in advance 4 Symmetric model only Asymmetric model possible The outcomes of all nodes must be conditioned on The outcomes of some nodes may be omitted for certain all outcomes of its parents. This implies that the outcomes of its parent leading to a asymmetrical tree equivalent tree must be symmetrical Example 1 Example 2 Example 3 Decision model : example 1 The party problem Basic risky decision problem Decision model : example 2 Decision problem with imperfect information Decision model : example 3 Production/sale problem Decision model : example 4 Maintenance decision for space shuttle tiles Decision model : example 5 Basic model for electricity generation investment evaluation Evaluating ID To find the optimal decision policy of a problem represented by an ID Methods: Convert ID into an equivalent decision tree and perform tree roll back Perform operations directly on the network to obtain the optimal decision policy. First algorithm is that of Shachter (1986) Readings Clemen, R.T. and Reilly, T. (2001). Making Hard Decisions with Decision Tools. California: Duxbury Thomson Learning Howard, R.A, (1988). Decision Analysis: Practice and Promise. Management Science, 34(6), pp. 679 – 695. Russell, S. and Norvig, P. (2003). Artificial intelligent: A modern approach, 2 ed. Prentice-Hall, Inc. Shachter, R.D., 1986, Evaluating Influence Diagrams, Operations Research, 34(6), pp. 871 – 882.