VIEWS: 4 PAGES: 17 POSTED ON: 12/11/2011
Conflicts in Bayesian Networks January 23, 2007 Marco Valtorta mgv@cse.sc.edu UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Example: Case Study #4 Bayesian Network Fragment Matching 1) Report Date: 1 April, 2003. FBI: Abdul Ramazi is the owner of the Select Gourmet Foods shop in <Protege:Person rdf:about="&Protege;Omniseer_00135" Springfield Mall. ….. Springfield, VA. (Phone Protege:familyName="Ramazi" Protege:givenName="Abdulla“rrdfs:label="Abdull number 703-659.2317). a Ramazi"/> First Union National ….. Bank lists Select <Protege:Bank Gourmet Foods as rdf:about="&Protege;Omniseer_00614" Protege:alternateName="Pyramid Bank of Cairo" holding account number 1070173749003. Six rdfs:label="Pyramid Bank of Cairo"> <Protege:address Partially- checks totaling $35,000 have been deposited in rdf:resource="&Protege;Omniseer_00594"/> <Protege:note Instantiated this account in the past rdf:resource="&Protege;Omniseer_00625"/> </Protege:Bank> Bayesian four months and are recorded as having been …. Network <Protege:Report drawn on accounts at the Pyramid Bank of Cairo, rdf:about="&Protege;Omniseer_00626" Protege:abstract="Ramazi's deposit in the past 4 Fragment months (1)" Egypt and the Central rdfs:label="Ramazi's deposit in the past 4 months Bank of Dubai, United (1)"> Arab Emirates. Both of <Protege:reportedFrom rdf:resource="&Protege;Omniseer_00501"/> these banks have just <Protege:detail been listed as possible rdf:resource="&Protege;Omniseer_00602"/> conduits in money <Protege:detail BN Fragment laundering schemes. rdf:resource="&Protege;Omniseer_00612"/> UNIVERSITY OF SOUTH CAROLINA </Protege:Report> Repository </rdf:RDF> Department of Computer Science and Engineering Example: Case Study #4 Bayesian Network Fragment Composition ..... + Fragments Situation-Specific Scenario UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Value of Information • An item of information is useful if acquiring it leads to a better decision, that is, to a more useful action • An item of information is useless if the actions that are taken after acquiring it are no more useful than before acquiring it • In particular, information is useless if the actions that are taken after acquiring it are the same as before acquiring it • In the absence of a detailed model of the utility of actions, the decrease in uncertainty about a variable of interest is taken to be a proxy for the increase in utility: the best item of information to acquire is the one that reduces the most the uncertainty about a variable of interest • Since the value of the new item of information is not known, we average over its possible values • Uncertainty is measured by entropy. Reduction in uncertainty is measured by reduction in entropy UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Example: Case Study #4 Computing Value of Information and Surprise This is the output Ramazi performed illegal banking of the VOI program on a transactions situation-specific scenario Yes for Case Study #4 (Sign Is Ramazi a terrorist? of the Crescent). Would it help to know whether he traveled to sensitive locations? Variable Travel (which represents suspicious travel) is significant for determining the state of variable Suspect (whether Ramazi is a terrorist), even when it is already known that Ramazi has performed suspicious banking transactions. UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Value of Information: Formal Definition • Let V be a variable whose value affects the actions to be taken by an analyst. For example, V indicates whether a bomb is placed on a particular airliner • Let p(v) be the probability that variable V has value v. • The entropy of V is: H (V ) p(V v) log( p(V v)) • Let T be a variable whose value we may acquire (by vV expending resources). For example, T indicates whether a passenger is a known terrorist. • The entropy of V given that T has value t is: H (V | t ) p (V v | T t ) log( p (V v | T t )) vV • The expected entropy of V given T is: EH (V | T ) p(T t ) H (V | t ) tT • The value of information is: ( EH (V | T ) H (V )) UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Surprise Detection • Surprise is the situation in which evidence (a set of findings) and a situation-specific scenario are incompatible • Since situation-specific scenarios are Bayesian networks, it is very unusual for an outright inconsistency to occur • In some cases, however, the evidence is very unlikely in a given scenario; this may be because a rare case has been found, or because the scenario cannot explain the evidence • To distinguish these two situations, we compare the probability of the evidence in the situation-specific scenario to the probability of the evidence in a scenario in which all events are probabilistically independent and occur with the same prior probability as in the situation-specific scenario UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Example: Case Study #4 Computing Surprise The VALUE OF INFORMATION of the test node C for the target node A is 0.0 Parsing the XMLBIF file 'ssn.xml' ... done! PROBABILITY FOR JOINT FINDINGS = 5.0E-4 Prior probability for NODE: Suspicious Person=yes is 0.01 Prior probability for NODE: Unusual Activities=yes is 0.0656 Prior probability for NODE: Stolen Weapons=yes is 0.05 PROBABILITY FOR INDIVIDUAL FINDINGS = 3.28E-5 No conflict was detected. This shows the output of the surprise detection program. In this case, the user is informed that no conflict is detected, i.e., the scenario is likely to be a good interpretive model for the evidence received UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Surprise Detection: Formal Definition • Let the evidence be a set of findings: e {Vi vi | i I } • The probability of the evidence in the situation-specific scenario is P S (e), where P S (.) is the distribution represented in the situation-specific scenario • The probability of the evidenceI in the model in which all variables are independent is P (e) iI P S (Vi vi ) • The evidence is surprising if P S (e) P I (e) • The conflict index is defined as cs log[ P I (e) / P S (e)] • The probability under P S that c s is greater than Kis K 2 K P I (e) • Proof [Laskey, 1991]: 1 e P (e) P (e) 2 I I S P (e) P S (e) P ( e ) 2 2 K P S ( e) 2 K K K I K P S (e) P S (e) • If the conflict index is high, it is unlikely that the findings could have been generated by sampling the situation-specific scenario • It is reasonable to inform the analyst that no good explanatory model of the findings exists, and we are in the presence of a novel or surprising situation UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering The Independent Straw Model • In the absence of conflict, the joint probability of all evidence variables is greater than the product of the probabilities of each evidence variable. This is normally the case, because P(x|y) > P(x), and P(x,y) = P(x|y)P(y). UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Straw Models in Diagnosis A bipartite straw model is obtained by the elimination of some variables from a given model. In diagnosis by heuristic classification, one can divide variables into three sets: Target, Evidence, and Other UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering How to Compute the Conflict Index (I) • The marginal probability of each finding is the normal result of any probability computation algorithm UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering How to Compute the Conflict Index (II) • The probability of the evidence is a bi-product of probability update computed using the variable elimination or junction tree algorithms UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering P(e) from the Variable Elimination Algorithm P(| =“yes”, =“yes”) = X\ {} (P(|)* P(|)* P(|,)* P(|,)* P()*P(|)*P(|)*P()) Bucket : P(|)*P(), =“yes” Hn(u)=xnПji=1Ci(xn,usi) Bucket : P(|) Bucket : P(|,), =“yes” Bucket : P(|,) H(,) H() Bucket : P(|) H(,,) Bucket : P(|)*P() H(,,) Bucket : H(,) Bucket : H() H() *k P (e) = 1- k, where k is a normalizing constant P(| =“yes”, =“yes”) UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Sensitivity Analysis • Sensitivity analysis assesses how much the posterior probability of some event of interest changes with respect to the value of some parameter in the model • We assume that the event of interest is the value of a target variable. The parameter is either a conditional probability or an unconditional prior probability • If the sensitivity of the target variable having a particular value is low, then the analyst can be confident in the results, even if the analyst is not very confident in the precise value of the parameter • If the sensitivity of the target variable to a parameter is very high, it is necessary to inform the analyst of the need to qualify the conclusion reached or to expend more resources to become more confident in the exact value of the parameter UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Example: Case Study #4 Computing Sensitivity This is the output of the Sensitivity Analysis program on a situation-specific scenario for Case Study #4. In the context of the information already acquired, i.e., travel to dangerous places, large transfers of money, etc., the parameter that links financial irregularities to being a suspect is much more important for assessing the belief in Ramazi being a terrorist than the parameter that links dangerous travel to being a suspect. The analyst may want to concentrate on assessing the first parameter precisely. UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Sensitivity Analysis: Formal Definition • Let the evidence be a set of findings: e {Vi vi | i I } • Let t be a parameter in the situation-specific scenario • Then, P(e)(t ) t [Castillo et al., 1997; Jensen, 2000] • α and β can be determined by computing P(e) for two values of t • More generally, if t is a set of parameters, then P(e)(t) is a linear function in each parameter in t, i.e., it is a multi-linear function of t • Recall that P(a | b) P(a, b) P(b) • Then, P( A a | e)(t ) P( A a, e)(t ) P(e)(t ) • We can therefore compute the sensitivity of a target variable V to a parameter t by repeating the same computation with two values for the evidence set, viz. e and e {V v} UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering