VIEWS: 65 PAGES: 7 POSTED ON: 8/5/2011
CS188: Artiﬁcial Intelligence, Fall 2010 Written 3: Bayes Nets, VPI, and HMMs Due: Tuesday 11/23 in 283 Soda Drop Box by 11:59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written up individually. 1 Scouting Information (13 points) In StarCraft, you choose what types of units to build, and so does your opponent. (Knowledge of the actual game of StarCraft will not make this W C U (W, C) problem easier or harder.) As in rock-paper-scissors, some units are strong S P(S) +w 4000 against others, so if you scout your opponent, you can build the unit type t 0.8 +w b 5000 which counters their choice. Consider a simple version of the game where m 0.2 ¬w 0 you (the Zerg Overmind) are playing against a Terran. Your opponent’s ¬w b 500 strategy (S) is either to build tanks (S = t) or of Marines (S = m). You must create an army to counter (C) their strategy. Mutalisks (C = ) are strong against Tanks, while Banelings (C = b) are strong W S C P (W |S, C) against Marines. If you construct the right counter, you will probably +w t 1 win (W = +w), but if you don’t, you’ll likely lose (W = ¬w). Even ¬w t 0 before you scout, you know that your opponent is somewhat fond of Tanks. +w t b 0.5 ¬w t b 0.5 To make things more interesting, you are actually a professional Starcraft +w m 0.25 player, and so you aren’t just trying to win: you’re trying to make money! ¬w m 0.75 Therefore, you want to maximize your utility U , which we’ll measure in +w m b 1 dollars. Winning always carries a prize, but if you win by doing the strategy ¬w m b 0 the crowd wants (Banelings!) then you’ll receive an even larger reward from endorsements. Using all this information, you’ve created the tables to the right. (a) (2 pts) Draw the decision network over the variables U , C, W , and S. (b) (2 pts) Given the tables above, what are the expected utilities for your possible actions (C = or C = b)? What is the MEU? Make sure all three numbers are prominently visible in your answer. 1 Now you must decide whether or not to scout your opponent’s base. Scouting will F S P (F |S) reveal what is at the front (F ) of their base. If your opponent is going for Tanks r t 0.5 (S = t), you might see a few Marines (F = r), or you might see nothing at all φ t 0.5 (F = φ). However, if their strategy is to build Marines (S = m), then you’ll r m 1.0 deﬁnitely see Marines (F = r). You create the following probability distributions: φ m 0.0 (c) (1 pt) Augment your existing drawing in part (a) above with the variable F . Be sure to include any arc(s). Doing some calculations, you compute the expected utilities E[U (W, C = c)|F = f ] for each of the possible values of scouting information F and counter-strategies C. You also compute marginal probabilities P (F = f ): F C E[U (W, C)|F = f ] F P(F) r 3000 r 0.6 r b 3500 φ 0.4 φ 4000 φ b 2750 (d) (2 pts) Compute the VPI of observing F , given no other observations. (e) (1 pt) Now suppose that you have already observed that your opponent is using Tanks. What is the value of perfect information of observing F given S = t: VPI(F |S = t). (f ) (1 pt) What property of Bayesian networks explains the answer to (e)? 2 Finally, there’s even more information to consider! If your opponent is going for tanks (S = t), you will likely see then harvesting gas early (G = +g), while for marines (S = m) you likely won’t (G = −g). Therefore, you extend your model with a probability distribution P (G|S) and assume that G is conditionally independent of all other variables given S. (g) (1 pt) You would like to do inference in this full decision network using variable elimination. However, the U node denotes a function and not a random variable. To formally convert your decision network into a Bayes net, state the conditional probability distribution P (U |C, W ), over values of U , that encodes the same information as the current utility function U . You wish to compute P (U |G = +g, F = r), the distribution over utility values given that you have observed a Marine in the Front of the Base and gas being harvested early. (h) (2 pts) You start by eliminating S. What factors will you need to join and what factor will be created after performing the join, but before summing out S? Factors to join: Resulting factor: (i) (1 pt) For any variable, X, let |X| denote the size of X’s domain (the number of values it can take). How large is the table for the result factor from part (h), again before summing out S ? 3 2 HMM: Search and Rescue (12 points) You are an interplanetary search and rescue expert who has just received an urgent message: a rover on Mercury has fallen and become trapped in Death Ravine, a deep, narrow gorge on the borders of enemy territory. You zoom over to Mercury to investigate the situation. Death Ravine is a narrow gorge 6 miles long, as shown below. There are volcanic vents at locations A and D, indicated by the triangular symbols at those locations. A B C D E F ! ! The rover was heavily damaged in the fall, and as a result, most of its sensors are broken. The only ones still functioning are its thermometers, which register only two levels: hot and cold. The rover sends back evidence E = hot when it is at a volcanic vent (A and D), and E = cold otherwise. There is no chance of a mistaken reading. The rover fell into the gorge at position A on day 1, so X1 = A. Let the rover’s position on day t be Xt ∈ {A, B, C, D, E, F }. The rover is still executing its original programming, trying to move 1 mile east (i.e. right, towards F) every day. However, because of the damage, it only moves east with probability 0.5, and it stays in place with probability 0.5. Your job is to ﬁgure out where the rover is, so that you can dispatch your rescue-bot. (a) (2 pt) Three days have passed since the rover fell into the ravine. The observations were (E1 = hot, E2 = cold, E3 = cold ). What is P (X3 |hot1 , cold2 , cold3 ), the probability distribution over the rover’s position on day 3, given the observations? You decide to attempt to rescue the rover on day 4. However, the transmission of E4 seems to have been corrupted, and so it is not observed. (b) (2 pt) What is the rover’s position distribution for day 4 given the same evidence, P (X4 |hot1 , cold2 , cold3 )? 4 All this computation is taxing your computers, so the next time this happens you decide to try approximate inference using particle ﬁltering to track the rover. (c) (2 pt) If your particles are initially in the top conﬁguration shown below, what is the probability that they will be in the bottom conﬁguration shown below after one day (after time elapses, but before evidence is observed)? A B C D E F 00 11 000 111 00 11 00 11 00 000 111 00 11 11 00 000 111 00 11 11 000 111 00 11 00 11 000 111 00 11 ! ! A B C D E F 000 111 00 11 00 11 000 111 00 11 00 11 000 111 000 00 11 00 11 111 00 11 00 11 ! ! A B C D E F 00 11 00 11 00 11 000 111 00 11 00 11 00 00 11 11 00 00 000 111 00 11 00 11 11 11 (d) (2 pt) If your particles are initially in the conﬁguration: 000 111 000 111 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 000 111 000 111 00 11 000 111 00 11 00 11 00 000 11 111 00 11 00 11 000 111 000 111 00 11 00 11 00 11 00 11 00 11 000 111 00 11 00 11 00 11 ! ! and the next observation is E = hot, what is the probability that ALL will be at location D after this observation has been taken into account? Assume that the number of particles is ﬁxed at all times. (e) (2 pt) Your co-pilot thinks you should model P (E|X) diﬀerently. Even though the sensors are not noisy, she thinks you should use P (hot| no volcanic vent) = , and P (cold | volcanic vent) = , meaning that a hot reading at a non-volcanic location has a small probability, and a cold reading at a volcanic location has a small probability. She performs some simulations with particle ﬁltering, and her version does seem to produce more accurate results, despite the false assumption of noise. Explain brieﬂy why this could be. (f ) (2 pt) The transition model (east: 0.5 and stay: 0.5) turns out to be an oversimpliﬁcation. The rover’s position Xt is actually determined by both its previous position Xt−1 and also its current velocity, Vt , which randomly drifts up and down from the previous day’s velocity. Draw the dynamic Bayes net that represents this reﬁned model. You do not have to specify the domains of V, X, or E, just the structure of the net. 5 3 Credit Card Fraud Detection (practice, not graded) You are building a fraud detection system for a credit card company. They have the following records of purchases for which the fraud status is known; here A is a coarse amount of a purchase, B is the kind of business purchased from, C is the country (domestic or foreign), and F is whether the transaction is fraudulent. A B C F cheap candy domestic legit cheap jewelry foreign fraud medium bike domestic legit expensive jewelry domestic fraud medium bike domestic legit cheap game domestic legit expensive computer foreign fraud You decide to build a Naive Bayes classiﬁer using these samples. (a) Using the unsmoothed relative frequency estimates, what is the classiﬁer’s posterior distribution for the example (cheap, computer, foreign)? (b) Using relative frequency estimates smoothed with add-one Laplace smoothing, what is the classiﬁer’s pos- terior distribution for the same example? Assume that there are no values for any of the random variables that are not present somewhere above. (c) For add-k Laplace smoothing, what will the classiﬁer’s posterior distribution approach as k → ∞? (Assume the prior distribution over classes is also smoothed.) You train your classiﬁer on a larger set of examples and deploy it for the company. You notice that the posteriors are overly peaked; that is, the classiﬁer tends to overestimate the probability of fraud. You decide the source of this overconﬁdence is that the independence assumptions are too strong. (d) Extrapolating from this sample, what single arc would be a best choice to add to the Naive Bayes network to reduce this overconﬁdence? Justify your answer very brieﬂy. 6 You decide to try out a perceptron classiﬁer on your original training set. You use an indicator feature for each outcome of each variable (as we did in lecture), and you choose to use the multiclass perceptron (even though there are only two classes here). You break ties by predicting fraud. (e) What will the weights be for the classiﬁer after one pass through the data? Class cheap medium expensive candy jewelry bike game computer domestic foreign fraud legit (f ) Will the perceptron ever converge on this training data? Brieﬂy justify your answer. 7