VIEWS: 15 PAGES: 20 POSTED ON: 11/24/2011 Public Domain
Exercises on Expectations Let X be a random variable with the distribution: x Pr(X=x) 1 20% What is E[X] ? 2.75 2 15% #NAME? 3 35% 4 30% What is E[X2] ? The probability distribution of z Pr(Z=z) the random variable Z = X2 is 1 20% 4 15% and therefore E[X2] is 8.75 9 35% #NAME? 16 30% Excel also allows us to do this calculation directly from the original distribution table: 8.75 #NAME? Notice that E[X2] is not the same as ( E[X] )2 . 7.5625 #NAME? What's the cumulative probability distribution of X ? x Pr(X≤x) 1 20% 2 35% 3 70% 4 100% The LOOKUP function The function =LOOKUP(value, lookup-range, result-range) looks for "value" in the "lookup-range," and returns the corresponding entry in the "result-range." It expects the entries in the lookup-range to be in increasing order, and if the value isn't found, it selects the last entry in the lookup-range which is less than the sought-for value. Here's the IRS 2008 Individual tax table: taxable income taxes from … to … paid on excess plus $0.00 $8,025.00 10% $0.00 $8,025.00 $32,550.00 15% $802.50 $32,550.00 $78,850.00 25% $4,481.25 $78,850.00 $164,550.00 28% $16,056.25 $164,550.00 $357,700.00 33% $40,052.25 $357,700.00 35% $103,791.75 $100,000.00 taxable income $16,056.25 #NAME? $78,850.00 #NAME? 28% #NAME? $21,978.25 taxes #NAME? The formula in cell B23 looks in the range B13:B18 for the value $100,000. The last entry less than or equal to this is the fourth, so the fourth entry in E13:E18 is returned. The formula in cell B24 shows a variant of the LOOKUP function: If the result-range is not specified, it's assumed to be the same as the lookup-range. Simulating a Random Variable, Using its Tabulated Distribution How could we build a spreadsheet cell which simulates X ? One way is to use Excel's "lookup" function. We need a table containing the cumulative distribution of X. cumulative 3 simulated value of X distribution Pr(X≤x) #NAME? x Pr(X=x) 0% 1 20% 20% 2 15% 35% Notice the offset in the two ranges: If, for example, 3 35% 70% RAND() takes the value 0.4527, the third value in the 4 30% 100% range J9:J12 (35%) is the final entry which is less than or equal to 0.4527, and therefore the third value in the range H10:H13 (3) is returned. The same value (3) will be returned whenever RAND() is at least 0.35 and less than 0.70, i.e., 35% of the time. Let's test this simulation. is X=1? is X=2? is X=3? is X=4? estimated E[X] 0 0 1 0 $B$23 $C$23 $D$23 $E$23 $B$8 monitored cell 20.040% 14.974% 35.110% 29.876% 2.7482 mean 0.400302 0.356818 0.477317 0.457717 1.089366 sample standard deviation 0 0 0 0 1 minimum 1 1 1 1 4 maximum 100,000 100,000 100,000 100,000 100,000 number of simulation runs 0.248% 0.221% 0.296% 0.284% 0.0068 margin of error (95% confidence) ` Conditional Expectation What is E[ X | X is even ]? x Pr(X=x) 1 20% Given that X is even, the conditional probability distribution 2 15% of X is: 3 35% x Pr(X=x) 4 30% 2 33.33% 4 66.67% Therefore, E[ X | X is even ] is 3.3333 What is E[ X | X is odd ]? Given that X is odd, the conditional probability distribution of X is: x Pr(X=x) 1 36.36% 3 63.64% Therefore, E[ X | X is odd ] is 2.2727 What is E[X] ? Directly, it's 2.75 But we can also work it out "case by case." E[X] = E[ X | X is even ]∙Pr( X is even ) + E[ X | X is odd ]∙Pr( X is odd ) and this yields 2.75 Joint Probability Distributions The joint distribution of two (or more) random variables indicates the likelihood of the variables taking any particular combination of values. X 2 3 5 15 10 5% 12% 3% 18% Y 20 2% 7% 8% 9% 30 14% 13% 0% 9% 100% The table indicates, for example, that Pr( X=3, Y=30 ) = 13% . What is the (marginal) probability distribution of X ? x Pr(X=x) 2 21% ######## 3 32% ######## The marginal distributions of both X and Y 5 11% ######## can, of course, be written at the "margins" 15 36% ######## of the joint-distribution table: X 2 3 5 15 10 5% 12% 3% 18% 38% ######## Y 20 2% 7% 8% 9% 26% ######## 30 14% 13% 0% 9% 36% ######## 21% 32% 11% 36% What is Pr( X = 3 | Y = 10 ) ? By definition, Pr( X = 3 | Y = 10 ) = Pr( X = 3 and Y = 10 ) / Pr( Y = 10 ) = 31.58% ######## What is the probability distribution of X+Y ? Z = X+Y is a random variable with the probability distribution: z Pr(Z=z) 12 5% ######## What is E[ X+Y] ? 13 12% ######## 15 3% ######## From the table to the right: 27.13 22 2% ######## Equivalently, since 23 7% ######## E[ X+Y] = E[X] + E[Y] , 27.13 25 26% ######## 32 14% ######## 33 13% ######## For any random variables X and Y , 35 9% ######## E[ X+Y ] = E[X] + E[Y] . 45 9% ######## 100% What is E[ XY ] ? We could build a table for the distribution of W = XY , and then compute the expected value from that table. An Excel "array" formula does precisely the same arithmetic without requiring the intermediate step: 133.700 {=SUM(E24:H24*D25:D27*E25:H27)} Notice that E[X]∙E[Y] is 145.134 E[XY] is not necessarily equal to E[X]∙E[Y] ! 20 30 Finally, it copied 5% 2% 14% 1.0 0.8 8.4 And it applied the S 133.70 necessarily 100% necessarily 100% As is explained on the next tab, this formula was entered by typing Ctrl+Shift+Enter. Excel automatically expanded E24:H24 to 2 3 5 15 2 3 5 15 2 3 5 15 It also expanded D25:D27 to 10 10 10 10 20 20 20 20 30 30 30 30 Finally, it copied 5% 12% 3% 18% 2% 7% 8% 9% 14% 13% 0% 9% It then did the indicated arithmetic on a cell-by-cell basis: 1.0 3.6 1.5 27.0 0.8 4.2 8.0 27.0 8.4 11.7 0.0 40.5 And it applied the SUM function to this entire array: 133.700 Array formulas Working with Array Formulas (an optional Excel skill) The SUMPRODUCT function returns the x Pr( X = x ) sum of the products of matching terms 1 0.2 in two (or more) lists. For example, the 2 0.5 expected value of X, given the probability 5 0.3 distribution at the right, is 2.7 =SUMPRODUCT(F4:F6,G4:G6) However, there's a slicker way to do the same calculation: 2.7 =SUM(F4:F6*G4:G6), entered as an array formula. Array formulas are very powerful ... and potentially somewhat confusing. An "array" has some number of rows, and some number of columns. In an array formula, Excel finds the largest number of rows, R, in any array in the formula, and the largest number of columns, C. It then expands any array that is smaller than R-by-C in one of two ways. If R or C is 1, it copies the single row or column to fill an entire R-by-C array. Otherwise, it adds extra cells with the value NA (not available). Then it does all the arithmetic on a cell-by-cell basis. And it evaluates any formulas that take array inputs using the resulting array as the input. 1 2 3 4 4 8 12 5 5 10 15 6 6 12 18 90 All nine cells inside the box are part of the same array, defined by the single formula: {=C28:E28*B29:B31} The formula was entered as an array formula. First, I highlighted C29:E31. Then I typed =C28:E28*B29:B31, and entered the formula by pressing Cntrl+Shift+Enter. What Excel did was determine R = 3 (the number of rows in B29:B31) and C = 3 (the number of columns in C28:E28). It expanded B29:B31 into 4 4 4 5 5 5 6 6 6 and C28:E28 into 1 2 3 1 2 3 1 2 3 Page 9 Array formulas Then it computed the products of matching cells, and put them into the space allocated to the array formula. The formula in F31 is =SUM(C28:E28*B29:B31), again entered by pressing Cntrl+Shift+Enter. Page 10 Independent Random Variables X and Y are independent if knowledge of the value of one doesn't affect your assessment of the likelihood of the other taking any specific value. In the previous example, X and Y were not independent. X 2 3 5 15 10 5.00% 12.00% 3.00% 18.00% 38.00% Y 20 2.00% 7.00% 8.00% 9.00% 26.00% 30 14.00% 13.00% 0.00% 9.00% 36.00% 21.00% 32.00% 11.00% 36.00% Show that X and Y , defined by the joint distribution above, are not independent. It takes only a single example to show this: Pr( X=2 ) = 21.00% Pr( X=2 | Y=10) = 13.16% If X and Y are independent, then Pr( X=x )∙Pr( Y=y ) = Pr( X=x and Y=y ) for every pair of values x and y . U 2 3 5 15 10 7.98% 12.16% 4.18% 13.68% 38.00% V 20 5.46% 8.32% 2.86% 9.36% 26.00% 30 7.56% 11.52% 3.96% 12.96% 36.00% 21.00% 32.00% 11.00% 36.00% U and V have the same marginal distributions as do X and Y , respectively, but U and V are independent random variables. The marginal distributions of X and Y separately convey no information concerning whether or not X and Y are independent. Compute E[ UV ] and E[U]∙E[V] . E[ UV ] = 145.134 E[U]∙E[V] = 145.134 These two quantities are equal whenever U and V are independent. But, just because these two amounts are equal, we can't conclude that U and V are independent! S 2 3 5 15 10 21.00% 2.21% 0.00% 14.79% 38.00% T 20 0.00% 13.07% 2.65% 10.28% 26.00% 30 0.00% 16.72% 8.35% 10.93% 36.00% 21.00% 32.00% 11.00% 36.00% E[ ST ] = 145.134 E[S]∙E[T] = 145.134 But S and T are certainly not independent. And, of course, we saw a few tabs ago that E[ XY ] = 133.700 E[X]∙E[Y] = 145.134 x Pr(X=x) From the very first tab (with X having a different distribution): 1 20% 2 15% 3 35% Let Y = X , so the joint distribution is 4 30% X 1 2 3 4 1 20.00% 0.00% 0.00% 0.00% 20.00% 2 0.00% 15.00% 0.00% 0.00% 15.00% Y 3 0.00% 0.00% 35.00% 0.00% 35.00% 4 0.00% 0.00% 0.00% 30.00% 30.00% 20.00% 15.00% 35.00% 30.00% Then (as we've already computed) E[ XY ] = E[ X2 ] = 8.7500 E[X]∙E[Y] = ( E[X] )2 = 7.5625 In summary: If X and Y are independent, then E[ XY ] = E[X]∙E[Y] . However, in general, any of E[ XY ] < E[X]∙E[Y] , E[ XY ] = E[X]∙E[Y] , or E[ XY ] > E[X]∙E[Y] might be true. Plant expansion Comparing Plant Expansions The J.R. Ryland Computer company is considering a plant expansion that will enable the company to begin production of a new computer product. The company’s president must determine whether to make the expansion a medium- or large-scale project. An uncertainty is the demand for the new product, which for planning purposes may be low demand, medium demand, or high demand. The probability estimates for demand are 0.20, 0.50, and 0.30. respectively. Writing the annual profit in $000s, the firm’s planners have developed the following profit forecast for the medium- and large-scale expansion projects. medium-scale large-scale expansion profits expansion profits Prob Prob low $50 20% $0 20% demand medium $150 50% $100 50% high $200 30% $300 30% Compute the expected value for the profit associated with the two expansion alternatives. Which decision is preferred if the objective is to maximize expected profit? expansion profits probability medium large 20% low $50 $0 demand 50% medium $150 $100 30% high $200 $300 Answer: expected profit $145 $140 Note that the profit numbers themselves are, in reality, expectations - the expected profit (in the face of current uncertainty about future interest rates, energy prices, and the like) when demand is low, medium, or high - and our actual calculations above are actually of the form: E[profit] = E[ profit | low demand ]∙Pr( low demand ) + E[ profit } medium demand ]∙Pr( medium demand ) + E[ profit | high demand ]∙Pr( high demand ) . Page 13 It's a Kind of Magic Hairy Houdini, a hirsute magician, uses his long beard to conceal the machinations behind his tricks. His final trick is to “magically” create a number of pigeons that he releases into the air. (Really, they’re hidden in a pouch behind his beard.) Each time he attempts another release, he knows there’s a 90% he’ll successfully fool the audience. He figures, on the basis of past experience, that his end-of-show tips will add up to about $10 for each successful release if he fools the audience every time. However, if he fails to fool the audience on any release, the show will end immediately, and he’ll only receive about $6 in tips for each successful release. How many pigeons should he try to release (i.e., after how many consecutive successes should he stop voluntarily – equivalently, how many pigeons should he put in the pouch before his act)? Hint: This is not a "single-period inventory" problem. However, it yields to either a marginal-analysis approach, or brute force. The answer is on the next tab. It's a Kind of Magic (answer) 23 pigeons Approach 1 (marginal analysis): Approach 2 (brute force): given k successful releases, expected put n pigeons in pouch marginal profit from trying for one more Pr( fail first on k k pigeon k ) profit 1 $8.60 1 10.00% $0.00 2 $8.20 2 9.00% $6.00 3 $7.80 3 8.10% $12.00 4 $7.40 4 7.29% $18.00 5 $7.00 5 6.56% $24.00 6 $6.60 6 5.90% $30.00 7 $6.20 7 5.31% $36.00 8 $5.80 8 4.78% $42.00 9 $5.40 9 4.30% $48.00 10 $5.00 10 3.87% $54.00 11 $4.60 11 3.49% $60.00 12 $4.20 12 3.14% $66.00 13 $3.80 13 2.82% $72.00 14 $3.40 14 2.54% $78.00 15 $3.00 15 2.29% $84.00 16 $2.60 16 2.06% $90.00 17 $2.20 17 1.85% $96.00 18 $1.80 18 1.67% $102.00 19 $1.40 19 1.50% $108.00 20 $1.00 20 1.35% $114.00 21 $0.60 21 1.22% $120.00 22 $0.20 22 1.09% $126.00 23 -$0.20 23 0.98% $132.00 24 -$0.60 24 0.89% $230.00 25 -$1.00 25 0.80% $230.00 26 0.72% $230.00 27 0.65% $230.00 28 0.58% $230.00 29 0.52% $230.00 30 0.47% $230.00 31 0.42% $230.00 32 0.38% $230.00 33 0.34% $230.00 34 0.31% $230.00 35 0.28% $230.00 36 0.25% $230.00 37 0.23% $230.00 38 0.20% $230.00 39 0.18% $230.00 40 or more 1.64% $230.00 if we wish to examine cases involving the release of 40 pigeons or more, this table must be extended. if we wish to examine cases involving the release of 40 pigeons or more, this table must be extended. n E[profit] 23 $57.37 n $57.37 1 $9.00 2 $16.74 3 $23.38 4 $29.07 5 $33.92 6 $38.06 7 $41.56 8 $44.53 9 $47.03 10 $49.12 11 $50.86 12 $52.31 13 $53.49 14 $54.46 15 $55.24 16 $55.85 17 $56.33 18 $56.70 19 $56.97 20 $57.16 21 $57.28 22 $57.35 23 $57.37 24 $57.35 25 $57.30 26 $57.23 27 $57.14 28 $57.04 29 $56.92 30 $56.80 31 $56.67 32 $56.54 33 $56.41 34 $56.28 35 $56.15 36 $56.03 37 $55.91 38 $55.79 39 $55.68 Answer via Simulation success on first failure 13 time of first failure (and 0 if he succeeds at least 50 tim release still going? this try? on this try 13 time of first failure (set to 51 if the first 50 releases suc 1 1 1 0 2 1 1 0 5 10 15 20 3 1 1 0 $50.00 $100.00 $72.00 $72.00 4 1 1 0 5 1 1 0 $G$7 $H$7 $I$7 $J$7 6 1 1 0 $33.89 $48.97 $55.01 $56.80 7 1 1 0 $20.06 $39.57 $52.81 $60.29 8 1 1 0 $0.00 $0.00 $0.00 $0.00 9 1 1 0 $50.00 $100.00 $150.00 $200.00 10 1 1 0 100,000 100,000 100,000 100,000 11 1 1 0 12 1 1 0 $0.12 $0.25 $0.33 $0.37 13 1 0 13 14 0 1 0 15 0 1 0 Simulation Results 16 0 1 0 17 0 1 0 $60.00 18 0 1 0 19 0 0 0 20 0 1 0 $55.00 21 0 1 0 expected profit 22 0 1 0 $50.00 23 0 1 0 24 0 1 0 $45.00 25 0 1 0 26 0 1 0 $40.00 27 0 1 0 28 0 1 0 29 0 1 0 $35.00 30 0 1 0 31 0 1 0 $30.00 32 0 0 0 0 10 20 33 0 1 0 intended number of birds release 34 0 1 0 35 0 1 0 36 0 1 0 37 0 1 0 The expected profit curve appears to top out somewhere between 38 0 0 0 39 0 1 0 We'll see in a while that the expected number of successful releas 40 0 1 0 first failure is 9.0. Therefore, the expected profit, as the planned nu 41 0 1 0 grows larger and larger, is heading for $6∙9.0 = 42 0 1 0 (since, if he plans a large number of releases, he'll just about defin 43 0 1 0 act with a failure). 44 0 1 0 45 0 1 0 46 0 1 0 47 0 1 0 48 0 1 0 49 0 1 0 50 0 0 0 nd 0 if he succeeds at least 50 times) et to 51 if the first 50 releases succeed) 25 30 35 40 45 50 $72.00 $72.00 $72.00 $72.00 $72.00 $72.00 $K$7 $L$7 $M$7 $N$7 $O$7 $P$7 monitored cell $56.88 $56.50 $55.87 $55.21 $54.77 $54.46 mean $63.83 $65.25 $64.95 $63.59 $62.37 $61.28 sample standard deviation $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 minimum $250.00 $300.00 $350.00 $400.00 $450.00 $500.00 maximum 100,000 100,000 100,000 100,000 100,000 100,000 number of simulation runs $0.40 $0.40 $0.40 $0.39 $0.39 $0.38 margin of error (95% confidence) imulation Results 30 40 50 ended number of birds released rs to top out somewhere between 20 and 25 releases. cted number of successful releases before the expected profit, as the planned number of releases ng for $6∙9.0 = $54.00 r of releases, he'll just about definitely end his