Document Sample

A Declaration of Independence? Peter Flanagan-Hyde Random Variables, Simulations, and the Law of Cosines Phoenix Country Day School peter.flanagan-hyde@pcds.org Abstract Many teachers demonstrate many of the concepts of random variables through simulation. However, the concept of independence doesn’t lend itself to this approach – randomly generated sets of values will not be independent, even when the conditions of the simulation include independent random variables. This leads to interesting connections to other mathematical topics, including the Law of Cosines, vectors, and projectile motion. Combining Variation in Random Variables In an introductory statistics class, students learn the following, which forms the basis of most of statistical inference: If X and Y are two independent random variables, then the sum X + Y (or difference X – Y) has variance given by: X Y XY (or X Y XY ) 2 2 2 2 2 2 These bear at least a formal similarity to the more famous Pythagorean Theorem, which forms the basis of most calculations of length: If a and b are legs of a right triangle, then the third side c has length given by a 2 b 2 c2 Is there anything to the symmetry in these two important theorems? The discussion that follows explains answers this question in the affirmative. A Scenario: Life plus Monopoly The board game Life uses a spinner numbered 1 – 10, and the game Monopoly uses two dice, numbered 1 – 6. Imagine a hybrid game that borrows from these elements, played by spinning the Life spinner and throwing one of the Monopoly dice and subtracting the results. Your token is then advanced by this difference, making for an interesting game: you generally move forward, but might move backwards or even not move at all. How does the spaces advanced relate to the individual outcomes of the spinner and the die? At the right is a simulation of 20 plays of this game done on a TI calculator. To follow along with exactly these numbers on your calculator, reset the random number generator as shown on the top screen; otherwise, you’ll generate a different simulation. In the notation of the preceding section, X is the outcome of a spin, with 20 examples stored in L1, and Y is the outcome of 20 throws of a die, stored in L2. Twenty examples of the difference X – Y are stored in L3. 2 Can we use this simulation to show that the variances add, sX sY sXY ? 2 2 Here is a tabulation of the variables that we have: Random Mean Standard Variance Variable deviation X (L1) x 5.65 sX 3.4683 2 s X 12.0289 Y (L2) y 3.45 sY 2.0641 2 sY 4.2605 X – Y (L3) x y 2.2 sX Y 3.5333 2 s X Y 12.4842 The means are quite close to expected (5.5 and 3.5) and the mean of the difference is a demonstration of the property, true for all random variables, X – Y X Y . The variances, however are another story: the sum of the variances of X and Y are not even close to the variance of the difference! 2 2 2 sX sY sX Y 12.0289 4.2605 12.4842 Let’s explore this discrepancy: 2 2 2 sX sY sX Y 12.0289 4.2605 3.8052 12.4842 The missing quantity is about 3.8052. Does this number have any meaning? The original theorem stated, “If X and Y are independent, then the variances add.” If the variances don’t add, then this implies that X and Y are not independent – despite the fact that we had set up the simulation that way. At the right is a scatterplot of the spin (X) and die (Y) values (there are only 17 points, since some combinations are repeated). This doesn’t show a striking pattern. But a linear regression reveals that the two lists of values are not independent, since the correlation between them isn’t zero – the correlation, 0.2658, measures an association. So now we have a case in which we might start a theorem, “If X and Y are not independent, then there is a discrepancy between the sum of the variances and the variance of the difference.” Not so neat. Can our analogy to the Pythagorean Theorem be of any assistance? That is, does 2 the sentence, “If a and b are the sides of a not-right triangle, then there is a discrepancy between the sum of the squares of the sides and the square of the third side.” make any sense? It does, if your students know the Law of Cosines! a 2 b 2 2ab cosC c2 Line this up with previous equations, replacing the a and b in the Law of Cosines with the standard deviations, sX and sY – reasonable by analogy, at least: a2 b2 2ab cosC c2 2 2 2 sX sY 2s X sY cosC s X Y 12.0289 4.2605 3.8052 12.4842 But what is the analogy to the angle C in the Law of Cosines? It’s easy enough to solve for cos C: 2sx sy cosC 3.8052 2(3.4683)(2.0641)cosC 3.8052 cosC 0.2658 This is a familiar number – it’s the correlation coefficient! This measured the degree to which the two lists of values were actually not independent. Of course, had this been zero, it would indicate that the lists were independent. In that case, the Law of Cosines reduces to the Pythagorean Theorem. Students who have taken a course in Physics may be familiar with the use of “independent” related to the idea of “perpendicular” in their study of projectile motion. The independence of horizontal motion and vertical motion is a classic demonstration that many students have seen. An explanation The correlation coefficient has a formula that includes the sum of a product of “standardized values” n xi x yi y n i1 sx sy z z x y r i1 n 1 n 1 The standardized values are calculated as the deviation from the mean, divided by the standard deviation. These measure the number of standard deviations from the mean for each spin or throw of the die. This is a linear transformation of the values in each of the lists. This is easily enough implemented on the calculator, as shown at the right. In another connection to work students may have done elsewhere, the indicated product of the two lists, shown in the second screen, creates a new list in which each element is the product of the corresponding elements in the original lists. The 3 sum then adds these values – but that is exactly how the dot product of two vectors is calculated. So now it appears that the correlation coefficient can be thought of as the dot product of two vectors, divided by some quantity, in this case 19. Why 19? It’s n – 1, which appears in the formula used to calculate the standard deviation of a list. This is familiar to students as well from their studies of vectors in a precalculus course. They may have seen a formula similar to the one below, that describes the angle between two vectors, u and v : v v uv cos v v u v So, if the numerator is like a dot product, is the denominator of the correlation coefficient like the product of the lengths of two vectors? The length of the vector is the square root of the sum of the squares of its components – for two dimensions this is the Pythagorean Theorem again, and for higher dimensions the formula is the same, just longer. The calculator screen at the right above shows that the “length” of the “vector” for both standardized lists is the same, 4.3589 in this example. The product of these “lengths” is indeed 19, or n – 1, the denominator in the formula for the correlation coefficient. Thinking about representing a list of random numbers as a vector in 20 dimensions is a little mind-boggling! But that’s just the beauty of this algebraic approach, where the form doesn’t change as the dimensions increase. Whatever the number of dimensions, the two “vectors” L1 and L2 determine a unique plane. These “vectors” can then be sketched in this plane, and the picture looks just like the illustration in many precalculus books, such as the picture shown at the right. The angle between the vectors is about 74.59°, which can be calculated as shown in the screen at the right, using the value for the correlation coefficient, r, that was produced in the linear regression. The screens at the right demonstrate that the scatterplot in the transformed space of the standardized list has exactly the same look as the original scatterplot. The only thing that has changed is the position and scale on the x- and y-axes. Standardizing is an example of a linear transformation of the plane, which will always preserved this look. The last screens show that the regression line in the transformed space has the same look as the previous regression line. The slope isn’t exactly the same, since the axes have been rescaled, but under any linear transformation of x and y the correlation coefficient is an invariant. 4 A Declaration of Independence? Peter Flanagan-Hyde Random Variables, Simulations, and the Law of Cosines Phoenix Country Day School peter.flanagan-hyde@pcds.org Screenshots reset random number random spins and difference spin – die scatterplot (x for spin, y throws of the die for die) (ZoomStat) linear regression slope in not 0, nor is scatterplot with line correlation transformation to set up graph with exactly the same as linear regression standardize spins and standardized lists above, after rescaling die with ZoomStat same correlation as rescaled line is above, and in the “z- identical space” slope = r correlation calculated “length” of angle between vectors, output of program to using the formula standardized vectors = 90° = independent repeat simulation n 1 5 A Declaration of Independence? Peter Flanagan-Hyde Random Variables, Simulations, and the Law of Cosines Phoenix Country Day School peter.flanagan-hyde@pcds.org Repeating the Simulation The lists that have been used in this demonstration, along with a program that allows easy repetition of the simulation to explore the distribution of correlation coefficients and angles between vectors is available in the TI group INDEPRV. The lists are stored with the names listed below. You can reproduce these lists using the command 0→rand, then running the program. To start the random numbers at an arbitrary point, I have students typically store their birthday or last 4 digits of their phone number to the random number seed. For example, using the last digits of my cell phone number, I’d enter 1591→rand. Since the “random” number generator isn’t really random, if you do this with a group each person should start with a different seed. The screenshot below shows a typical output of the program. Questions that can be explorations are the distribution of the correlation coefficient (should have a mean of 0) or the distribution of the angle (should have a mean of 90). Group INDEPRV LISTS: L1 = LSPIN 20 spins of the wheel L2 = LDIE 20 tosses of the die L3 = LDIF differences, SPIN - DIE L4 = LZSPIN standardized spins L5 = LZDIE standardized die L6 = LZDIF standardized differences pgrmSPINDIE 20→N randInt(1,10,N)→L₁ randInt(1,6,N)→L₂ 2-Var Stats L₁ ,L₂ Sx→A:Sy→B L₁ +L₂ →L₃ 1-Var Stats L₃ Sx→C - - Disp "CORRELATION",R Disp "ANGLE θ",θ I llustration of the spinner from the game Life 6

DOCUMENT INFO

Shared By:

Categories:

Tags:
the Declaration of Independence, the Declaration, DECLARATION OF INDEPENDENCE, declaration independence, Take a minute, United States, paper summaries, 24 hours a day, analysis of the Declaration of Independence, Essay Answers

Stats:

views: | 3 |

posted: | 11/20/2010 |

language: | English |

pages: | 6 |

OTHER DOCS BY hcj

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.