VIEWS: 52 PAGES: 15 POSTED ON: 9/16/2012 Public Domain
Stat 390 April 15, 2009 Topic 27: Correlation Coefficient1 In 1970, the United States Selective Service instituted a draft to decide which young men would be forced to join the armed forces. Wanting to be completely fair, they used a random lottery process that assigned draft numbers to birthdays: those born on days with low draft numbers were drafted. But was the lottery process carried out in a fair, truly random manner? In this topic, you will learn a new technique for analyzing such data and answering this question. Overview In the previous topic, you saw how scatterplots provide useful visual information about the relationship between two quantitative variables. Rather than relying on visual impressions alone, however, it is also handy to have a numerical measure of the strength of association between two variables—just as you made use of numerical summaries for various aspects of a single variable’s distribution. This topic introduces you to such a measure and asks you to investigate some of its properties. This measure, one of the most famous in statistics, is the correlation coefficient. Activity 27-1: Car Data Recall from Activity 26-3 in your previous handout, the nine scatterplots related to car data. a. Check that you have this ordering of those scatterplots according to the direction and strength of association revealed in them: Negative None Positive Strongest Weakest Weakest Strongest Letter of D G A H C E I F B Scatterplot Correlation Coefficient The correlation coefficient, denoted by r, is a number that measures the degree to which two quantitative variables are linearly associated. The calculation of r is very tedious to do by hand, so you will begin by letting technology calculate correlation coefficients while you explore their properties. b. Use Minitab to calculate the value of the correlation coefficient between time to travel ¼ mile and weight. Record this value in the preceding table in the column corresponding to scatterplot A: 1. Open the Cars99.MTW worksheet. Notice that the variables you are interested in are in columns C10 and C6 respectively. 2. Select Stat Basics Statistics Correlation. Enter c10 and c6 as the Variables and then click OK. 1 rd Excerpted from Workshop Statistics: Discovery with Data 3 Edition by Allan J. Rossman & Beth L. Chance, and Minitab Companion for Workshop Statistics, by Julie M. Clark. 1 Stat 390 April 15, 2009 Minitab Tip: Minitab reports the (Pearson) correlation coefficient and a “P-Value.” The sample correlation coefficient (r) is the first number reported. The reported p-value is the result of an “hypothesis test” of whether or not the correlation coefficient is different from zero. If the p- value is confusing, you can ask Minitab not to report it by unchecking the box labeled Display p- values. Minitab Tip: As a shortcut, you can obtain the correlation coefficient by typing at the command prompt: MTB > corr c10 c6. c. Now use Minitab to calculate the value of the correlation coefficient for the other eight scatterplots. Record these in the table on the previous page, below the appropriate letter (B-I). Does it matter what in what order you give Minitab the variables? d. Based on these results, what do you suspect is the largest value that a correlation coefficient can assume? What do you suspect is the smallest value? Largest: Smallest: e. Under what circumstances do you think the correlation coefficient assumes its largest or smallest value? Hint: Consider what would have to be true of the curve in the scatterplot. f. How does the value of the correlation relate to the direction of the association? g. How does the value of the correlation relate to the strength of the association? These examples should convince you that a correlation coefficient has to be between -1 and +1, and it equals one of those values only when the observations form a perfectly straight line. The sign of the correlation coefficient reflects the direction of the association (e.g., positive values of r correspond to a positive linear association). The magnitude of the correlation coefficient indicates the strength of the association, with values closer to -1 or +1 signifying a stronger linear association. Activity 27-2: Governors’ Salaries The following table reports governors’ salaries for the fifty states (as of the year 2005), along with the median housing prices for the states. State Governor’s Median Housing State Governor’s Median Housing Salary Price Salary Price Alabama $96,361 $85,100 Montana $96,462 $99,500 Alaska $85,776 $144,200 Nebraska $85,000 $88,000 Arizona $95,000 $121,300 Nevada $117,000 $142,000 Arkansas $77,028 $72,800 New Hampshire $104,758 $133,300 California $175,000 $211,500 New Jersey $175,000 $170,800 Colorado $90,000 $166,600 New Mexico $110,000 $108,100 Connecticut $150,000 $166,900 New York $179,000 $148,700 Delaware $114,000 $130,400 North Carolina $123,819 $108,300 Florida $129,060 $105,500 North Dakota $88,926 $74,400 Georgia $128,903 $111,200 Ohio $132,292 $103,700 Hawaii $94,780 $272,700 Oklahoma $117,571 $70,700 2 Stat 390 April 15, 2009 State Governor’s Median Housing State Governor’s Median Housing Salary Price Salary Price Idaho $98,500 $106,300 Oregon $93,600 $152,100 Illinois $150,691 $130,800 Pennsylvania $144,416 $97,000 Iowa $107,482 $82,500 Rhode Island $105,194 $133,000 Kansas $103,813 $83,500 South Dakota $103,222 $79,600 Kentucky $112,705 $86,700 Tennessee $85,000 $93,000 Louisiana $95,000 $85,000 Texas $115,345 $82,500 Maine $70,000 $98,700 Utah $104,600 $146,100 Maryland $145,000 $146,000 Vermont $168,600 $111,500 Massachusetts $135,000 $185,700 Virginia $175,000 $125,400 Michigan $177,000 $115,600 Washington $148,035 $168,300 Minnesota $120,303 $122,400 West Virginia $95,000 $72,800 Mississippi $122,160 $71,400 Wisconsin $131,768 $112,200 Missouri $120,087 $89,900 Wyoming $105,000 $96,600 a. What are the observational units for these data? b. Use Minitab (Govenors05.mtw) to produce a scatterplot of governor’s salary vs. median housing price. Describe the association (direction, strength, and form) between these two variables. c. Based on this scatterplot, guess the value of the correlation coefficient between governor’s salary and median housing price. d. Use Minitab to calculate the value of this correlation. Record this value, and comment on the accuracy of your guess. e. Suppose Hawaii gives its governor a $100,000 raise. Make this change in the data. Then reproduce the scatterplot, and recalculate the value of the correlation coefficient. Has the correlation coefficient changed much? f. Repeat part e after giving the governor of Hawaii an additional $100,000 raise. g. Now suppose Hawaii decides to make its governorship an unpaid position. Change the governor of Hawaii’s salary to $0. Then reproduce the scatterplot and recalculate the value of the correlation coefficient. Has the correlation coefficient changed much? h. Based on these calculations, would you say the correlation coefficient is a resistant measure of association? Explain. Activity 27-3: Televisions and Life Expectancy Reconsider the data from Activity 26-6 about life expectancy and number of televisions per thousand people in a sample of 22 countries. A scatterplot is reproduced here. 3 Stat 390 April 15, 2009 a. Describe the direction and strength of the association between life expectancy and number of televisions per thousand people in these countries. Also comment on whether or not this association follows a linear form. b. Based on this scatterplot, guess the value of the correlation coefficient between life expectancy and televisions per thousand people in these countries. c. Use Minitab (TVlife06.mtw) to calculate this correlation coefficient. How accurate was your guess? d. Would you say the value of the correlation coefficient is fairly high, even though the association between the variables is not linear? e. Does the fairly high value of the correlation coefficient provide evidence of a cause-and-effect relationship between number of televisions and life expectancy? Explain. Watch Out • Correlation measures the degree of linear association between two quantitative variables. But even when two variables display a nonlinear relationship, the correlation between them still might be quite high. With these data, the relationship is clearly curved and not linear, and yet the correlation is still fairly high. Do not assume from a high correlation coefficient that the relationship between the variables must be only linear. Always look at a scatterplot, in conjunction with the correlation coefficient, to assess the form (linear or not) of the association. • No matter how close a correlation coefficient is to ±1, and no matter how strong the association between two variables, a cause-and-effect conclusion cannot necessarily be drawn from observational data. There are far more plausible explanations for why countries with lots of televisions per thousand people tend to have long life expectancies. For example, the technological sophistication of the country is related to both number of televisions and life expectancy. 4 Stat 390 April 15, 2009 Activity 27-4: Guess the Correlation This activity will give you practice at judging the value of a correlation coefficient by examining a scatterplot. http://www.rossmanchance.com/applets/guesscorrelation/GuessCorrelation.html a. Open the applet Guess the Correlation. Keep 15 for the Number of Points, and click New Sample. The applet will generate some “pseudo-random data” and produce a scatterplot. Based solely on the scatterplot, guess the value of the correlation coefficient. Enter your guess in the Correlation Guess field in the applet, and click Enter. The applet then reports the actual value of the correlation coefficient. Record your guess and the actual value in the first empty column of the following table: Repetition Number 1 2 3 4 5 6 7 8 9 10 Your Guess 1 2 3 4 5 6 7 8 9 10 Actual Correlation 1 2 3 4 5 6 7 8 9 10 b. Click New Sample to generate another scatterplot of pseudo-random data. Enter your guess for the value of the correlation coefficient in the applet. Then record your guess and the actual value of the correlation coefficient in the preceding table. Repeat for a total of 10 repetitions. c. After the ten repetitions, guess the value of the correlation coefficient between your guesses for r and the actual values of r. d. From the applet’s pull-down menu below Show Graph Of, select Guess vs. Actual. The applet will create the scatterplot of your ten guesses and the corresponding actual correlation coefficients and will also report the correlation coefficient between your guesses and the actual values. Record this correlation coefficient. Does the value surprise you? e. Use the applet to examine a scatterplot of your errors vs. the actual values. Is there evidence you are better at guessing certain correlation coefficient values than other values? Explain. f. Use the applet to examine a scatterplot of your errors vs. the repetition (trial) number. Is there evidence your guesses were more accurate or less accurate as you went along? Explain. 5 Stat 390 April 15, 2009 g. Suppose all of your guesses had been too high by exactly 0.1, what would the correlation coefficient between your guesses and the actual values be? Hint: Think about what the scatterplot would look like. h. Repeat part g if your guesses had all been too low by exactly 0.5. i. If the correlation coefficient between your guesses and the actual values is 1.0, does this mean you guessed perfectly every time? What does this value reveal about the utility of the correlation coefficient as a measure of your guessing prowess? Explain. Activity 27-5: House Prices Reconsider the data on house prices from Activity 26-1. The mean house price is $482,386, and the standard deviation is $79,801.5. The mean house size is 1288.1 square feet, and the standard deviation is 369.191 square feet. You can gain some insight into how the correlation coefficient r measures association by examining the formula for its calculation: 1 n xi x yi y r n 1 i1 sx s y where xi denotes the ith observation of one variable, yi the ith observation of the other variable, x and y the respective sample means, sx and sy the respective sample standard deviations, and n the sample size. This formula says to standardize each x and y value into its z-score, multiply these z-scores together for each observational unit, add those results, and finally divide the sum by one less than the sample size. The following table begins the process of calculating the correlation between house price and size by calculating the houses’ z-scores for price and size and then multiplying the results. Address Price($) Price Z-score Size (sq ft) Size Z-score Product of Z-scores 2130 Beach St. 311,000 460 -2.243 2545 Lancaster Dr. 344,720 -1.725 720 -0.699 1.206 415 Golden West Pl. 359,500 -1.54 883 -1.097 1.69 990 Fair Oaks Ave. 414,000 -0.857 728 -1.517 1.30 845 Pearl Dr. 459,000 -0.293 926 -0.125 0.037 1115 Rogers Ct. 470,000 -0.155 1499 0.355 -0.055 579 Halcyon Rd. 470,000 --0.155 1419 -0.91 0.141 1285 Poplar St. 470,000 -0.155 952 0.571 -0.089 1080 Fair Oaks Ave. 474,000 -0.105 1014 -0.742 0.078 690 Garfield Pl. 475,000 -0.093 1615 0.885 -0.082 1030 Sycamore Dr. 490,000 -0.095 1664 1.018 0.097 620 Eman Ct. 492,000 0.120 1160 -0.347 -0.042 529 Adler St. 500,000 0.221 1545 0.696 0.154 646 Cerro Vista Cir. 510,000 0.346 1567 0.755 0.261 926 Sycamore Dr. 520,000 0.471 1176 -0.304 -0.143 227 S Alpine St. 541,000 0.734 1120 -0.455 -0.334 654 Woodland Ct. 567,000 1.067 1549 0.707 0.754 2230 Paso Robles St. 575,000 1.161 1540 0.682 0.792 2461 Ocean St. 580,000 1.223 1755 833 Creekside Dr. 625,000 1.787 1844 1.506 2.691 6 Stat 390 April 15, 2009 a. Calculate the z-score for the price of 2130 Beach St. and for the size of 2461 Ocean St. Then calculate the product of the z-scores for these two houses. Show your calculations below and record the results in the table. b. The sum of the products turns out to equal 14.819. Use this information, and the fact that there are 20 houses in this sample, to determine the value of the correlation coefficient between house price and size. c. What do you notice about the size z-score for most of the houses with negative price z-scores? Explain how the signs of these z-scores result from the strong positive association between house price and size. d. Confirm your calculation in part b by using Minitab (HousePricesAG.mtw) to calculate the value of the correlation coefficient between house price and size. Activity 27-6: Exam Score Improvements Consider some data on hypothetical exam scores stored in the Minitab file ExamScores.mtw. a. Use Minitab to produce a scatterplot of exam 2 score vs. exam 1 score. Comment on the direction, strength, and form of the association revealed. b. Use Minitab to calculate the correlation coefficient between exam 1 and exam 2. c. Now suppose each student scores 10 points lower on exam 1 than she actually did. How would you expect this result to affect the value of the correlation coefficient between exam 2 and exam 1? Explain. d. Use Minitab to make this change (subtract 10 points from everyone’s score on exam 1): 1. Click in the Session window at the Command Prompt (MTB>). 2. Type let c5 = c1 – 10 3. Now type a title for column C5 in the Data window (something clever like “Exam 1-10”.) 4. Create a scatterplot of exam 2 vs. new exam 1 score and recalculate the correlation coefficient. How did the correlation value change? e. Now suppose each student scores twice as many points on exam 2 as she actually did. How would you expect this result to affect the value of the correlation coefficient between exam 2 and exam 1? Explain. f. Use Minitab’s let command to make this change: double everyone’s score on exam 2. (You will need to use the * character to multiply in Minitab.) Store your results in column C6. Then reproduce the scatterplot of this new exam 2 vs. new exam 1, and recalculate the correlation. How did the correlation value change? 7 Stat 390 April 15, 2009 These questions demonstrate another property of the correlation coefficient: It does not change if the scale of measurement is altered by adding a constant or multiplying by a constant. g. Now consider a different (hypothetical) class of students. Suppose each student scores exactly 10 points higher on exam 2 than he/she does on exam 1. What do you think the value of the correlation coefficient would be between exam 1 and exam 2? Explain your reasoning. Hint: Consider what the scatterplot would look like. h. Make up some hypothetical bivariate data in Minitab with the property described in part g. Hint: Choose any values at all for the exam 1 scores, and then make sure each exam 2 score is 10 points higher. Do this for at least 5 hypothetical students. Then use Minitab to produce a scatterplot and calculate the correlation. Does this confirm the value you expected in part g, or do you need to revise your thinking? i. Now suppose each student scores exactly twice as many points on exam 2 than he/she does on exam 1. What do you think the value of the correlation coefficient would be between exam 1 and exam 2? Explain your reasoning. Hint: Consider what the scatterplot would look like. j. Make up some hypothetical bivariate data in Minitab with the property described in part i. Then use Minitab to produce a scatterplot and calculate the correlation. Does this confirm the value you expected in part i, or do you need to revise your thinking? Watch Out • A correlation coefficient is a number! In fact, it is a number between + and -1, inclusive. While this may seem obvious by now, many students say “the same” and do not give a number in response to the question to part g. • The slope, or steepness, of the points in a scatterplot is unrelated to the value of the correlation coefficient. If the points fall on a perfectly straight line with a positive slope, then the correlation coefficient equals 1.0 whether that slope is very steep or not steep at all. What matters for the magnitude of the correlation is how closely the points concentrate around a line, not the steepness of a line. Activity 27-7: Draft Lottery (Self-Check Activity) In 1970 the United State Selective Service conducted a lottery to decide which young men would be drafted into the armed forces (Fienberg, 1971). Each of the 366 birthdays of the year was assigned a draft number. Young men born on days assigned low draft numbers were drafted. The file DraftLottery.mtw lists the draft number assigned to each birthday. The “sequential date” column lists the birthday as a number from 1–366 (January 1 is coded as 1 and December 31 as 366). a. What draft number was assigned to your birthday? b. In a perfectly fair, random lottery, what should the correlation coefficient between draft number and sequential date of the birthday equal? Explain. 8 Stat 390 April 15, 2009 c. Use Minitab to produce a scatterplot of draft number vs. sequential date of the birthday. Based on the scatterplot, guess the value of the correlation coefficient. Explain the reasoning behind your guess. d. Use Minitab to calculate the value of the correlation coefficient. Does its value surprise you? If so, look back at the scatterplot to see if, in hindsight, its value makes sense. Summarize what the value of this correlation coefficient reveals about how the draft numbers were distributed across birthdays throughout the year. e. Data for 1971 are also stored in the file DraftLottery.mtw. Examine a scatterplot, and calculate the correlation coefficient between draft number and sequential date for that year’s lottery. Comment on your findings. Solution a. Answers will vary. b. With a perfectly fair, random lottery, there should be no association between draft number and sequential date for the birthday. In other words, these variables should be independent, so the correlation coefficient would equal zero. With an actual lottery, you would not expect the correlation coefficient to equal exactly zero, but it should be close to zero. c. The scatterplot is shown here. It’s hard to see a relationship between the variables in this scatterplot, so a reasonable guess for the value of the correlation coefficient would be close to zero. d. Minitab reveals the correlation coefficient to equal r = -0.226. This indicates a weak negative association between draft number and sequential date. While not large, this correlation value is farther from zero than most people expect. Looking at the scatterplot more closely, you can see there are few points in the top right and bottom left of the graph. This result suggests few birthdays late in the year were assigned high draft numbers, and few birthdays early in the year were assigned low draft numbers, which means young men born late in the year were at a disadvantage and had a better chance of getting a low draft number. Birthdays late in the year were not mixed as thoroughly as those earlier in the year, so they tended to be selected early in the process and thereby assigned a low draft number. e. The scatterplot for the 1971 draft lottery data is shown here. 9 Stat 390 April 15, 2009 The correlation coefficient is 0.014, which is very close to 0. This value indicates there is no association between draft number and sequential date, suggesting the lottery process was fair and random in 1971. The mixing mechanism was greatly improved after the anomaly with the 1970 results was spotted. Wrap-Up In this topic, you discovered the correlation coefficient as a measure of the linear relationship between two variables. Analyzing pairs of variables for the house data, you discovered some of the properties of this measure. For example, a correlation value has to be between -1 and +1, inclusive. The sign of the correlation coefficient reflects the direction of the association. The magnitude of the correlation coefficient reflects the strength of the association, with correlation coefficients close to -1 or +1 indicating very strong association, and correlation coefficients close to 0 reflecting very weak linear association. But also keep in mind that you discovered the correlation coefficient is not resistant to outliers, as altering simply one state’s value for governor’s salary changed the value of the correlation considerably. It is important to always accompany your interpretation of the correlation coefficient with a scatterplot. You also learned how to calculate a correlation coefficient based on z-scores and gained practice judging the value of a correlation based on a scatterplot. Finally, with the data on televisions and life expectancy, you saw again that you should not infer a causal relationship between variables based on a high correlation. Some useful definitions to remember and habits to develop from this topic include: • The correlation coefficient is a number that measures the direction and strength of linear association between two quantitative variables. • The correlation coefficient is not resistant to outliers. One very unusual point can produce a large correlation coefficient even when most of the data reveals no pattern, or a small correlation coefficient when most of the data follows a clear linear pattern. • Always examine a scatterplot in addition to calculating a correlation coefficient. A clear nonlinear relationship can have a small (close to zero) correlation, and a correlation can be close to -1 or +1, even if the relationship follows a curve or other nonlinear pattern. • Never forget a large correlation coefficient between two variables does not necessarily establish a cause-and-effect relationship between those variables. 10 Stat 390 April 15, 2009 Activity 27-8: Hypothetical Exam Scores Consider the following scatterplots of hypothetical scores on two exams for Class A and Class B (the data are also stored in the file HypoExams.mtw): a. In class A, do most of the exam scores follow a linear pattern? Are there any exceptions? b. In class B, are most of the exam scores scattered haphazardly with no apparent pattern? Are there any exceptions? c. Use Minitab to calculate the correlation coefficient between exam 1 score and exam 2 score for each of these classes. Are you surprised at either of the values? Explain. d. Describe how these scatterplots pertain to the issue of resistance of the correlation coefficient. Now consider the following scatterplot of exam data for Class C: e. Describe what the scatterplot reveals about the relationship between exam scores in class C. 11 Stat 390 April 15, 2009 f. Use Minitab to calculate the correlation coefficient between exam scores in class C. Is its value higher than you expected? Explain what this example reveals about correlation. Activity 27-9: Proximity to the Teacher Consider the idea of studying whether students who sit closer to the teacher tend to have higher quiz scores than students who sit farther away from the teacher. Suppose you measure distance from the teacher and average quiz score for a group of students. Explain how you know each of the following statements is in error: a. The correlation between distance and quiz average is –1.8. b. The correlation between distance and quiz average is –0.8, and the correlation between quiz average and distance is –0.4. c. The correlation is –0.8, so there is no association between distance and quiz average. d. The correlation between quiz average and gender is –0.8. e. The correlation between distance and quiz average is –0.8, so students who sit farther away tend to score higher. f. The correlation between distance and quiz average is –0.8, so sitting closer to the teacher must cause students to score higher on quizzes. Activity 27-10: Monthly Temperatures Reconsider 26-10 and the data on average monthly temperatures in Raleigh, North Carolina: Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Avg. Temp 39 42 50 59 67 74 78 77 71 60 51 43 The following scatterplot displays Raleigh’s average monthly temperature vs. the month number: a. Does there appear to be any relationship between temperature and month in Raleigh? If so, describe the relationship. b. Use Minitab to calculate the correlation coefficient between these variables. Does this correlation value seem to indicate a strong or a weak relationship? c. Explain why the correlation is so close to 0 even though the scatterplot reveals a clear relationship between temperature and month. 12 Stat 390 April 15, 2009 Activity 27-11: Planetary Measurements Consider the data below on planetary measurements. The following scatterplot displays the period of revolution around the sun (in earth days) vs. the distance from the sun (in millions of miles). a. Describe the association between these variables as revealed in the scatterplot. b. Would a straight line appear to be a reasonable summary of the relationship between revolution and distance? Explain. c. The correlation coefficient between revolution and distance turns out to equal 0.989. This value is very close to 1. Does this value mean a straight line is the best model for a reasonable summary of the relationship between revolution and distance? Explain. Activity 27-12: Ice Cream, Drownings, and Fire Damage a. Suppose a beach community keeps track of the amount of ice cream sold in a given month and the number of drownings that occur in that month. Would you expect to find a negative correlation, a positive correlation, or a correlation close to zero? Explain your reasoning. b. If the community in part a were to find a strong positive correlation between ice cream sales and drownings, would that mean ice cream causes drowning? If not, suggest an alternative explanation (i.e., a confounding variable) for the strong association. c. Explain why you would expect to find a positive correlation between the number of fire engines that respond to a fire and the amount of damage done in the fire. Does this imply the damage would be less extensive if fewer fire engines were dispatched? Explain. Activity 27-13: Climatic Conditions The following data, from the 1992 Statistical Abstract of the United States, pertain to a number of climatic variables for a sample of 25 American cities. These variables measure long-term averages of • January high temperature (in degrees Fahrenheit) • January low temperature • July high temperature • July low temperature • Annual precipitation (in inches) • Days of measurable precipitation per year 13 Stat 390 April 15, 2009 • Annual snow accumulation • Percentage sunshine City Jan. High Jan. Low July High July Low Precip. Days Precip. Snow Sun Atlanta 50.4 31.5 88 69.5 50.77 115 2 61 Baltimore 40.2 23.4 87.2 66.8 40.6 113 21.3 57 Boston 35.7 21.6 81.8 65.1 41.51 126 40.7 58 Chicago 29 12.9 83.7 62.6 35.82 126 38.7 55 Cleveland 31.9 17.6 82.4 61.4 36.63 156 54.3 49 Dallas 54.1 32.7 96.5 74.1 33.7 78 2.9 54 Denver 43.2 16.1 88.2 58.6 15.4 89 59.8 70 Detroit 30.3 15.6 83.3 61.3 32.62 135 41.5 53 Houston 61 39.7 92.7 74.2 46.07 104 0.4 56 Kansas City 34.7 16.7 88.7 68.2 37.62 104 20 62 Los Angeles 65.7 47.8 75.3 62.8 12.01 35 0 73 Miami 75.2 59.2 89 76.2 55.91 129 0 73 Minneapolis 20.7 2.8 84 63.1 28.32 114 49.2 58 Nashville 45.9 26.5 89.5 68.9 47.3 119 10.6 56 New Orleans 60.8 41.8 90.6 73.1 61.88 114 0.2 60 New York 37.6 25.3 85.2 68.4 47.25 121 28.4 58 Philadelphia 37.9 22.8 82.6 67.2 41.41 117 21.3 56 Phoenix 65.9 41.2 105.9 81 7.66 36 0 86 Pittsburgh 33.7 18.5 82.6 61.6 36.85 154 42.8 46 St. Louis 37.7 20.8 89.3 70.4 37.51 111 19.9 57 Salt Lake City 36.4 19.3 92.2 63.7 16.18 90 57.8 66 San Diego 65.9 48.9 76.2 65.7 9.9 42 0 68 San Francisco 55.6 41.8 71.6 65.7 19.7 62 0 66 Seattle 45 35.2 75.2 55.2 37.19 156 12.3 46 Washington 42.3 26.8 88.5 71.4 38.63 112 17.1 56 Use Minitab to calculate the correlation coefficient between all pairs of these eight variables; the data are stored in the file Climate.mtw. Hint: There are a total of 28such pairs of variables. It’s probably easiest to record the correlation values in a table similar to the following: Jan. High Jan. Low July High July Low Precip. Days Precip. Snow Sun Jan. High xxx Jan. Low xxx xxx July High xxx xxx xxx July Low xxx xxx xxx xxx Precip. xxx xxx xxx xxx xxx Days Precip. xxx xxx xxx xxx xxx Xxx Snow xxx xxx xxx xxx xxx Xxx xxx Sun xxx xxx xxx xxx xxx Xxx xxx xxx To compute all of the p-values simultaneously, select Stat Basic Statistics Correlation. Enter c2-c9 as the Variables. To simplify the output, remove the check from the box labeled Display p- values. a. Which pair of variables has the strongest (either positive or negative) linear association? What is the value of the correlation between those variables? Variables: correlation: 14 Stat 390 April 15, 2009 b. Which pair of variables has the weakest (either positive or negative) linear association? What is the value of the correlation between those variables? Variables: correlation: c. Suppose you want to predict the annual snowfall for an American city and you are allowed to look at that city’s averages for these other variables. Which variable would be most useful to you? Which variable would be least useful? Most useful: Least useful: d. Suppose you want to predict the average July high temperature for an American city and you are allowed to look at that city’s averages for these other variables. Which variable would be most useful to you? Which variable would be least useful? Most useful: Least useful: e. Use Minitab to explore the relationship between annual snowfall and annual precipitation more closely. Produce and comment on a scatterplot of these two variables. Activity 27-14: Muscle Fatigue Reconsider the matched-pairs study comparing muscle fatigue between men and women from Activity 23-5 (Hunter et al., 2004). In Activity 26-12, you analyzed a scatterplot of time until fatigue for men and women. a. Calculate the correlation coefficient between time until muscle fatigue for men and time until muscle fatigue for women. b. Comment on what this correlation coefficient suggests about whether or not men and women of similar strength tend to have similar times until muscle fatigue. 15