# Correlation Coefficient spr09 by 23K0Jb

VIEWS: 52 PAGES: 15

• pg 1
```									Stat 390                                                                                       April 15, 2009

Topic 27: Correlation Coefficient1
In 1970, the United States Selective Service instituted a draft to decide which young men would be
forced to join the armed forces. Wanting to be completely fair, they used a random lottery process that
assigned draft numbers to birthdays: those born on days with low draft numbers were drafted. But was
the lottery process carried out in a fair, truly random manner? In this topic, you will learn a new
technique for analyzing such data and answering this question.

Overview
In the previous topic, you saw how scatterplots provide useful visual information about the relationship
between two quantitative variables. Rather than relying on visual impressions alone, however, it is also
handy to have a numerical measure of the strength of association between two variables—just as you
made use of numerical summaries for various aspects of a single variable’s distribution. This topic
introduces you to such a measure and asks you to investigate some of its properties. This measure, one
of the most famous in statistics, is the correlation coefficient.

Activity 27-1: Car Data
Recall from Activity 26-3 in your previous handout, the nine scatterplots related to car data.

a. Check that you have this ordering of those scatterplots according to the direction and strength of
association revealed in them:

Negative                      None                     Positive
Strongest                    Weakest               Weakest                          Strongest
Letter of
D         G          A      H            C         E         I              F      B
Scatterplot
Correlation
Coefficient

The correlation coefficient, denoted by r, is a number that measures the
degree to which two quantitative variables are linearly associated.

The calculation of r is very tedious to do by hand, so you will begin by letting technology calculate
correlation coefficients while you explore their properties.

b. Use Minitab to calculate the value of the correlation coefficient between time to travel ¼ mile and
weight. Record this value in the preceding table in the column corresponding to scatterplot A:

1. Open the Cars99.MTW worksheet. Notice that the variables you are interested in are in
columns C10 and C6 respectively.

2. Select Stat  Basics Statistics  Correlation. Enter c10 and c6 as the Variables and then
click OK.

1                                                          rd
Excerpted from Workshop Statistics: Discovery with Data 3 Edition by Allan J. Rossman & Beth L. Chance, and
Minitab Companion for Workshop Statistics, by Julie M. Clark.
1
Stat 390                                                                                     April 15, 2009

Minitab Tip: Minitab reports the (Pearson) correlation coefficient and a “P-Value.” The sample
correlation coefficient (r) is the first number reported. The reported p-value is the result of an
“hypothesis test” of whether or not the correlation coefficient is different from zero. If the p-
value is confusing, you can ask Minitab not to report it by unchecking the box labeled Display p-
values.

Minitab Tip: As a shortcut, you can obtain the correlation coefficient by typing at the command
prompt: MTB > corr c10 c6.

c. Now use Minitab to calculate the value of the correlation coefficient for the other eight scatterplots.
Record these in the table on the previous page, below the appropriate letter (B-I). Does it matter
what in what order you give Minitab the variables?

d. Based on these results, what do you suspect is the largest value that a correlation coefficient can
assume? What do you suspect is the smallest value?
Largest:                                         Smallest:

e. Under what circumstances do you think the correlation coefficient assumes its largest or smallest
value? Hint: Consider what would have to be true of the curve in the scatterplot.

f.   How does the value of the correlation relate to the direction of the association?

g. How does the value of the correlation relate to the strength of the association?

These examples should convince you that a correlation coefficient has to be between -1 and +1, and it
equals one of those values only when the observations form a perfectly straight line. The sign of the
correlation coefficient reflects the direction of the association (e.g., positive values of r correspond to a
positive linear association). The magnitude of the correlation coefficient indicates the strength of the
association, with values closer to -1 or +1 signifying a stronger linear association.

Activity 27-2: Governors’ Salaries
The following table reports governors’ salaries for the fifty states (as of the year 2005), along with the
median housing prices for the states.
State         Governor’s    Median Housing           State         Governor’s    Median Housing
Salary         Price                                 Salary         Price
Alabama           \$96,361        \$85,100            Montana            \$96,462        \$99,500
Arizona          \$95,000       \$121,300             Nevada           \$117,000       \$142,000
Arkansas          \$77,028        \$72,800         New Hampshire        \$104,758       \$133,300
California       \$175,000       \$211,500           New Jersey         \$175,000       \$170,800
Colorado          \$90,000       \$166,600          New Mexico          \$110,000       \$108,100
Connecticut       \$150,000       \$166,900            New York          \$179,000       \$148,700
Delaware         \$114,000       \$130,400         North Carolina       \$123,819       \$108,300
Florida         \$129,060       \$105,500          North Dakota         \$88,926        \$74,400
Georgia         \$128,903       \$111,200              Ohio            \$132,292       \$103,700
Hawaii           \$94,780       \$272,700           Oklahoma           \$117,571        \$70,700

2
Stat 390                                                                                 April 15, 2009

State        Governor’s    Median Housing          State        Governor’s    Median Housing
Salary         Price                               Salary         Price
Idaho          \$98,500       \$106,300            Oregon           \$93,600       \$152,100
Illinois       \$150,691       \$130,800          Pennsylvania      \$144,416        \$97,000
Iowa         \$107,482        \$82,500          Rhode Island      \$105,194       \$133,000
Kansas         \$103,813        \$83,500          South Dakota      \$103,222        \$79,600
Kentucky        \$112,705        \$86,700           Tennessee         \$85,000        \$93,000
Louisiana        \$95,000        \$85,000             Texas          \$115,345        \$82,500
Maine           \$70,000        \$98,700              Utah          \$104,600       \$146,100
Maryland         \$145,000       \$146,000            Vermont         \$168,600       \$111,500
Massachusetts     \$135,000       \$185,700            Virginia        \$175,000       \$125,400
Michigan        \$177,000       \$115,600           Washington       \$148,035       \$168,300
Minnesota        \$120,303       \$122,400          West Virginia      \$95,000        \$72,800
Mississippi      \$122,160        \$71,400           Wisconsin        \$131,768       \$112,200
Missouri        \$120,087        \$89,900            Wyoming         \$105,000        \$96,600

a. What are the observational units for these data?

b. Use Minitab (Govenors05.mtw) to produce a scatterplot of governor’s salary vs. median housing
price. Describe the association (direction, strength, and form) between these two variables.

c. Based on this scatterplot, guess the value of the correlation coefficient between governor’s salary
and median housing price.

d. Use Minitab to calculate the value of this correlation. Record this value, and comment on the

e. Suppose Hawaii gives its governor a \$100,000 raise. Make this change in the data. Then reproduce
the scatterplot, and recalculate the value of the correlation coefficient. Has the correlation
coefficient changed much?

f.   Repeat part e after giving the governor of Hawaii an additional \$100,000 raise.

g. Now suppose Hawaii decides to make its governorship an unpaid position. Change the governor of
Hawaii’s salary to \$0. Then reproduce the scatterplot and recalculate the value of the correlation
coefficient. Has the correlation coefficient changed much?

h. Based on these calculations, would you say the correlation coefficient is a resistant measure of
association? Explain.

Activity 27-3: Televisions and Life Expectancy
Reconsider the data from Activity 26-6 about life expectancy and number of televisions per thousand
people in a sample of 22 countries. A scatterplot is reproduced here.

3
Stat 390                                                                                   April 15, 2009

a. Describe the direction and strength of the association between life expectancy and number of
televisions per thousand people in these countries. Also comment on whether or not this
association follows a linear form.

b. Based on this scatterplot, guess the value of the correlation coefficient between life expectancy and
televisions per thousand people in these countries.

c. Use Minitab (TVlife06.mtw) to calculate this correlation coefficient. How accurate was your
guess?

d. Would you say the value of the correlation coefficient is fairly high, even though the association
between the variables is not linear?

e. Does the fairly high value of the correlation coefficient provide evidence of a cause-and-effect
relationship between number of televisions and life expectancy? Explain.

Watch Out
• Correlation measures the degree of linear association between two quantitative variables. But
even when two variables display a nonlinear relationship, the correlation between them still
might be quite high. With these data, the relationship is clearly curved and not linear, and yet
the correlation is still fairly high. Do not assume from a high correlation coefficient that the
relationship between the variables must be only linear. Always look at a scatterplot, in
conjunction with the correlation coefficient, to assess the form (linear or not) of the association.

• No matter how close a correlation coefficient is to ±1, and no matter how strong the
association between two variables, a cause-and-effect conclusion cannot necessarily be drawn
from observational data. There are far more plausible explanations for why countries with lots
of televisions per thousand people tend to have long life expectancies. For example, the
technological sophistication of the country is related to both number of televisions and life
expectancy.

4
Stat 390                                                                                    April 15, 2009

Activity 27-4: Guess the Correlation
This activity will give you practice at judging the value of a correlation coefficient by examining a
scatterplot. http://www.rossmanchance.com/applets/guesscorrelation/GuessCorrelation.html
a. Open the applet Guess the Correlation. Keep 15 for the Number of Points, and click New Sample.
The applet will generate some “pseudo-random data” and produce a scatterplot.

Based solely on the scatterplot, guess the value of the correlation coefficient. Enter your guess in
the Correlation Guess field in the applet, and click Enter. The applet then reports the actual value of
the correlation coefficient. Record your guess and the actual value in the first empty column of the
following table:

Repetition Number     1 2 3 4 5 6 7 8 9 10
Your Guess        1 2 3 4 5 6 7 8 9 10
Actual Correlation   1 2 3 4 5 6 7 8 9 10
b. Click New Sample to generate another scatterplot of pseudo-random data. Enter your guess for the
value of the correlation coefficient in the applet. Then record your guess and the actual value of the
correlation coefficient in the preceding table. Repeat for a total of 10 repetitions.

c. After the ten repetitions, guess the value of the correlation coefficient between your guesses for r
and the actual values of r.

d. From the applet’s pull-down menu below Show Graph Of, select Guess vs. Actual. The applet will
create the scatterplot of your ten guesses and the corresponding actual correlation coefficients and
will also report the correlation coefficient between your guesses and the actual values. Record this
correlation coefficient. Does the value surprise you?

e. Use the applet to examine a scatterplot of your errors vs. the actual values. Is there evidence you
are better at guessing certain correlation coefficient values than other values? Explain.

f.   Use the applet to examine a scatterplot of your errors vs. the repetition (trial) number. Is there
evidence your guesses were more accurate or less accurate as you went along? Explain.

5
Stat 390                                                                                       April 15, 2009

g. Suppose all of your guesses had been too high by exactly 0.1, what would the correlation coefficient
between your guesses and the actual values be? Hint: Think about what the scatterplot would look
like.

h. Repeat part g if your guesses had all been too low by exactly 0.5.

i.   If the correlation coefficient between your guesses and the actual values is 1.0, does this mean you
guessed perfectly every time? What does this value reveal about the utility of the correlation
coefficient as a measure of your guessing prowess? Explain.

Activity 27-5: House Prices
Reconsider the data on house prices from Activity 26-1. The mean house price is \$482,386, and the
standard deviation is \$79,801.5. The mean house size is 1288.1 square feet, and the standard deviation
is 369.191 square feet. You can gain some insight into how the correlation coefficient r measures
association by examining the formula for its calculation:
1 n  xi  x   yi  y 
r                
n  1 i1  sx   s y 

       
where xi denotes the ith observation of one variable, yi the ith observation of the other variable, x and y
the respective sample means, sx and sy the respective sample standard deviations, and n the sample size.
This formula says to standardize each x and y value into its z-score, multiply these z-scores together for
each observational unit, add those results, and finally divide the sum by one less than the sample size.
The following table begins the process of calculating the correlation between house price and size by
calculating the houses’ z-scores for price and size and then multiplying the results.

Address                Price(\$)   Price Z-score    Size (sq ft)   Size Z-score   Product of Z-scores
2130 Beach St.          311,000                        460            -2.243
2545 Lancaster Dr.     344,720          -1.725         720           -0.699             1.206
415 Golden West Pl.    359,500           -1.54         883           -1.097             1.69
990 Fair Oaks Ave.     414,000          -0.857         728           -1.517             1.30
845 Pearl Dr.          459,000          -0.293         926           -0.125             0.037
1115 Rogers Ct.        470,000          -0.155        1499            0.355            -0.055
579 Halcyon Rd.        470,000         --0.155        1419            -0.91             0.141
1285 Poplar St.        470,000          -0.155         952            0.571            -0.089
1080 Fair Oaks Ave.    474,000          -0.105        1014           -0.742             0.078
690 Garfield Pl.       475,000          -0.093        1615            0.885            -0.082
1030 Sycamore Dr.      490,000          -0.095        1664            1.018             0.097
620 Eman Ct.           492,000           0.120        1160           -0.347            -0.042
529 Adler St.          500,000           0.221        1545            0.696             0.154
646 Cerro Vista Cir.   510,000           0.346        1567            0.755             0.261
926 Sycamore Dr.       520,000           0.471        1176           -0.304            -0.143
227 S Alpine St.       541,000           0.734        1120           -0.455            -0.334
654 Woodland Ct.       567,000           1.067        1549            0.707             0.754
2230 Paso Robles St.   575,000           1.161        1540            0.682             0.792
2461 Ocean St.         580,000           1.223        1755
833 Creekside Dr.      625,000           1.787        1844           1.506              2.691

6
Stat 390                                                                                April 15, 2009

a. Calculate the z-score for the price of 2130 Beach St. and for the size of 2461 Ocean St. Then
calculate the product of the z-scores for these two houses. Show your calculations below and record
the results in the table.

b. The sum of the products turns out to equal 14.819. Use this information, and the fact that there are
20 houses in this sample, to determine the value of the correlation coefficient between house price
and size.

c. What do you notice about the size z-score for most of the houses with negative price z-scores?
Explain how the signs of these z-scores result from the strong positive association between house
price and size.

d. Confirm your calculation in part b by using Minitab (HousePricesAG.mtw) to calculate the value
of the correlation coefficient between house price and size.

Activity 27-6: Exam Score Improvements
Consider some data on hypothetical exam scores stored in the Minitab file ExamScores.mtw.

a. Use Minitab to produce a scatterplot of exam 2 score vs. exam 1 score. Comment on the direction,
strength, and form of the association revealed.

b. Use Minitab to calculate the correlation coefficient between exam 1 and exam 2.

c. Now suppose each student scores 10 points lower on exam 1 than she actually did. How would you
expect this result to affect the value of the correlation coefficient between exam 2 and exam 1?
Explain.

d. Use Minitab to make this change (subtract 10 points from everyone’s score on exam 1):
1.   Click in the Session window at the Command Prompt (MTB>).
2.   Type let c5 = c1 – 10
3.   Now type a title for column C5 in the Data window (something clever like “Exam 1-10”.)
4. Create a scatterplot of exam 2 vs. new exam 1 score and recalculate the correlation
coefficient. How did the correlation value change?

e. Now suppose each student scores twice as many points on exam 2 as she actually did. How would
you expect this result to affect the value of the correlation coefficient between exam 2 and exam 1?
Explain.

f.   Use Minitab’s let command to make this change: double everyone’s score on exam 2. (You will need
to use the * character to multiply in Minitab.) Store your results in column C6. Then reproduce the
scatterplot of this new exam 2 vs. new exam 1, and recalculate the correlation. How did the
correlation value change?

7
Stat 390                                                                                   April 15, 2009

These questions demonstrate another property of the correlation coefficient: It
does not change if the scale of measurement is altered by adding a constant or
multiplying by a constant.

g. Now consider a different (hypothetical) class of students. Suppose each student scores exactly 10
points higher on exam 2 than he/she does on exam 1. What do you think the value of the correlation
coefficient would be between exam 1 and exam 2? Explain your reasoning. Hint: Consider what the
scatterplot would look like.

h. Make up some hypothetical bivariate data in Minitab with the property described in part g. Hint:
Choose any values at all for the exam 1 scores, and then make sure each exam 2 score is 10 points
higher. Do this for at least 5 hypothetical students. Then use Minitab to produce a scatterplot and
calculate the correlation. Does this confirm the value you expected in part g, or do you need to

i.   Now suppose each student scores exactly twice as many points on exam 2 than he/she does on
exam 1. What do you think the value of the correlation coefficient would be between exam 1 and
exam 2? Explain your reasoning. Hint: Consider what the scatterplot would look like.

j.   Make up some hypothetical bivariate data in Minitab with the property described in part i. Then use
Minitab to produce a scatterplot and calculate the correlation. Does this confirm the value you
expected in part i, or do you need to revise your thinking?

Watch Out
• A correlation coefficient is a number! In fact, it is a number between + and -1, inclusive. While this may
seem obvious by now, many students say “the same” and do not give a number in response to the
question to part g.

• The slope, or steepness, of the points in a scatterplot is unrelated to the value of the correlation
coefficient. If the points fall on a perfectly straight line with a positive slope, then the correlation
coefficient equals 1.0 whether that slope is very steep or not steep at all. What matters for the
magnitude of the correlation is how closely the points concentrate around a line, not the steepness of a
line.

Activity 27-7: Draft Lottery (Self-Check Activity)
In 1970 the United State Selective Service conducted a lottery to decide which young men would be
drafted into the armed forces (Fienberg, 1971). Each of the 366 birthdays of the year was assigned a
draft number. Young men born on days assigned low draft numbers were drafted. The file
DraftLottery.mtw lists the draft number assigned to each birthday. The “sequential date” column
lists the birthday as a number from 1–366 (January 1 is coded as 1 and December 31 as 366).
a. What draft number was assigned to your birthday?

b. In a perfectly fair, random lottery, what should the correlation coefficient between draft number
and sequential date of the birthday equal? Explain.
8
Stat 390                                                                                 April 15, 2009

c. Use Minitab to produce a scatterplot of draft number vs. sequential date of the birthday. Based on
the scatterplot, guess the value of the correlation coefficient. Explain the reasoning behind your
guess.

d. Use Minitab to calculate the value of the correlation coefficient. Does its value surprise you? If so,
look back at the scatterplot to see if, in hindsight, its value makes sense. Summarize what the value
of this correlation coefficient reveals about how the draft numbers were distributed across
birthdays throughout the year.

e. Data for 1971 are also stored in the file DraftLottery.mtw. Examine a scatterplot, and calculate
the correlation coefficient between draft number and sequential date for that year’s lottery.

Solution
b. With a perfectly fair, random lottery, there should be no association between draft number and
sequential date for the birthday. In other words, these variables should be independent, so the
correlation coefficient would equal zero. With an actual lottery, you would not expect the correlation
coefficient to equal exactly zero, but it should be close to zero.
c. The scatterplot is shown here.

It’s hard to see a relationship between the variables in this scatterplot, so a reasonable guess for the
value of the correlation coefficient would be close to zero.
d. Minitab reveals the correlation coefficient to equal r = -0.226. This indicates a weak negative
association between draft number and sequential date. While not large, this correlation value is farther
from zero than most people expect. Looking at the scatterplot more closely, you can see there are few
points in the top right and bottom left of the graph. This result suggests few birthdays late in the year
were assigned high draft numbers, and few birthdays early in the year were assigned low draft numbers,
which means young men born late in the year were at a disadvantage and had a better chance of getting
a low draft number. Birthdays late in the year were not mixed as thoroughly as those earlier in the year,
so they tended to be selected early in the process and thereby assigned a low draft number.
e. The scatterplot for the 1971 draft lottery data is shown here.

9
Stat 390                                                                                   April 15, 2009

The correlation coefficient is 0.014, which is very close to 0. This value indicates there is no association
between draft number and sequential date, suggesting the lottery process was fair and random in 1971.
The mixing mechanism was greatly improved after the anomaly with the 1970 results was spotted.

Wrap-Up
In this topic, you discovered the correlation coefficient as a measure of the linear relationship between
two variables. Analyzing pairs of variables for the house data, you discovered some of the properties of
this measure. For example, a correlation value has to be between -1 and +1, inclusive. The sign of the
correlation coefficient reflects the direction of the association. The magnitude of the correlation
coefficient reflects the strength of the association, with correlation coefficients close to -1 or +1
indicating very strong association, and correlation coefficients close to 0 reflecting very weak linear
association. But also keep in mind that you discovered the correlation coefficient is not resistant to
outliers, as altering simply one state’s value for governor’s salary changed the value of the correlation
considerably. It is important to always accompany your interpretation of the correlation coefficient with
a scatterplot. You also learned how to calculate a correlation coefficient based on z-scores and gained
practice judging the value of a correlation based on a scatterplot. Finally, with the data on televisions
and life expectancy, you saw again that you should not infer a causal relationship between variables
based on a high correlation.
Some useful definitions to remember and habits to develop from this topic include:

• The correlation coefficient is a number that measures the direction and strength of linear association
between two quantitative variables.
• The correlation coefficient is not resistant to outliers. One very unusual point can produce a large
correlation coefficient even when most of the data reveals no pattern, or a small correlation coefficient
when most of the data follows a clear linear pattern.
• Always examine a scatterplot in addition to calculating a correlation coefficient. A clear nonlinear
relationship can have a small (close to zero) correlation, and a correlation can be close to -1 or +1, even
if the relationship follows a curve or other nonlinear pattern.
• Never forget a large correlation coefficient between two variables does not necessarily establish a
cause-and-effect relationship between those variables.

10
Stat 390                                                                                 April 15, 2009

Activity 27-8: Hypothetical Exam Scores
Consider the following scatterplots of hypothetical scores on two exams for Class A and Class B (the data
are also stored in the file HypoExams.mtw):

a. In class A, do most of the exam scores follow a linear pattern? Are there any exceptions?

b. In class B, are most of the exam scores scattered haphazardly with no apparent pattern? Are there
any exceptions?

c. Use Minitab to calculate the correlation coefficient between exam 1 score and exam 2 score for each
of these classes. Are you surprised at either of the values? Explain.

d. Describe how these scatterplots pertain to the issue of resistance of the correlation coefficient.

Now consider the following scatterplot of exam data for Class C:

e. Describe what the scatterplot reveals about the relationship between exam scores in class C.

11
Stat 390                                                                                    April 15, 2009

f.    Use Minitab to calculate the correlation coefficient between exam scores in class C. Is its value
higher than you expected? Explain what this example reveals about correlation.

Activity 27-9: Proximity to the Teacher
Consider the idea of studying whether students who sit closer to the teacher tend to have higher quiz
scores than students who sit farther away from the teacher. Suppose you measure distance from the
teacher and average quiz score for a group of students. Explain how you know each of the following
statements is in error:
a. The correlation between distance and quiz average is –1.8.
b. The correlation between distance and quiz average is –0.8, and the correlation between quiz
average and distance is –0.4.
c. The correlation is –0.8, so there is no association between distance and quiz average.
d. The correlation between quiz average and gender is –0.8.
e. The correlation between distance and quiz average is –0.8, so students who sit farther away tend to
score higher.
f. The correlation between distance and quiz average is –0.8, so sitting closer to the teacher must
cause students to score higher on quizzes.

Activity 27-10: Monthly Temperatures
Reconsider 26-10 and the data on average monthly temperatures in Raleigh, North Carolina:

Jan   Feb    Mar     Apr    May    Jun    Jul   Aug      Sept    Oct      Nov      Dec
Avg. Temp    39    42     50      59     67     74     78    77       71      60       51       43

The following scatterplot displays Raleigh’s average monthly temperature vs. the month number:

a. Does there appear to be any relationship between temperature and month in Raleigh? If so,
describe the relationship.
b. Use Minitab to calculate the correlation coefficient between these variables. Does this
correlation value seem to indicate a strong or a weak relationship?
c. Explain why the correlation is so close to 0 even though the scatterplot reveals a clear
relationship between temperature and month.

12
Stat 390                                                                                April 15, 2009

Activity 27-11: Planetary Measurements
Consider the data below on planetary measurements. The following scatterplot displays the period of
revolution around the sun (in earth days) vs. the distance from the sun (in millions of miles).

a. Describe the association between these variables as revealed in the scatterplot.
b. Would a straight line appear to be a reasonable summary of the relationship between revolution
and distance? Explain.
c. The correlation coefficient between revolution and distance turns out to equal 0.989. This value
is very close to 1. Does this value mean a straight line is the best model for a reasonable
summary of the relationship between revolution and distance? Explain.

Activity 27-12: Ice Cream, Drownings, and Fire Damage
a. Suppose a beach community keeps track of the amount of ice cream sold in a given month and the
number of drownings that occur in that month. Would you expect to find a negative correlation, a
positive correlation, or a correlation close to zero? Explain your reasoning.

b. If the community in part a were to find a strong positive correlation between ice cream sales and
drownings, would that mean ice cream causes drowning? If not, suggest an alternative explanation
(i.e., a confounding variable) for the strong association.

c. Explain why you would expect to find a positive correlation between the number of fire engines that
respond to a fire and the amount of damage done in the fire. Does this imply the damage would be
less extensive if fewer fire engines were dispatched? Explain.

Activity 27-13: Climatic Conditions
The following data, from the 1992 Statistical Abstract of the United States, pertain to a number of
climatic variables for a sample of 25 American cities. These variables measure long-term averages of
• January high temperature (in degrees Fahrenheit)
• January low temperature
• July high temperature
• July low temperature
• Annual precipitation (in inches)
• Days of measurable precipitation per year

13
Stat 390                                                                                     April 15, 2009

• Annual snow accumulation
• Percentage sunshine

City             Jan. High   Jan. Low   July High   July Low   Precip.    Days Precip.   Snow    Sun
Atlanta            50.4        31.5         88        69.5      50.77         115          2      61
Baltimore          40.2        23.4        87.2       66.8       40.6         113         21.3    57
Boston             35.7        21.6        81.8       65.1      41.51         126         40.7    58
Chicago             29         12.9        83.7       62.6      35.82         126         38.7    55
Cleveland          31.9        17.6        82.4       61.4      36.63         156         54.3    49
Dallas             54.1        32.7        96.5       74.1       33.7          78         2.9     54
Denver             43.2        16.1        88.2       58.6       15.4          89         59.8    70
Detroit            30.3        15.6        83.3       61.3      32.62         135         41.5    53
Houston             61         39.7        92.7       74.2      46.07         104         0.4     56
Kansas City        34.7        16.7        88.7       68.2      37.62         104          20     62
Los Angeles        65.7        47.8        75.3       62.8      12.01          35          0      73
Miami              75.2        59.2         89        76.2      55.91         129          0      73
Minneapolis        20.7         2.8         84        63.1      28.32         114         49.2    58
Nashville          45.9        26.5        89.5       68.9       47.3         119         10.6    56
New Orleans        60.8        41.8        90.6       73.1      61.88         114         0.2     60
New York           37.6        25.3        85.2       68.4      47.25         121         28.4    58
Philadelphia       37.9        22.8        82.6       67.2      41.41         117         21.3    56
Phoenix            65.9        41.2       105.9         81       7.66          36          0      86
Pittsburgh         33.7        18.5        82.6       61.6      36.85         154         42.8    46
St. Louis          37.7        20.8        89.3       70.4      37.51         111         19.9    57
Salt Lake City     36.4        19.3        92.2       63.7      16.18          90         57.8    66
San Diego          65.9        48.9        76.2       65.7       9.9           42          0      68
San Francisco      55.6        41.8        71.6       65.7       19.7          62          0      66
Seattle             45         35.2        75.2       55.2      37.19         156         12.3    46
Washington         42.3        26.8        88.5       71.4      38.63         112         17.1    56

Use Minitab to calculate the correlation coefficient between all pairs of these eight variables; the data
are stored in the file Climate.mtw. Hint: There are a total of 28such pairs of variables. It’s probably
easiest to record the correlation values in a table similar to the following:

Jan. High   Jan. Low   July High   July Low    Precip.   Days Precip.    Snow    Sun
Jan. High            xxx
Jan. Low             xxx        xxx
July High            xxx        xxx         xxx
July Low             xxx        xxx         xxx         xxx
Precip.              xxx        xxx         xxx         xxx        xxx
Days Precip.         xxx        xxx         xxx         xxx        xxx          Xxx
Snow                 xxx        xxx         xxx         xxx        xxx          Xxx         xxx
Sun                  xxx        xxx         xxx         xxx        xxx          Xxx         xxx    xxx

To compute all of the p-values simultaneously, select Stat  Basic Statistics  Correlation. Enter c2-c9
as the Variables. To simplify the output, remove the check from the box labeled Display p- values.
a. Which pair of variables has the strongest (either positive or negative) linear association? What is the
value of the correlation between those variables?
Variables:                                                                 correlation:

14
Stat 390                                                                                   April 15, 2009

b. Which pair of variables has the weakest (either positive or negative) linear association? What is the
value of the correlation between those variables?
Variables:                                                                 correlation:
c. Suppose you want to predict the annual snowfall for an American city and you are allowed to look at
that city’s averages for these other variables. Which variable would be most useful to you? Which
variable would be least useful?
Most useful:                                               Least useful:
d. Suppose you want to predict the average July high temperature for an American city and you are
allowed to look at that city’s averages for these other variables. Which variable would be most
useful to you? Which variable would be least useful?
Most useful:                                               Least useful:
e. Use Minitab to explore the relationship between annual snowfall and annual precipitation more
closely. Produce and comment on a scatterplot of these two variables.

Activity 27-14: Muscle Fatigue
Reconsider the matched-pairs study comparing muscle fatigue between men and women from Activity
23-5 (Hunter et al., 2004). In Activity 26-12, you analyzed a scatterplot of time until fatigue for men and
women.
a. Calculate the correlation coefficient between time until muscle fatigue for men and time until
muscle fatigue for women.
b. Comment on what this correlation coefficient suggests about whether or not men and women of
similar strength tend to have similar times until muscle fatigue.

15

```
To top