Assignment preparation – regression lines and
We have seen how to plot out variables which we believe to be associated and to measure
the strength of that association. We also need to be able to use one variable to predict
another. In the assignment, we will be using data from a weather station to predict the data
from a crime scene.
The purpose of this tutorial is to see how to use Excel to both plot data and to give us the
equation which will allow us to make that kind of prediction.
We start with an example, where we examine data taken from an experiment in which the
circumferences and radii of several circular objects were measured. The data is displayed in
the screen shot below.
Of course, the equation associated with this data is C = 2r, meaning that the circumference
of a circle is equal to two times pi times the circle's radius. In this experiment, the
circumferences and radii are measured, and we use Excel to investigate the relationship
Enter the data onto the worksheet as shown in the above screen shot. Select the data, then
click on ‘Insert’, choose ‘Scatter’ and then the graph with no line on it.
A graph should appear on your page that looks like this (except that you will have different
You now want to improve the labels. Click on ‘Layout’, then ‘Axis titles’ and then either
horizontal or vertical axis titles:
Do the same thing for ‘Chart Title’
Now you may want to add a line to represent the association between the two variables. This
is not a line that joins the dots, but rather one that is the ‘best fit’ between the points. To do
this, right-click on any of the data points and select ‘Add Trendline’:
On the Format Trendline box, select Linear; Display Equation on Chart; Display R-squared
value on chart
When the OK button is pressed the best fit line is drawn and the equation of the line and R-
squared value will be displayed on the graph. It will look something like this. You can move
the equation by clicking and dragging it to the desired location.
The R-squared value is actually the square of the correlation coefficient. The correlation
coefficient, R, gives us a measure of the reliability of the linear relationship between the x and
y values. A value of R = 1 indicates an exact linear relationship between x and y. Values of R
close to 1 indicate excellent linear reliability. If the correlation coefficient is relatively far away
from 1, the predictions based on the linear relationship, y = mx + b, will be less reliable.
The equation displayed on the graph is y = 6.179x + 0.2327. To change ‘x’ and ‘y’ to the
names of your own variables, click on the equation and edit it.
The end result should be a graph that looks like this:
C = 6.179r means that the slope of the graph is 6.179. That is fairly close to 2 x π. Make sure
you understand why.
For our purposes, what we need to understand is that, once we have a regression equation
(which Excel does for us), we can predict the variable on the y axis from the variable on the x
axis. In this particular case, we take the variable on the x axis, multiply it by 6.179, add
0.2327, and we have the variable on the y axis. This will allow you to estimate the
temperature in one location based on the temperature in another location.