DEDA at ITU Spring 2010
Handling data and making graphs in SPSS
SPSS (temporarily PASW) is a computer program used for statistical analysis.
Between 2009 and 2010 it was called PASW (Predictive Analytics SoftWare).
The company announced July 28, 2009 that it was being acquired by IBM for
US$1.2 billion. As of January 2010, it became "SPSS: An IBM Company".
SPSS (an acronym for Statistical Package for the Social Sciences) was developed at Stanford
University in the late 1960s by a group of researchers who wanted to analyze a large amount of
social science data. The software supports the standard forms of data management, graphical
presentation and statistical analysis. Furthermore, there are different ways of communicating with
the software, from the user friendly graphic user interface (GUI), with pull-down menus, to more
hard core programming. In this course, you will get to know SPSS through the graphic user
interface. In SPSS, data are organized in a data sheet format, which is similar to the spreadsheet
format used by programs like Excel.
Starting SPSS and getting help
Knowing how to get help is essential when you use such complex software as SPSS. In addition to
the extensive help function which is part of the SPSS package, there are a number of introductions,
tutorials, user forums and newsgroups on the Internet. On the official SPSS website there is an
“SPSS Statistics Base User’s Guide 17.0” (640 pages) available for download. If you think that this
is a bit too heavy, there is also an “SPSS Statistics 17.0 Brief Guide” (only 224 pages). Anyhow,
you will scarcely be able to memorize all the details of these guides, so the online help provided in
the SPSS package will probably be your first choice in most situations.
On the computers at ITU, you will find SPSS on the All Programs sub-menu on the Start menu.
When you have started the program, before you do anything else, take your time and “take a walk”
through the built-in help facilities. SPSS has so many functions and options that you will soon get
lost if you don’t know how to get help, so spending some time to get acquainted with the help
function is a good investment.
1. Open the Help pull-down menu.
a. Take a look at what’s on the menu.
b. Go to the Topics section.
c. Read the “Getting Help” page, and don’t miss the section on context-sensitive help –
sooner or later (probably sooner) you will need it!
d. Take a look at the table of contents on the left hand side (if it isn’t there, click on the
Contents tab) just to get an overview of the topics covered.
DEDA at ITU Spring 2010
2. Open the Tutorial and browse through the subsections “Sample Files” and “Opening a Data
File” in the Introduction section. (If you can’t open your data files, there will be no data
3. Use the help function to find out how to read (open) Excel files in SPSS. (You will need to
know how to do this in the next exercise).
4. Use the help function to find out about variable names and data labels. (In short, every data
field has both a name and a label. The name is usually shorter than the label, which is
intended to be more descriptive.)
5. Go to Help → Topics → Contents→ Base System → Data Preparation → Variable
Properties and find out about how to define variable properties. (You will need it in the
Data import and preparation
The perhaps most direct way to get your data into SPSS is to open an empty spreadsheet and just
enter your data by hand. Basically, you do this in the same way as in any other spreadsheet
software, except that you now also have to be careful about the definition of your variables. Excel,
for instance, does not care very much about what type of data you enter, but since SPSS is a
statistical software package, it does indeed care about data types. We will return to this topic later
on in the course, but briefly it means that it doesn’t make much sense to compute the average of the
letters of the alphabet or try to find the mean value of a giraffe, a zebra and a lion (what kind of
animal would that be?). Thus, since the data type determines what kinds of calculations are valid,
you have to tell SPSS what type of data you feed it with.
In this first session with SPSS you will continue to work with some data that you already know,
namely your results from the tabletop hockey experiment. An Excel file with the pooled results
from all the groups is available for download on the course webpage at
As a warm-up, you will do more or less the same type of analysis as you did in Excel. The
information given in the exercises will often be on a “need to know” basis, which means that you
should just try to follow the instructions – the explanations will come later. The important thing is
that you get started, not that you get everything right immediately.
1. Download the pooled tabletop hockey data from the course webpage.
2. Open the Excel file in SPSS (make sure that you get the column headings, i.e. the variable
names, right), highlight the Data Editor window and take a look at the data set, so that you
understand how the data are organized in the data sheet. Identify how columns and rows are
related to variables (factors) and cases (observations).
DEDA at ITU Spring 2010
3. Go to → Tutorial →Using the Data Editor and follow the tutorial from there and through
the sections Entering Numeric Data and Entering String Data. Since you have opened an
already existing data file, you don’t have to follow all the steps shown in the tutorial with
your own data, but the tutorial shows how data are organized in SPSS.
4. Return to the Data Editor window and select Variables from the View pull-down menu or
click the Variable View tab in the lower left hand corner of the window. The variables are
now presented by row. Look at the variable names, types, labels and values. You will find
that SPSS has not been able to guess all the properties of your variables correctly (although
it has made a decent try). It is now your job to give SPSS this information.
5. Go to Help → Tutorial →Using the Data Editor → Defining Data and follow the tutorial
from there and through the sections Adding Variable Labels, Changing Variable Type and
Format, Adding Value Labels for Numeric Variables and Adding Value Labels for String
Variables. You are now going to do more or less the same with your data set (but in a
slightly different way).
6. Return to the Data Editor window and select Define Variable Properties from the Data
pull-down menu and select all variables except Distance for scanning. (The variable
Distance is not categorical and therefore not suitable for scanning.) SPSS then scans the data
for information that can tell something about the properties of the different variables in the
data set. When the scan is finished, edit the properties for each variable in the list:
a. Group: Check that the measurement level is nominal (which means that the values of
this variable are names – in this case the names of the groups). Enter an appropriate
label (e.g. Work group), check the counts and the values in the corresponding
columns and finally select Automatic Labels before you select the next variable in
b. Run: Check that the measurement level is ordinal (which means that the values of this
variable have a natural order – in this case, run 1 comes before run, 2 etc.). Enter an
appropriate label (e.g. Shot number), check the counts and the values in the
corresponding columns and then select Automatic Labels before you select the next
variable in the list.
c. Shottype: Check that the measurement level is nominal and then check the label
(actually, SPSS has managed to suggest a label in this case). Check the values in the
count and value columns. Select the cell in the first row of the Label column and
type Drag shot. Then select the cell in the second row of the Label column and type
Slap shot. Since you have now manually given the labels for the different values of
this variable, you will not use the Automatic Labels function.
d. Position: Check that the measurement level is nominal (although you could argue that
the values of this variable, short and long, respectively, have a natural order and
therefore could be treated as ordinal variables). Enter an appropriate label (e.g. Puck
position). Check the values in the count and value columns. To manually enter the
value labels, select the cell in the first row of the Label column and type Long stick.
DEDA at ITU Spring 2010
Select the cell in the second row of the Label column and type Short stick. At last,
click the OK button.
7. Take a look at the Output window – it gives you a summary of the commands that SPSS
has just executed for you. It may look a bit cryptic at first, but it provides you with an
opportunity to check that SPSS actually did what you wanted it to do.
8. Look at the Data Editor window in Data View and notice the changes. Use View → Value
Labels to toggle between a display of values and value labels, respectively, in each column
(i.e. for each variable). Change to Variable view. If you click on a cell in the Values
column, a small button appears – click on it to see (and change, if you want) the value
Now you have imported and prepared your data in SPSS. The next step is to do something with the
data – like drawing graphs or computing statistical summaries. However, before we are ready for
that, we need to know how to select subsets from the large data set.
Selecting subsets of data
It would be nice if you could compare your results in SPSS with what you got in Excel. In Excel
you analyzed data only from your own group, so to do the same in SPSS, you will have to select the
cases pertaining to your group from the pooled data set which you imported. In SPSS this is done
with the Select Cases option in the Data pull-down menu. Once a subset of cases has been selected,
all the following operations and calculations will be applied only to the selected data set. Thus, you
will also need to know how to undo your selection.
In the exercise below group A is used as an example, but you can of course choose your own group
(or any other group, for that matter) when you work through the steps of the exercise.
1. Go to Select Cases in the Data pull-down menu:
a. In the Select frame of the dialogue box, select the If condition is satisfied option and
then click the If button.
b. You want to select the cases from group A only, so select the variable Group in the
list and click the arrow button to the box to the right.
c. In this box, you specify the condition for which cases should be selected. In this case,
you should type the formula Group = “A” and then click Continue.
d. In the Output frame of the dialogue box, select the Filter out selected cases option
and then click OK.
e. Take a look at your data (Data View) in the Data Editor window. If you scroll down,
you will notice that all cases which have been filtered out, are now mark by a slash
in the leftmost column. In addition, a new variable, called filter_$, has been created.
This variable has only two possible values: 1 for a selected case and 0 for a case
which has been filtered out. Such a variable is generally called an indicator variable,
DEDA at ITU Spring 2010
because it indicates which cases are “in” and which are “out”. From now on, all
operations, such as drawing graphs or calculating statistics, will be performed only
with the selected subset.
f. To get the data which have been filtered out back again, go to the Select Cases
dialogue box and select the option All cases in the Select frame.
Making a scatter plot
You can produce many types of graphs in SPSS. In this first encounter with SPSS, we will just try
to produce the same types of graphs that you made in Excel. Along the way, you will discover that
some tasks may require some additional data preparation. The first graph-making exercise will
simply be to produce a scatter plot (a graph where the observations are drawn as points) of the data
from your own group, and then we will proceed to make more complicated graphs. There are at
least three ways to produce the same graph in SPSS, and we will start with the Chart Builder which
lets you make graphs by simple drag-and-drop operations.
Let’s start by making a scatter plot of your own data from the tabletop hockey experiment.
1. Select the data from your group, as described above.
2. Open the Chart Builder in the Graphs menu:
a. A small box reminding you about the importance of defining your variables pops up.
You already have defined your variables, so you can safely click OK in this box, and
it will disappear.
b. In the Chart Builder dialogue box, the first thing to do is to select the type of graph
you want to draw. You do this in the Gallery frame. To make a scatter plot, just
select Scatter/Dot from the Choose from list. Miniature pictures of different types
of scatter plot will appear in the neighbour frame (they are called gallery charts in
c. Point on the second chart in the top row of this frame. You will see that it is called
Grouped Scatter. Drag this plot to the empty frame above. Apart from an example
preview of the chart, three boxes appear next to the chart in the frame. You decide
what to draw in the chart by dragging variables from the list to the left into these
d. You want the variable Run (or Shot number) to be represented on the x-axis, so drag
this variable from the Variables list on the left and drop it in the X-Axis box on the
right. Similarly, drag and drop the variable Distance into the Y-Axis box.
e. The Set Color box is still empty. In this box you put the variable which you want to
determine the colour grouping in the graph (i.e. the grouping variable). In the present
case, you could choose to colour the points either by shot type or by puck position,
so drag and drop one of the corresponding variables from the list on the left into the
Set Color box. (This plot function only accepts one grouping variable – you can try
and put both variables in the box and see what happens.)
DEDA at ITU Spring 2010
f. Now that you’ve filled in all the boxes pertaining to the type of graph you’ve chosen,
just click OK and the graph will appear in the Output window in a moment. If you
want to save or edit the graph, or copy it to another place (e.g. if you’re writing a
report), just right-click on the graph and a pop-up menu will show you the options
However, we haven’t quite yet reached our goal, which was to produce the same kind of graphs as
we did in Excel. To reach there, we have to do some “data management”.
Creating new variables
When you produced the graph as described above, you used one of the variables Shottype or
Position as grouping variable. Since the Chart Builder did not accept two grouping variables at the
same time, we could get around this problem by creating a new grouping variable, which indicates
which of the four factor combinations each case belongs to. This is, in a way, similar to the
indicator variable which was automatically created during the Select Cases operation – but now we
need a grouping variable with four different values to distinguish between the four different groups
Now hold on and follow the instructions closely, because this is going to be a bit technical – but it’s
well worth the effort! The two variables Shotttype and Position are both string variables. This
means that their values are strings, i.e. combinations of letters and other characters. The variable
Shottype can take the values drag and slap, and the variable Position can take the values long and
short. What we will do now is to create a new string variable, which will combine the string values
of Shottype and Position into new strings.
1. Select Compute Variable from the Transform pull-down menu. The Compute Variable
dialogue box appears.
a. In the Target Variable box write the name of the new variable (e.g. FactorComb).
b. Click the Type & Label button and type the label you want (e.g. Factor combination)
in the Type and Label dialogue box. To make sure that there is enough space for the
values of the new string variable, you’ll have to increase the width from the default
of 8 to at least 11 (you do this in the Type frame in the box). Then click Continue.
c. You are going to manipulate some strings, so select String in the Function group
box. A list of functions operating on strings appears in the Functions and Special
d. Since you are going to put some strings together, more specifically concatenate them,
select the function Concat from the list. When you have selected the function, it will
appear in the String Expression box at the top. In the expression, there are two
question marks which you have to replace with the two string variables whose values
you want to concatenate.
DEDA at ITU Spring 2010
e. Select the first grouping variable from the list on the left and click the arrow button.
The highlighted question mark is now replaced by the selected variable. Highlight
the second question mark and replace it with the other grouping variable.
f. You’re not quite finished yet. As the formula reads now, the result will be one single
word, like slaplong, which isn’t very readable. It would be nicer with a comma in
between. To put in a comma in the result, type “, “, after the comma in the formula
so that it reads like this: CONCAT(Shottype,", ",Position). Count the commas
carefully, or it won’t work!
g. Finally, click OK and return to the Data View of the Data Editor. Your new
grouping variable should be there. Scroll down the data sheet and check that it has
the correct values.
2. Draw a graph like the one in the previous exercise, but now with your new grouping
Making line charts
Sometimes you prefer to have your data points connected by lines rather than have them free-
floating like in the scatter plot.
Now that you have your new grouping variable, you can easily create the line chart that corresponds
to the scatter plot you just made in the last exercise. There are some details which you have to be
aware of, however.
1. Once again, select the Chart Builder from the Graphs pull-down menu.
a. This time, select Line and then Multiple Line from the Gallery.
b. Select the same variables for the x- and y-axes, and the grouping, as before.
c. Look carefully at the label on the y-axis. It might very well be that the mean of the y-
variable has been chosen by default. This is not what you want right now. To change
it, move to the Element Properties dialogue box which should have appeared
somewhere on your screen. In the Statistics frame, select Value from the pull-down
menu and then click Apply. Take a look at the y-axis box in the Chart preview
frame and check that the label contains nothing but the variable name before you
There is only one type of graph left from the collection of graphs you made in Excel –the inter-
action plot for comparison of the average results from each factor combination. The procedure is
very similar to what you did in the previous exercise. This time, however, put Shottype on the x-axis
and Position as grouping variable, and make sure that the Mean Distance is on the y-axis
(remember, you select this in the Statistics frame of the Element Properties dialogue box).
DEDA at ITU Spring 2010
Make another interaction plot, with Position on the x-axis and Shottype as grouping variable.
Compare this graph with the one you made in Excel. What calculations are “hidden” when you
make this kind of graph in SPSS?
Calculating summary statistics
What about the summary statistics (the mean, variance and standard deviation) that you calculated
with the built-in functions in Excel – how do you do that in SPSS? Whenever you want to do
statistical calculations in SPSS, the Analyze pull-down menu is the place to look. There are a
bewildering number of functions and options in this menu, so it takes some time (and, sometimes,
frustration) to find one’s way through to the right choice.
1. Select Reports from the Analyze pull-down menu.
a. Select Case Summaries from the sub-menu. A Summarize Cases dialogue box
b. Select Distance as the variable to analyse (summarise) and Factor combination as
c. Click the Statistics button. A Summary Report: Statistics dialogue box appears. In
the Statistics list on the left, select Mean, Variance, and Standard Deviation and
click the blue arrow to move them to the Cell Statistics list. Click Continue in this
box and then click OK in the Summarize Cases box.
d. Take a careful look at the results in the Output window and make sure that you
understand it. Compare these results with what you got in Excel: the averages should
be exactly the same, but there is a chance that the variances and standard deviations
will differ from the corresponding values in Excel. If that is the case, go back to
Excel and check the formulas in the corresponding cells. The explanation probably is
that you have used the function which calculates the variance based on the entire
population. If so, change it to the variance based on a sample (and correspondingly
for the standard deviation). Now all summary statistics computed in Excel should
have the same values as the corresponding statistics computed in SPSS.
1. Select Descriptive Statistics from the Analyze pull-down menu.
a. Select Descriptives from the sub-menu. A Summarize Cases dialogue box appears.
b. Select Distance as the variable to analyse.
c. Click the Options button. A Descriptives: Options dialogue box appears. Select
Mean, Sum, Variance, and Std.deviation and then click Continue in this box,
followed by OK in the Descriptives box.
d. Take a look at the results in the Output window and compare it with the results you
got in the previous exercise. What is it that SPSS has computed for you, and what is
the difference between these two ways to obtain summary statistics?