Make Friends with SPSS and SAS Page 1
Psyc930: Making Friends with SPSS and SAS
Overall goals of the course:
Get used to working with SPSS syntax (in addition to windows)
Get used to working with SAS syntax (instead of windows)
Skills to learn: read in data from different formats, merge and restructure data, transform
variables, maybe a little macro programming for even greater efficiency
Why work with syntax instead of point-and-click windows (in any program)?
Key idea: Save the PROCESS, not the PRODUCT DOCUMENTATION
When conducting analyses – know exactly how any analysis was conducted (i.e.,
which options were selected, with which versions of the variables, on what dataset)
When modifying datasets – know exactly how new variables were created, how
aberrant values were fixed, etc
You do not ever need to keep 8,000 versions of any dataset – keep the syntax instead
Make a point to save a „completed‟ analysis history to accompany manuscripts (just
trust me on this one)
The initial front-end time investment yields HUGE long-term gains in efficiency
Can „borrow‟ code to use across multiple projects (especially using macro variables)
No longer fear hearing the words “Can you do this again, but with _______?”
Make Friends with SPSS and SAS Page 2
Working with SPSS Syntax
Features of SPSS datasets:
Two views: data view (actual data) and variable view (meta data)
Hidden gem – DATA FILE COMMENTS (under UTILITIES menu) – can be used to
leave notes for future reference that get saved with the dataset (DOCUMENT command)
2 main types you‟ll use: numeric or string (text)
Store data as numeric whenever possible – string variables are a pain to work with
(and they are case-sensitive and space-sensitive, too)
Do not mix strings and numbers in one variable
Variable NAMES, variable LABELS, and variable VALUES:
SPSS allows up to 64 characters for names, but…
Variable NAMES should be short – try to stay under 8 characters (even though you
are allowed more) for two reasons: for use in other programs with 8-character
restrictions, and so you have room to add characters after transforming the variable
Variable NAMES are not case-sensitive (but will display that way)
Variable NAMES must start with letter or underscore, numbers are ok, but try to
avoid periods or special characters (incompatible in some other programs), avoid
ending in underscores; no spaces allowed
Variable LABELS are long (255 characters) – make them as long as you need to
document exactly what the variable means (you will thank yourself later);
I recommend including the name, label, and value information ALL in the label for
convenience – for example:
“Dead: Whether subject is dead as of 2004, 0=no, 1=yes”
Variable VALUES assign verbal labels to the specific numeric values – use values
instead of string variables wherever possible (note: SAS works differently with
variable values, so your variable values will not be imported from SPSS data – you‟ll
need to use PROC FORMAT in SAS to assign value labels for printing in tables)
Make Friends with SPSS and SAS Page 3
Missing values codes:
Missing value codes are usually unnecessary – if it‟s missing, leave it blank
If you must use missing value codes, remember to define them ahead of time
(otherwise SAS won‟t recognize them), pick numbers that will never show up in
the data (i.e., pick -99 instead of 99), and use the same codes across all your variables
Better strategy: If you need to keep track of different reasons for missingness (e.g.,
not there that day = -99, wouldn‟t answer the question = -98), it is better to create a
separate categorical variable to keep track of missingness reasons
Variable WIDTH versus COLUMNS:
WIDTH is how many characters are allotted for the values of the variable (including
decimals) – don‟t usually need to mess with WIDTH or DECIMALS unless you want
to reduce the size of a file
COLUMNS is how wide the variable is displayed – aesthetic concern only
Alignment is also an aesthetic concern only; measurement type rarely matters (such
as when making certain kinds of plots)
Customize your SPSS – Under EDIT menu, go to OPTIONS – here are some useful ones:
Under GENERAL: list variables in order or alphabetically as desired
Under VIEWER: change default font in output and titles
Under OUTPUT LABLES: select just names, just labels, or both to be displayed
Under CHART: select default fonts and style options for charts
Under PIVOT TABLES: output tables in APA style with “academic” look – another
one I like is “report” for pasting neatly in word/excel (many other similar options,
Relatively new features: use FILE HANDLES and DATASET options:
FILE HANDLE: Abbreviation for path where files reside
DATASET options control which open dataset gets called
Colors in SPSS syntax: BLUE for commands, red for errors (but not always).
Make Friends with SPSS and SAS Page 4
Working with SAS Syntax
Welcome to the wonderful world of SAS! SAS has a steeper learning curve than does SPSS, but
I think you‟ll find climbing to the top to be well worth the effort. Most of the data management
and analysis that SAS can do SPSS can also do (with some exceptions, of course), but there are
several programming-related features that make SAS more pleasant to work with than SPSS:
Enhanced editor uses color-coding to make writing syntax easier:
Comments are GREEN
Commands are BLUE
Labels, titles, and libraries are PINKY PURPLE
Entered data is shaded YELLOW
Variables, file names, and other user-entered text is BLACK
If you see RED, something is wrong
The use of LIBRARIES and temporary directories
Library = nickname for a physical location where permanent files are stored
You can define and reference multiple libraries simultaneously
By default SAS has a temporary “work” library: all files are deleted from the work
library when closing SAS unless explicitly saved to a permanent, user-defined library.
I always recommend immediately copying files over to the work library and
using that temporary copy instead!
That way, unnecessary (intermediate) data files are not saved as permanent files
Also, if you mess up, your original file is still intact – this brings us to a very
important difference between SAS and other programs like SPSS:
There is no ‘saving the data file’ in SAS: transformations happen immediately.
So if you mess up, you will not know it until it is too late; thus the importance of
using the data set stored in the temporary ‘work’ library.
You can always save the modified data file back into your permanent location as
needed, but you don‟t ever really need to – as long as you keep the syntax, all
transformations can be regenerated as needed, and thus you really only need the
Make Friends with SPSS and SAS Page 5
original file. The exception is when you have a HUGE file in which transformations
take a long time to run. In that case it might be worthwhile to save the final product
(or the subset you are working with) just to save time.
Only SAS datasets are recognized in libraries. To refer to other kinds of data, you can
use a macro variable as a placeholder for the file location via a %LET statement (see
example in Day 2 syntax).
Data files are referred to EXPLICITLY
It used to be that SPSS only allowed one data file open at a time, which was a pain.
Now SPSS is “helpful” by allowing more than one file… but this is even worse,
because it can be confusing to figure out in which file a given command will be
executed (and that‟s what the SPSS DATASET commands are for).
SAS has two main types of commands: DATA steps, and PROCs
All file transformations and variable modifications must happen inside a DATA step.
Thus, you always know which file is being modified because it‟s the one you specify.
If your data has already been read in, you must essentially re-define it as itself (using
the DATA and SET commands) in order to do further transformations on it.
PROCs (procedures) run things… PROC MEANS, PROC CORR, PROC REG…
You should always explicitly specify which data file is being used for each PROC. If
you do not, by default it will run it on the last one that something was done to, which
is just as bad as in SPSS.
SAS syntax can be used to generate plots
SPSS will make graphs for you through windows or syntax, but all the customization
(e.g., changing colors, line styles, fonts, labels) must be done through windows. This
gets annoying quickly, particularly if you have lots of them to make.
SAS syntax can be used to make any kind of plot you can think of, and you can write
syntax to customize absolutely every feature of the plot. SAS plotting can require a
huge learning curve, but with a few relatively basic options you can get nice-looking
plots to put in papers and presentations, and you can size them and put them in
whatever format directly that you need.
Make Friends with SPSS and SAS Page 6
Advanced programming features in SAS
Macro programming can be used to automate repetitive tasks. Whenever you find
yourself doing the same thing over and over again (e.g., running the same series of
models on different outcome variables, doing the same series of transformations to
different datasets), these are good candidates for macro programs.
Macro programming is available in both SPSS and SAS, but it generally seems
clunkier and less intuitive (to me) in SPSS than in SAS.
SAS DATA steps can include arrays and loops to automate repetitive variable
transformation tasks (again, also as available in SPSS as „vectors‟ but clunkier).
SAS has excellent built-in functions for working with string and date variables.
Another nice feature in SAS is the Output Delivery System (ODS). Again, SPSS has
something similar in 12.0+ called the Output Management System (OMS), but it is
clunkier as well. You can save SAS output as datasets, html, rich text, or pdf files.
The combination of macro programs + ODS is particularly powerful, because it can
enable you to generate many, many analyses, save the tables of output into datasets,
manipulate and combine the datasets, and then export them into something easy to
The ability to write syntax to make tables and figures is particularly helpful when revising
manuscripts, theses, and dissertations…. Change something? One click and all of your data
manipulation, analyses, and tables and pictures of results are updated!
Make Friends with SPSS and SAS Page 7
Tips for good, readable, and reusable syntax:
Use EXCESSIVE comments/documentation throughout (you‟ll thank yourself later)
Start with who wrote it, for what purpose, when (and when last updated)
Record which data files get used (use macro variables for file references – stay tuned)
Add variable labels (and values in SPSS) immediately upon creating new variables
Separate logical sections with blank lines, comment lines, etc
Use indentation to help delineate structure (not required, but easier to read)
Use capital letters for command words (not required, but easier to read)
Test/allow for all contingencies (and physically LOOK at the data after each step)
Comments can be used to „shut off‟ parts of code and yet keep it all in the same file
Avoid „hard coding‟ wherever possible by using macro variables (stay tuned)
General tips for de-bugging syntax:
All commands and variable names spelled correctly?
Refer to correct version of file? Or correct version of file „in focus‟ in SPSS?
Do the variables you are referring to exist yet (in the file you are working with)?
Comments shut off correctly?
Do you have a command terminator (period in SPSS, semi-colon in SAS)?
In SAS (and SPSS) – Are the colors ok? Remember: red = wrong
In SAS – Check the log – just because it runs doesn‟t mean it‟s right
In SAS – Is your data set open? It must be closed to use it in most cases.
In SPSS – Did nothing happen? Check to see if the words “transformations pending”
appear at the bottom of the data screen – if so, you are missing an “execute.”
In SAS – Did nothing happen? Does it say that something is still “running” at the top of
the screen – if so, you are missing a “run;”