INTRODUCTION TO SPSS FOR WINDOWS

Document Sample
INTRODUCTION TO SPSS FOR WINDOWS Powered By Docstoc
					INTRODUCTION TO SPSS
    FOR WINDOWS
      Version 15.0




                 Summer 2007
Contents
Purpose of handout & Compatibility between different versions of SPSS……………….. 1
SPSS window & menus…………………………………………………………………… 1
Getting data into SPSS & Editing data…………………………………………………….. 3
Reading an SPSS viewer/output (.spo) file & Editing your pout…………………………. 7
Saving data as an SPSS data (.sav) file…..………………………………………………... 8
Saving your output (statistical results and graphs)………………………………………… 9
Exporting SPSS Output……………………………………………………………………. 10
Printing your work & Exiting SPSS……………………………………………………….. 12
Running SPSS using syntax or command language (.sps files)….…………………………13
Creating a new variable……………………………………………………………………. 14
Recoding or combining categories of a variable……………………………………………15
Summarizing your data
Frequency tables (& bar charts) for categorical variables…………………………………. 20
Contingency tables for categorical variables………………………………………………. 21
Descriptive statistics (& histograms) for numerical variables…………………………….. 22
Descriptive statistics (& boxplots) by groups for numerical variables……………………. 24
Using the Split File option for summaries by groups……………………………………… 26
Using the Select Cases option for summaries for a subgroup of subjects/observations…… 27
Graphing your data
Bar chart…………………………………………………………………………………… 28
Histogram & Boxplot……………………………………………………………………… 29
Normal probability plot……………………………………………………………………. 30
Error bar plot……………………………………………………………………………….. 31
Scatter plot…………………………………………………………………………………. 32
Adding a line or loess smooth to a scatter plot…………………………………………….. 32
Stem-and-leaf plot………………………………………………………………………….. 33
Hypothesis tests & Confidence intervals
One sample t test & Confidence interval for a mean………………………………………. 34
Paired t test & Confidence interval for the difference between means……………………. 37
Two sample t test & Confidence interval for the difference between means……………… 39
Sign test and Wilcoxon signed rank test………………………………………………....... 42
Mann Whitney U test (or Wilcoxon rank sum test)……………………………….............. 45
One-way ANOVA (Analysis of variance) & Post-hoc tests…………………………......... 47
Kruskal-Wallis test……………………………………………………………………….....50
One-sample binomial test………………………………………………………………...... 52
McNemar’s test……………………………………………………………………………..53
Chi-square test for contingency tables………………..…………………………………….55
Fisher’s exact test………………………………………………………………………....... 55
Trend test for contingency tables/ordinal variables……………………………………....... 55
Binomial, McNemar’s, Chi-square and Fisher’s exact tests using summary data……….... 59
Confidence interval for a proportion………………………………………………………. 63
Correlation & Regression
Pearson and spearman rank correlation coefficient……………………………………....... 65
Linear regression………………………………………………………………………........ 68
Liner regression via ANOVA commands………………………………………………….. 76
Logistic regression………………………………………………………………………… 80
                                                                                                     1


Purpose of handout
SPSS for Windows provides a powerful statistical and data management system in a graphical
environment. The user interfaces make statistical analysis more accessible for casual users and
more convenient for experienced users. Most tasks can be accomplished simply by pointing and
clicking the mouse.

The objective of this handout is to get you oriented with SPSS for Windows. It teaches you how
to enter and save data in SPSS, how to edit and transform data, how to explore your data by
producing graphics and summary descriptives, and how to use pointing and clicking to run
statistical procedures. It is also intended to serve as a reference guide for SPSS procedures that
you will need to know to do your homework assignments.

Compatibility between different versions of SPSS
SPSS for Windows data files (files ending in .sav) and syntax (command) files (files ending in
.sps) are compatible between different versions of SPSS (at least, versions 11.0 or newer).
However, SPSS output/viewer files (files ending .spo) are NOT always compatible between
different versions. Usually SPSS output files created with an old version and can be read by a
new version, but an output file created using a new version can not be read by an old version.
One option for avoiding compatibility problems between different versions of SPSS is to export
your output in html or MS Word format. The compatibility between Window and Mac versions
of SPSS is limited.

SPSS Windows & Menus
An overview of the SPSS windows, menus, toolbars, and dialog boxes is given in the SPSS
Tutorials under Help. You can also find information under Topics, Case Studies, Statistics
Coach, and Command & Syntax (if you are using syntax commands.)

Window Types
SPSS Data Editor. When you start an SPSS session, you usually see the Data Editor window
(otherwise you will see a Viewer window). The Data Editor displays the contents of the working
data file. There a two views in the data editor window: 1) Data View displays the data in a
spreadsheet format with variable names listed for column headings, and 2) Variable View which
displays information about the variables in your data set. In the Data View you can edit or enter
data, and in the Variable View you can change the format of a variable, add format and variable
labels, etc.

SPSS Viewer/Output. Statistical results and graphs are displayed in the Viewer window. The
(output) Viewer window is divided into two panes. The right-hand pane contains the all the
output and the left-hand pane contains a tree-structure of the results. You can use the left-hand
pane for navigating through, editing and printing your results.
                                                                                                  2


Chart Editor. The chart editor is used to edit graphs. When you double-click on figure or
graph, it will reappear in a chart editor window.

SPSS Syntax Editor. The Syntax Editor is used to create SPSS command syntax for using the
SPSS production facility. Usually you will be using the point and click facilities of SPSS, and
hence, you will not need to use the Syntax Editor. More information about the Syntax Editor and
using the SPSS syntax is given in the SPSS Help Tutorials under Working with Syntax. A few
instructions to get you started are given later in the handout in the section Running SPSS using
the Syntax Editor (or Command Language)

Menus
Data Editor Menu:

File. Use the File menu to create a new SPSS file, open an existing file, or read in spreadsheet or
database files created by other software programs (e.g., Excel).

Edit. Use the Edit menu to modify or copy data and output files.

View. Choose which buttons are available in the window or how the window should look.

Data. Use the Data menu to make changes to SPSS data files, such as merging files, transposing
variables, or creating subsets of cases for subset analysis.

Transform. Use the Transform menu to make changes to selected variables in the data file (e.g.,
to recode a variable) and to compute new variables based on existing variables.

Analyze. Use the Analyze menu to select the various statistical procedures you want to use, such
as descriptive statistics, cross-tabulation, hypothesis testing and regression analysis.

Graphs. Use the Graphs menu to display the data using bar charts, histograms, scatterplots,
boxplots, or other graphical displays . All graphs can be customized with the Chart Editor.

Utilities. Use the Utilities menu to view variable labels for each variable.

Add-ons. Information about other SPSS software.

Window. Choose which window you want to view.

Help. Index of help topics, tutorials, SPSS home page, Statistics coach, and version of SPSS.
Viewer Menu: Menu is similar to Data Editor menu, but has two additional options:
Insert. Use the insert menu to edit your output
Format. Use the format menu to change the format of your output.

Chart Editor Menu: Use SPSS Help to learn more about the Chart Editor.
                                                                                                     3


Toolbars
Most Windows applications provide buttons arranged along the top of a window that act as
shortcuts to executing various functions. In SPSS, you will find such buttons (icons) at the top
the of the Data Editor, Viewer, Chart Editor, and Syntax windows. The icons are usually
symbolic representations of the procedure they execute when pushed, unfortunately their
meanings are not intuitively obvious until one has already used them. Hence, the best way to
learn these buttons is to use them and note what happens.

The Status Bar The Status Bar runs along the bottom of a window and alerts the user to the status
of the system. Typical messages one will see are “SPSS Processor is ready”,
“Running procedure…”. The Status Bar will also provide up-to-date information concerning
special manipulations of the data file like whether only certain cases are being used in an
analysis or if the data has been weighted according to the value of some variable.

File Types
Data Files. A file with an extension of .sav is assumed to be a data file in SPSS for Windows
format. A file with an extension of .por is a portable SPSS data file. The contents of a data file
are displayed in the Data Editor window.

Viewer (Output) Files. A file with an extension of .spo is assumed to be a Viewer file
containing statistical results and graphs.

Syntax (Command) Files. A file witn an extension of .sps is assumed to be a Syntax file
containing spss syntax and commands.

Getting Data into SPSS & Editing Data
When reading and editing data into SPSS the data will be displayed in the Data Editor Window.
An overview of the basic structure of an SPSS data file is given in the SPSS Help Tutorials:

     1. Choose Help on the menu bar
     2. Choose Tutorial
     3. Choose Reading Data

Reading Data from a SPSS Data (.sav) File
To read a data file from your computer/floppy disk/flash drive that was created and saved using
SPSS. The filename should end with the suffix .sav.

     1. Choose Open an existing data source
     2. Double click on the filename or
     3. Single click on the filename and choose OK
Or
                                                                                                   4


   1.   Choose Cancel
   2.   Choose File on the menu bar
   3.   Choose Open
   4.   Choose Data...
   5.   Edit the directory or disk drive to indicate where the data is located.
   6.   Double click on the filename or
   7.   Single click on the filename and choose Open

Reading Data from an Text Data File
To read an raw/text (ascii) data file from your computer/floppy disk/flash drive, where the data
for each observation is on a separate line and a space is used to separate variables on the same
line (i.e., the file format is freefield). The filename should end with the suffix .dat.
   1.   Choose File on the menu bar
   2.   Choose Read Text Data
   3.   Choose Files of Type *.dat
   4.   Edit the directory or disk drive to indicate where the data is located
   5.   Double click on the filename or
   6.   Single click on the filename and choose Open
   7.   Follow the Import Wizard Instructions.
You can also get to the Import Wizard as follows:
   1.   Choose File on the menu bar
   2.   Choose Open
   3.   Choose Data...
   4.   Choose Files of Type *.dat
   5.   Edit the directory or disk drive to indicate where the data is located
   6.   Double click on the filename or
   7.   Single click on the filename and choose Open
   8.   Follow the Import Wizard Instructions.

Instructions on how to read a text data file in fixed format are located in SPSS Help Tutorials
under Reading Data from a Text File.
                                                                                                      5


Reading Data from Other Types of External Files
SPSS allows you to read a variety of other types of external files, such as Excel spreadsheet files,
SAS data files, Lotus 1-2-3 spreadsheet files, and dBASE database files. To read data from other
types of external files, you follow the same steps as you would for reading an SPSS save file,
except that you specify the file type according to what package was used to create the save file.
For further instruction on how to read data from other types of external files, see the SPSS for
Windows Base System User's Guide on data files or the SPSS Help Tutorials.

Entering and Editing Data Using the Data Editor
The Data Editor provides a convenient spreadsheet-like facility for entering, editing, and
displaying the contents of your data file. A Data Editor window opens automatically when you
start an SPSS session. Instruction on Using the Data Editor to enter data is given in the SPSS
Help Tutorials. Note that if you are already familiar with entering data into a different
spreadsheet program (e.g., MS Excel), you might find it easy to enter your data in the program
your are familiar with and then read the data into SPSS.

Entering Data. Basic data entry in the Data Editor is simple:

Step 1. Create a new (empty) Data Editor window. At the start of an SPSS session a new
(empty) Data Editor window opens automatically. During an SPSS session you can create a new
Data Editor window by
   1. Choose File
   2. Choose New
   3. Choose Data

Step 2. Move the cursor to the first empty column.

Step 3. Type a value into the cell. As you type, the value appears in the cell editor at the top of
the Data Editor window. Each time you press the Enter key, the value is entered in the cell and
you move down to the next row. By entering data in a column, you automatically create a
variable and SPSS gives it the default variable name var00001.

Step 4. Choose the first cell in the next column. You can use the mouse to click on the cell or use
the arrow keys on the keyboard to move to the cell. By default, SPSS names the data in the
second column var00002.

Step 5. Repeat step 4 until you have entered all the data. If you entered an incorrect value(s) you
will need to edit your data. See the following section on Editing Data.
                                                                                                    6


Editing Data. With the Data Editor, you can modify a data file in many ways. For example you
can change values or cut, copy, and paste values, or add and delete cases.

To Change a Data Value:
  1. Click on a data cell. The cell value is displayed in the cell editor.
  2. Type the new value. It replaces the old value in the cell editor.
  3. Press then Enter key. The new value appears in the data cell.

To Cut, Copy, and Paste Data Values
  1. Select (highlight) the cell value(s) you want to cut or copy.
  2. Pull down the Edit box on the main menu bar.
  3. Choose Cut. The selected cell values will be copied, then deleted. Or
  4. Choose Copy. The selected cell values will be copied, but not deleted.
  5. Select the target cell(s) (where you want to put the cut or copy values).
  6. Pull down the Edit box on the main menu bar.
  7. Choose Paste. The cut or copy values will be ``pasted'' in the target cells.

To Delete a Case (i.e., a Row of Data)
  1. Click on the case number on the left side of the row. The whole row will be highlighted.
  2. Pull down the Edit box on the main menu bar.
  3. Choose Clear.

To Add a Case (i.e., a Row of Data)
  1. Select any cell in the case from the row below where you want to insert the new case.
  2. Pull down the Data box on the main menu bar.
  3. Choose Insert.

Defining Variables. The default name for new variables is the prefix var and a sequential five-
digit number (e.g., var00001, var00002, var00003). To change the name, format and other
attributes of a variable.

   1. Double click on the variable name at the top of a column or,
   2. Click on the Variable View tab at the bottom of Data Editor Window.
   3. Edit the variable name under column labeled Name. The variable name must be eight
      characters or less in length. You can also specify the number of decimal places (under
      Decimals), assign a descriptive name (under Label), define missing values (under
      Missing), define the type of variable (under Measure; e.g., scale, ordinal, nominal), and
      define the values for nominal variables (under Values).

After the data is entered (or several times during data entering), you will want to save it as an
SPSS save file. See the section on Saving Data As An SPSS Save File.
                                                                                                      7


Reading an SPSS Viewer/Output (.spo) File
Statistical results and graphs are displayed in the Viewer window. An overview of how to use
the Viewer is given in the SPSS Help Tutorials under Working with Output.

If you saved the results of Viewer window during an earlier SPSS session, you can use the
following commands to display the Viewer (output) results in a current SPSS session. However,
SPSS output/viewer files (files ending .spo) are NOT always compatible between different
versions. Usually SPSS output files created with an older version and can be read by a new
version, but an output file created using a new version can not be read by an older version. One
option for avoiding compatibility problems between different versions of SPSS is to export your
output in html or MS Word format. The compatibility between Window and Mac versions of
SPSS is limited.

To read a Viewer file from your computer\floppy disk\flashdrive that was created and saved
using SPSS. The filename should end with the suffix spo.

   1.   Choose File on the menu bar
   2.   Choose Open
   3.   Choose Output...
   4.   Edit the directory or disk drive to indicate where the data is located
   5.   Double click on the filename or
   6.   Single click on the filename and choose Open

Editing Your Output
Editing the statistical results and graphs in the Viewer window is beyond the scope of this
handout. Instructions on how to edit your output is given in the SPSS Help Tutorials under
Working with Output and Creating and Editing Charts.

 You can use either the tree-structure in the left hand pane or the results displayed in the right
  hand pane to select, move or delete parts of the output.

 To edit a table or object (an object is a group of results) you first need to double click on the
  table/object so an “editing” box appears around the table/object, and then select the value you
  want to modify. An “editing box'” will be a ragged box outlining the table. If you only do a
  single click you will get a box with straight/plain lines outlining the table. In general, to create
  “nice looking” tables of your results it is often easier to hand enter the values into a blank MS
  Word table than to edit a SPSS table/object (either in SPSS or MS Word).

 To edit a chart you first need to double click on the chart so it appears in a new Chart Editor
  window. After you are done editing the chart, close the window and then export the chart, for
  example to a windows metafile and then into a MS Word file.

 By default in SPSS a P-value is displayed as .000 if the P-value is less than .001. You can
  report the P-value as <.001 or to have SPSS display more significant digits:
                                                                                                   8


   1. In a SPSS (output) Viewer window double click (with the left mouse button) on the table
      containing the p-value you want to display differently A ``editing box'' should appear
      around the table.
   2. Click on the p-value using the right mouse button.
   3. Choose Cell Properties. (If you do not get this option, you need to double click on the table
      to get the ragged box.)
   4. Change the number of decimals to the desired number (default is 3).
   5. Choose OK or
   6. Double click on the p-value with the left mouse button and SPSS will display the p-value
      with more significant digits. If the p-value is very small, the p-value will be displayed in
      scientific notation (e.g., 1.745E-10 = 0.0000000001745).

Saving Data as an SPSS Data (.sav) File
To save data as a new SPSS Data file onto your computer/floppy disk/flashdrive:

   1. Display the Data Editor window (i.e., execute the following commands while in the Data
      Editor window displaying the data you want to save.)
   2. Choose File on the menu bar.
   3. Choose Save As...
   4. Edit the directory or disk drive to indicate where the data should be saved. SPSS will
      automatically add the .sav suffix to the filename.
   5. Choose Save

To save data changes in an existing SPSS Save: file.

   1. Display the Data Editor window (i.e., execute the following commands while in the Data
      Editor window displaying the data you want to save.)
   2. Choose File box on the menu bar
   3. Choose Save

Caution. The Save command saves the modified data by overwriting the previous version of the
file.

You can save your data in other formats besides an SPSS save file (e.g., as an ASCII file, Excel
file, SAS data set). To save your data with a given format you follow the same steps as saving
data in a new SPSS Save file, except that you specify the Save as Type as the desired format.
                                                                                                9


Saving Your Output (Statistical Results and Graphs)
To save the statistical results and graphs displayed in the Viewer window as a new SPSS Output
file:

  1. Display the Viewer window (i.e., execute the following commands while in the Viewer
     window displaying the results you want to save.)
  2. Choose File on the menu bar.
  3. Choose Save As...
  4. Edit the directory or disk drive to indicate where the output should be saved. SPSS will
     automatically add the .spo suffix to the filename.
  5. Choose Save

To save Viewer changes in an existing SPSS Output file.

  1. Display the Viewer window (i.e., execute the following commands while in the Viewer
     window displaying the results you want to save.)
  2. Choose File on the menu bar.
  3. Choose Save.

Caution. The Save command saves the modified Viewer window by overwriting the previous
version of the file.

Note that you will not be able to open SPSS output that was created with a newer version than
the version of SPSS that you are using to open the output. Hence, you may want to avoid this
problem you by exporting your output in html or MS word format. Also, charts often do not
export properly into a Html or Word file. Usually you need to export charts separately into a
window metafile file (.wmf). Sometimes the output, including charts, and be copied and pasted
directly into a Word file.
                                                                                             10


Exporting SPSS Output
Sometimes you will want to save your SPSS output in a different file format than a SPSS output
file, because you want to avoid compatibility problems between different versions of SPSS, you
want to further edit your output in a Word document, or you want include graphs or figures in
another document file. The basic steps in exporting SPSS output to another file type are, while
in a SPSS (output) Viewer window:

  1. Choose File
  2. Choose Export

  3. Choose what you want to export:

    Output Document – exports all the output

    Output Document (No Charts) – exports
    only the numerical results

    Charts Only – exports only charts (i.e., graphs
    & figures)

 Note that charts often do not export properly into
 a Html or Word file. Usually you need to export
 charts separately into a window metafile file
 (.wmf).

 4. Define further what you want to export:

    All Objects – this option also exports
    other extraneous information (rarely
    useful)

    All Visible Objects – use this option to
    export all the output.

    Selected Objects – this allows you to
    export only the objects you have selected
    in the Viewer window.
                                               11




5. Choose the file type

   HTML and Word/RTF a good file
   types for numerical results (no
   charts).




Windows Metafile (.WMF) is a good
file type for charts in you want to
include figures in a MS Word
document.

Note that the file type options are
dependent on what you are exporting.




6. Choose the location and file name for the
output you want to export.




7. Choose OK
                                                                                                   12


Printing Your Work in SPSS
To print statistical results and graphs in the Viewer window or data in the Data Editor window:

 1. Display the output or data you want to
    print (i.e., execute the following
    commands while in a output or data
    window)
 2. Choose File on the menu bar.
 3. Choose Print...
 4. Choose All visible output or Selection (if
    you have selected parts of the output).
    When printing from a data file, the
    options are All, Selection and Page # to
    Page #.
 5. Choose OK

Exiting SPSS
To exit SPSS:

   1. Choose File on the menu bar
   2. Choose Exit SPSS

If you have made changes to the data file or the output file since the last time you saved these
files, before exiting SPSS you will be asked whether you want to save the contents of the Data
Editor window and Viewer window. If you are unsure as to whether you want to save the
contents of the data or output window, choose Cancel, then display the window(s) and if you
want to save the contents of the window, follow the instructions in this handout for saving data
or output windows. SPSS will use the overwrite method when saving the contents of the
window.
                                                                                              13


Running SPSS using Syntax (or Command Language)
This handout describes how to the run various statistical summaries and procedures using the
point-and-click menus in SPSS. However, it is possible run SPSS commands using SPSS
syntax/command language. If you are running similar analyses repeatedly, it can be more
efficient to run your analysis using SPSS syntax. How to run SPSS using the syntax/command
language is beyond the scope of this handout. Help on running SPSS using the syntax/command
language can be found in the SPSS Tutorials under Working with Syntax.

To get you started using SPSS syntax, follow the point-and-click instructions for running a
particular analysis, but select Paste instead of OK at the last step. A SPSS Syntax Editor window
will open containing the SPSS syntax for running the analysis. To run the analysis you can
choose Run on the menu bar or you can highlight the syntax you want to run, click the right
mouse button, and select Run Current. You can add more syntax to the Syntax Editor window
by using the point-and-click method, selecting Paste instead of OK at the last step. The
additional syntax will be added at the bottom of the Syntax Editor window. You can also write
syntax directly into the syntax file and/or use copy, paste and editing commands to modify the
syntax. Remember to save you syntax file before exiting SPSS. The file should end in .sps.
You can open a syntax file by selecting File on the menu bar, Open, and the Syntax…


 Here’s an example of SPSS syntax.

 This syntax runs a two sample test
 comparing HDL cholesterol (hdl) for
 subjects without and with a family
 history of heart attack (fhha, coded 0
 for no and 1 for yes).

 This syntax creates 3 indicators
 variables, neversmoke, formersmoke,
 and currentsmoke for smoking status
 (smoke).

 Note that a period (.) is used to denote
 the end of a string of syntax and
 Execute. is sometimes required to run
 the syntax.
                                                                                                14


Creating a New Variable
To create a new variable:
1. Display the Data Editor window (i.e., execute the following commands while in the Data
   Editor window displaying the data file you want to use to create a new variable).
2. Choose Transform on the menu bar
3. Choose Compute...
4. Enter the new variable name in the Target Variable box.
5. Enter the definition of the new variable in the Numeric Expression box (e.g., SQRT(visan),
   LN(age), or MEAN(age)) or
6. Select variable(s) and combine with desired arithmetic operations and/or functions.
7. Choose OK
After creating a new variable(s), you will probably want to save the new variable(s) by re-saving
your data using the Save command under File on the menu bar (See Saving Data as an SPSS
Save File). Further instructions on creating a new variable are given in the SPSS Help Tutorials
under Modifying Data Values.

Example: Creating a (New) Transformed Variable

You can use the SPSS commands for creating a new variable to create a transformed
variable. Suppose you have a variable indicating triglyceride level, trig, and you want to
transform this variable using the natural logarithm to make the distribution less skewed
(i.e., you want to create a new variable which is natural logarithm of triglyceride levels).


 1. Display the Data Editor
    window
 2. Choose Transform on the
    menu bar
 3. Choose Compute...
 4. Enter, say, lntrig, in the
    Target Variable box.
 5. Enter Ln(trig) in the Numeric
    Expression box.
 6. Choose OK




Now, a new variable, lntrig, which is the natural logarithm of trig, will be added to your
data set. Remember to save your data set before exiting SPSS (e.g., while in the SPSS
Data window, choose Save under File or click on the floppy disk icon).
                                                                                               15


Recoding or Combining Categories of a Variable
To recode or combine categories of a variable:

1.  Display the Data Editor window (i.e., execute the following commands while in the Data
    Editor window displaying the data file you want to use to recode variables).
2. Choose Transform on the menu bar
3. Choose Recode
4. Choose Into Same Variable... or Into Different Variable...
5. Select a variable to recode from the variable list on the left and then click on the arrow
    located in the middle of the window. This defines the input variable.
6. If recoding into a different variable, enter the new variable name in the box under Name:,
    then choose Change. This defines the output variable.
7. Choose Old and New Values...
8. Choose Value or Range under Old Value and enter old value(s).
9. Choose New Value and enter new value, then choose Add.
10. Repeat the process until all old values have been redefined.
11. Choose Continue
12. Choose OK

After creating a new variable(s), you will probably want to save the new variable(s) by re-saving
your data using the Save command under File box on the menu bar (See Saving Data as an SPSS
Save File).

Example: Recoding a Categorical Variable

You can use the commands for recoding a variable to change the coding values of a
categorical variable. You may want to change a coding value for a particular category to
modify which category SPSS uses as the referent category in a statistical procedure. For
example, suppose you want to perform linear regression using the ANOVA (or General
Linear Model) commands, and one of your independent variables is smoking status, smoke,
that is coded 1 for never smoked, 2 for former smoker and 3 for current smoker. By
default SPSS will use current smoker as the referent category because current smoker
has the largest numerical (code) value. If you want never smoked to be the referent
category you need to recode the value for never smoked to a value larger than 3.

Although you can recode the smoking status into the same variable, it is better to recode
the variable into a new/different variable, newsmoke, so you do not lose your original data
if you make an error while recoding.
                                                        16


1. Display the Data Editor
   window
2. Choose Transform
3. Choose Recode
4. Choose Into Different
   Variables...
5. Select the variable smoke as
   the Input variable
6. Enter newsmoke as the name
   of the Output variable, and
   then choose Change.
7. Choose Old and New
   Values...



8. Choose Value under Old
    Value. (It may already be
    selected.)
9. Enter 1 (code for never
    smoker)
10. Choose Value under New
    Value. (It may already be
    selected.)
11. Enter 4 (or any value greater
    than 3)
12. Choose Add
13. Choose All Other Values
    under Old Value.
14. Choose Copy Old Value(s)
    under New Value.
15. Choose Add
16. Choose Continue
17. Choose OK


  Remember to save your data set before exiting SPSS.
                                                                                        17


Example: Creating Indicator or Dummy Variables

You can use the commands for recoding a variable to create indicator or dummy variables
in SPSS. Suppose you have a variable indicating smoking status, smoke, that is coded 1 for
never smoked, 2 for former smoker and 3 for current smoker. To create three new
indicator or dummy variables for never, former and current smoking:

 1. Display the Data Editor
     window
 2. Choose Transform
 3. Choose Recode
 4. Choose Into Different
     Variables...
 5. Select the variable smoke
     as the Input variable
 6. Enter neversmoke as the
     name of the Output
     variable, and then choose
     Change.
 7. Choose Old and New
     Values...
 8. Choose Value under Old
     Value. (It may already be
     selected.)
 9. Enter 1 (code value for
     never smoker)
 10. Choose Value under New
     Value. (It may already be
     selected.)
 11. Enter 1 (to indicate never
     smoker)
 12. Choose Add
 13. Choose All Other Values
     under Old Value.
 14. Choose Value under New
     Value.
 15. Enter 0
 16. Choose Add
 17. Choose Continue
 18. Choose OK

Now, you have created a binary indicator variable for never smoker (coded 1 if never
smoker, 0 if former or current smoker). Next, create a binary indicator variable for
former smoker.
                                                                                        18



 1. Display the Data Editor
     window
 2. Choose Transform
 3. Choose Recode
 4. Choose Into Different
     Variables...
 5. Select the variable smoke
     as the Input variable
 6. Enter formersmoke as the
     name of the Output
     variable, and then choose
     Change. (Or change (edit)
     never to former, and then
     choose Change).
 7. Choose Old and New
     Values...
 8. Choose 1→1 under
     Old→New and then
     choose Remove.
 9. Choose Value under Old
     Value.
 10. Enter 2 (code value for
     former smoker)
 11. Choose Value under New
     Value.
 12. Enter 1 (to indicate former
     smoker)
 13. Choose Add
 14. Choose Continue
 15. Choose OK


Now, you have a created a binary indicator variable for former smoker (coded 1 if former
smoker, 0 if never or current smoker). To create a binary indicator variable for current
smoker you would use similar commands to those for creating the indicator variable for
former smoke, except that now the value of 3 for smoke is coded as 1 and all other values
are coded as 0.
                                                                                                   19


Example: Creating a Categorical Variable From a Numerical Variable
You can use the commands for recoding a variable to create a categorical variable from a numerical
variable (i.e., group values of the numerical variable into categories). For example, suppose you have
a variable that is the number of pack years smoked, packyrs, and you want to create a categorical
variable with the four categories, 0, >0 to 10, >10 to 30, and >30 pack years smoked .

 1.  Display the Data Editor window
 2.  Choose Transform
 3.  Choose Recode
 4.  Choose Into Different Variables...
 5.  Select the variable packyrs as the
     Input variable
 6. Enter a name for the new variable,
     packcat, for the Output variable, and
     then choose Change.
 7. Choose Old and New Values...
 8. Choose Value under Old Value. (It
     may already be selected.)
 9. Enter 0
 10. Choose Value under New Value.
 11. Enter 0 (to indicate 0 pack years)
 12. Choose Add
 13. Choose Range under Old Value.
 14. Enter 0.01 and 10 in the two blank
     boxes.
 15. Choose Value under New Value
 16. Enter 1 (to indicate >0 to 10 pack
     years)
 17. Choose Add


 18.   Choose Range under Old Value.
 19.   Enter 10.01 and 30 in the two blank boxes.
 20.   Choose Value under New Value
 21.   Enter 2 (to indicate >10 to 30 pack years)
 22.   Choose Add
 23.   Choose Range, value through HIGHEST under Old Value.
 24.   Enter 30.01 in the blank box.
 25.   Choose Value under New Value
 26.   Enter 3 (to indicate >30 pack years)
 27.   Choose Add
 28.   Choose Continue
 29.   Choose OK

Note that if you may want to use different coding values depending on which category you want to
be used as the referent category in certain statistical procedures. Remember to save your data set
before exiting SPSS.
                                                                                                   20


Summarizing Your Data
Frequency Tables (& Bar Charts) for Categorical Variables. To produce frequency tables
and bar charts for categorical variables:
1.   Choose Analyze from the menu bar
2.   Choose Descriptive Statistics
3.   Choose Frequencies…
4.   Variable(s): To select the variables you want from the source list on the left, highlight a
     variable by pointing and clicking the mouse and then click on the arrow located in the middle
     of the window. Repeat the process until you have selected all the variables you want.
5.   Choose Charts (Skip to step 7 if you do not want bar charts.)
6.   Choose Bar Chart(s)
7.   Choose Continue
8.   Choose OK

Example: Frequency table and bar chart for the categorical variable, smoking status.

                                                         Smoking
                                                         status is the
                                                         selected
                                                         variable(s) and
                                                         Bar charts
                                                         under Charts…
                                                         has been
                                                         selected.




                                 Frequency table and bar chart of smoking status


                    Smoking status                                             Smoking status


                                             Cumu-                60
            Fre-                  Valid       lative
           quency     Percent    Percent     Percent
never          590        59.0        59.0       59.0             50

former         293        29.3        29.3       88.3
current        117        11.7        11.7     100.0              40
                                                        Percent




Total         1000       100.0       100.0
                                                                  30



                                                                  20



                                                                  10



                                                                  0
                                                                       never           former      current

                                                                                  Smoking status
                                                                                                  21


Contingency Tables for Categorical Variables. To produce contingency tables for categorical
variables:
1.   Choose Analyze from the menu bar.
2.   Choose Descriptive Statistics
3.   Choose Crosstabs...
4.   Row(s): Select the row variable you want from the source list on the left and then click on the
     arrow located next to the Row(s) box. Repeat the process until you have selected all the row
     variables you want.
5.   Column(s): Select the column variable you want from the source list on the left and then
     click on the arrow located next to the Column(s) box. Repeat the process until you have
     selected all the column variables you want.
6.   Choose Cells...
7.   Choose the cell values (e.g., observed counts; row, column, and margin (total) percentages).
     Note the option is selected when the little box is not empty.
8.   Choose Continue
9.   Choose OK

Example: Contingency table of smoking status by coronary heart disease (CHD).


                                                  Smoking
                                                  status is the
                                                  row variable
                                                  and CHD is
                                                  the column
                                                  variable.

                                                  Observed
                                                  counts and
                                                  row
                                                  percentages
                                                  will be
                                                  displayed.


                              Smoking status * Incident CHD Crosstabulation

                                                                  Incident CHD
                                                                    no            yes     Total
              Smoking     never                     Count          537             53      590
                status             % within Smoking status      91.0%            9.0%   100.0%
                         former                     Count          257             36      293
                                   % within Smoking status      87.7%         12.3%     100.0%
                         current                    Count          106             11      117
                                   % within Smoking status      90.6%            9.4%   100.0%
                          Total                     Count          900            100     1000
                                   % within Smoking status      90.0%         10.0%     100.0%
                                                                                                     22


Descriptive Statistics (& Histograms) for Numerical Variables. To produce descriptive
statistics and histograms for numerical variables:

1.  Choose Analyze on the menu bar
2.  Choose Descriptive Statistics
3.  Choose Frequencies...
4.  Variable(s): To select the variables you want from the source list on the left, highlight a
    variable by pointing and clicking the mouse and then click on the arrow located in the middle
    of the window. Repeat the process until you have selected all the variables you want.
5. Choose Display frequency tables to turn off the option. Note that the option is turned off
    when the little box is empty.
6. Choose Statistics
7. Choose summary measures (e.g., mean, median, standard deviation, minimum, maximum,
    skewness or kurtosis).
8. Choose Continue
9. Choose Charts (Skip to step 11 if you do not want histograms.)
10. Choose Histograms(s)
11. Choose Continue
12. Choose OK

An alternate way to produce only the descriptive statistics is at step 3 to choose Descriptives...
instead of Frequencies..., then, select the variables you want. By default SPSS computes the
mean, standard deviation, minimum and maximum. Choose Options... to select other summary
measures.


Example: Descriptive summaries and histogram for the numerical variable age.



     Age is the variable to summarize. You can
     select more than one variable to analyze.

     Remember to turn off the Display
     frequency tables option.
                                                                                                     23


                                                                   Mean, standard
                                                                   deviation,
                                                                   minimum and
                                                                   maximum were
                                                                   selected under
                                                                   Statistics…, and
                                                                   histogram was
                                                                   selected under
                                                                   Charts…




Summaries for Age
                      Statistics

                            Age
N                              Valid                1000
                               Missing                    0
Mean                                                72.14
Std. Deviation                                      5.275
Minimum                                                  65
Maximum                                                  90



Histogram of Age


                                                    Histogram



                120



                100



                80
    Frequency




                60



                40



                20

                                                                                   Mean =72.14
                                                                                  Std. Dev. =5.275
                 0                                                                   N =1,000
                       60        65      70   75         80   85     90    95

                                                   Age
                                                                                               24


Descriptive Statistics (& Boxplots) by Groups for Numerical Variables. To produce
descriptive statistics and boxplots by groups for numerical variables:

1.  Choose Analyze on the menu bar
2.  Choose Descriptive Statistics
3.  Choose Explore...
4.  Dependent List: To select the variables you want to summarize from the source list on the
    left, highlight a variable by pointing and clicking the mouse and then click on the arrow
    located next to the dependent list box. Repeat the process until you have selected all the
    variables you want.
5. Factor List: To select the variables you want to use to define the groups from the source list
    on the left, highlight a variable by pointing and clicking the mouse and then click on the
    arrow located next to the factor list box.
6. Choose Plots... (If you do not want boxplots, choose Statistics for the Display option and
    skip to Step 11.)
7. Choose Factor levels together from the Boxplot box.
8. Select Stem-and-leaf option from the Descriptive box to turn off the option.
9. Choose Continue
10. Choose Both for the Display option
11. Choose OK

Example: Total cholesterol by family history of heart attack (yes or no).


 In this example total cholesterol is
 the dependent variable. You can
 select more than one variable.

 Summaries will computed for each
 group defined by family history of
 heart attack.

 Both numerical summaries
 (statistics) and plots are selected.


                             Under Statistics…
                             Descriptives is usually
                             selected by default.

                             Under Plots select
                             Boxplot option and
                             unselect stem-and-
                             leaf.
                                                                                                                                          25

                                                                           Descriptives

                 Family
              history of
                  heart                                                                                           Std.
                 attack                                                                            Statistic     Error   The explore
     Total
cholesterol
                     no                                                            Mean
                                                                                            221.93             1.417     command by
                                                 95% Confidence          Lower Bound        219.15                       default
                                                Interval for Mean
                                                                         Upper Bound        224.72                       produces a lot
                                                                    5% Trimmed Mean         221.63                       of different
                                                                                 Median     219.76                       summaries, so
                                                                              Variance      1350.641                     you need to
                                                                         Std. Deviation     36.751                       select what to
                                                                             Minimum        111                          report.
                                                                             Maximum        363
                                                                                   Range
                                                                                                                         All summaries
                                                                                            252
                                                                    Interquartile Range     49
                                                                            Skewness                                     are shown for
                                                                                            .184               .094
                                                                                 Kurtosis   .363               .188      all groups –
                    yes                                                            Mean     220.53             2.150     the table has
                                                 95% Confidence          Lower Bound        216.30                       been cropped
                                                Interval for Mean
                                                                         Upper Bound        224.76                       in this
                                                                                                                         example.



                Boxplot of Total Cholesterol by Family History of Heart Attack



                                          400


                                                                              95
                                                                                                                           812
                                          350                                172
                                                                             438                                           875


                                          300
                      Total cholesterol




                                          250




                                          200




                                          150

                                                                             729                                           659


                                          100


                                                                            no                                           yes

                                                                                   Family history of heart attack
                                                                                            26


Using the Split File Option for Summaries by Groups for Categorical and Numerical
Variables. The Split File option in SPSS is a convenient way to produce summaries, graphs, and
run statistical procedures by groups. To activate the option:

1. Choose Data on the menu bar of the Data Editor window
2. Choose Split File
3. Choose Compare groups or Organize output by groups. The two options display the output
   differently. Try each option to see which works best for your needs.
4. Choose the variable that defines the groups.
5. Choose OK

Now, all the summaries, graphs, and statistical procedures you request will be done
(automatically) for each group. To turn off this option:

1.   Choose Data on the menu bar of the Data Editor window
2.   Choose Split File
3.   Choose Analyze all cases, do no create groups
4.   Choose OK


Example. Use the Split File option to run summaries by family history of heart attack (yes
or no).


 Compare groups option will try to
 display the results for each group
 side by side when feasible.



 Organize output by groups option
 will display the results separately
 for each group starting with the
 group with the lowest numerical
 code value.
                                                                                                 27


Using the Select Cases Option for Summaries for a subgroup of subjects/observations.
The Select Cases option in SPSS is a convenient way to produced summaries and run statistical
procedures for a subgroup of subjects or to temporary exclude subjects from the analysis. To
activate this option:
1.   Choose Data on the menu bar of the Data Editor window
2.   Choose Select Cases…
3.   Choose If condition is satisfied
4.   Choose If…
5.   Enter the expression that indicates the subjects/observation you want to select.
6.   Choose Continue
7.   Choose OK
Now, all the summaries, graphs, and statistical procedures you request will be done using only
the selected subjects/observations. To turn off this option:
1.   Choose Data on the menu bar of the Data Editor window
2.   Choose Select Cases…
3.   Choose All cases
4.   Choose OK

Example: Select subjects not lipid lowering medications (i.e., subjects with lipid = 0
indicating no medications).



 Select the If condition is satisfied and then If…




 Caution! Usually you do not want to delete
 observations from your dataset, so do not select
 this option.

     Typical expressions will involve
     combinations of the following symbols:

     Symbol      Definition
      =             equal
      ~=            not equal
      >=           greater than or equal
      <=           less than or equal
      >            greater than
      <            less than
      &            and
       |           or
                                                                                                                28


Graphing Your Data
You can produce very fancy figures and graphs in SPSS. Producing fancy figures and graphs is
beyond the scope of this handout. Instructions on producing figures and graphs can be found in
SPSS Help under Topics → Contents → Chart Galleries, Standard Charts, and Chart Editor, as
well as in the SPSS Tutorials under Creating and Editing Charts. The commands for making
charts are located under Graphs (and then Legacy Dialogs, if using Version 15) on the menu bar,
and the commands for making simple figures and graphs are relatively easy to use and some
instruction is given below. The Interactive option under Graphs is another way to produce charts
in SPSS interactively, as well as fancier versions of the basic charts (e.g., 3-dimensional bar
charts).

Bar Charts

The easiest way to produce simple bar charts is to use the Bar Chart option with the
Frequencies... command. See Frequency Tables (& Bar Charts) for Categorical Variables. You
can only produce only one bar chart at a time using the Bar command.
1. Choose Graphs (& then Legacy Dialogs, if Version 15) from the menu bar.
2. Choose Bar...
3. Choose Simple, Clustered, or Stacked
4. Choose what the data in the bar chart represent (e.g., summaries for groups of cases).
5. Choose Define
6. Select a variable from the variable list on the left and the click on the arrow next to the
   Category axis.
7. Choose what the bars represent (e.g., number of cases or percentage of cases)
8. Choose OK

                                                                                                           Family history of
               60.0%                                            60.0%                                        heart attack
                                                                                                                   no
                                                                                                                   yes
               50.0%                                            50.0%



               40.0%                                            40.0%
     Percent




                                                      Percent




               30.0%                                            30.0%



               20.0%                                            20.0%



               10.0%                                            10.0%



                0.0%                                             0.0%
                       never       former       current                 never       former       current
                               Smoking status                                   Smoking status
                                                                                                                     29


Histograms
The easiest way to produce simple histograms is to use the Histogram option with the
Frequencies... command. See Descriptive Statistics (& Histograms) for Numerical Variables.
You can produce only one histogram at a time using the Histogram command.

                                                                   120
 1. Choose Graphs (& then Legacy
    Dialogs, if Version 15) from the
    menu bar                                                       100
 2. Choose Histogram...
 3. Select a variable from the                                      80
    variable list on the left and then


                                             Frequency
    click on the arrow in the middle of
    the window.                                                     60
 4. Choose Display normal Curve if
    you want a normal curve                                         40
    superimposed on the histogram.
 5. Choose OK
                                                                    20

                                                                                                                       Mean =26.2366
                                                                                                                      Std. Dev. =4.8667
Boxplots                                                             0                                                     N =1,000

                                                                         10     20     30           40         50
                                                                 option with the
The easiest way to produce simple boxplots is to use the BoxplotBody mass index Explore...
command. See Descriptive Statistics (& Boxplots) By Groups for Numerical Variables.
You can produce only one boxplot at a time using the Boxplot command.

 1. Choose Graphs (& then Legacy
    Dialogs, if Version 15) from the                                                                                    880
    menu bar.
 2. Choose Boxplot...                                              400                                                  684
 3. Choose Simple or Clustered
                                           Serum fasting glucose




 4. Choose what the data in the                                                                                           77
    boxplots represent (e.g.,                                                                                           673
    summaries for groups of cases).
 5. Choose Define
 6. Select a variable from the                                     200
    variable list on the left and then
    click on the arrow next to the
    Variable box.                                                                785
 7. Select the variable from the
    variable list that defines the                                   0
    groups and then click on the
    arrow next to Category Axis.                                              normal        impaired fasting        diabetic
                                                                                               glucose
 8. Choose OK
                                                                                       ADA diabetes status
                                                                                                                                                             30


Normal Probability Plots. To produce Normal probability plots:

1. Choose Graphs (& then Legacy Dialogs, if Version 15) from the menu bar.
2. Choose Q-Q... to get a plot of the quantiles (Q-Q plot) or choose P-P... to get a plot of the
   cumulative proportions (P-P plot)
3. Select the variables from the source list on the left and then click on the arrow located in the
   middle of the window.
4. Choose Normal as the Test Distribution. The Normal distribution is the default Test
   Distribution. Other Test Distributions can be selected by clicking on the down arrow and
   clicking on the desired Test distribution.
5. Choose OK

SPSS will produce both a Normal probability plot and a detrended Normal probability plot for
each selected variable. Usually the Q-Q plot is the most useful for assessing if the distribution of
the variable is approximately Normal.

                                    Normal Q-Q Plot of Serum fasting glucose                                       Normal Q-Q Plot of Body mass index


                                   250


                                   200                                                                       40
           Expected Normal Value




                                                                                     Expected Normal Value


                                   150

                                                                                                             30
                                   100


                                   50
                                                                                                             20

                                     0


                                   -50                                                                       10
                                      -200      0        200         400       600                                10       20        30          40     50
                                                    Observed Value                                                              Observed Value
                                                                                                                                     31


Error Bar Plot. To produce an error bar plot of the mean of a numerical variable (or the means
for different groups of subjects):

1. Choose Graphs (& then Legacy Dialogs, if Version 15) from the menu bar.
2. Choose Error Bar...
3. Choose Simple or Clustered
4. Choose what the data in the error bars represent (e.g., summaries for groups of cases).
5. Choose Define
6. Select a variable from the variable list on the left and then click on the arrow next to the
   Variable box.
7. Select the variable from the variable list that defines the groups and then click on the arrow
   next to Category Axis.
8. Select what the bars represent (e.g., confidence interval, ±standard deviation, ±standard error
   of the mean)
9. Choose OK

                                                                                  Error Bar Plot
                                          300
     Mean +- 2 SD Serum fasting glucose




                                          250



                                          200



                                          150



                                          100



                                           50

                                                                               normal       impaired fasting         diabetic
A bar chart of the mean with glucose bars can be made
                             error
                        ADA diabetes status
using the commands for making a bar chart
                                                                         300
                                            Mean Serum fasting glucose




                                                                         200




                                                                         100




                                                                           0
                                                                                   normal         impaired fasting        diabetic
                                                                                                     glucose
                                                                                             ADA diabetes status
                                                                                            Error bars: +/- 2 SD
                                                                                                32


Scatter Plot. To produce a scatter plot between two numerical variables:

  1. Choose Graphs (& then Legacy
                                                                  HLD cholesterol vs BMI
      Dialogs, if Version 15) on the menu
      bar.
  2. Choose Scatter/Dot...
                                              140
  3. Choose Simple
  4. Choose Define                            120
  5. Y Axis: Select the y variable you




                                            HDL cholesterol
      want from the source list on the left   100
      and then click on the arrow next to      80
      the y axis box.
  6. X Axis: Select the x variable you         60
      want from the source list on the left    40
      and then click on the arrow next to
      the x axis box.                          20
  7. Choose Titles...
                                                0
  8. Enter a title for the plot (e.g., y vs.
      x).                                           10           20             30          40                      50
  9. Choose Continue                                                    Body mass index
  10. Choose OK
Adding a linear regression line to a scatter plot. To add a linear regression (least-squares) line
to a scatter plot of two numerical variables:


 1. While in the Viewer window                                      HLD cholesterol vs BMI
     double click on the scatter plot. The
     scatter plot should now be
     displayed in a window titled Chart          140
     Editor.
                                                 120
 2. Choose Elements.
                                            HDL cholesterol




 3. Choose Fit Line at Total. (A line            100
     should be added to the plot, because
                                                  80
     the next 2 steps are the default
     options.                                     60
 4. Choose Linear (in the Properties
                                                  40
     window)
 5. Choose Apply (in the Properties               20                                                R Sq Linear = 0.121

     window).
Additional options:                                0

o Choose Mean under Confidence Intervals (in the Properties window) to add a prediction
                                                        10         20             30       40                       50
    interval for the linear regression line to the scatter plot or        Body mass index
o Choose Individual under Confidence Intervals to add a prediction interval for individual
    observations to the scatter plot.

6.   Click on the ``X'' in the upper right hand corner of the Chart Editor window or choose File,
     and then Close to return to the Viewer window.
                                                                                                              33


Adding a Loess (scatter plot) smooth to a scatter plot. To add a Loess smooth to a scatter plot
of two numerical variables:
 1. While in the Viewer window
    double click on the scatter plot. The
    scatter plot should now be                                                HLD cholesterol vs BMI
    displayed in a window titled Chart
    Editor.
 2. Choose Elements.                       140
 3. Choose Fit Line at Total.
                                           120
 4. Choose Loess (in the Properties
    window). Default options for % of



                                              HDL cholesterol
                                           100
    points to fit (50%) and kernel
    (Epanechnikov) are usually the          80

    most appropriate options.               60
 5. Choose Apply (in the Properties
    window). If a line was added to the     40
    plot in Step 3, it will be replaced by  20
    the loess smooth.
 6. Click on the ``X'' in the upper right    0
    hand corner of the Chart Editor               10                          20            30           40        50
    window or choose File, and then                                                   Body mass index
    Close to return to the Viewer
    window.
Stem-and-leaf Plot. To produce stem-and-leaf plot:
 0.
 1.  Choose Analyze on the menu bar                             Severity of Illness Index Stem-and-
 2.  Choose Descriptive Statistics                              Leaf Plot
 3.  Choose Explore...                                          Frequency          Stem &   Leaf
 4.  Dependent List: To select the variables
     you want from the source list on the left,                      2.00        4      .   34
     highlight a variable by pointing and                            7.00        4      .   6688899
                                                                    10.00        5      .   0001112344
     clicking the mouse and then click on the                        3.00        5      .   568
     arrow located next to the dependent list                        1.00 Extremes          (>=62)
     box. Repeat the process until you have
     selected all the variables you want.                       Stem width:           10.00
 5. Choose Plots...                                             Each leaf:             1 case(s)
 6. Choose Stem-and-leaf from the
     Descriptive box. Note the option may
     already be selected if the little box is not
     empty.
 7. Choose None from the Boxplot box
 8. Choose Continue
 9. Choose Plots for the Display option
 10. Choose OK
                                                                                                34


Hypothesis Tests & Confidence Intervals

One-Sample t Test
1. Choose Analyze from the menu bar.
2. Choose Compare Means
3. Choose One-Sample T Test...
4. Test Variable(s): Select the variable you want from the source list on the left, highlight
   variables by pointing and clicking the mouse and then click on the arrow located in the
   middle of the window.
5. Edit the Test Value. The Test Value is the value of the mean under the null hypothesis. The
   default value is zero.
6. Choose OK

Confidence Interval for a Mean (from one sample of data)
1. Choose Analyze from the menu bar.
2. Choose Compare Means
3. Choose One-Sample T Test...
4. Test Variable(s): Select the variable you want from the source list on the left, highlight
   variables by pointing and clicking the mouse and then click on the arrow located in the
   middle of the window.
5. The Test Value should be 0, which is the default value.
6. By default a 95% confidence interval will be computed. Choose Options… to change the
   confidence level.
7. Choose OK


SIDS Example. There were 48 SIDS cases in King County, Washington, during the years
1974 and 1975. The birth weights (in grams) of these 48 cases were:

       2466   3941    2807   3118    2098   3175
       3317   3742    3062   3033    2353   3515
       2013   3515    3260   2892    1616   4423             The mean (and standard
       2750   2807    2807   3005    3374   3572             deviation) of these
       2722   2495    3459   3374    1984   2495             measurements is 2891 (623)
       3005   2608    2353   4394    3232   3062             grams.
       2013   2551    2977   3118    2637   1503
       2722   2863    2013   3232    2863   2438

We want to know if the mean birth weight in the population of SIDS infant is different
from that of normal children, 3300 grams. We could construct a 95% confidence interval,
to see if the interval contains the value of 3300 grams or we could perform a one sample t
test to test if the mean in the SIDs population is equal to 3300 (versus not equal to 3300).
                                                                                                    35


To construct a 95% confidence interval



                                                                When computing the
                                                                interval for a mean make
                                                                sure the Test Value is 0.




                       One-Sample Statistics                               Number of subjects, mean,
                                                                           standard deviation, and standard
                                                          Std. Error
                  N          Mean       Std. Deviation      Mean           error of the mean.
birth weight          48   2891.1250        623.39177      89.97885

                                        One-Sample Test


                                               Test Value = 0
                                                                       95% Confidence Interval
                                                                          of the Difference
                                                            Mean
                   t          df        Sig. (2-tailed)   Difference     Lower        Upper
birth weight      32.131           47              .000   2891.12500   2710.1109    3072.1391



          Ignore the t test results                         95% confidence interval for the
          (t, df, sig.) because these                        mean birth weight is 2710 to
          results are for testing if                                 3072 grams
          the mean birth weight is
          equal to 0 (versus not
          equal to zero).
                                                                                                        36


To perform a one sample t test to test if the mean in the SIDs population is equal
to 3300 versus not equal to 3300.



                                                                          To run the one-sample t
                                                                          test to test if the mean
                                                                          birth weight is equal to
                                                                          3300 you need to change
                                                                          the Test Value from the
                                                                          default value of 0 to 3300.
                                        One-Sample Statistics

                                                        Std. Error
                N         Mean       Std. Deviation       Mean
birth weight        48   2891.1250        623.39177      89.97885

                                     One-Sample Test


                                          Test Value = 3300
                                                                         95% Confidence
                                                                          Interval of the
                                                                            Difference
                                                         Mean
                 t         df        Sig. (2-tailed)   Difference        Lower       Upper
birth weight    -4.544          47              .000   -408.87500      -589.8891   -227.8609



   Sig. (2-tailed) = two tailed p-value = <.001
                                                                     Ignore the results for 95%
                                                                     confidence interval of the
   t = test statistic value = -4.544
                                                                     difference, because it is the
                                                                     confidence interval for the
   df = degrees of freedom = 47
                                                                     mean minus 3300.
                                                                                                37


Paired t Test
1. Choose Analyze from the menu bar.
2. Choose Compare Means
3. Choose Paired-Samples T Test...
4. Paired Variable(s): Select two paired variables you want from the source list on the left,
   highlight both variables by pointing and clicking the mouse and then click on the arrow
   located in the middle of the window. Repeat the process until you have selected all the
   paired variables you want to test.
5. Choose OK


Confidence Interval for the Difference Between Means from Paired Sample
By default a 95% confidence interval for the difference means of the paired samples will be
computed when performing a paired t test. Choose Options… to change the confidence level.



Prozac Example. To compare the effect of Prozac on anxiety 10 subjects are given one
week of treatment with Prozac and one week of treatment with a placebo. The order of
the treatments was randomized for each subject. An anxiety questionnaire was used to
measure a subject's anxiety on a scale of 0 to 30. Higher scores indicate more anxiety.


                   Subject          Placebo          Prozac        Difference
                       1               22              19               3
                       2               18              11               7
                       3               17              14               3
                       4               19              17               2
                       5               22              23               -1
                       6               12              11               1
                       7               14              15               -1
                       8               11              19               -8
                       9               19              11               8
                      10                7               8               -1
              Mean difference, d  1.3
              Standard deviation, sd  4.5
                                                                                                               38


Paired t test and confidence interval for the difference between paired means.



                                                                          The order of the variables in
                                                                          calculating the difference is
                                                                          determined by the order of
                                                                          the variables in the data set
                                                                          (and not the order in which
                                                                          you select the variables).


                                         Paired Samples Statistics

                                                                   Std. Error            Summaries for each
                     Mean           N          Std. Deviation        Mean
                                                                                         sample of data (or
Pair 1   placebo      16.1000            10          4.95424         1.56667
         prozac
                                                                                         variable).
                      14.8000            10          4.68568         1.48174

                    Paired Samples Correlations

                                                                                Correlation between the paired
                                N           Correlation         Sig.
Pair 1   placebo & prozac           10             .556            .095         values - usually not useful.

                                              Paired Samples Test

                                                                                                              Sig. (2-
                                              Paired Differences                               t     df       tailed)
                                  Std.          Std. Error     95% Confidence Interval of
                     Mean       Deviation         Mean              the Difference

                                                                  Lower          Upper
Pair 1   placebo
                     1.30000      4.54728          1.43798        -1.95293         4.55293    .904        9       .390
         - prozac




                                               95% confidence interval for the
 difference = placebo - prozac                 mean difference is -1.9 to 4.6

 mean difference = 1.3
                                                          Paired t test
 standard deviation of the
 differences = 4.5                                        Sig. (2 tailed) = two-sided p-value = 0.39

 standard error of the                                    t = test statistic value = .904
 differences = 1.4
                                                          df = degrees of freedom
                                                                                                    39


Two-Sample t Test
1.  Choose Analyze on the menu bar.
2.  Choose Compare Means
3.  Choose Independent-Samples T Test...
4.  Test Variable(s): Select the test variable you want from the source list on the left and then
    click on the arrow located next to the test variable box. Repeat the process until you have
    selected all the variables you want.
5. Grouping Variable: Select the variable which defines the groups and then click on the
    arrow located next to the grouping variable box.
6. Choose Define Groups...
7. Click on blank box next to Group 1, then enter the code value (numeric or
    character/string) for group 1.
8. Click on blank box next to Group 2, then enter the code value (numeric or
    character/string) for group 2.
9. Choose Continue
10. Choose OK


Confidence Interval for the Difference Between Means from Independent
Samples
By default a 95% confidence interval for the difference means from two independent samples
will be computed when performing a two sample t test. Choose Options… to change the
confidence level.


Model Cities Example. Two groups of people were studied - those who had been randomly
allocated to a Fee-For-Service medical insurance group and those who had been randomly
allocated to a Prepaid insurance group.

We would like to compare the two groups on the quality of health care they received in
each group, but first we would like to know how comparable the groups are on other
characteristics that might affect medical outcome. For example, we would like to know if
the mean age in the two groups is similar. Hopefully, the process of random allocation
minimizes this possibility, but there is always a chance that it didn't.



     Group                           n           Mean              Standard
                                                                   deviation
     Prepaid (GHC)                 1167           24.0                15.3
     Fee-for-service (KCM)         3207           26.4                17.1

We could compare the average age between the two groups using a two sample t test or a
confidence interval for the difference between the average ages of the two groups.
                                                                                                       40




Two sample t test and 95% confidence interval for the difference between means
(from independent samples).



                                                           After you select the Grouping Variable,
                                                           SPSS will put in question marks to
                                                           prompt you to define the code values for
                                                           the two groups. Select Define Groups…
                                                           to enter the code values.




                                                         In this example the group codes
                                                         are numeric, 0 (for GHC) and 1 (for
                                                         KCM)




T-Test
                             Group Statistics

                                                               Std. Error         Summaries for each
        prov        N            Mean       Std. Deviation       Mean             sample/group.
age     GHC           1167       23.9846            15.30787      .44810
        KCM           3207       26.3676            17.10260      .30200



Independent Samples Test

                               Levene's Test for
                              Equality of Variances              SPSS by default tests if the
                                                                 variances are equal using Levene’s
                                 F           Sig.
                                                                 test. A small p-value (sig.)
age     Equal variances                                          indicates the variances may be
                                47.068          .000
        assumed                                                  different.
        Equal variances
        not assumed
                                                                 sig. = p-value = <.001

                                                                 F = test statistic value = 47.0
                                                                                                 41

Independent Samples Test



                                               t-test for Equality of Means
                                                                        Mean        Std. Error
                             t           df         Sig. (2-tailed)   Difference    Difference
age     Equal variances
        assumed              -4.188        4372               .000       -2.38306       .56896
        Equal variances
        not assumed          -4.410   2293.698                .000       -2.38306       .54037




 Two Sample t test. SPSS by default always performs both versions of the two
 sample t test assuming equal variance and unequal variances

 Sig. (2 – tailed) = two sided p-value = <.001 (equal var.), <.001 (unequal var.)

 t = test statistic value = -4.2 (equal var.), -4.4 (unequal var.)

 df = degrees of freedom = 4372 (equal var.), 2294 (unequal var.)

 mean difference = difference between means = -2.4 (equal and unequal var.)

 std. error difference = standard error of the difference between means = .6 (equal
 var.), .5 (unequal var.)


Independent Samples Test

                            95% Confidence                      95% confidence interval for
                             Interval of the                    the difference between means
                               Difference
                                                                is
                           Lower       Upper
age     Equal variances                                         -3.4 to -1.3 (assuming equal
        assumed            -3.49851    -1.26760
                                                                variances)
        Equal variances
        not assumed        -3.44273    -1.32338
                                                                -3.4 to -1.3 (assuming unequal
                                                                variances)
                                                                                                42


Sign Test and Wilcoxon Signed-Rank Test
1.   Choose Analyze from the menu bar.
2.   Choose Nonparametric Tests
3.   Choose 2 Related Samples...
4.   Test Pair(s) List: Select two paired variables you want from the source list on the left hand
     side, highlight both variables by pointing and clicking the mouse and then click on the arrow
     located in the middle of the window. Repeat the process until you have selected all the
     paired variables you want to test.
5.   Choose Sign as the Test Type.
6.   or
7.   Choose Wilcoxon as the Test Type.
8.   Choose OK




Aspirin Example. To compare 2 types of Aspirin, A and B, 1 hour urine samples were
collected from 10 people after each had taken either A or B. A week later the same
routine was followed after giving the “other” type to the same 10 people.

                       Person             Type A       Type B         Difference
                          1               15           13              2
                          2               26           20              6
                          3               13           10              3
                          4               28           21              7
                          5               17           17              0
                          6               20           22             -2
                          7                7            5              2
                          8               36           30              6
                          9               12            7              5
                         10               18           11              7

                                Mean = 19.2            15.6            3.6 = d
                 Standard deviation =      8.63         7.78           3.098 = s d


A Sign test or Wilcoxon Signed Rank test could be used to compare the two types of
Aspirin.
                                                                                                                43



                                                                              The order of the variables in
                                                                              calculating the difference is
                                                                              determined by the order of the
                                                                              variables in the data set (and
                                                                              not the order in which you
                                                                              select the variables).

                                                                              Select Wilcoxon or Sign (or
                                                                              both)

 Under Options you can select summaries
 Descriptive (n, mean, etc.) and Quartiles
 (median, 25th and 75th percentile)

                                                  Descriptive Statistics

                                                                                                  Percentiles
                  N               Mean      Std. Deviation    Minimum      Maximum    25th      50th (Median)        75th
 aspirina                10       19.2000         8.62554          7.00       36.00   12.7500         17.5000        26.5000
 aspirinb                10       15.6000         7.77746          5.00       30.00    9.2500         15.0000        21.2500



Sign Test
                              Frequencies

                                                        N
 aspirinb - aspirina      Negative
                                                              8
                          Differences(a)
                          Positive
                                                              1
                          Differences(b)
                          Ties(c)                             1
                          Total                              10
a aspirinb < aspirina
b aspirinb > aspirina
c aspirinb = aspirina                                        Sign Test
            Test Statistics(b)
                                                             Exact sig. (2-tailed) = exact, two-sided
                              aspirinb -                     p-value = 0.039
                               aspirina
 Exact Sig. (2-tailed)      .039(a)
                                                             The p-value is exact because it is
a Binomial distribution used.
b Sign Test                                                  computed using the Binomial
                                                             distribution instead of using an
                                                             approximation to the Normal
                                                             distribution.
                                                                                                   44

                                                                                       Information
Wilcoxon Signed Ranks Test
                                           Ranks                                       used in the
                                                                                       test statistic
                                              N           Mean Rank    Sum of Ranks    – not usually
 aspirinb - aspirina      Negative Ranks           8(a)         5.38           43.00   reported; use
                          Positive Ranks           1(b)         2.00            2.00
                                                                                       the previous
                          Ties                     1(c)
                          Total                                                        descriptives.
                                                    10
a aspirinb < aspirina
b aspirinb > aspirina
c aspirinb = aspirina

             Test Statistics(b)                Wilcoxon Signed Rank Test

                              aspirinb -       Asymp. Sig. (2-tailed) = two sided p-value = 0.015
                               aspirina
 Z                             -2.442(a)
 Asymp. Sig. (2-tailed)            .015        Asymp. is an abbreviation for asymptotic, which
a Based on positive ranks.                     means the p-value is computed using a large sample
b Wilcoxon Signed Ranks Test                   approximation based on the Normal distribution.
                                                                                               45



Mann-Whitney U Test (or Wilcoxon Rank Sum Test)
1.  Choose Analyze on the menu bar.
2.  Choose Nonparametric Tests
3.  Choose 2 Independent Samples...
4.  Test Variable(s): Select the test variable you want from the source list on the left and then
    click on the arrow located next to the test variable box. Repeat the process until you have
    selected all the variables you want.
5. Grouping Variable: Select the variable which defines the grouping and then click on the
    arrow located next to the grouping variable box. The grouping variable must be numeric for
    the variable to appear on the left hand side.
6. Choose Define Groups...
7. Click on the blank box next to group 1, then enter the code value (it must be numeric) for
    group 1.
8. Click on the blank box next to group 2, then enter the code value (it must be numeric) for
    group 2.
9. Choose Continue to return to Two Independent Samples dialog box.
10. Choose Mann-Whitney U as the Test Type. Note that the option may already be selected if
    the little box is not empty.
11. Choose OK


Legionnaires Example. During July and August, 1976, a large number of Legionnaires
attending a convention died of mysterious and unknown cause. Chen et al. (1977) examined
the hypothesis of nickel contamination as a toxin. They examined the nickel levels in the
lungs of nine cases and nine controls. There was no attempt to match cases and controls.
The data are as follows (μg/100g dry weight):

Legionnaire cases 65 24 52 86 120 82 399 87 139
Controls          12 10 31 6 5 5 29 9 12

The Mann Whitney U test could be used to compare the two groups.


                                                        After you select the Grouping
                                                        Variable, SPSS will put in question
                                                        marks to prompt you to define the
                                                        code values for the two groups.
                                                        Select Define Groups… to enter the
                                                        code values.

                                                        Note: The codes must be numeric,
                                                        otherwise the grouping variable will
                                                        not appear on the left hand side.
                                                                                                          46



                                                     In this example the group codes are
                                                     1 for legionnaires and 2 for controls.



Mann-Whitney Test                                                         Information used in the test
                                   Ranks
                                                                          statistic – not usually reported.
                                                                          The descriptives under Options
          group            N          Mean Rank       Sum of Ranks
 nickel   1                    9             13.78           124.00       are not useful; you can produce
          2                    9              5.22            47.00       relevant descriptives (e.g.
          Total                18                                         median and interquartile range
                                                                          for each group) using the
              Test Statistics(b)
                                                                          Explore command.
                               nickel
 Mann-Whitney U                   2.000
 Wilcoxon W                        47.000             Mann Whitney test
 Z                                  -3.403
 Asymp. Sig. (2-tailed)               .001            Asymp. Sig. (2-tailed) = two-sided p-value =
 Exact Sig. [2*(1-tailed
 Sig.)]                            .000(a)            0.001
a Not corrected for ties.
b Grouping Variable: group                            This p-value is computed based a large
                                                      sample approximation to the Normal
                                                      distribution and it corrects for ties in the
                                                      data, if present.

                                                      Exact Sig. [2*(1-tailed Sig.)] = two-sided p-
                                                      value = <.001

                                                      This p-value is an exact p-value, but it does
                                                      not correct for ties in the data, if present.

                                                      In this example, given the small sample sizes
                                                      and few ties in the data, the exact p-value
                                                      would be appropriate to report.
                                                                                                47


One-way ANOVA (Analysis of Variance) (E.g., to compare two or more means
from two or more independent samples)
1. Choose Analyze on the menu bar
2. Choose Compare Means
3. Choose One-Way ANOVA...
4. Dependent: Select the variable from the source list on the left for which you want to use to
   compare the groups and then click on the arrow next to the dependent variable box. You run
   multiple one-way ANOVAs by selecting more than one dependent variable.
5. Factor: Select the variable from the source list on the left which defines the groups.
6. Choose OK

To perform pairwise comparisons to determine which groups are different while controlling for
multiple testing use the Post Hoc... option. There are many methods to choose from (e.g.,
Bonferroni and R-E-G-W-Q).

Other useful options can be found under Options... For example, choose Descriptive to get
descriptive statistics for each group (e.g., mean, standard deviation, minimum value, and
maximum value). Choose Homogeneity-of-variance to perform the Levene Test to test if the
group variances are all equal versus not all equal. A small p-value for the Levene's Test may
indicate that the variances are not all equal.

CHD Example. We can use one-way ANOVA to compare HDL levels between subjects with
different hypertensive status (0=normotensive, 1=borderline, 2=definite)

                    Hypertensive                                      Standard
                       Group                    n         Mean        Deviation
                    Normotensive              1568        55.8          15.5
                     Borderline               547         55.7          16.2
                      Definite                1310        53.5          15.2




                                                      You can select 1 or more variables to
                                                      compare between groups.



                                                      The variable selected as the Factor
                                                      defines the groups. The variable can
                                                      be numeric or character/string.
                                                                                                        48


Oneway
                                         ANOVA

HDL cholesterol
                      Sum of
                      Squares       df        Mean Square         F           Sig.
 Between Groups        4344.834           2       2172.417        9.045          .000
 Within Groups     821904.577        3422          240.183
 Total             826249.411        3424


  One-way analysis of variance

  Sig. = p-value = <.001

  F = test statistic = 9.0; df = degrees of freedom

  Sometimes the test statistic and degrees of freedom of the test statistics are
  reported along with the p-value; in this example, F=9.0 with degrees of freedom 2
  and 3422. Sum of squares and mean square are used to compute the test statistic;
  they are usually not reported.



Descriptives
                      Under Options you can request Descriptives for each group to be
                      computed. This information can be used to describe the differences
                      between the groups.

HDL cholesterol
                                    Std.       Std.     95% Confidence Interval for
                  N      Mean     Deviation    Error              Mean                  Minimum   Maximum

                                                        Lower Bound    Upper Bound
 normotensive     1568    55.82      15.500      .391          55.05          56.59          21       138
 borderline        547    55.67      16.202      .693         54.30           57.03          24       149
 definite         1310    53.47      15.192      .420         52.64           54.29          15       129
 Total            3425    54.90      15.534      .265         54.38           55.42          15       149
                                                                                                                          49


Post Hoc Tests
Under Post Hoc… you can request further comparisons be done between each of the
possible pair of groups to determine which groups are different from each other. These
are multiple comparison procedures, which control for the number of tests/comparison
being performed. There are many methods to choose from; below is an example of the
Bonferroni method and Ryan-Einot-Gabriel-Welsch method.

Multiple Comparisons

Dependent Variable: HDL cholesterol
             (I)              (J)                      Mean
             Hypertension     Hypertension           Difference       Std.
             status           status                     (I-J)        Error       Sig.          95% Confidence Interval

                                                                                              Lower Bound    Upper Bound
 Bonferroni    normotensive        borderline               .157        .770      1.000              -1.69           2.00
                                   definite              2.356(*)       .580       .000                .97           3.74
               borderline          normotensive             -.157       .770      1.000              -2.00           1.69
                                   definite               2.198(*)      .789        .016               .31           4.09
               definite            normotensive          -2.356(*)      .580        .000             -3.74           -.97
                                   borderline            -2.198(*)      .789        .016             -4.09           -.31
* The mean difference is significant at the .05 level.

The Bonferroni method is a method that shows all pairwise comparisons/differences along
with a p-value (sig.) adjusted for the number of comparisons. In this example, subjects
with normal blood pressure and borderline hypertension have similar HDL cholesterol
levels, but subjects with definite hypertension have different HDL cholesterol levels than
both subjects with normal blood pressure and borderline hypertension.

Homogeneous Subsets
                                         HDL cholesterol

                                                                      Subset for alpha = .05
                            Hypertension status             N            1               2
 Ryan-Einot-Gabriel-        definite                         1310         53.47
 Welsch Range               borderline                          547                      55.67
                            normotensive                     1568                        55.82
                            Sig.                                          1.000              .867
Means for groups in homogeneous subsets are displayed.

The Ryan-Einot-Gabriel-Welsch (R-E-G-W-Q) method is a method that groups together
groups that are similar in the same subset and groups that are different are in different
subsets. In this example, subjects with normal blood pressure and borderline
hypertension are in one subset and subjects with definite hypertension are in a different
subset. Hence, subjects with definite hypertension have different HDL cholesterol levels
than subjects with normal blood pressure and borderline hypertension, but subjects with
normal blood pressure and borderline hypertension have similar HDL cholesterol levels.
                                                                                                    50



Kruskal-Wallis Test
1.  Choose Analyze on the menu bar.
2.  Choose Nonparametric Tests
3.  Choose K Independent Samples...
4.  Test Variable(s): Select the test variable you want from the source list on the left and then
    click on the arrow located next to the test variable box. Repeat the process until you have
    selected all the variables you want to test.
5. Grouping Variable: Select the variable which defines the grouping and then click on the
    arrow located next to the grouping variable box.
6. Choose Define Range...
7. Click on the blank box next to Minimum, then enter the smallest numeric code value for
    the groups.
8. Click on the blank box next to Maximum, then enter the largest numeric code value for the
    groups.
9. Choose Continue
10. Choose Kruskal-Wallis H as the Test Type. Note that the option may already be selected if
    the little box is not empty.
11. Choose OK

CAUTION: The group variable must be numeric and you must correctly enter the smallest
numeric code value and the largest numeric code value. SPSS will allow you to select a
character/string variable as the grouping variable, as well as allow you to incorrectly enter the
numeric code values. The results displayed for the Kruskal Wallis test in these cases will be
incorrect, but no error or warning message will be displayed.

CHD Example. We can use one-way ANOVA to compare serum insulin levels between
subjects with different hypertensive status (0=normotensive, 1=borderline, 2=definite)

                     Hypertensive
                        Group                     n        Median         IQR*
                     Normotensive               1568        12            9, 15
                      Borderline                547         12            9, 17
                       Definite                 1310        14            11, 20

               *IQR, interquartile range = 25th percentile, 75th percentile
                                                                                                   51


Kruskal Wallis test

                                                              You can select 1 or more
                                                              variables to compare between
                                                              groups.

                                                              The variable selected as the
                                                              Grouping Variable defines the
                                                              groups. THE VARIABLE
                                                              SHOULD BE NUMERIC.



                                            In this example the smallest numeric
                                            code is 0 (for normal) and the largest
                                            numeric code is 2 (for definite).




Kruskal-Wallis Test                                                   Information used in the test
                                    Ranks                             statistic – not usually reported.
                                                                      The descriptives under Options
                  Hypertension status       N         Mean Rank
                                                                      are not useful; you can produce
 Serum insulin    normotensive               1568        1526.31
                                                                      relevant descriptives (e.g.
                  borderline                    547      1685.28
                  definite                   1310        1948.03
                                                                      median and interquartile range
                  Total                      3425                     for each group) using the
                                                                      Explore command.
                 Test Statistics(a,b)

                 Serum insulin                  Kruskal Wallis test
 Chi-Square           130.816
                                                Asymp. Sig. = p-value = <.001
 df                            2
 Asymp. Sig.                 .000
a Kruskal Wallis Test
b Grouping Variable: Hypertension status        Asymp. is an abbreviation for asymptotic,
                                                which means the p-value is computed
                                                using a large sample approximation based
                                                on the Normal distribution.

                                                Chi-Square = test statistic value = 130.8

                                                Df = degrees of freedom = 2
                                                                                                                     52


One-Sample Binomial Test
1. Choose Analyze from the menu bar.
2. Choose Nonparametric Tests
3. Choose Binomial...
4. Test Variable List: Select the test variable you want from the source list on the left and then
   click on the arrow located next to the test variable box. Repeat the process until you have
   selected all the variables you want.
5. Test Proportion: Click on the box next to Test Proportion and enter/edit the proportion
   value specified by your null hypothesis.
6. Choose OK

Example. In the TRAP study, 125 patients of the 527 patients who were negative for
lymphocytotoxic antibodies at baseline became antibody positive. The expected rate for
being antibody positive is 30%. We could use the one-sample binomial test to test if the
rate is different in the TRAP study population.



                                                                                            Positive is a variable
                                                                                            coded 1 if positive and 0
                                                                                            if negative.



                                                                                            Make sure to edit the
                                                                                            test proportion value.
                                                                                            This case .30 or 30%.
                                                                                            The default is .50.




NPar Tests
                                            Binomial Test

                                                       Observed                         Asymp. Sig.
                          Category          N           Prop.        Test Prop.          (1-tailed)
 positive    Group 1     yes                    125           .24            .3             .001(a,b)
             Group 2     no                     402            .76
             Total                            527             1.0
a Alternative hypothesis states that the proportion of cases in the first group < .3.
b Based on Z Approximation.



  One-sample binomial test, two-sided p-value given by 2 x .001 = .002
  (Note: SPSS reports the one-sided p-value).
                                                                                                   53


McNemar's Test
1.  Choose Analyze from the menu bar.
2.  Choose Descriptive Statistics
3.  Choose Crosstabs...
4.  Row(s): Select the row variable you want from the source list on the left and then click on
    the arrow located next to the Row(s) box. Repeat the process until you have selected all the
    row variables you want.
5. Column(s): Select the column variable you want from the source list on the left and then
    click on the arrow located next to the Column(s) box. Repeat the process until you have
    selected all the column variables you want.
6. Choose Cells...
7. For cell values choose total under percentages.
8. Choose Continue
9. Choose Statistics...
10. Choose McNemar
11. Choose Continue
12. Choose OK

There is also another way to run McNemar’s test (but the test pair variables must be numeric and
an asymptotic (Asymp.) p-value, based a large sample approximation based on the Normal
distribution, is reported instead of a p-value based on exact methods).
1. Choose Analyze from the menu bar.
2. Choose Nonparametric Tests
3. Choose 2 Related Samples...
4. Test Pair(s) List: Select two paired variables you want from the source list on the left,
   highlight both variables by pointing and clicking the mouse and then click on the arrow
   located in the middle of the window. Repeat the process until you have selected all the
   paired variables you want.
5. Choose McNemar as the Test Type.
6. Choose Wilcoxon to turn off the option. Note that the option is turned off when the little box
   is empty.
7. Choose OK

Example. Suppose we want to compare two different treatments for a rare form of
cancer. Since relatively few cases of this disease are seen, we want the two treatment
groups to be as comparable as possible. To accomplish this goal, we set up a matched study
such that a random member of each matched pair gets treatment A (chemotherapy),
whereas the other member gets treatment B (surgery). The patients are assigned to pairs
(621 pairs) matched on age (within 5 years), sex, and clinical condition. The patients are
followed for 5 years, with survival as the outcome variable.

The 5-year survival rate for treatment A is 17.1% (106/621) and for treatment B is 15.3%
(95/621). We could use McNemar’s test to compare the survival rate of the two
treatments.
                                                                                                              54


McNemar’s test
                                                            It doesn’t matter for McNemar’s
                                                            test which variable is selected for
                                                            the Row(s): or Columns(s). You can
                                                            run more than one test at a time.




                                                           Under
                                                           Statistics…
                                                           select
                                                           McNemar.

                                                           Under Cells…,
                                                           in this
                                                           example,
                                                           select Total
                                                           percentages.


Crosstabs
                    TreatmentA * TreatmentB Crosstabulation

                                                    TreatmentB              Total      Survival rate for
                                               died          survived
                                                                                       Treatment A is
 TreatmentA     died        Count                  510                  5       515    17.1%
                            % of Total             82.1%           .8%       82.9%
                survived    Count                     16             90         106    Survival rate for
                            % of Total             2.6%          14.5%       17.1%     Treatment B is
 Total                      Count                    526            95          621    15.3%
                            % of Total             84.7%         15.3%      100.0%

                Chi-Square Tests
                                                           McNemar’s test
                                      Exact Sig.
                                      (2-sided)
                        Value                              Exact Sig. (2-sided) = exact two-sided p-value
 McNemar Test                            .027(a)
                                                           = 0.027
 N of Valid Cases               621
a Binomial distribution used.                              The p-value is exact because it is computed
                                                           using the Binomial distribution instead of using
                                                           an approximation to the Normal distribution.
                                                                                                  55


Chi-square Test, Fisher’s Exact test and Trend test for Contingency Tables
If the Chi-square test is requested for a 2 x 2 table, SPSS will also compute the Fisher's Exact
test. If the Chi-square test is requested for a table larger than 2 x 2, SPSS will also compute the
Mantel-Haenszel test for linear or linear by linear association between the row and column
variables.

1.  Choose Analyze from the menu bar.
2.  Choose Descriptive Statistics
3.  Choose Crosstabs...
4.  Row(s): Select the row variable you want from the source list on the left and then click on
    the arrow located next to the Row(s) box. Repeat the process until you have selected all the
    row variables you want.
5. Column(s): Select the column variable you want from the source list on the left and then
    click on the arrow located next to the Column(s) box. Repeat the process until you have
    selected all the column variables you want.
6. Choose Cells...
7. Choose the cell values (e.g., observed and expected counts; row, column, and margin (total)
    percentages). Note the option is selected when the little box is not empty.
8. Choose Continue
9. Choose Statistics...
10. Choose Chi-square
11. Choose Continue
12. Choose OK

Asthma Example. An investigator studied the relationship of parental smoking habits and
the presence of asthma in the oldest child. Type A families are defined as those in which
both parents smoke and Type B families are those in which neither parent smokes. Of 100
type A families, 15 eldest children have asthma, and of 200 type B families, 6 children
have asthma. We could use a chi-square test or Fisher’s exact test to test if the
proportion of first born children with asthma different in these two types of families?



                                                  It doesn’t matter for the chi-square,
                                                  Fisher’s Exact or trend test which
                                                  variable is selected for the Row(s): or
                                                  Columns(s). You can run more than one
                                                  test at a time.
                                                                                                                  56



                                                         Under
                                                         Statistics…
                                                         select Chi-
                                                         square.

                                                         Under Cells…,
                                                         in this
                                                         example,
                                                         select Row
                                                         percentages.

Crosstabs
                        familytype * asthma Crosstabulation

                                                    asthma            Total         15% of first born in family
                                                  No        Yes                     type A have asthma
 familytype   A        Count                        85         15        100
                       % within familytype       85.0%      15.0%     100.0%
                                                                                    3% of first borin in family
              B        Count                       194            6       200
                       % within familytype                                          type B have asthma
                                                 97.0%      3.0%      100.0%
 Total                 Count                       279         21         300
                       % within familytype       93.0%      7.0%      100.0%

Chi-Square Tests
                                                 Asymp.                       Exact Sig.
                                                 Sig. (2-    Exact Sig.       (1-sided)        Fisher’s Exact test
                           Value        df       sided)      (2-sided)
 Pearson Chi-Square       14.747(b)          1       .000
 Continuity                                                                                    Exact Sig. (2-sided)
                               12.961        1       .000
 Correction(a)                                                                                 = exact two-side p-
 Likelihood Ratio              13.745        1       .000
 Fisher's Exact Test
                                                                                               value = <.001
                                                              .000        .000
 N of Valid Cases               300
a Computed only for a 2x2 table
b 0 cells (.0%) have expected count less than 5. The minimum expected count is 7.00.


Chi-square test
      Pearson Chi-square (without continuity correction), p-value = <.001
      Pearson Chi-square with continuity correction, p-value = <.001

Asymp. Sig. (2-sided) = two-sided p-value. Asymp. is an abbreviation for asymptotic, which
means the p-value is computed using a large sample approximation based on the Normal
distribution. Check that all cells have expected cell counts 5 or greater.

Value = test statistic value
df = degrees of freedom
                                                                                                            57


Trend Test Example. A clinical trial of a drug therapy to control pain was
performed. The investigators wanted to investigate whether adverse responses to
the drug increased with larger drug doses. Subjects received either a placebo or
one of four drug doses. In this example dose is an ordinal variable, and it
reasonable to expect that as the dose increases and rate of adverse events will
increase.
                                                                       Adverse event
                               Dose                       n               % (n)
                                Placebo                  32           18.8% (6)
                                500 mg                   32           21.9% (7)
                                1000 mg                  32           28.1% (9)
                                2000 mg                  32           31.3% (10)
                                4000 mg                  32           50.0% (16)

There are several different methods for performing a trend test with ordinal
variables. One test, which is available in SPSS is the Mantel-Haenszel chi-square,
also called the Mantel-Haenszel test for linear association or linear by linear
association chi-square test.


                                          Adverse events
                                          No             Yes           Total
 dose    0          Count                       26             6               32
                    % within dose         81.3%           18.8%            100.0%
         500        Count                       25             7               32
                    % within dose         78.1%           21.9%            100.0%
         1000       Count                       23             9               32
                    % within dose         71.9%           28.1%            100.0%
         2000       Count                       22             10              32
                    % within dose         68.8%           31.3%            100.0%
         4000       Count                       16             16              32
                    % within dose         50.0%           50.0%            100.0%
 Total              Count                      112             48             160
                    % within dose         70.0%           30.0%            100.0%

                            Chi-Square Tests
                                                                                In this example, there is a
                                                         Asymp. Sig.            significant trend (p-value =
                             Value             df         (2-sided)             0.003, chi-square trend test),
 Pearson Chi-Square          9.107(a)                4              .058
 Likelihood Ratio               8.836                4              .065
                                                                                and we would conclude that
 Linear-by-Linear
                                8.876                1              .003        the rate of adverse responses
 Association
 N of Valid Cases                                                               increases with drug dose.
                                    160
a 0 cells (.0%) have expected count less than 5. The minimum expected count is 9.60.
                                                                                                                 58


Using Standardized Residuals in R x C tables. When the contingency table has
more then 2 rows and 2 columns it can be hard to determine the association or the
largest differences. Standard residuals are often helpful in describing the
association, if the chi-square test indicates there is a statistically significant
association. The (adjusted) standardized residual re-expresses the difference
between the observed cell count and expected cell count in terms of standard
deviation units below or above the value 0 (the expected differences if there is no
association), and the distribution of the standardized residuals has a standard
Normal distribution. Hence, values less than -2 or greater than 2 indicate large
differences and values less than -3 or greater than 3 indicate very large
differences.


                                                      Under Cells…, select Adjusted
                                                      standardized for Residuals




Education vs Stage of Disease at Diagnosis Example. The chi-square indicated a
significant association between education level and stage of disease at diagnosis (
Chi-square test, p-value = 0.016).
                                                                                The adjusted standardized
                                                Stage of Disease
 Education                                  I            II          III
                                                                                residuals indicate the biggest
   ≤12 years          Count                     20            24           35   difference between the
                      % within education   25.3%       30.4%        44.3%       observed and expected cell
                                                                                counts (i.e., the most unusual
                      Adjusted Residual     -2.6              -.5      3.3
   College            Count
                                                                                differences under the
                                                37            32           23
                      % within education   40.2%       34.8%        25.0%
                      Adjusted Residual          .8           .6      -1.4      assumption of no association
   College graduate   Count                     40            29           21   between education and stage
                      % within education   44.4%       32.2%        23.3%       of disease) are for subjects
                      Adjusted Residual         1.8           -.1     -1.8
                                                                                with ≤12 years of education,
where there are fewer subjects with Stage I and more subjects with Stage III or
IV than expected if there was no association between education and stage of
disease. Also, to a lesser extent, among the subjects with a college graduate
degree there a more subjects with Stage I and fewer subject with Stage III or
IV than expected if there was no association between education and stage of
disease.
                                                                                                 59


One sample binomial test, McNemar's test, Fisher's Exact test and Chi-square
test for 2 x 2 and R x C Contingency Tables Using Summary Data
There is an easy way in SPSS to perform a one sample binomial test, a McNemar's test, a
Fisher's Exact test or a Chi-square test for a 2 x 2 or R x C table when you only have summary
data (i.e., the number of observations in each cell).

One sample binomial test. Suppose you observe 15 cases of myocardial infarction (MI) in 5000
men over a 1 year period and you want to test if the rate of MI is equal to a previously reported
incidence rate of 5 per 1000 (or 0.005).

1. In a new (empty) SPSS Data Editor window enter the following 2
  rows of data:

   MI   Observed
   0    4985
   1    15

   The values of 0 and 1 used to indicate MI (no/yes) are arbitrary. The variable names are also
   arbitrary (e.g., you can leave them as var0001 and var0002).

2. Next, you want to weight cases by Observed:

   Choose Data
   Choose Weight Cases...
   Choose Weight cases by
   Choose Observed and then the arrow button so the variable appears in the Frequency variable
   box.
   Choose OK

3. Now, run the one sample binomial test:
   Choose Analyze
   Choose Nonparametric Tests
   Choose Binomial...
   Choose MI so that in appears in the Test Variable List
   Change (edit) Test Proportion to .005.
   Choose OK
                                                                                             60


McNemar's test. Suppose you have the following summary table of presence and absence of
DKA before and after therapy for paired data,

                                           After therapy
                                      No DKA          DKA
              Before     No DKA         128            7
             therapy     DKA             19            7

1. In a new (empty) SPSS Data Editor window enter the following 4
  rows of data:

   Before After Observed
   1      1     128
   1      0      19
   0      1       7
   0      0       7

   The values of 0 and 1 used to indicate DKA and no DKA are arbitrary. The variable names
   are also arbitrary (e.g., you can leave them as var0001, var0002, and var0003).

2. Next, you want to weight cases by Observed:
   Choose Data
   Choose Weight Cases...
   Choose Weight cases by
   Choose Observed and then the arrow button so the variable appears in the Frequency variable
   box.
   Choose OK

3. Now, run McNemar's test:
   Choose Analyze
   Choose Nonparametric Tests
   Choose 2 Related Samples...
   Choose Before and After so that they appear in the Test Pair(s) List.
   Choose McNemar as the Test Type
   Choose Wilcoxon to turn off the option
   Choose OK
                                                                                            61


Chi-square test and Fisher's Exact test for a 2 x 2 table. Suppose you have the following
summary table for oral contraceptive (OC) use by presence or absence of cancer (case or
control),
                                                     OC Use
                                                  No        Yes
                              Cases (cancer)      111        6
                              Controls            387        8

1. In a new (empty) SPSS Data Editor window enter the following 4
  rows of data:

  Case OCuse Observed
  1 0 111
  1 1    6
  0 0 387
  0 1    8

  The values of 0 and 1 used to indicate case/control and OC use (no/yes)
  are arbitrary. The variable names are also arbitrary (e.g., you can
  leave them as var0001, var0002, and var0003).

2. Next, you want to weight cases by Observed:
   Choose Data
   Choose Weight Cases...
   Choose Weight cases by
   Choose Observed and then the arrow button so the variable appears in the Frequency variable
   box.
   Choose OK

3. Now, run the Chi-square (\& Fisher's Exact) test
   Choose Analyze
   Choose Crosstabs
   Choose Case and OCuse as the row the column variables
   Choose Statistics...
   Choose Chi-square
   Choose Continue
   Choose OK
                                                                                                  62


The commands are similar for running the Chi-square test for tables larger than 2x 2. Suppose
you have the following summary table for education level by stage of disease at diagnosis


                                                    Stage of Disease
                       Education level          I         II     III or IV
                      High school or less      20        24          35
                      College                  37        32          23
                      College graduate         40        29          21


1. In a new (empty) SPSS Data Editor window enter the following 9
  rows of data:

   Educ Stage Observed
   1 1 20
   1 2 24
   1 3 35
   2 1 37
   2 2 32
   2 3 23
   3 1 40
   3 2 29
   3 3 21

The values used to indicate education level and stage are arbitrary, and the variable names are
also arbitrary.

Follow steps 2. and 3. on the previous page (except use variables Educ and Stage, instead of
Case and OCuse).
                                                                                                  63


Confidence Interval for a Proportion
To construct a confidence interval for a proportion or rate is rather awkward in SPSS, but you
can do it with the raw data or with summary data (as long as the sample size is large enough to
use the Normal approximation methods for binomial data).

To construct a confidence interval using the raw data you need 1) a binary indicator variable
equal to 1 if the variable is present for a subject and equal to 0 if the variable is absent for a
subject, and 2) a variable that is equal to 1 for all subjects. For example, suppose you want to
construct a confidence interval for the proportion of males in your data set. First you need a
binary indicator variable for males, e.g. you could have a variable named Gender which is equal
to 1 if the subject is a male and equal to 0 if the subject is a female. Second you need to create a
variable that is equal to 1 for all subjects (e.g., use the Compute statement and create a variable
Allones = 1). Now,

1.  Choose Analyze on the menu bar
2.  Choose Descriptive Statistics
3.  Choose Ratio...
4.  Numerator: Select the binary indicator variable from the source list on the left and then
    click on the arrow located in the middle of the window (e.g. select Gender)
5. Denominator: Select the variable equal to 1 for all subjects from the source list on the left
    and then click on the arrow located in the middle of the window (e.g. select Ones)
6. Choose Statistics...
7. Choose Mean under Central Tendency
8. Choose Confidence intervals (default is a 95% confidence interval)
9. Choose Continue
10. Choose OK

To illustrate how you would construct a confidence interval with summary data, suppose in a
data set of 3425 subjects, 1341 are males and 2084 are females:

1. In a new (empty) SPSS Data Editor window enter the following 2
rows of data:

     Gender Observed Allones
      0      2084     1
      1      1341     1

2. Next, you want to weight cases by Observed:
   Choose Data
   Choose Weight Cases...
   Choose Weight cases by
   Choose Observed and then the arrow button so the variable appears in the Frequency variable
   box.
   Choose OK
                                                                                                      64


3. Now,

   Choose Analyze on the menu bar
   Choose Descriptive Statistics
   Choose Ratio...
   Numerator: Select Gender
   Denominator: Select Allones
   Choose Statistics...
   Choose both Mean and Confidence intervals under Central Tendency
   Choose Continue
   Choose OK


Example of the SPSS output using the previous summary data.

Ratio Statistics
                   Ratio Statistics for Gender / Allones

 Mean                                                            .392        The observed
 95% Confidence Interval      Lower Bound                        .375        proportion was .392 or
 for Mean                     Upper Bound                                    39.2%.
                                                                 .408

 Price Related Differential                                    1.000
                                                                             A 95% confidence
 Coefficient of Dispersion                                          .
 Coefficient of Variation     Median Centered
                                                                             interval is 37.5% to
                                                                     .
The confidence intervals are constructed by assuming a Normal distribution
                                                                             40.8%.
for the ratios.
                                                                                                    65


Correlation & Regression

Pearson and Spearman Rank Correlation Coefficient
1. Choose Analyze on the menu bar
2. Choose Correlate
3. Choose Bivariate...
4. Variable(s): Select the variables from the source list on the left and then click on the arrow
   located in the middle of the window.
5. Choose Pearson or/and Spearman as the Correlation Coefficients. Note that the option is
   selected if the box has a check mark in it.
6. Choose Two-tailed as the Test of Significance. SPSS will perform the test testing if the
   correlation is equal to zero versus it is not equal to zero.
7. Choose OK
Note that you can use the Crosstabs command to calculate confidence intervals for the
correlation.
Example. Pain-related beliefs, catastrophizing, and coping have been shown to be
associated with measures of physical and psychosocial functioning among patients with
chronic musculoskeletal and rheumatologic pain. However, little is known about the
relative importance of these process variables in the functioning of patients with
temporomandibular disorders (TMD).

Correlation coefficients could be calculated to examine the association between
catastrophizing, depression (Beck Depression Inventory), pain-related activity
interference and jaw opening (maximum assisted opening).

 (Reference: JA Turner, SF Dworkin, L Mancl, KH Huggins, EL Truelove. “The roles of
beliefs, catastrophizing, and coping in the functioning of patients with temporomandibular
disorders.” Pain, 92, 41-51, 2001.


                                                                   Typically, you would only
                                                                   report either the Pearson or
                                                                   Spearman (rank) correlation
                                                                   coefficients, but you might
                                                                   calculate both to see if you
                                                                   get different results or
                                                                   conclusions.




The correlations are shown on the next page. Note that SPSS will display the correlation between
variable 1 and variable 2 and between variable 2 and variable 1, which are equivalent, and similarly
the correlations between all possible pairs of variables. So, all results displayed below the diagonal
of the matrix of results are redundant.
                                                                                                                                      66

Correlations
  1st entry = Pearson correlation coefficient
  2nd entry = Sig. (2-tailed) = p-value
  3rd entry = N = the number observations or subjects with non-missing data for both variables
                                                               Correlations
                                                                                                                               Correlation
                                                                          Beck
                                                                                             Interference
                                                                                                                Maximum        between
                                                       Catastroph       inventory                               assisted       Catastrophiz-
                                                          izing           score                                 opening
                                                                                                                               ing and
 Catastroph            Pearson Correlation                      1          .602(**)               .451(**)           -.029
                                                                                                                               Interference
  -izing               Sig. (2-tailed)                                           .000                .000             .758
                                                                                                                               = .45
                       N                                       118                118                 118             116
 Beck inventory        Pearson Correlation                .602(**)                  1             .445(**)           -.079
                                                                                                                               P-value =
  score                Sig. (2-tailed)                        .000                                   .000            .397
                                                                                                                               <.001
                       N                                       118                118                 118             116
 Interference          Pearson Correlation                .451(**)            .445(**)                  1            -.068
                                                                                                                               N = 118
                       Sig. (2-tailed)                        .000               .000                                .468
                       N
                                                                                                                               subjects
                                                               118                118                 118             116
 Maximum               Pearson Correlation                    -.029             -.079               -.068                  1
  assisted             Sig. (2-tailed)                        .758               .397                .468
  opening              N                                       116                116                 116             116
** Correlation is significant at the 0.01 level (2-tailed).

Nonparametric Correlations
     1st entry = Spearman rank correlation coefficient
     2nd entry = Sig. (2-tailed) = p-value
     3rd entry = N = the number observations or subjects with non-missing data for both variables

                                                               Correlations

                                                                                              Interference      Maximum
                                                                                 Beck                           assisted       Rank
                                                          Catastrophiz-        inventory                        opening
                                                              ing                score                                         correlation
 Spearman's       Catastrophiz-      Correlation
                                                                  1.000          .625(**)         .451(**)           -.013     between
 rho              ing                Coefficient
                                     Sig. (2-tailed)                      .         .000             .000             .892     Catastrophiz
                                     N                                118            118              118              116
                                                                                                                               -ing and
                  Beck inventory     Correlation                                                                     -.110     Interference
                                                                .625(**)           1.000          .455(**)
                  score              Coefficient                                                                               = .45
                                     Sig. (2-tailed)                  .000               .           .000             .241
                                     N                                118            118              118             116
                                                                                                                               P-value =
                  Interference       Correlation
                                     Coefficient
                                                                .451(**)         .455(**)           1.000            -.046     <.001
                                     Sig. (2-tailed)                  .000          .000                    .         .621
                                     N                                118            118              118             116      N = 118
                  Maximum
                  assisted
                                     Correlation
                                     Coefficient                  -.013            -.110             -.046           1.000
                                                                                                                               subjects
                  opening
                                     Sig. (2-tailed)                  .892          .241             .621                  .
                                     N                                116            116              116             116
** Correlation is significant at the 0.01 level (2-tailed).
                                                                                                                   67


Confidence Interval for a Correlation Coefficient
Typically the Crosstabs command is used to produce contingency tables for categorical
variables. One of the options under Statistics… is used to compute the correlation coefficient,
which would you might want to calculate for ordinal variables. However, you can also use this
option for quantitative variables.


                                                        The Crosstabs command is found by selecting
                                                        Analyze and then Descriptive Statistics.

                                                        In this example the correlation between the
                                                        quantitative variables catastrophizing and
                                                        interference will be calculated.

                                                        Select Statistics… and then select Correlations.

                                                        SPSS will produce a contingency table of the
                                                        cross-tabulation of the two variables which you
                                                        can ignore.

                                                        SPSS will display the correlation coefficient and
                                                        standard error estimate for the correlation
                                                        coefficient, which can be used to calculate
                                                        confidence intervals.


                                                Symmetric Measures


                                                                      Asymp. Std.
                                                         Value          Error(a)    Approx. T(b)   Approx. Sig.
 Interval by Interval    Pearson's R                          .451           .068         5.445          .000(c)
 Ordinal by Ordinal      Spearman Correlation                 .451           .076         5.449          .000(c)
 N of Valid Cases                                              118
a Not assuming the null hypothesis.
b Using the asymptotic standard error assuming the null hypothesis.
c Based on normal approximation.


An approximate 95% confidence interval for the correlation coefficient is given by

                             Correlation coefficient ± 1.96 x Asymp. Std Error

In this example, 95% confidence interval for the Pearson correlation coefficient is given
by .451 ± 1.96 x .068 or .31, .58

95% confidence interval for the Spearman rank correlation coefficient is given by .451 ±
1.96 x .076 or .30, .60
                                                                                                  68


Linear Regression
1.  Choose Analyze on the menu bar
2.  Choose Regression
3.  Choose Linear...
4.  Dependent: Select the dependent variable from the source list on the left and then click on
    the arrow next to the dependent variable box.
5. Independent(s): Select the independent variable and then click on the arrow next to the
    independent variable(s) box. Repeat the process until you have selected all the independent
    variables you want.
6. Choose Statistics...
7. Choose Estimates. SPSS will print the regression coefficient estimate, standard error, t
    statistic and p-value for each independent variable (as well as the intercept/constant). By
    default the option should be selected (i.e., the box has a check mark in it).
8. Choose Model fit. SPSS will print the multiple R, R squared, Adjusted R-squared, standard
    error of the regression line, and the ANOVA table. By default the option should be selected.
9. Choose Continue
10. Choose Enter as the Method. Enter is the default method for independent variable entry.
    Other methods of variable entry can be selected by clicking on the down arrow and clicking
    on the desired method of entry.
11. Choose OK

Additional options are available under Statistics..., Plots..., Save..., Method, and Options... For
example:

Statistics...
 Estimates. Default option, which prints the usual linear regression results.
 Model fit. Default option, which prints the usual linear regression results.
 Confidence intervals (for the regression coefficient estimates)
 Covariance matrix (and correlation matrix for the regression coefficient estimates).
 R squared change. If independent variables are entered in Blocks (using the Block option;
   see below), this option computes the change in the R squared between models with different
   blocks of independent variables. It is also useful for computing a partial F test for a
   categorical variable with more than two categories by entering the indicator variables for the
   categorical variable in the second block (Block 2 of 2) and all other independent variables in
   the first block (Block 1 of 2) and using the R squared change option.
 Part and Partial Correlations. This option computes the Pearson correlation coefficient
   between the dependent variable and each independent variable (Zero-order correlation) and
   the correlation coefficient between the dependent variable and an independent variables after
   controlling for all the other independent variables in the regression model (Partial correlation).
   Squaring the partial correlation gives you the partial R-squared for an independent variable.
   This option also computes a Part correlation, which is the correlation between the dependent
   variable and an independent after (only) the independent variable has been adjusted for all the
   other independent variables in the regression model. The square of the Part correlation is
   equal to the change in the R-squared when an independent is added to the regression model
   with all the other independent variables.
                                                                                              69


 (Multi-)Collinearity diagnostics. This option computes various statistics for detecting
  collinearity between the independent variables. For example, Tolerance is the proportion of a
  variable's variance not accounted for by other independent variables in the equation. A
  variable with a very low tolerance contributes little information to a model, and can cause
  computational problems. Another statistic is the VIF (variance inflation factor). Large values
  are an indicator of multicollinearity between independent variables.

Plots... which are useful for doing regression diagnostics:
 Histogram or Normal Probability Plot (P-P plot) (of the standardized residuals).
 Produce all partial (residual) plots
 Other scatter plots

Save... which produced variables which are useful for doing regression diagnostics:
 Predicted Values (unstandardized, standardized, adjusted)
 Residuals (unstandardized, standardized, studentized, delete)
 Distances (Mahalanobis, Cook's, Leverage)
 Influence Statistics (dfBeta, dfFit)

Note that SPSS creates a new variable for each selected Save... option and adds the new
variables to the data file. The variable names are defined in the Variable View of the Data
Editor. Once you are done using these variables you may want to delete them from the data file
or save them (by re-saving the data file).

Method. Click on the down arrow to the right of Method to display the methods available for
independent variable entry (enter, stepwise, remove, backward, forward). Enter is the default
option. The other options you enter independent variables into the model using various stepwise
methods.

Options...
 You can modify the entry and removal criteria used by stepwise, remove, backward, and
  forward independent variable entry methods.
 You can define how observations with missing data are handled.

Previous, Block \# of \#, Next
 You can use these options to enter independent variables in blocks into the regression model.
 You can select different methods of variable entry for each block. This option is also useful
  for computing partial F tests with the R squared change option.
                                                                                          70


Example. Simple linear regression of forced expiratory volume (volume, 1 second) on
height (cm).


                                                      The dependent variable
                                                      in this example is
                                                      forced expiratory
                                                      volumne (fev1).

                                                      There is only 1
                                                      independent variable in
                                                      this example, height.

                                                      Additional options can
                                                      be found under
                                                      Statistics, Plots, Save,
                                                      & Options.




                                                    Here are the Statistics… options

                                                    Usually you want the default options
                                                    Estimates and Model fit selected.

                                                    In this example, (95%) confidence interval
                                                    for the regression coefficients is also
                                                    selected.




                                                    Here are the Plots… options

                                                    By default no options are selected.

                                                    In this example, the normal probability
                                                    plot of the residuals is requested.
                                                                                                                71


Regression
                                                                      Information on the independent
           Variables Entered/Removed(b)                               variables and dependent variable in the
                                                                      regression model, and the method of
            Variables     Variables                                   entering the independent variables into
 Model      Entered       Removed               Method
 1        height(a)                   .    Enter
                                                                      the regression model.
a All requested variables entered.
b Dependent Variable: fev1
                                                                             R-Square = proportion of the total
                                                                             variation in the dependent variable
                                                                             explained by the independent
                                                                             variable(s) = .315 or 31.5%

                                                                             R is square root of R Square
                         Model Summary(b)
                                                                             Adjusted R Square – “adjusts” the
                                          Adjusted R        Std. Error of    R square for the number of
 Model          R        R Square          Square           the Estimate
 1                             .315
                                                                             variables in the model
               .562(a)                              .314          .55337
a Predictors: (Constant), height
b Dependent Variable: fev1                                                   Std. error of the estimate =
                                                                             standard deviation of the error or
                                                                             residuals. Not usually reported, but
                                                                             used in estimating the standard
                                                                             error of the regression
                                                                             coefficients.



                                                                                       ANOVA = analysis of
                                     ANOVA(b)                                          variance table. Not
                                                                                       needed when there is
                          Sum of                      Mean                             only 1 independent
 Model                    Squares          df        Square         F        Sig.
 1        Regression
                                                                                       variable in the model.
                          112.380               1    112.380      366.997   .000(a)
          Residual        244.054          797             .306
                                                                                       The F test is
          Total            356.434         798                                         equivalent to the t test
a Predictors: (Constant), height                                                       for testing if the slope
b Dependent Variable: fev1                                                             is equal to zero in the
                                                                                       output that follows. (F
                                                                                       = t2)
                                                                                                                                             72

                                                                            Coefficients(a)

                                              Unstandardized         Standardized
  Model                                        Coefficients          Coefficients             t       Sig.   95% Confidence Interval for B
                                                         Std.
                                               B        Error               Beta                             Lower Bound     Upper Bound
  1       (Constant)                         -4.330          .335                           -12.943   .000          -4.987           -3.673
          height                              .039           .002           .562             19.157   .000            .035             .043
 a Dependent Variable: fev1


 Unstandardized coefficients B = regression coefficient

 In this example B = 0.039 is the slope and B = -4.330 the intercept

 Std. Error = standard error of the regression coefficient.

 Standardized coefficients Beta = standardized regression coefficient

 t = t statistic for testing if the regression coefficient is equal to zero (versus not equal
 to zero)

 Sig. = p – value for testing if the regression coefficient is equal to zero (versus not
 equal to zero).

 95% confidence interval for B = 95% confidence interval for the regression coefficient


In this example, you would report the slope (.039), standard error of the slope (.002)
and the p-value (<.001), or the slope (.039) and 95% confidence interval (.035 to 0.043).


 Charts

              Normal P-P Plot of Regression Standardized Residual



                                                   Dependent Variable: fev1
                                                                                                  Normal probability plot of
                                       1.0
                                                                                                  the residuals. The points
                                                                                                  fall along a straight line,
                                       0.8
                                                                                                  indicating the residuals
                   Expected Cum Prob




                                       0.6
                                                                                                  have, at least
                                                                                                  approximately, a Normal
                                       0.4
                                                                                                  distribution.

                                       0.2




                                       0.0
                                             0.0      0.2      0.4    0.6      0.8    1.0

                                                            Observed Cum Prob
                                                                                     73


Linear Regression Example with three independent variables


                                                   The dependent variable is
                                                   forced expiratory volume
                                                   (fev1).

                                                   The independent variables are
                                                   height, age and enter.

                                                   The Enter method means all 3
                                                   independent variables will be
                                                   included in the regression
                                                   model.




                                         Statistics… options

                                         By default, Estimates and Model fit are
                                         selected.

                                         In this example, part and partial
                                         correlations and collinearity diagnostics
                                         are also selected.




                                         Plots… options

                                         Normal probability plot (of the
                                         standardized residuals) and partial
                                         (residual) plots are selected.
                                                                                                                               74


Regression
              Variables Entered/Removed(b)

              Variables    Variables                                  Information on the independent
 Model        Entered      Removed          Method
                                                                      variables, method of variable entry, and
 1        gender,                          Enter
          age,                         .                              dependent variable.
          height(a)
a All requested variables entered.
b Dependent Variable: fev1                                                             R-square is .361 or 36.1%
                          Model Summary(b)
                                                                                       (adjusted R-square is 35.8%).
                                                                                       About 36% of the variation in
                                        Adjusted R          Std. Error of              the dependent variables can be
 Model           R        R Square       Square             the Estimate               explained by the 3 independent
 1             .601(a)        .361            .358                 .53531
                                                                                       variables.
a Predictors: (Constant), gender, age, height
b Dependent Variable: fev1

                                                  ANOVA(b)
                                                                                                     The overall F test,
                            Sum of                        Mean                                       indicates 1 or more the
 Model                      Squares          df          Square       F           Sig.               independent variables is
 1        Regression         128.623              3       42.874    149.621      .000(a)
                                                                                                     significant (P < .001).
          Residual            227.811        795            .287
          Total
                                                                                                     Degrees of freedom of
                             356.434      798
a Predictors: (Constant), gender, age, height
                                                                                                     the F test are 3 and 795.
b Dependent Variable: fev1

                                                          Coefficients(a)

                  Unstandardized     Standardized                                                               Collinearity
                   Coefficients      Coefficients           t       Sig.              Correlations               Statistics
                            Std.                                              Zero-
                   B       Error           Beta                               order     Partial      Part    Tolerance     VIF
 (Constant)       -.780      .593                         -1.315     .189
 height            .028      .003                 .399     9.143     .000      .562        .308       .259        .423    2.364
 age              -.025      .004             -.200       -6.857     .000     -.206       -.236      -.194        .944    1.059
 gender            .273      .059                 .201     4.591     .000      .478        .161       .130        .420    2.379
a Dependent Variable: fev1

Height, age, and gender are all statistically significant (P < .001), i.e., the regression
coefficients are different from zero.

The partial correlations (and partial R-squares, .3082=.095, -.2362 =.056, and .1612=.026)
indicate the correlation with the dependent variable adjusted for the other variables in
the regression model.

A low tolerance value (say, <.20) or a high variance inflation factor (VIF) (say, > 5 or 10)
may indicate a multicollinearity problem.
                                                                                                                                                                          75



           Normal P-P Plot of Regression Standardized Residual



                                                Dependent Variable: fev1
                                                                                       Normal probability plot of the
                                    1.0                                                residuals. The points fall
                                                                                       approximately along a straight
                                    0.8
                                                                                       line, indicating the residuals have
                Expected Cum Prob




                                                                                       (approximately) a Normal
                                                                                       distribution.
                                    0.6




                                    0.4



                                    0.2




                                    0.0
                                          0.0      0.2      0.4    0.6     0.8   1.0
                                                                                                                              Partial Regression Plot
                                                         Observed Cum Prob
        Partial regression plots for
        height and age with lowess                                                                                           Dependent Variable: fev1


        smooths.                                                                                 2.00




        The plot for height is
        assessing the relationship
                                                                                                 0.00
                                                                                          fev1




        between height and fev1 after
        adjusting for age and gender                                                             -2.00

        (e.g., is the relationship
        linear).
                                                                                                         -30.00     -20.00       -10.00     0.00
                                                                                                                              Partial Regression Plot         10.00       20.00     30.00

                                                                                                                                              height

        Similarly, the plot for age is                                                                                       Dependent Variable: fev1
        assessing the relationship                                                               2.00

        between age and fev1
        adjusting for height and
        gender.                                                                                  0.00
                                                                                          fev1




                                                                                                 -2.00




Note that SPSS will also produce a partial residual plot for gender. In general, the partial             -15.00   -10.00      -5.00    0.00            5.00       10.00     15.00   20.00


residuals plots for categorical/nominal variables are not very useful. Boxplots of the
                                                                              age


residuals for each category of a categorical/nominal variable are useful for regression
diagnostics. To produce the boxplots you could use the Save… options to save the
residuals from a regression and then the Boxplot commands to plot the residuals.
                                                                                                   76


Linear Regression via ANOVA Commands
It is possible to use the analysis variance commands of SPSS to perform a linear regression
analysis, because the methods are mathematically equivalent. Performing a linear regression
analysis via analysis of variance in SPSS is more complicated than using the linear regression
commands. However, the advantage of using the analysis of variance commands to perform a
linear regression is that you do not have to create indicator variables for categorical variables or
create interaction terms. To perform a linear regression via analysis of variance commands

1.  Choose Analyze on the menu bar
2.  Choose General Linear Model
3.  Choose Univariate...
4.  Dependent: Select the dependent variable from the source list on the left and then click on
    the arrow next to the dependent variable box.
5. Fixed Factor(s): Select the independent variables that are categorical/qualitative and then
    click on the arrow next to the fixed factor(s) box. Repeat the process until you have selected
    all the categorical variables you want.
6. Covariate(s): Select the independent variables that are continuous/quantitative and then click
    on the arrow next to the covariate(s) box. Repeat the process until you have selected all the
    continuous variables you want.
7. Choose Model...
8. Choose Custom
9. Factors & Covariates: Select/highlight all the variables, then under Build Terms select
    Main Effects. You may need to click on the down arrow to display the Main Effects option.
    After you have selected Main Effects, select the arrow under the Build Terms. All the
    variables should now appear in the Model box on the right hand side.
10. Choose Continue
11. Choose Options...
12. Choose Parameter Estimates under Display
13. Choose Continue
14. Choose OK

For categorical variables the last category (i.e., the category with the largest numeric coding
value) will be the referent group/category. SPSS will compute the F test for each continuous
independent variable and for categorical independent variable. By selecting to have the
parameter estimates displayed, SPSS will also compute the regression coefficient estimates,
standard errors, t (statistic) values, p-values, and 95% confidence intervals that you get from the
linear regression commands.

To include interaction terms in the regression model, in Step 9 highlight two variables you want
to create an (two-way) interaction term. Under Build Terms select Interaction, and then select the
arrow under the Build Terms. A two-way interaction between two variables (variable 1 *
variable 2) should now appear in the Model box on the right hand side.
                                                                                           77


Example. Linear regression of forced expiratory volume on height (continuous variable)
and diabetes status (categorical variables; normal, impaired fasting glucose, diabetic).


                                                             Forced expiratory volume
                                                             (fev1) is the dependent
                                                             variable.

                                                             Diabetes is a categorical
                                                             variables with the 3
                                                             categories

                                                             Height is a continuous
                                                             variable




                                                             Under Model…, select
                                                             Custom, then select each
                                                             of the variables separately
                                                             until they all appear under
                                                             Model: or select Main
                                                             Effects under Build
                                                             Terms(s), select all
                                                             Factors & Covariates, and
                                                             then select the arrow
                                                             under Build Term(s).



                                                             Under Options…, select
                                                             Parameter estimates to
                                                             have usual linear
                                                             regression results
                                                             displayed in the output.
                                                                                                                        78


Univariate Analysis of Variance
             Between-Subjects Factors

                               Tests of Between-Subjects Effects

Dependent Variable: fev1
                     Type III Sum                             Mean
 Source               of Squares             df              Square          F           Sig.
 Corrected Model        114.617(a)                3             38.206     125.606         .000          The overall test for
 Intercept                    51.195              1             51.195     168.308        .000           the significant of
 diabetes                      2.237              2              1.118       3.677        .026
                                                                                                         diabetes is
 height                     111.378               1           111.378      366.168        .000
                                                                                                         displayed (p-value =
 Error                      241.817           795                 .304
 Total
                                                                                                         0.026)
                           3773.779           799
 Corrected Total          356.434      798
a R Squared = .322 (Adjusted R Squared = .319)

                                                  Parameter Estimates

Dependent Variable: fev1                                                                                 This table displays
                                 Std.
                                                                                                         the usual linear
 Parameter             B         Error             t            Sig.       95% Confidence Interval
                                                                             Lower       Upper           regression results.
                                                                             Bound      Bound            In this example
 Intercept           -4.392        .337       -13.025             .000         -5.054        -3.730
                                                                                                         diabetes = 3
 [diabetes=1.00]       .126        .049           2.549           .011           .029             .223
                                                                                                         (diabetic) is the
 [diabetes=2.00]       .046        .056               .830        .407          -.063             .156
 [diabetes=3.00]       0(a)              .               .             .             .               .
                                                                                                         reference group.
 height                .039        .002      19.136      .000                    .035             .043
a This parameter is set to zero because it is redundant.
                                                                                                                79


Example. Adding an interaction between diabetes status and height in the regression
model




                                                                                           To add an interaction
                                                                                           between two variables,
                                                                                           select the Build Term(s)
                                                                                           to show Interaction,
                                                                                           select two variables
                                                                                           under Factors &
                                                                                           Covariates and then
                                                                                           select the arrow under
                                                                                           Build Term(s)



Univariate Analysis of Variance
                                 Tests of Between-Subjects Effects

Dependent Variable: fev1
                      Type III Sum                      Mean
 Source                of Squares             df       Square           F       Sig.
 Corrected Model            114.946(a)             5    22.989      75.492      .000
 Intercept                     42.741              1    42.741     140.354      .000
 diabetes                        .272              2        .136         .447   .639       This table displays the
 height                        94.349              1    94.349     309.823      .000       significant of the
 diabetes * height               .328              2      .164        .539      .583       diabetes status by
 Error                        241.488          793          .305                           height interaction (p-
 Total                       3773.779          799                                         value = 0.58).
 Corrected Total          356.434      798
a R Squared = .322 (Adjusted R Squared = .318)

                                                       Parameter Estimates
Dependent Variable: fev1
 Parameter                         B           Std. Error          t            Sig.
                                                                                           This table displays the
 Intercept
                                                                                           usual linear regression
                                   -4.373              .673        -6.498          .000
 [diabetes=1.00]                    -.168              .818            -.206       .837
                                                                                           results which includes
 [diabetes=2.00]                       .614            .963            .637        .524    the results for diabetes
 [diabetes=3.00]                       0(a)                 .               .          .   status, height and the
 height                                .039            .004        9.506           .000    interaction between
 [diabetes=1.00] * height              .002            .005            .361        .719    diabetes status and
 [diabetes=2.00] * height           -.003              .006            -.593       .553    height.
 [diabetes=3.00] * height              0(a)                 .               .          .
a This parameter is set to zero because it is redundant.
                                                                                                 80


Logistic Regression
1. Choose Analyze on the menu bar
2. Choose Regression
3. Choose Binary Logistic...
4. Dependent: Select the dependent variable from the source list on the left and then click on
   the arrow next to the dependent variable box.
5. Covariate(s): Select the independent variable and then click on the arrow next to the
   Covariate(s) box. Repeat the process until you have selected all the independent variables
   you want.
6. Choose Enter as the Method. Enter is the default method for independent variable entry.
   Other methods of variable entry can be selected by clicking on the down arrow and clicking
   on the desired method of entry.
7. Choose OK

Additional options are available under >a*>b, Categorical..., Save..., Method, or Options... .
For example:

>a*>b (for adding two-way interactions) You can add an interaction between two independent
variables to the regression model by selecting two variables from the source list on the left (hold
down the Ctrl key while selecting the two variables) and then clicking on >a*>b (after you
highlight two variables from the source list on the left the >a*>b should be available to select).

Categorical... You can use the categorical option to have SPSS create indicator or dummy
variables for categorical variables.
1. Choose Categorical
2. Categorical Covariates: Select a covariate that is categorical and then click on the arrow next
    to the Covariates box.
3. Choose Indicator as the Contrast: Indicator is the default method for creating indicator
    variables. Other methods can be selected by clicking on the down arrow and clicking on the
    desired method.
4. Choose the reference category as the last category (i.e., the category with the largest numeric
    coding value) or the first the category (i.e., category with the smallest numeric coding value).
5. Choose Change.
6. Repeat steps 2 through 5 until you have defined all categorical variables.
7. Choose Continue.

Save...
 Predicted Values (Probabilities and Group Membership). This options creates new variables
  that are the predicted probabilities and the predicted group membership. The predicted group
  membership (0 or 1) is based on the whether the predicted probability is less than (group
  membership=0) or greater than or equal to (group membership=1) the classification cutoff. By
  default the classification cutoff value is 0.5. You can change the cutoff value using Options...
 Residuals (Unstandardized, Logit, Studentized, Standardized, Deviance)
 Influence (Cook's, leverage, dfBeta)
                                                                                             81


Note that SPSS creates a new variable for each selected Save... option and adds the new
variables to the data file. The variable names are defined in the Viewer window. Once you are
done using these variables you may want to delete them from the data file or save them (be re-
saving the data file).

Method… Click on the down arrow to the right of Method to display the methods available for
independent variable entry (enter, forward:conditional, forward:LR, forward:Wald,
backward:conditional, backward:LR, backward:Wald).

Options...
 Confidence interval for odds ratio (CI for exp(B))
 Hosmer-Lemeshow goodness-of-fit
 You can modify the entry and removal criteria used by the backward and forward variable
  entry methods.

Previous, Block # of #, Next You can use these options to enter independent variables in blocks
into the regression model. You can select different methods of variable entry for each block.


Example. Logistic regression will be used to determine the relationship between any use
of health services (coded 0 = no use, 1 = any use) and age, health index, gender and race.
Subjects in the study (Model Cities Data Set) were followed for a varying amount of time,
so the number of months followed (expos) will also be included as an independent variable
in the logistic regression model.

                                                            The dependent variable,
                                                            anyuse, is binary.



                                                            There are 5 independent
                                                            variables. Female and Race
                                                            are categorical/nominal
                                                            variables.
                                                                                                              82

                                                                     You can use the Categorical option to
                                                                     define which variables are categorical
                                                                     and SPSS will create the indicator
                                                                     variables.

                                                                     By default the category with the
                                                                     largest numerical value (last) will be
                                                                     the reference group. Here, the
                                                                     category with the smallest numerical
                                                                     value was selected as the reference
                                                                     group.

                                                                     Under Options you can select to have
                                                                     the 95% confidence intervals for the
                                                                     odds ratios displayed in the output.

                                                                     Also, you can run the Hosmer-
                                                                     Lemeshow goodness-of-fit test.


Logistic Regression
                         Case Processing Summary
                                                                                    Information on the
 Unweighted Cases(a)                                    N          Percent          number of observations
 Selected Cases      Included in Analysis                3199          73.1         used in the logistic
                         Missing Cases                    1175           26.9       regression. Subjects with
                         Total                            4374          100.0       missing data are excluded.
 Unselected Cases                                             0             .0
 Total                                                    4374          100.0
a If weight is in effect, see classification table for the total number of cases.

   Dependent Variable Encoding                  SPSS will always recode the dependent variable to a 0
                                                or 1 binary variable (internal value), and will estimate
 Original Value      Internal Value             the odds ratio for the event coded as 1 (vs the event
 .00                               0            coded as 0). If your dependent variable is not coded 0
 1.00                              1
                                                or 1, check this table to determine the interpretation of
                                                the odds ratios.
                                                                                                                83

Categorical Variables Codings

                                            Parameter coding                   This table gives the definition of the
                      Frequency             (1)            (2)                 indicator variables. E.g.,
 race      white            497                .000           .000               race(1) = other
           other                455          1.000               .000            race(2) = black
           black               2247              .000       1.000                (race = white, is the reference group)
 female    male                1450              .000
           female              1749          1.000
                                                                                female(1) = female
                                                                               (male is the reference group)
Caution! – Make sure you understand the interpretation of the indicator variables that
SPSS creates. It is very easy to get confused. For example, in this example the variable
race is coded 1=white, 2=other, 3=black. A common mistake would be to interpret race(1) =
white and race(2) = other.



Block 0: Beginning Block                                 Ignore all the output under Block 0. The output
                                                         displays information for the logistic regression
                                                         model with no independent variables in the model.
Block 1: Method = Enter
            Omnibus Tests of Model Coefficients
                                                                        Unless you are using stepwise
                     Chi-square             df            Sig.
 Step 1   Step
                                                                        methods to enter variables or
                           301.534                 6         .000
          Block            301.534                 6         .000
                                                                        entering variables in different
          Model            301.534                 6         .000       blocks you can ignore this output.

                            Model Summary
                                                                         “R-square” measures for logistic
              -2 Log        Cox & Snell           Nagelkerke R           regression – usually not very
 Step       likelihood       R Square               Square               useful.
 1         2609.415(a)             .090                     .151
a Estimation terminated at iteration number 5 because parameter estimates changed by less than .001.

                                                                                   Ignore this table also. It is
                          Classification Table(a)
                                                                                   describing how the logistic
                                                      Predicted                    regression predicts any use
                                                 anyuse         percent
          Observed                    .00             1.00      correct            if a predicted probability >
Step 1
          anyuse
                            .00                    0       542          .0         0.5 is to used to indicate
                            1.00                   0     2657        100.0
          Overall
                                                                                   any use. All subjects are
                                                                        83.1
          percentage                                                               predicted to have use.
a The cut value is .500
                                                                                                                                      84

              Hosmer and Lemeshow Test
                                                                                     Hosmer-Lemeshow goodness-of-fit
 Step      Chi-square                df                  Sig.                        statistic is formed by grouping the
 1                 8.368                     8              .398                     data into g groups (usually
           Contingency Table for Hosmer and Lemeshow Test
                                                                                                          g=10) based on the
                      anyuse = .00                        anyuse = 1.00                   Total           percentiles of the
                   Observed         Expected        Observed        Expected         Observed             estimated probabilities
 Step 1   1             124          123.653               197           197.347             321          and calculating the
          2             101            97.310              218           221.690             319          Pearson chi-square
                                                                                                          statistic from the 2 x g
          3                79          81.589              241           238.411             320
          4                73          67.769              248           253.231             321
                                                                                                          table of observed and
          5
                                                                                                          estimated expected
                           57          54.600              263           265.400             320
          6                33          41.820              287           278.180             320
          7                32          29.724              288           290.276             320
                                                                                                          frequencies. A small p-
          8                16          21.258              304           298.742             320
                                                                                                          value indicates a lack of
          9                13          15.538              307           304.462             320          fit. Large differences
          10               14             8.740            304           309.260             318          between the observed

and expected values can be used to help identify where there is lack-of-fit when present.


The last table of the output usually has the results we are most interested in. It lists the
odds ratios, p-values and 95% confidence intervals for the odds ratios.
                                                                        Variables in the Equation

                                B           S.E.            Wald         df    Sig.          Exp(B)       95.0% C.I.for EXP(B)
                                                                                                          Lower       Upper
 Step     expos
                                .077              .006     167.398        1        .000           1.080     1.068        1.093
 1(a)
          age                   .009              .003          8.118     1        .004           1.009     1.003        1.016
          female(1)             .501              .099      25.363        1        .000           1.650     1.358        2.005
          race                                              12.715        2        .002
          race(1)           -.424                 .190          4.964     1        .026            .655      .451         .950
          race(2)           -.530                 .149      12.689        1        .000            .588      .440         .788
          health                .048              .010      23.603        1        .000           1.049     1.029        1.070
          Constant          -.337                 .196          2.958     1        .085            .714
a Variable(s) entered on step 1: expos, age, female, race, health.


Exp(B) = Odds Ratio

95.0% C.I. for EXP(B) = 95% confidence interval for the odds ratio

Sig. = P-value for the individual odds ratio or the overall significant of a
      categorical/nominal variable if there is no Exp(B) listed.
                                                                                           85


B = the logistic regression coefficient, the log odds ratio

S.E. = the standard error the of the logistic regression coefficient

Wald = the Wald test statistic for testing if B=0 (or equivalently odds ratio = 1)
       or if all B’s = 0 for a categorical variable with >2 indicator variables.

d.f. = degrees of freedom of the test statistic.


It is often helpful to write on your output the definition of the indicator variables, so you
don’t get confused about the interpretation of the results. Also, helpful to change Exp(B)
to odds ratio, and sig. to P-value.

                                     95.0% C.I.for
                           Odds       odds ratio
                           Ratio    Lower      Upper        P-value
Step    expos
                            1.080     1.068     1.093        .000
1(a)
        age                 1.009     1.003     1.016        .004
        female (vs male)    1.650     1.358     2.005        .000
        race                                                 .002
        other vs white       .655      .451          .950    .026
        black vs white       .588      .440          .788    .000
        health              1.049     1.029     1.070        .000

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:24
posted:11/16/2011
language:English
pages:87