SMARTPsych SPSS Tutorial SPSS (Statistical Package for the Social Sciences) Version 9.0 Introduction I. What is SPSS? A. Data analysis software for PCs and for Mac. 1. Other similar programs are Microsoft Excel (a spreadsheet program), Microsoft Access (a data storage program), and other statistical software packages (e.g., SAS) 2. Can be used for data storage, as well as for data analysis B. Advantages of SPSS over other programs 1. A multitude of statistical functions; (almost) all you'll ever need 2. Syntax (the command language of SPSS) makes reproducing your work simpler 3. Menu-system (typical Windows pull-down menus) for executing commands makes the software easier to use II. There are three types of files that are created in SPSS. A. System Files (data files)--these files have a .sav extension these files contain the data that you are using Variables Data The screen above is an example of a system (.sav) file. This file, tutorial1.sav, is a fictional data set that contains five variables: 1. id--gives an identity to each row of data. 2. gender--a demographic variable (1 stands for 'male' and 2 for 'female' here) 3. cond--the independent variable in this study--there are four levels of this variable; let's assume that these four levels represent different conditions of "stress" that have been manipulated by the experimenter (for example, 1 may be under 'normal' conditions, 2 may be after telling the subject that he or she is performing poorly, etc.). 4. rxntime--a dependent variable--the reaction time of each subject 5. rating--another dependent variable--the subject's subjective rating of his or her performance (on a scale of 1-7, 1 is 'very poor', 7 is 'excellent') -- Note that, in contrast to Microsoft Excel, the variable names are not in the columns of data, but instead are kept separate from the data. -- You can make changes to your data by clicking on the cell that you want to change, and then typing a new value into that cell. Some variables allow only numeric values, whereas others may allow text and are called string variables. You will learn how to assign these characteristics to different variables shortly. B. Output Files (what is generated)--in SPSS v. 9.0, these have a .spo extension These files contain all of the output that is generated when you run an analysis in SPSS (e.g., tables, descriptive statistics, t-test output) Descriptive Statistics Histogram The screen above is an example of an output (.spo) file. -- Note that the screen is divided into two parts: 1. The outline section on the left helps you to navigate through your output file. You can "open" or "close" any section of your output by clicking on the heading for each output section (for example, double-clicking on the "Descriptives" heading would "close" these statistics from view) 2. The viewing section is where all of your output is visible. Here, there are two items in the output: a table of descriptive statistics, and a histogram of the dependent variable reaction time (rxntime) C. Syntax Files (command language)--these files have a .sps extension Using the syntax file means that you are directly controlling a set of commands that the SPSS program will run on your data. You can also control commands through the menus. Clicking on this button when the syntax you want is highlighted (by clicking and dragging your cursor over the text), you will "run" the commands that are selected. Here is the syntax, or command language, that produced the output on the previous page; the descriptive statistics and the histogram. Don't worry about the mechanics of writing this language at this point. This syntax is for running further analyses; here, the syntax would correlate the 'rating' and 'rxntime' variables, then would run a oneway analysis of variance on the dependent variable 'rxntime' to see if the four conditions in the experiment differ with respect to this variable For now, we won't work much with syntax, but know that by saving the commands that you run in SPSS (by saving a syntax file), you can re-create all of the data analysis you have done simply by re-running your syntax. III. Why learn SPSS? A. Comprehensively used in the sciences B. Another way of making your life as a researcher easier (if not slightly more difficult while you are learning) SMARTPsych SPSS Tutorial t-tests in SPSS To demonstrate SPSS’s one sample t-test command, we have the data set to the left. These data were collected in an experiment in which students (10 groups of 5 individuals) solved a number of complicated anagrams in groups. One person in each group was randomly assigned anagrams that were far more difficult than those that the others received. Participants were asked to allocate a $10 prize among the members of the group in any manner that they wished. The dependent variable is the amount of money allocated to the “least helpful” partner in each of the groups (almost always, the person who had the most difficult anagrams to solve). The null hypothesis is that all members of the group would receive equal allocation of the $10 prize. Thus, the null and alternative hypotheses are: Ho: μ = 2 (they allocate the $10 equally, so each person gets $2) H1: μ ≠ 2 We want to subject these data to a one sample t-test of the null hypothesis above. Use a two-tailed test, α = .05. Click here for the data set (you must save it to disk in order to access the file). Once you have the data, it is fairly simple to compute a one-sample t-test in SPSS. Choose, from the menu, ANALYZE / COMPARE MEANS / ONE SAMPLE T TEST You will get the dialog box that appears below: Your test variable is amount; click this over into the test variable column. What is your test value? It is the value of the mean under the null, in this case, 2. Enter 2 into the test value box. You may want to check out the “options” box to see what is available to you. The box is below. Note that will automatically get the SPSS default 95% confidence interval for the data. The missing values command tells SPSS how to deal with missing data; the SPSS default is fine for most purposes (this will become more critical when you are dealing with more variables and have missing data in your set). Click OK once you have set up your dialog boxes correctly, and SPSS will run the test. The output that you will receive is below: One -Sam ple Statistics Std. Error N Mean Std. Dev iation Mean AMOUNT 10 1.1770 .9499 .3004 One -Sam ple Tes t Test Value = 2 95% Conf idence Interval of the Mean Dif f erence t df Sig. (2-tailed) Dif f erence Low er Upper AMOUNT -2.740 9 .023 -.8230 -1.5026 -.1434 Take a moment to see what you have here. Of course, your critical information is the t value (-2.74), your degrees of freedom (9), and the significance of your test (.023; notice that you can get exact p-values in SPSS). You also get your sample mean, the standard deviation (this is the estimated standard deviation), and the estimated standard error of the mean. You could compute the t-test yourself with this information: t = 1.177 – 2 = -2.74 .9499 / √10 Also note that your 95% confidence interval of the difference ranges from – 1.50 to -.14. It’s strange that SPSS computes the confidence intervals this way, but you can easily get the confidence intervals around the mean as follows: If you were to compute the confidence intervals by hand, you would calculate Mean +/– tcrit (α=.05, 2 tailed) * σ hat / √N Or 1.177 +/– 2.262 * .9499 / √10, the confidence interval is .4975 to 1.8565 Because you use +/- .6795 to give the bounds of your interval. Using these values around the mean difference (-.8230) you get the SPSS values; using these values around the mean (1.177) you get .4975 to 1.8565. These are the values we would use. What would a confidence interval look like if you were using a lower alpha level (i.e., having a greater % confidence)? Let’s try using a 99% confidence interval. Run the t-test as before, but select in the options that you want a 99% c.i.. One -Sam ple Tes t Test Value = 2 99% Conf idence Interval of the Mean Dif f erence t df Sig. (2-tailed) Dif f erence Low er Upper AMOUNT -2.740 9 .023 -.8230 -1.7993 .1533 You will get the table above. Notice that your confidence interval is now larger (this makes intuitive sense; to have a greater degree of confidence, you would need a larger range of possible values). This time, the confidence interval of the difference contains 0 (this is essentially the same as stating that the confidence interval around the mean contains 2 (c.i. around the mean would be .201 to 2.153). In other words, if you use an α level of .01, you will fail to reject the null hypothesis, and your confidence interval would contain the null value. You can see that your exact p-value is .023, too high to reject the null if α is .01. STATISTICAL SIGNIFICANCE OF R Enter data for two variables (each row is an individual): Once you have the data, choose from the menu, ANALYZE / CORRELATE / BIVARIATE You will get the dialog box that appears below: Select both test variables (by holding CTRL key) and move to Variables column. Make sure that Pearson and Two-tailed are selected. Click OK once you have set up your dialog boxes correctly, and SPSS will run the test. The output that you will receive is below: Correlations test1 test2 test1 Pearson 1 .780(*) Correlation Sig. (2-tailed) . .013 N 9 9 test2 Pearson .780(*) 1 Correlation Sig. (2-tailed) .013 . N 9 9 * Correlation is significant at the 0.05 level (2-tailed). Take The key elements of the output are the Pearson correlation (r = .78) and the significance value of r (.013). Is the correlation significant? What can we conclude?