Docstoc

14.2.5 SPSS Correlation and regression analysis

Document Sample
14.2.5 SPSS Correlation and regression analysis Powered By Docstoc
					      14.2.5             Correlation and Regression Analysis with SPSS
      During this tutorial you will learn how to use SPSS to investigate the association
      between two continuous variables and how to describe it graphically

2.5.1 In this practical session you will analyse some bivariate data.
      To enter the data:
      File / Open worksheet / Merlin4 (saved in the ANOVA Worksheet, 14.2.4)

      In order to see what this file contains:
      Analyse / Descriptive Statistics / Descriptives Select all the variables but not the
      filters.
                                    Descriptiv e Statistics

                               N        Minimum    Maximum    Mean     Std. Deviation
        Ages of Employees          60         17         64    37.75         11.037
        Sex of Employee            60          1          2     1.50            .504
        Job category               60          1          4     2.27          1.039
        Valid N (listwise)         60



      We have three variables. Each is measured on 60 cases There are no missing values.

      We give each of the employees a salary and then see if it is associated with their age.

      In Variable View, in the first empty row name a new variable SALARY and label it
      Annual salary (£000).

      Type the following in Data View in one column:
      (Work down these columns one after the other)

      38.1      38.9         23.2       22.9        19.8      19.7      15.6            31.7   17.3   37.8
      18.7      42.8         19.6       47.5        31.3       8.5      28.5            14.1   33.5   32.9
      42.3      60.1         15.5       15.8        59.3      15.9      37.3            20.3   13.7    9.8
      25.9      60.7         20.7       35.9        33.8      39.3      32.9            19.8    6.4   15.2
      53.6      75.2         40.2       25.3        24.5      14.5      23.9            28.2   35.3    8.7
      37.6      10.9         28.5       63.2        32.0      10.2       8.6            25.3   12.5   38.4

      The variables of interest in this practical session are the continuous variables
      SALARY and AGE. We shall investigate the relationship between the Salaries
      earned by the employees of Merlin and their ages.

2.5.2 Save revised datafile as Merlin5

2.5.3 Produce a scatter plot
      Graphs / Scatter / Define as Simple/ Select SALARY for Y and AGE for X. Give
      your graph a suitable title.




                                                                                                             1
                                                                       Salary v Age

                                        80.0


                Annual salary (£'000)
                                        60.0




                                        40.0




                                        20.0




                                         0.0

                                               0               20               40               60                 80
                                                                               Age


      Examine the plot. You should find it does suggest a rather poor linear relationship.

         Does there appear to be a relationship?                                                                  Yes but not strong

         Have a guess as to whether this is likely to be significant or not. . . . . . Yes. . . . . . .


2.5.4 Calculate the correlation coefficient.
      Analyse / Correlate / Bivariate

      Select SALARY and AGE as the variables.

                                                                       Correlations

                                                                                         Annual salary
                                                                                            (£'000)              Age
                            Annual salary (£'000) Pearson Correlation                                1             .398**
                                                  Sig. (2-tailed)                                                  .002
                                                  N                                                     60           60
                            Age                   Pearson Correlation                                 .398**          1
                                                  Sig. (2-tailed)                                     .002
                                                  N                                                     60               60
                                          **. Correlation is significant at the 0.01 level (2-tailed).



         What is the value of the correlation coefficient?. . . . . . . . . . . . . . . 0.398 . . . . . . . . .

         What is the probability of it being zero?                                       . . . . . . . . . . . . . . . .0.002 . .. . . . . . . .

         Is this significant at 5%?                                                         . . . . . . . . . . . . . . .Yes . . . . . . . . . .

      If the p value is less than 0.05 the correlation coefficient is significant at the 5% level
      of significance.



                                                                                                                                              2
2.5.5 Find the regression equation:
      Analyse / Regression / Linear/ Select SALARY as Dependent, AGE as Independent.
      Use Method Enter.

                                                                               a
                                                                   Coe fficients

                                                        Unstandardized        Standardized
                                                          Coefficients         Coefficients
                 Model                                   B       Std. Error       Beta                t             Sig.
                 1                       (Constant)     7.728        6.593                           1.172            .246
                                         Age              .554         .168            .398          3.306            .002
                                   a. Dependent Variable: Annual salary (£'000)



      The regression equation, as produced by SPSS, is not at all obvious.
      In the Coefficients table, under unstandardised coefficients and in the column under B
      you will find the constant, a, and the coefficient of Age, b.

                             Write down the regression equation                        y = 7.73 + 0.554x. . . . . . . . . . . . . .


2.5.6 Produce the regression line on your scatterplot.
      Graphs / Scatter / Define as simple / Select SALARY for Y and AGE for X and OK

      Double click on the graph to get into editing mode.
      Click on the points to select them.
      Chart / Add chart element / Fit line at tot al


                                                               Salary v Age

                                  80.0
          Annual salary (£'000)




                                  60.0




                                  40.0




                                  20.0

                                                                                              R Sq Linear = 0.159

                                   0.0

                                         0              20              40              60                   80
                                                                       Age




                                                                                                                                 3
2.5.7 Carry out residual analysis: (This could have been added at step 5)
     Analyse / Regression / Linear / Select SALARY as dependent and AGE as
     independent variables. Select Plots / Standardised residual plots / histogram. Select
     Save / Residuals / Unstandardised
                                                                  a
                                               Residuals Statistics

                                      Minimum         Maximum      Mean      Std. Deviation        N
        Predicted Value                 17.154          43.216     28.660          6.1200              60
        Residual                      -25.0975         37.5294      .0000         14.0992              60
        Std. Predicted Value             -1.880          2.378        .000          1.000              60
        Std. Residual                    -1.765          2.639        .000            .991             60
                   a. Dependent Variable: Annual salary (£'000)




                                                            Histogram


                                     Dependent Variable: Annual salary (£'000)

                   12


                   10
       Frequency




                    8


                   6


                   4


                   2

                                                                                         Mean = -6.94E-18
                   0                                                                     Std. Dev. = 0.991
                           -2        -1           0           1         2         3      N = 60

                                   Regression Standardized Residual


     You should see that your residuals appear reasonably normal on the histogram
     .
     To see if the mean is zero and the standard deviation low:
     Analyse / Descriptive Statistics / Descriptives / Select the SALARY and
     UNSTANDARDISED RESIDUALS.

                                                            Descriptive Statistics

                                                        N       Minimum        Maximum           Mean        Std. Deviation
          Annual salary (£'000)                             60        6.4           75.2          28.660          15.3701
          Unstandardized Residual                           60 -25.09753       37.52944        .0000000      14.09917166
          Valid N (listwise)                                60


     You should see that the mean of the residuals is zero. The standard deviation has been
     reduced (but not by much for these data).

     In the next three tasks we look at the males and females separately.




                                                                                                                              4
2.5.8 Produce the two regression lines on your scatterplot.
      Graphs / Scatter / Define as simple Select SALARY for Y and AGE for X and Set
      marker by SEX. OK

      Double click on the graph to get into editing mode.
      Click on the male points to select them (easiest to do in the legend).

      Chart / Add chart element / Fit line at total

      Repeat for the female points



                                                            Salary v Age

                                  80.0                                                  Sex of Employee
                                                                                              Male
                                                                                              Female
          Annual salary (£'000)




                                  60.0




                                  40.0




                                  20.0
                                                                                       R Sq Linear = 0.083
                                                                                       R Sq Linear = 0.233

                                   0.0

                                         0         20        40         60        80
                                                            Age
      .

                        Describe the general differences between the male and the female salaries. . . . . . . .

                                  .Male salaries start higher and rise more quickly than do those for females . . .


2.5.9 In the Data editor: Data / Select cases / If condition is satisfied / If Sex = 1.
      Continue.
      Leave unselected cases as Filtered. OK

      Repeat tasks 2.5.4 and, if the correlation coefficient is significant, task 2.5.5 for the
      males only.




                                                                                                                5
                                       Correlations for M ale s only

                                                                   Annual salary
                                                                      (£'000)            Age
                 Annual salary (£'000) Pearson Correlation                     1           .483**
                                       Sig. (2-tailed)                                     .007
                                       N                                       30            30
                 Age                   Pearson Correlation                   .483**           1
                                       Sig. (2-tailed)                       .007
                                       N                                       30                30
                    **. Correlation is significant at the 0.01 level (2-tailed).




                                                                    a
                                                        Coe fficients

                                            Unstandardized           Standardized
                                              Coefficients            Coefficients
                 Model                       B       Std. Error          Beta                t             Sig.
                 1        (Constant)        6.880       10.157                                .677           .504
                          Age                 .734         .252                .483          2.917           .007
                    a. Dependent Variable: Annual salary (£'000)




         Write down the correlation coefficient and the regression equation, if appropriate.. .

          . . . . . . . . . . . . . 0.483. . . . . . . y = 6.88 + 0.734x . . . . . . . . . . . . . . . . . . . . . . . . . .

2.5.10In the Data editor: Data / Select cases / All cases. Then If condition is satisfied / If
      Sex = 2. Leave unselected cases as Filtered.

      Repeat tasks 2.5.4 and, if the correlation coefficient is significant, task 2.5.5 for the
      females only.

                                         Correlations for fe males only

                                                                         Annual salary
                                                                            (£'000)                   Age
                  Annual salary (£'000) Pearson Correlation                          1                  .288
                                        Sig. (2-tailed)                                                 .123
                                        N                                               30                30
                  Age                   Pearson Correlation                           .288                 1
                                        Sig. (2-tailed)                               .123
                                        N                                               30               30



         Write down the correlation coefficient and the regression equation, if appropriate.

          . . . . . . . . .0.288 . . . . . . Not significant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .




                                                                                                                             6

				
DOCUMENT INFO