Document Sample

```					         SPSS Advanced for Windows:
Version 13
Under construction

Office of Information Technology / Office of Academic Computing Services

William Dardick
Brief Outline
I.      Introduction
II.     Descriptive statistics
III.    T-tests
a. One sample
b. Two independent samples
c. Paired
IV.     Analysis of variance
a. ANOVA
b. MANOVA
V.      Correlation
a. Bivariate
b. Partial
VI.     Regression
a. Simple
b. Multiple
c. Multivariate
VII.    Analysis of Covariance
a. ANCOVA
b. MANCOVA
VIII.   Logistic Regression
IX.     Factor analysis and principle components
X.      Non-parametric statistics

2
Introduction to SPSS version 13

The objective of this course is to teach the user knowledge of statistical applications in
SPSS version 13 in a windows setting. Most tasks in SPSS can be accomplished through
the use of the pull-down menu. The analysis functions of SPSS will be focused on, such
as multiple regressions Multivariate regression, ANOVAs, MANOVAs, Factor analysis,
etc…

General Overview
The majority of work that will be done in SPSS for windows will be performed in the
Data Editor and the Viewer/output Windows. SPSS syntax will not be focused on in this
course but sample syntax will be provided for the majority of analysis. It is not necessary
to know syntax to use SPSS for windows.

3
Descriptive Statistics

The Analyze option on the main menu has many statistics that can be preformed on your
data. One of the most common options for the Analyze menu is the Descriptive
Statistics procedure. By opening up the Analyze menu you can select Descriptive
Statistics. Under this option you have four types of analyses that can be performed.

Frequency Distribution

Looking at raw data can often be confusing. There is not much that can be said with
certainty by looking at raw unordered scores. The larger the data set the more difficult it
is to speak meaningfully about it. Today more so then any other time in history we have
hug data bases of data. Data bases in the millions are common place. The near future will
bring us into the billions as common place and on its coat tails the trillions. What if a data
base kept track of each order at every super market in the world? The data would only
stay in the Billions if it was tracked by person, and not order, the number of visits would
soon reach into the trillions. In the United States alone we could have 100 million people
stop into the super market three times a week for a year. Quickly summing up these
values, 3x52x100 million, we get 15.6 billion visits to the supper market that year.
Imagine the data set when the number of people doing something world wide is in the
billions and they do this several times a week or day.

Frequency distributions are one way of organizing data so observations about the data set
can be made. The frequency distribution shows the possible scores and the number of
observations for each score. When working with a large distribution it is sometimes
beneficial to group scores and look at the grouped distribution.

Computing Frequencies
Select the first analysis labeled Frequencies. A dialog box will pop up with the heading
Frequencies.

4
In the left box all of your variables from the data set will be present. By selecting one or
more of these variables and clicking the directional arrow shown in the window you can
place them into the Variable(s): section of the window. The variable is now ready to be
analyzed. Before doing so other options available to you in the frequencies window need
to be discussed.

Select Display frequency tables. This will provide a frequency table in your output or
Viewer window. On the bottom of your dialog box there are three options: Statistics,
Charts, and Format. Select Statistics. The Frequencies: statistics dialog box will open.
The box is composed of several sections.

First is the Percentile Values section. There are several options to divide data to obtain
specific percentages above and below specific points. The Quartiles option will divide the
output into 4 equal groups. Cut points allows for equal groups to be obtained other then
four. Individual percentiles can be selected by typing in the appropriate percentile in the
Percentile(s) option. Select Quartiles.

Central Tendency has four selections, Mean, Median, Mode and Sum.

The Desperation section is for measuring the amount of variance in the data set.

The Values are group midpoints selection is used only if the data is midpoint coded. This
selection will give estimates for the median and percentiles as if the data was ungrouped.

Distribution is tested by two statistics, Skewness and Kutosis. Skewness gives a measure
of symmetry for the data. Kurtosis measures to see if its peak is normal. In both cases a
normal distribution will have 0 Skewness and Kurtosis. A significant positive skew
indicates a long right tail in the distribution. When a data set is skewed negatively it has a
long left tail. A positive measure of kurtosis indicates clustering in the center with longer

5
tails than a normal distribution. A negative kurtosis has shorter tails and cluster less then
the normal curve.

When finished selecting statistics for the frequencies analysis click continue.

Now you are ready to run the analysis. Select OK.

The output will now appear in the Viewer window.

Descriptives

Next under the Descriptive Statistics option is Descriptives. By selecting descriptives
you will open a dialog box. The box is set up very similar to the frequency dialog box.
The main difference is the Options button.

The Display Order is the only new option in Descriptives. This option allows you to
display the order of the descriptives in the output as you choose.

6
Explore

Explore can be used to run descriptive statistics on data that is grouped. The Dependent
List holds the variables you want to explore. The Factor list creates the groups.

The option are not very different then the other descriptive options.

Crosstabs

The Crosstabs option allows you to count the frequencies that occur in any number of
cells within a larger table. One of the options in the statistics dialog box allows you to
compute a chi-squared along with the analysis.

Note:

Be sure to check the type of variables in the data set.
Make sure the type of variable is put in SPSS appropriately.
Use the appropriate variable for analysis.

7
Z-Scores
Z-score transformation using the compute option:

First you need to run descriptive statistics in order to compute z scores using the compute
transformation. Use the mean and standard deviation for variable that is going to be
converted into z scores.

X  x
z
Sx

Variable x – mean divided by standard deviation.

Use this equation in the compute transformation box, using the variable name for x.

The new variable holds values for your previous variable transformed into z scores.

8
t-test
The one sample t-test:
The one sample t test is used to compare the mean of one variable to a predetermined test
value either known or hypothesized. The objective with a one sample T test is used to see
if a single continuous dependent variable differs from some specified constant. The T test
is used with a single mean and in this case is a non-directional test using a normal
distribution. The statistics will state difference between the mean and the constant.

Examples of when to use the one sample t-test:

The average IQ score is 100, a sample can be used to gather scores and be tested against
this hypothesized mean.
When measuring resting heart rate we can hypothesize that the average rate per minuet
will be 70. This would be compared to a sample of actual resting heart rates and 7o will
be the hypothesized value used in the analysis.

The following is an example of a one sample t-test performed in SPSS.

Select One sample T Test. The One-Sample T Test window will appear on the screen.
Select the variable to be analyzed and set the test value for which you are testing the
variable. The options button will allow you to change the Confidence Interval from 95 to
whatever value is desired. Missing values allows for the removal of cases for each
variable measured or for individual cases.

Click OK to run the T-test.

Output for the one sample t-test.

T-Test
One -Sam ple Statis tics

Std. Error
N         Mean           Std. Deviation      Mean
WEIGHT            18   156.3889            38.69814       9.12124

One -Sam ple Tes t

Test Value = 135
95% Conf idence
Interval of the
Mean             Dif f erence
t             df         Sig. (2-tailed)   Dif f erence   Low er         Upper
WEIGHT        2.345              17              .031       21.3889      2.1448        40.6330

9
Example of Syntax for one-sample t-test:

T-TEST
/TESTVAL=135
/MISSING=ANALYSIS
/VARIABLES=weight
/CRITERIA=CIN (.95) .

Output explained:

The Value computed for t is2.345. With 17 Degrees of freedom this value is significant
at the .031 level. Even though this test is non-directional, we can still state that the mean
of this group is significantly higher then the hypothesized mean (Test Value) of 135.

Deriving this value:

The basic equation for a one sample t-test is similar to the z distribution.
1.1            1.2            1.3
X  x        X  x           X  x
z            t          or t 
Sx            Sx                SS x
nn  1
Where X-bar is the actual mean, U-sub-x is the hypothesized mean and S-sub-x-bar is the
standard error. The bottom half of equation 1.3 is expanded to derive the standard error
from the sum of squares.

If we plug in the numbers from our descriptive statistics above (or calculate them by
hand) we can see how t was derived.

t = (72.952 – 70) / 0.901 = 3.28

This value of 3.28 is significant for 20 degrees of freedom (DF) at the p=(.0038) level.
The degrees of freedom for the t-test are equal to the number of case minus one. In this
case there is one sample so our DF is equal to, 21-1= 20. 20 of the cases are ‘free to
vary’; one will be used to insure that the sum of deviation scores is set to 0. DF is used to
calculate the area under a normal distribution for x number of cases.

The independent t-test:
The independent t-test is used when you have the means of two separate, normally
distributed, groups that you want to compare. Unlike the one-sample t-test, the
independent t compares means that are both derived from the actual samples used. There
is no hypothesized mean in the equation. The purpose of the independent t test is to
determine if the two sample are significantly different then on another.

Example of when to use the independent t-test:

10
Select two randomly selected independent groups of subjects. Both groups are
administered an IQ test. The first group receives breakfast prior to taking the test, the
second group has only a glass of water prior to the test. The groups are the compared to
each other.

We could also have two groups, one of which is given a cup of coffee before testing there
heart rate. The control group is given glass of water prior to testing. The heart rates of the
two independent groups are compared to determine if there is a significant difference
between groups.

Groups could also be compared by gender or age category (if there are only two) or any
other two independent groups.

The following is an example of a independent t-test performed in SPSS.

This time go to the Analyze menu, Compare Means and select independent-samples T
Test. The window that appears is very similar to the One-Sample T test window. The
main difference here is the grouping variable section. The grouping variable should be
dichotomous or categorical as to divide the cases in the variables into groups. The options
for this T test are the same as for the others.

Output for the one sample t-test.

T-Test
Group Statis tics

Std. Error
rec ode gender             N            Mean       Std. Deviation           Mean
HEIGHT      1.00                             9      71.4444          2.12786             .70929
.00                              9      63.7778          3.30824           1.10275

Inde pe nde nt Sam ples Te st

Levene's Test f or
Equality of V ariances                                  t-test f or Equality of Means
95% Conf idence
Interval of the
Mean        Std. Error          Dif f erence
F            Sig.        t          df        Sig. (2-tailed)   Dif f erence   Dif f erence    Low er        Upper
HEIGHT   Equal variances
5.096          .038     5.847            16            .000         7.6667       1.31116       4.88714    10.44620
as sumed
Equal variances
5.847      13.652              .000         7.6667       1.31116       4.84777    10.48556
not assumed

Output explained:

The t value 5.8 is tested for significance. The t value is significant for a two tailed test.

11
Deriving this value:

The basic equation for a independent t-test:

2.1                                2.2
X  Y   (  x   y )                     X  Y 
t                          or t 
Sx y                          SSx  SS y      1 1

nx  1  ny  1  nx ny 

       


t= (64.818-72) / 1.0106 = -7.11
DF total is 19. This is the total number of cases minus the number of groups.

The paired t-test
Example of when to use the paired t-test:

A Paired-Samples T Test is used when there are two variables for the same group being
compared. This test computes the difference between the two values to see if there is a
significant difference from a score of zero. This is often the appropriate test for a one
group pre-post test design.

The following is an example of a paired t-test performed in SPSS.

Computing a paired t-test in SPSS is similar to other versions of the t-test. Select
Analyze, Compare means, and Paired-Samples t test. The dialog box opens with many
variables in one section and an empty section for variables to be placed in for analysis.
Select two variables and place them in the box.

Example of paired t-test with syntax:

T-TEST
PAIRS= lift WITH lift2 (PAIRED)
/CRITERIA=CIN(.95)
/MISSING=ANALYSIS.

Output for the paired t-test.

T-Test
Paired Samples Statis tics

Std. Error
Mean          N        Std. Deviation     Mean
Pair     lift untrained            143.0556           18       54.15498     12.76445
1        lift after groups split   158.8889           18       62.29804     14.68379

12
Paired Sam ples Corre lations

N          Correlation            Sig.
Pair      lif t untrained & lift
18               .991           .000
1         af ter groups split

Paired Sam ples Te st

Paired Dif ferences
95% Conf idence
Interval of the
Std. Error       Dif f erence
Mean      Std. Deviation       Mean       Low er        Upper    t       df        Sig. (2-tailed)
Pair   lif t untrained - lif t
-15.8333       11.40820        2.68894     -21.5065   -10.1602   -5.888        17            .000
1      af ter groups s plit

Output explained:

T value significant at -5.88 with 17 df. The two variables are correlated r = .991

ANOVA

One-way ANOVA (Analysis of Variance)
The One-way ANOVA is similar to an independent t-test, except that it can compare
differences in more then two means at once. If you have only 2 levels or groups the F
result from the ANOVA will be equal to t2 from a t test.

Two sources of variance. Between group difference and within group difference.

Example of when to use a one-way ANOVA:

The ANOVA procedure is used for balanced data.

A study using 3 levels of caffeine intact to determine performance level.
Giving two levels of pain medicine plus a placebo to patients after pulling wisdom teeth
and having them give a self report on pain level.

The following is an example of a one-way ANOVA performed in SPSS.

From the analyze menu, select compare means, one-way ANOVA. The one-way
ANOVA dialog box will open. Select one Dependent variable, select one IV (this is your
grouping variable). Select OK.

Example of Syntax for one-way ANOVA:

13
ONEWAY
height BY gendernm
/MISSING ANALYSIS .

Output for the one-way ANOVA.

Oneway
ANOVA

HEIGHT
Sum of
Squares       df        Mean Square      F          Sig.
Betw een Groups     264.500            1       264.500     34.190        .000
Within Groups       123.778           16         7.736
Total               388.278           17

Output explained:

The F value is tested for significance. Here our value for F = 34.19 is significant.

GLM
GLM stands for the general linear model. This model is the tool from which most of your
common statistics are generated.

Analysis for GLM can be performed in SPSS under the Analyze menu, Go to general
linear model. There will be four options: Univariate, Multivariate, Repeated Measures,
and Variance Components.

The first types of models being reviewed are Univariate ANOVA’s. The one-way
analysis of variance can be computed through the compare means menu or the GLM.

One-way ANOVA (using GLM)
Using the one-way ANOVA option under compare means we generate an output for
group comparison of weight after three years. with one table. The tables significance is
for the overall effect of the model.

Oneway

14
ANOVA

es timate af ter 3 years
Sum of
Squares               df                 Mean Square     F        Sig.
Betw een Groups 60336.778                     2             30168.389    18.941      .000
Within Groups   23891.667                    15              1592.778
Total           84228.444                    17

Sample syntax:

ONEWAY
weight3y BY lvl
/MISSING ANALYSIS .

We can also run this simple analysis using the general linear model option for Univariate
statistics. From the GLM option select Univariate.

The dialog box for Univariate models opens. In order to compute the simple ANOVA we
need to know where to place our variables. The first two boxes on the right side of the
dialog box are titled Dependent variables and Fixed Factors. Dependent variables for
ANOVA’s are continuous variables. The fixed factors are the grouping discrete variables
that differentiate levels.

To compute the same analysis from the one-way ANOVA, place weight after three years,
in the DV box and LVL of group in the Fixed factor box. Select OK.

Univariate Analysis of Variance
Betw e e n-Subjects Factors

V alue Label        N
lv l of   1.00     L                             6
group     2.00     M                             6
3.00     H                             6

Tes ts of Be tw ee n-Subje cts Effe cts

Dependent Variable: estimate after 3 y ears
Ty pe III Sum
Sourc e              of Squares             df             Mean Square     F         Sig.
Correc ted Model      60336.778 a                     2      30168.389    18.941       .000
Intercept            441173.556                       1     441173.556   276.984       .000
LVL                   60336.778                       2      30168.389    18.941       .000
Error                 23891.667                      15       1592.778
Total                525402.000                      18
Correc ted Total      84228.444                      17
a. R Squared = .716 (Adjusted R Squared = .679)

Sample syntax:

15
UNIANOVA
weight3y BY lvl
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/CRITERIA = ALPHA(.05)
/DESIGN = lvl .

The new analysis produced from the GLM, produces two tables. The first gives
information on the number of cases in each group. The second table is the analysis. The
Corrected Model row of this table yields the same results as the one-way ANOVA. F=
18.941. Error is the same as the within groups row and The Total row is the same as the
Total from the one-way ANOVA. Intercept LVL and corrected Total have been added,
along with significant tests for intercept and LVL. The intercept is the Y intercept of the
model and LVL is out IV tested for significance.

Factorial ANOVA
We have seen the basic ANOVA run two different ways. Using the univariate option
ANOVA with several Fixed factors we need to use the GLM method.

A full factorial ANOVA is produced by default when using the Univariate analysis for
the GLM. This model includes all main effects and interactions between effects. The one-
way ANOVA displays only main effects.

Example of when to use a factorial ANOVA:

When there are several different groups to test against one DV.

The following is an example of a factorial ANOVA performed in SPSS.

Taking the previous example using estimate after Three years as the DV and Level of
group as the IV, or Fixed Factor, we now add another Fixed Factor.

Output for the factorial ANOVA.

Univariate Analysis of Variance

16
Betw e e n-Subjects Factors

Value Label        N
lv l of       1.00    L                       6
group         2.00    M                       6
3.00    H                       6
opinion       1.00    high                    5
of res ults   2.00    med                     6
3.00    low                     7

This Table displays the label and number of cases for each level of the fixed factors.

Tes ts of Be tw ee n-Subje cts Effe cts

Dependent Variable: estimate after 3 y ears
Ty pe III Sum
Sourc e              of Squares         df        Mean Square       F       Sig.
Correc ted Model      72677.578 a             6     12112.930      11.535     .000
Intercept            352286.559               1    352286.559     335.486     .000
LVL                   23081.352               2     11540.676      10.990     .002
CONTENT                 2486.382              2      1243.191       1.184     .342
LVL * CONTENT         12149.979               2      6074.989       5.785     .019
Error                 11550.867              11      1050.079
Total                525402.000              18
Correc ted Total      84228.444              17
a. R Squared = .863 (Adjusted R Squared = .788)

Sample syntax:
UNIANOVA
weight3y BY lvl content
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/CRITERIA = ALPHA(.05)
/DESIGN = lvl content lvl*content .

Output explained:

Our second table now shows both main effects for our two Fixed Factors and an
interaction between them.

There is a main effect for lvl, but not for content. There is also an interaction effect.

Groups do not need to be fully balanced in this model but it is necessary for all cases to
be full. If there are missing cases another type of sum of squares can be used to perform
the analysis.

Type of model:

17
Using the same procedure, open the dialog box for Univariate GLM. The next step in
understanding the proper way to run ANOVA’s in SPSS is to understand what is intended
by Type Sum of Squares. There are 4 different types to select from. Each model will
effect your analysis differently. Knowing your data through descriptive statistics can help
in choosing the correct model.

Choosing the correct Type of sum of squares to use can be found under the Model option.
The model option is also where effect preferences can be modified. The default here is
for a full factorial model. Individual main effects and interactions can be selected with
the custom model. Only effects placed into the custom design will be run with the
analysis if custom is selected.

At the bottom of the model dialog box is a pull down section titled sum of squares: the
default is type III.

Type III analysis is appropriate for balanced or unbalanced models as long as there is no
missing data. If there is missing data in some groups type IV should be used. Types I and
II are included in types III, and IV, but can be used for analysis under certain conditions.

One-way repeated measures ANOVA
This ANOVA model is useful when you have repeated measures on some variable type.
Essentially it is an expansion of a pre post test situation.

Example of when to use a one-way repeated measures ANOVA:

Measuring weight 3 times over a two week period and having multiple levels in
experimental groups.

The following is an example of a one-way repeated measures ANOVA performed in
SPSS.

Select repeated from GLM. Enter factors as name and number of levels as number of
variable repeated over x time.
Select between subjects factors for ANOVA model.

Sample Syntax:

GLM
weight weight2 weight12 BY lvl
/WSFACTOR = time 3 Polynomial
/METHOD = SSTYPE(3)
/CRITERIA = ALPHA(.05)
/WSDESIGN = time
/DESIGN = lvl .

18
Output for the one-way repeated measures ANOVA.

General Linear Model
Within-Subje cts Factors

Measure: MEASURE_1
Dependent
TIME          Variable
1            WEIGHT
2            WEIGHT2
3            WEIGHT12

Level of the factor Time. Each level was a separate variable.
Betw e e n-Subjects Factors

Value Label              N
lv l of       1.00      L                             6
group         2.00      M                             6
3.00      H                             6

b
Multivariate Te sts

Ef f ect                                          Value              F       Hypothes is df             Error df            Sig.
TIME                 Pillai's Trace                  .814           65.693 a        1.000                 15.000              .000
Wilks ' Lambda                  .186           65.693 a        1.000                 15.000              .000
Hotelling's Trac e             4.380           65.693 a        1.000                 15.000              .000
Roy's Larges t Root            4.380           65.693 a        1.000                 15.000              .000
TIME * LVL           Pillai's Trace                  .965          205.675 a        2.000                 15.000              .000
Wilks ' Lambda                  .035          205.675a         2.000                 15.000              .000
Hotelling's Trac e            27.423          205.675a         2.000                 15.000              .000
Roy's Larges t Root           27.423          205.675 a        2.000                 15.000              .000
a. Ex ac t statistic
b.
Design: Intercept+LVL
Within Subjects Design: TIME

Commonly used tests for multivariate statistics.
b
Mauchly's Te st of Spher icity

Measure: MEASURE_1

a
Epsilon
Approx.                                    Greenhous
Within Subjects Ef f ec t Mauchly's W     Chi-Square          df           Sig.       e-Geiss er     Huynh-Feldt      Low er-bound
TIME                             .000               .              2              .         .500           .571               .500
Tests the null hypothes is that the error cov arianc e matrix of the orthonormalized transf ormed dependent v ariables is
proportional to an identity matrix.
a. May be us ed to adjus t the degrees of f reedom f or the av eraged tes ts of signif icanc e. Corrected tes ts are displayed in the
Tests of Within-Subjects Ef f ec ts table.
b.
Design: Intercept+LVL
Within Subjects Design: TIME

19
Tes ts of Within-Subjects Effe cts

Measure: MEASURE_1
Ty pe III Sum
Sourc e                                 of Squares          df        Mean Square      F         Sig.
TIME          Spheric ity Assumed           100.000             2          50.000     65.693       .000
Greenhouse-Geis ser           100.000         1.000         100.000     65.693       .000
Huynh-Feldt                   100.000         1.143          87.500     65.693       .000
Low er-bound                  100.000         1.000         100.000     65.693       .000
TIME * LVL    Spheric ity Assumed           626.167             4         156.542    205.675       .000
Greenhouse-Geis ser           626.167         2.000         313.083    205.675       .000
Huynh-Feldt                   626.167         2.286         273.948    205.675       .000
Low er-bound                  626.167         2.000         313.083    205.675       .000
Error(TIME)   Spheric ity Assumed            22.833            30            .761
Greenhouse-Geis ser            22.833        15.000           1.522
Huynh-Feldt                    22.833        17.143           1.332
Low er-bound                   22.833        15.000           1.522

Tes ts of Within-Subjects Contrasts

Measure: MEASURE_1
Ty pe III Sum
Sourc e       TIME         of Squares           df        Mean Square        F         Sig.
TIME          Linear            25.000                1        25.000       65.693       .000
Quadratic         75.000                1        75.000       65.693       .000
TIME * LVL    Linear           156.542                2        78.271      205.675       .000
Quadratic        469.625                2       234.813      205.675       .000
Error(TIME)   Linear               5.708             15          .381

Tes ts of Be tw ee n-Subje cts Effects

Measure: MEASURE_1
Transf ormed Variable: Average
Ty pe III Sum
Sourc e    of Squares         df           Mean Square        F           Sig.
Intercept 1349004.167               1      1349004.167      269.529         .000
LVL           2697.583              2         1348.792         .269         .767
Error       75075.750              15         5005.050

GLM (multivariate models)
The Second option under the general linear model analysis is the multivariate analysis.
Just as in the Univariate analysis several different types of statistics can be performed.
The Multivariate option can perform regressions and ANOVA’s with multiple Dependent
variables.

20
MANOVA (Multivariate Analysis of Variance)
Predicting for multiple dependent variables at once using multiple predictors. A one-way
manova would have only one IV and multiple DV’s

Example of when to use a factorial MANOVA:
A MANOVA is used when you have several dependent variables with multiple
classification, or independent variables.
Do your background research on the MANOVA analysis before using it. Make sure
MANOVA is the correct tool to use. It may help protect against type I error but many
experimenters will use MANOVA in cases where an experiment calls for multiple
ANOVA’s.

The following is an example of a factorial MANOVA performed in SPSS.
Under the general linear model select multivariate. The multivariate dialog box will open.
This box will look similar to the Univariate option except the random effects box has
been removed and the dependent variables box is expanded to allow more then one DV at
a time. A simple MANOVA can be run by adding two dependent variable and one Fixed
factor.

Output for the factorial MANOVA:

General Linear Model
Betw e e n-Subjects Factors

V alue Label       N
lv l of     1.00    L                        6
group       2.00    M                        6
3.00    H                        6

Balanced groups are observed
c
Multivariate Te sts

Ef f ect                             V alue          F       Hypothes is df      Error df        Sig.
Intercept     Pillai's Trace             .991      775.078a         2.000          14.000          .000
Wilks ' Lambda             .009      775.078 a        2.000          14.000          .000
Hotelling's Trac e     110.725       775.078a         2.000          14.000          .000
Roy's Larges t Root    110.725       775.078a         2.000          14.000          .000
LV L          Pillai's Trace             .935        6.589          4.000          30.000          .001
Wilks ' Lambda             .067       19.993 a        4.000          28.000          .000
Hotelling's Trac e      13.831        44.952          4.000          26.000          .000
Roy's Larges t Root     13.828       103.713b         2.000          15.000          .000
a. Ex ac t statistic
b. The statis tic is an upper bound on F that y ields a low er bound on the signif ic ance level.
c. Design: Intercept+LV L

21
The Multivariate tests table is comprised of four separate multivariate tests that basically
do the same thing with different assumptions. They test the null hypothesis that there is
no differences do to the alternative statistical hypothesis being tested.
Of the four tests Wilk’s Lambda is the most handy to use partly due to its relationship to
likelihood ratio criterion.
The most robust and powerful is the Pilai’s Trace test.
Most of the time all four results will yield the same thing, Significance or non-
significance
Tes ts of Be tw ee n-Subje cts Effects

Ty pe III Sum
Sourc e            Dependent Variable        of Squares      df        Mean Square      F       Sig.
Correc ted Model   es timate af ter 3 years   60336.778 a          2     30168.389     18.941     .000
caloric intake             43463.194 b          2     21731.597       .255     .778
Intercept          es timate af ter 3 years 441173.556             1    441173.556    276.984     .000
caloric intake            88201334.7            1   88201334.72   1034.477     .000
LVL                es timate af ter 3 years   60336.778            2     30168.389     18.941     .000
caloric intake             43463.194            2     21731.597       .255     .778
Error              es timate af ter 3 years   23891.667           15      1592.778
caloric intake           1278927.083           15     85261.806
Total              es timate af ter 3 years 525402.000            18
caloric intake            89523725.0           18
Correc ted Total   es timate af ter 3 years   84228.444           17
caloric intake           1322390.278           17
a. R Squared = .716 (Adjusted R Squared = .679)
b. R Squared = .033 (Adjusted R Squared = -.096)

The tets of between-subjects effects, gives us an overall F value as the corrected model. It
also tests for main effects, y intercept and any interaction that may be present.
R2 is a good estimate of how well the overall model fits the linear model.

Sample Syntax:
GLM
weight3y cal BY lvl
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/CRITERIA = ALPHA(.05)
/DESIGN = lvl .

22
Correlations

Correlation
The standard bivariate correlation compares x to y. A correlation measures how x and y
relate to one another. A change in a set of X, relates to a change in a set of y. Degree and
magnitude.

Example of when to use a correlation:

Determine if there is a relationship between height and weight.

Example of correlation in SPSS:

Under the analyze menu select correlation and select bivariate. Place the variables to be
correlated in the box, select OK.

Correlations
Cor relations

Time to
Engine        A cc elerate
Displacement    f rom 0 to 60
(cu. inches)      mph (s ec)     Horsepow er
Engine Dis placement         Pearson Correlation                     1            -.545**        .897**
(cu. inches)                 Sig. (2-tailed)                          .            .000          .000
N                                     406              406           400
Time to Ac celerate          Pearson Correlation                 -.545**              1         -.701**
f rom 0 to 60 mph (s ec)     Sig. (2-tailed)                      .000                 .         .000
N                                     406              406           400
Horsepow er                  Pearson Correlation                  .897**          -.701**           1
Sig. (2-tailed)                      .000             .000              .
N                                     400              400           400
**. Correlation is s ignif icant at the 0.01 lev el (2-tailed).

Sample Syntax:

CORRELATIONS
/VARIABLES=engine accel horse
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE .

23
Output explained:

The diagonal from upper left to lower right in a correlation matrix is equal to one. These
scores are always the relationship of any variable to itself. This diagonal also cuts the
matrix in half, leaving a mirror image on either side. In the case of our simple bivariate
correlation we have only one important value in our matrix.

Search for a possible relationship between two or more variables.

Equations for correlations
Y  A  BX

2 tailed

r
 X  X Y  Y 
SS SS 
x        y

N  XY   X  Y 
r
N  X     2

  X  N  Y 2   Y 
2                  2


24
Partial and Semi-partial Correlations

Partial Correlations
Partial correlations remove the effect of one or more variables from other variables.
These variables are then correlated.

Example of when to use a partial correlation:

This type of correlation is particularly useful when you think there may be an underling
variable influencing your correlation matrix. Extracting this variable can yield useful
information.

The following is an example of a partial correlation performed in SPSS.

Under the analyze menu select correlation and select Partial. Place the variables to be
correlated in the top box, select a variable to control for and place this variable in the
bottom box. Select OK.

- - -    P A R T I A L        C O R R E L A T I O N          C O E F F I C I E N T S

Controlling for..           WEIGHT

ENGINE          ACCEL         HORSE

ENGINE             1.0000       -.4832          .5171
(     0)      ( 397)         ( 397)
P= .          P= .000        P= .000

ACCEL             -.4832         1.0000        -.7419
( 397)        (     0)       ( 397)
P= .000       P= .           P= .000

HORSE              .5171        -.7419          1.0000
( 397)        ( 397)         (     0)
P= .000       P= .000        P= .

(Coefficient / (D.F.) / 2-tailed Significance)

" . " is printed if a coefficient cannot be computed

Sample Syntax:
PARTIAL CORR
/VARIABLES= engine accel horse BY weight
/SIGNIFICANCE=TWOTAIL
/MISSING=LISTWISE .

Output explained:

25
With the effects of x variable removed our correlation is .xx and (non)significant.
Same interpretation as correlation except smaller partial correlations are more meaning
full.

Note: semi-partial correlations are your basic regression model.
Non-parametric correlation calls for spearman
Useful when variables are not normal distributions and/or ranks.

26
Regression
Simple linear regression
In its most basic form a simple bivariate regression is used to determine the strength of
relationship between a DV and IV. A regression is very similar to a correlation. The r
value derived from a correlation is used to predict variance. In a simple regression the
value of r is squared. When two values are highly correlated there is likely to be a
predictive linear relationship.

Regression is used to predict a score on one variable when other variables scores are
known. In a regression model the independent variable is the predictor and the dependent
variable is the predicted. The variables are continuous.

The line of best fit, analogous to a running mean between x and y, is the regression line
used for linear regression. The line is derived from the least squares criterion. The goal of
the least squares criterion is to minimize the sum of squares of discrepancies between y
and y predicted.

The bivariate regression formula:

Y’= A+BX

A is intercept
B is the slope
X is the predictor variable

Example of when to use a simple linear regression:

Predicting performance in college from SAT scores.
Determining market close from the days opening value.

The following is an example of a simple linear regression performed in SPSS

The regression statistic can be computed by selecting Analyze, Regression, Linear.
Selecting a variable to place in the Dependent box and one or more variables to be placed
in the Independent Box can now perform a linear regression. The statistics option can
help in the selection of output for the regression.

27
The Statistics option allows you to select what you want to appear in the output Viewer.

Example of Syntax for regression and options.

REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS CI BCOV R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT height
/METHOD=ENTER weight
/RESIDUALS DURBIN
/SAVE PRED ZPRED RESID ZRESID DFBETA SDBETA
/OUTFILE=COVB('C:\Program Files\SPSS\testsave for regression.sav') .

Output for simple linear regression.

Regression

28
Des criptive Statis tics

Mean              Std. Deviation                    N
HEIGHT                 67.6111                 4.77911                           18
WEIGHT                156.3889               38.69814                            18

Cor relations

HEIGHT               WEIGHT
Pearson Correlation                    HEIGHT                  1.000                .903
WEIGHT                   .903               1.000
Sig. (1-tailed)                        HEIGHT                       .               .000
WEIGHT                   .000                    .
N                                      HEIGHT                     18                  18
WEIGHT                     18                  18

Variables Enter ed/Re m ovebd

Variables               Variables
Model            Entered                Remov ed                 Method
1               WEIGHTa                           .             Enter
a. All requested variables entered.
b. Dependent Variable: HEIGHT

b
Model Sum m ary

Change Statistics
Adjusted         Std. Error of         R Square                                                                               Durbin-W
Model             R          R Square           R Square         the Estimate          Change        F Change          df 1           df 2           Sig. F Change             atson
1                  .903 a        .816                .804            2.11429               .816         70.859                1               16              .000               1.412
a. Predictors: (Constant), WEIGHT
b. Dependent Variable: HEIGHT

ANOVAb

Sum of
Model                                    Squares                   df              Mean Square                   F                Sig.
1               Regression                316.754                           1          316.754                  70.859              .000 a
Residual                   71.523                          16            4.470
Total                     388.278                          17
a. Predictors: (Constant), WEIGHT
b. Dependent Variable: HEIGHT

a
Coe fficients

Unstandardiz ed        Standardized
Coef f icients        Coef f icients                            95% Conf idence Interval f or B      Correlations                  Collinearity Statis tics
Model                    B         Std. Error       Beta            t            Sig.     Low er Bound Upper Bound Zero-order     Partial          Part     Toleranc e        VIF
1        (Cons tant)    50.167          2.131                      23.537          .000         45.648          54.685
WEIGHT           .112           .013            .903       8.418          .000           .083             .140   .903        .903           .903       1.000          1.000
a. Dependent Variable: HEIGHT

29
a
Coe fficient Corre lations

Model                                 WEIGHT
1        Correlations    WEIGHT         1.000
Covariances     WEIGHT          .000
a. Dependent Variable: HEIGHT

a
Colline arity Diagnostics

Condition      Varianc e Proportions
Model    Dimension      Eigenvalue        Index        (Cons tant)   WEIGHT
1        1                   1.972           1.000             .01         .01
2                    .028           8.435             .99         .99
a. Dependent Variable: HEIGHT

a
Res iduals Statistics

Minimum      Max imum       Mean       Std. Deviation   N
Predicted V alue               60.7635      76.9374       67.6111          4.31655        18
Std. Predicted V alue           -1.586        2.161           .000            1.000       18
Standard Error of
.50598       1.21485         .68031          .18937        18
Predicted V alue
A djusted Predic ted V alue   61.2143       78.8781       67.7285          4.49460        18
Residual                      -3.9374        3.0937         .0000          2.05116        18
Std. Residual                  -1.862         1.463           .000             .970       18
Stud. Res idual                -2.275         1.510          -.025            1.064       18
Deleted Residual              -5.8781        3.2942        -.1174          2.48944        18
Stud. Deleted Res idual        -2.679         1.579          -.048            1.130       18
Mahal. Distance                   .029        4.668           .944            1.175       18
Cook's Dis tance                  .001        1.276           .121             .293       18
Centered Leverage V alue          .002          .275          .056             .069       18
a. Dependent V ariable: HEIGHT

Output explained:

R is the correlation between x and y.

R2 is the proportion of variance that can be predicted from the IV’s used in the regression
model. The results of the R2 from a one sample can vary from model to model, depending
on which variables are used as predictors and method is used for selecting and imputing
variables.

Adjusted R2 is the value for R2 adjusted for a population. R2 tends to be overestimated in
a sample.

30
Standard error of Estimate is a measure of how a sample would vary from sample to
sample. It is analogous to a standard deviation for samples, as apposed to individual
scores.

ANOVA test is used to test the overall value of R2 and to see if it is significant.

P value is the level of significance you are testing R2. The value is typically set to .05.

Coefficients are weights used to adjust values and predict Y. Unstandardized coefficients
are B scores and are greatly influenced by the scale of measurement for the variable.
Standardized Coefficients have been modified to have a mean of 0 and standard deviation
of 1. Weights or beta values is the slop of the line of best fit. The weight reflects how
much y will change when there is a one unit increase in x.

GLM
Simple linear regression (using GLM)

Still using the GLM Univariate model, another type of analysis can be performed, The
linear regression.

First is an example regression using the linear regression method. The second regression
model is generated using the Univariate method in SPSS.

Regression
Variables Enter ed/Re m ovebd

Variables       Variables
Model     Entered        Remov ed        Method
1        WEIGHTa                   .    Enter
a. All requested variables entered.
b. Dependent Variable: HEIGHT

Model Sum m ary

Model        R         R Square        R Square    the Estimate
1             .903 a       .816             .804       2.11429
a. Predictors: (Constant), WEIGHT

31
ANOVAb

Sum of
Model                  Squares             df         Mean Square        F        Sig.
1       Regression      316.754                  1        316.754       70.859      .000 a
Residual         71.523                 16          4.470
Total           388.278                 17
a. Predictors: (Constant), WEIGHT
b. Dependent Variable: HEIGHT

a
Coe fficients

Unstandardiz ed             Standardized
Coef f icients             Coef f icients
Model                    B          Std. Error           Beta            t        Sig.
1       (Cons tant)     50.167          2.131                           23.537      .000
WEIGHT            .112            .013                .903       8.418      .000
a. Dependent Variable: HEIGHT

Sample syntax:

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT height
/METHOD=ENTER weight .

The following is a Regression test using the Univariate model under GLM:

Univariate Analysis of Variance
Tes ts of Be tw ee n-Subje cts Effe cts

Dependent Variable: HEIGHT
Ty pe III Sum
Sourc e            of Squares         df             Mean Square        F        Sig.
Correc ted Model       316.754a              1           316.754       70.859      .000
Intercept             2476.483               1          2476.483      553.997      .000
WEIGHT                 316.754               1           316.754       70.859      .000
Error                   71.523              16             4.470
Total               82671.000               18
Correc ted Total       388.278              17
a. R Squared = .816 (Adjusted R Squared = .804)

Above R2 and adjusted R2 are reported after the ANOVA table.

Sample syntax:

32
UNIANOVA
height WITH weight
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/CRITERIA = ALPHA(.05)
/DESIGN = weight .

In order to get coefficients under the GLM model the option for parameter estimates must
be selected. The power option has also been added to the table bellow.
Param e ter Es tim ate s

Dependent Variable: HEIGHT

95% Conf idence Interval      Nonc ent.   Obs erved
a
Parameter      B        Std. Error    t            Sig.      Low er Bound Upper Bound       Parameter     Pow er
Intercept     50.167        2.131    23.537          .000          45.648         54.685       23.537        1.000
WEIGHT          .112          .013    8.418          .000            .083            .140       8.418        1.000
a. Computed using alpha = .05

Sample syntax: Print option has been added to base statements for power and parameter.

UNIANOVA
height WITH weight
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = OPOWER PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = weight .

Comparing the two models we can see that our results are the same for the overall model
and Parameter estimates. The regression model has been duplicated under the GLM
option.

Multiple regression
Is the same thing as a simple regression except with multiple predictors (IV’s).

Example of when to use a multiple regression:

Using SAT scores, GPA and the strength of letters of recommendation to predict First
year college performance for new students out of High school.

The following is an example of a correlation performed in SPSS

The multiple regression analysis can be obtained the same way as the simple bivariate
regression. A single dependent variable is selected. You can add in multiple independent
variables and run the analysis the same way a linear regression is run with one IV.

33
Example of Syntax for multiple regression and options.

REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS CI BCOV R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT height
/METHOD=ENTER weight age whindex
/RESIDUALS DURBIN
/SAVE PRED ZPRED RESID ZRESID DFBETA SDBETA
/OUTFILE=COVB('C:\Program Files\SPSS\testsave for regression.sav') .

Output of Multiple regression.

Most of the output will look identical to the regression output. The main difference is
multiple coefficients and correlations. One set for each Independent predictor variable
used in the analysis.

Regression
Des criptive Statistics

Mean       Std. Dev iation         N
HEIGHT             67.6111          4.77911               18
WEIGHT            156.3889        38.69814                18
AGE                32.3333        10.54960                18
WHINDEX             2.2890           .42953               18

Cor relations

HEIGHT          WEIGHT     AGE       WHINDEX
Pearson Correlation          HEIGHT          1.000           .903      .047        .840
WEIGHT           .903          1.000      .044        .990
AGE              .047           .044     1.000        .018
WHINDEX          .840           .990      .018       1.000
Sig. (1-tailed)              HEIGHT               .          .000      .426        .000
WEIGHT           .000               .     .431        .000
AGE              .426           .431          .       .472
WHINDEX          .000           .000      .472            .
N                            HEIGHT             18             18        18          18
WEIGHT             18             18        18          18
AGE                18             18        18          18
WHINDEX            18             18        18          18

34
Variables Enter ed/Re m ovebd

Variables                Variables
Model            Entered                 Remov ed                 Method
1               WHINDEX,
AGE, a                                   .       Enter
WEIGHT
a. All requested variables entered.
b. Dependent Variable: HEIGHT

b
Model Sum m ary

Change Statistics
Adjusted         Std. Error of       R Square                                                                      Durbin-W
Model            R         R Square          R Square         the Estimate        Change        F Change         df 1           df 2        Sig. F Change        atson
1                 .988 a       .976               .971              .81957            .976       188.017                3              14            .000          2.674
a. Predictors: (Constant), WHINDEX, AGE, WEIGHT
b. Dependent Variable: HEIGHT

ANOVAb

Sum of
Model                                     Squares                   df              Mean Square                 F                     Sig.
1               Regression                 378.874                           3          126.291               188.017                   .000 a
Residual                     9.404                          14             .672
Total                      388.278                          17
a. Predictors: (Constant), WHINDEX, AGE, WEIGHT
b. Dependent Variable: HEIGHT

a
Coe fficients

Unstandardiz ed         Standardized
Coef f icients        Coef f icients                            95% Conf idence Interval f or B      Correlations                 Collinearity Statis tics
Model                    B          Std. Error       Beta            t            Sig.     Low er Bound Upper Bound Zero-order     Partial        Part      Toleranc e        VIF
1        (Cons tant)    69.806           2.286                      30.532          .000         64.902          74.710
WEIGHT            .473           .038           3.828      12.467          .000            .391            .554   .903        .958         .519          .018        54.507
AGE              -.031           .019           -.068      -1.606          .131           -.072            .010   .047       -.394        -.067          .964         1.037
WHINDEX       -32.824           3.414          -2.950      -9.615          .000        -40.146         -25.502    .840       -.932        -.400          .018        54.419
a. Dependent Variable: HEIGHT

a
Coe fficient Corre lations

Model                                                            WHINDEX                  A GE            WEIGHT
1               Correlations              WHINDEX                   1.000                    .185            -.991
A GE                       .185                  1.000             -.189
WEIGHT                    -.991                   -.189           1.000
Covariances               WHINDEX                  11.654                    .012            -.128
A GE                       .012                    .000             .000
WEIGHT                    -.128                    .000             .001
a. Dependent V ariable: HEIGHT

35
a
Colline arity Diagnos tics

Condition                   V arianc e Proportions
Model    Dimension      Eigenvalue        Index        (Cons tant)   WEIGHT         A GE       WHINDEX
1        1                   3.888           1.000             .00          .00          .01        .00
2                    .087           6.687             .00          .00          .70        .00
3                    .024          12.650             .21          .01          .26        .00
4                    .000        104.088              .79          .99          .04       1.00
a. Dependent Variable: HEIGHT

a
Res iduals Statistics

Minimum      Max imum       Mean       Std. Deviation        N
Predicted V alue               60.1977      76.4574       67.6111          4.72088             18
Std. Predicted V alue           -1.570        1.874           .000            1.000            18
Standard Error of
.20615        .68203         .36430          .13239             18
Predicted V alue
A djusted Predic ted V alue   59.3334       77.4876       67.7078          4.97591             18
Residual                      -1.7874         .8784         .0000           .74375             18
Std. Residual                  -2.181         1.072           .000             .907            18
Stud. Res idual                -2.616         1.411          -.047            1.103            18
Deleted Residual              -2.9020        1.6666        -.0967          1.14049             18
Stud. Deleted Res idual        -3.526         1.468          -.136            1.325            18
Mahal. Distance                   .131       10.828         2.833             2.906            18
Cook's Dis tance                  .001        1.423           .173             .365            18
Centered Leverage V alue          .008          .637          .167             .171            18
a. Dependent V ariable: HEIGHT

Output explained:

The multiple regression model differs from the bivariate model with its multiple IV’s. In
order to get an overall R2 value partial correlations are used.

Optional Model-selection methods:

The method by which IV’s are entered into the model can change the overall regression
value.

Under linear regression there are 5 optional methods.

The enter method takes all of the IV’s and enters them together in one step.
Stepwise, combo of forward and backwards.
Remove, variables in a block removed at once.
Backward, all variables entered then removed, smallest partial correlation removed.
Forwards, entered one at a time, if satisfies criteria.

36
Multivariate regression
A multivariate regression has multiple predictors (IV’s) and multiple dependent
variables.

Example of when to use a Multivariate Regression:

Using SAT scores, GPA and the strength of letters of recommendation to predict First
year college performance for new students out of High school and to predict final college
GPA.

The following is an example of a multiple regression performed in SPSS.

The multivariate regression can be run through the general linear model (GLM) analysis.
From the Analyze menu select GLM, multivariate…

The Multivariate dialog box will open.

Place your variables to be predicted (dependent) in the Box entitled Dependent Variables.
The predictor or Independent variables go in the Covariate(s) box.

Select OK to run the core Multivariate regression.

Example of Syntax for multiple regression and options.
GLM

37
cal timerun WITH height weight age
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/CRITERIA = ALPHA(.05)
/DESIGN = height weight age .

Output for multiple regression.

General Linear Model

b
Multivariate Te sts

Ef f ect                          V alue       F       Hypothes is df   Error df   Sig.
Intercept   Pillai's Trace                           a
.187     1.493          2.000       13.000     .261
Wilks ' Lambda            .813     1.493a         2.000       13.000     .261
Hotelling's Trac e        .230     1.493a         2.000       13.000     .261
Roy's Larges t Root       .230     1.493 a        2.000       13.000     .261
HEIGHT      Pillai's Trace            .079      .557 a        2.000       13.000     .586
Wilks ' Lambda            .921      .557 a        2.000       13.000     .586
Hotelling's Trac e        .086      .557 a        2.000       13.000     .586
Roy's Larges t Root       .086      .557 a        2.000       13.000     .586
WEIGHT      Pillai's Trace            .693    14.659 a        2.000       13.000     .000
Wilks ' Lambda            .307    14.659 a        2.000       13.000     .000
Hotelling's Trac e      2.255     14.659 a        2.000       13.000     .000
Roy's Larges t Root     2.255     14.659 a        2.000       13.000     .000
A GE        Pillai's Trace            .706    15.629 a        2.000       13.000     .000
Wilks ' Lambda                           a
.294    15.629          2.000       13.000     .000
Hotelling's Trac e      2.405     15.629 a        2.000       13.000     .000
Roy's Larges t Root     2.405     15.629 a        2.000       13.000     .000
a. Ex ac t statistic
b. Design: Intercept+HEIGHT+WEIGHT+A GE

38
Tes ts of Be tw ee n-Subje cts Effe cts

Ty pe III Sum
Sourc e            Dependent Variable  of Squares          df        Mean Square    F            Sig.
Correc ted Model   caloric intake     1223180.721 a              3    407726.907   57.537          .000
Running one mile         42.811 b             3        14.270    4.867          .016
Intercept          caloric intake       19535.918                1     19535.918    2.757          .119
Running one mile            6.443             1         6.443    2.197          .160
HEIGHT             caloric intake           83.174               1        83.174     .012          .915
Running one mile            2.111             1         2.111     .720          .410
WEIGHT             caloric intake      167127.148                1    167127.148   23.584          .000
Running one mile              .310            1          .310     .106          .750
AGE                caloric intake      236517.123                1    236517.123   33.376          .000
Running one mile         38.790               1        38.790   13.229          .003
Error              caloric intake       99209.557               14      7086.397
Running one mile         41.051              14         2.932
Total              caloric intake      89523725.0               18
Running one mile       1756.210              18
Correc ted Total   caloric intake     1322390.278               17
Running one mile         83.863              17
a. R Squared = .925 (Adjusted R Squared = .909)
b. R Squared = .510 (Adjusted R Squared = .406)

Output explained:

Other concerns for regression:

Orthogonal data is better then correlated predictors…
Closer correlation to 1.0, the harder to separate out effects. Dependent on other IV’s.
Could cause significant results for R when none exist individually.
Could recode
More variables used to predict, more likely to find significance. If there are too many
variables you will find significance even when it does not exist. Over fit your model.

Don’t try multiple models and choose the best.

Models should remain current. Don’t try to predict current market trends with a 30 year
old model. Factors can change over time.

Larger the number of unrelated predictors, the larger R2 will be.

Using a regression model to predict. Use weights to form model and predict new outcome
from model.

39
Comparing coefficients: If coefficients are measured in the same scale there can be a
direct comparison of size to fit of model. If different scale the coefficients weight is
effected by size of scale. Ex. Measured in seconds versos hours.
Standeard coef use the std as unit of measure, as standardized scores, can be used to
compare if from a well defined population.

Post hoc tests…

Assumptions …

40
Regression coefficients
Regression model using coefficients and the compute option:

Run a simple regression model. Use coefficient(s) from the model.
Open compute transformation.
Transform into new variable called xreg.

Example using cars data set from SPSS.

Run multiple regression using cars data. Dependent variable is acceleration. Predictors or
IV’s are miles per gallon, engine displacement, horse power, and vehicle weight.

Use unstandardized weights, or coefficients. The B weights are used in the model.

After the new variable is computed use correlation to view relationship of excel to new
predicted excel.

Output for correlation is your r score from regression model. Correlation value squared
is r2. This is the method by which regression coefficients are generated and how the
overall fit of the model can be judged.

41
ANCOVA (Analysis of Covariance)
Allows a continues covariate to be analyzed with categorical predictors. This analysis is a
combination of the ANOVA test and Regression test.

Example of when to use a one-way ANCOVA:

You could use level of group (categorical) and weight to predict weight after 3 years.

The following is an example of a one-way ANCOVA performed in SPSS.

Output for the one-way ANCOVA.

Univariate Analysis of Variance
Betw e e n-Subjects Factors

V alue Label      N
lv l of   1.00     L                          6
group     2.00     M                          6
3.00     H                          6

Tes ts of Betw ee n-Subje cts Effe cts

Dependent Variable: estimate after 3 y ears
Ty pe III Sum                                                        Nonc ent.   Obs erved
a
Sourc e              of Squares          df            Mean Square      F       Sig.     Parameter     Pow er
Correc ted Model      84005.602 b                  3     28001.867   1759.209     .000    5277.628        1.000
Intercept                 23.635                   1        23.635      1.485     .243       1.485          .206
WEIGHT                23668.824                    1     23668.824   1486.987     .000    1486.987        1.000
LV L                  54407.818                    2     27203.909   1709.078     .000    3418.156        1.000
Error                    222.842                  14        15.917
Total                525402.000                   18
Correc ted Total      84228.444                   17
a. Computed using alpha = .05
b. R Squared = .997 (A djusted R Squared = .997)

42
Param e ter Es tim ate s

Dependent Variable: estimate af ter 3 years

95% Conf idence Interval      Nonc ent.     Obs erved
a
Parameter        B      Std. Error          t             Sig.      Low er Bound Upper Bound       Parameter       Pow er
Intercept       72.471      4.290          16.894           .000          63.270         81.671       16.894          1.000
WEIGHT            .970        .025         38.561           .000            .916          1.023       38.561          1.000
[LVL=1.00]    -135.041      2.310         -58.464           .000        -139.995       -130.087       58.464          1.000
[LVL=2.00]     -67.591      2.304         -29.333           .000         -72.533        -62.648       29.333          1.000
[LVL=3.00]           0b           .              .              .               .              .              .            .
a. Computed using alpha = .05
b. This parameter is set to zero bec aus e it is redundant.

Sample syntax:

UNIANOVA
weight3y BY lvl WITH weight
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/PRINT = OPOWER PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = weight lvl .

MANCOVA (Multivariate Analysis of
Covariance)
Example of when to use a MANCOVA:
Similar to ANCOVA except multiple dependent variables
The following is an example of a MANCOVA performed in SPSS.

43
Simple logistic regression
Similar to the simple regression model except the dependent variable is discrete. Your
binary dependent variable is assumed to be 0 or 1 and the regression model predicts
which group a case would be more likely to fall under.

Example of when to use a simple logistic regression:

Linear regression models are not appropriate for binary or discrete dependent variables.
When you are trying to find group membership between two groups logistic regression is
the appropriate statistic.

Variables to be coded 0 and 1 should be exhaustive and mutually exclusive. Your model
should be somewhat larger then the usual linear regression model.

Output explained:

Omnibus Tests of Model Coefficients
Model summary
Classification table.
Varaibles in the Equation

Multiple logistic regression
Similar to the simple logistic regression except with more then one predictor.

Multinomial logistic regression
Example of when to use a multinomial logistic regression:

More then one continuous predictor and more then two levels for a dependent variable.

The following is an example of a one-way Multiple logistic regression performed in
SPSS.

Select Multinomial from the regression options.
DV- dependent:
Lvls – Factors
Covariates – predictors

Sample Syntax:
NOMREG
lvl WITH age height weight

44
/CRITERIA CIN(95) DELTA(0) MXITER(100) MXSTEP(5) CHKSEP(20)
LCONVERGE(0)
PCONVERGE(0.000001) SINGULAR(0.00000001)
/MODEL
/INTERCEPT INCLUDE
/PRINT PARAMETER SUMMARY LRT .

Output for multinomial logistic regression:

Nominal Regression
Cas e Proces s ing Sum m ary

Marginal
N           Percentage
lv l of group      L                          6          33.3%
M                          6          33.3%
H                          6          33.3%
V alid                                       18         100.0%
Mis sing                                      0
Total                                        18
Subpopulation                                18 a
a. The dependent v ariable has only one value observed
in 18 (100.0%) subpopulations .

Model Fitting Information

-2 Log
Model              Likelihood   Chi-Square          df           Sig.
Intercept Only         39.550
Final                  32.722        6.828               6         .337

Pse udo R-Square
Cox and Snell            .316
Nagelkerke               .355

45
Lik e lihood Ratio Tes ts

-2 Log
Likelihood of
Reduced
Ef f ect             Model          Chi-Square          df             Sig.
Intercept               33.452            .731                2          .694
A GE                    39.282           6.561                2          .038
HEIGHT                  33.150            .428                2          .807
WEIGHT                  33.311            .589                2          .745
The chi-square statis tic is the dif f erenc e in -2 log-likelihoods
betw een the f inal model and a reduc ed model. The reduced
model is f ormed by omitting an ef fect f rom the f inal model. The
null hypothes is is that all parameters of that ef f ect are 0.

Param e ter Es tim ate s

95% Conf idence Interval f or
Ex p(B)
a
lv l of group                  B        Std. Error   Wald         df            Sig.     Ex p(B)   Low er Bound Upper Bound
L               Intercept    -20.042       25.762      .605            1          .437
AGE             .213          .119    3.213            1          .073     1.237            .980           1.560
HEIGHT          .294          .491     .358            1          .550     1.342            .512           3.515
WEIGHT         -.043          .066     .420            1          .517      .958            .842           1.090
M               Intercept     -6.516       17.924      .132            1          .716
AGE             .100          .102     .951            1          .329     1.105            .904           1.351
HEIGHT          .056          .321     .030            1          .862     1.057            .564           1.982
WEIGHT         -.001          .033     .000            1          .986      .999            .936           1.067
a. The ref erence category is: H.

46
Factor Analysis
A factor analysis is the better choice of overall analysis when looking to examine the
underling factors of a dataset. Data sets are typically comprised of observed or measured
variables. In factor analysis we can also call these variables manifest variables. The
underling or latent variables in factor analysis are called factors. The theory behind factor
analysis is that there are a very small number of factors behind a large number of
manifest variables.

One of the methods to obtain scores for a factor analysis is to perform a principle
components analysis and rotate the components. A very common method of rotation is
called varimax.
The following is an example of Factor Analysis:

Select Data reduction, factor from the analyze menu.
Place variable for reduction in the variables box.
The extraction method will be set to principle components by default, select number of
factors to extract.. If you do not select a rotation a principle components analysis will be
run only.
Select rotation, and use the varimax method.

24 varaibles from the world 95 data set.

FACTOR
/VARIABLES fertilty lg_aidsr aids_rt birth_rt urban density lifeexpf
lifeexpm literacy pop_incr babymort gdp_cap region calories aids
death_rt
log_gdp b_to_d lit_male lit_fema climate log_pop cropgrow populatn
/MISSING
LISTWISE /ANALYSIS fertilty lg_aidsr aids_rt birth_rt urban density
lifeexpf
lifeexpm literacy pop_incr babymort gdp_cap region calories aids
death_rt
log_gdp b_to_d lit_male lit_fema climate log_pop cropgrow populatn
/PRINT INITIAL EXTRACTION ROTATION
/CRITERIA MINEIGEN(1) ITERATE(25)
/EXTRACTION PC
/CRITERIA ITERATE(25)
/ROTATION VARIMAX
/METHOD=CORRELATION .

Example of when to use a Factor Analysis:

When you have multiple variable and theory behind them.

Must use syntax to analyze FA from correlation matrix

Principle Components

47
Principle Components
Principle components analysis generates linear combined scores which are orthogonal to
one another.

Example of when to use Principle Components:

Principle components can be used as an exploratory method or as a variable reduction
technique that can be very useful in regression models

The following is an example of Principle components in SPSS:

You run a Factor analysis without rotation.
SPSS Cars data set

Total Variance Explained

Total     % of Variance         Cumulative %             Total       % of Variance      Cumulative %
1                  6.166             68.514                  68.514         6.166            68.514              68.514
2                   .923             10.252                  78.766          .923            10.252              78.766
3                   .856              9.511                  88.277          .856             9.511              88.277
4                   .515              5.727                  94.004          .515             5.727              94.004
5                   .256              2.842                  96.846          .256             2.842              96.846
6                   .114              1.267                  98.113          .114             1.267              98.113
7                   .090              1.002                  99.115          .090             1.002              99.115
8                   .051                  .561               99.676          .051                 .561           99.676
9                    .029            .324                   100.000
Extraction Method: Principal Component Analysis.

Component Matrix(a)

Component
1            2                 3               4                5           6             7             8
Miles per Gallon                   -.880           .081              .318         -.041            -.266          .164          .131       -.030
Engine Displacement (cu.
inches)                            .971            .115              .032         .071             -.003          .032          .139       .022
Horsepower                         .938         -.085                .187         .012             .113           .216       -.057         .116
Vehicle Weight (lbs.)              .932            .197          -.042            .177             .150           .083          .031       -.164
Time to Accelerate from 0
to 60 mph (sec)                    -.637           .333          -.525            .442             -.062          .072       -.019         .049
Model Year (modulo 100)            -.477           .725              .472         .011             .135        -.050         -.048         .023
Country of Origin                  -.622        -.453                .405         .476             .127        -.036            .021       -.003
Number of Cylinders                .947            .121              .056         .147             -.096       -.151            .137       .069
cylrec = 1 | cylrec = 2
(FILTER)                           -.884        -.020            -.228            -.184            .320           .025          .166       .032

Extraction Method: Principal Component Analysis.

48
a 8 components extracted.

Sample Syntax:
FACTOR
/VARIABLES mpg engine horse weight accel year origin cylinder
filter_\$
/MISSING LISTWISE /ANALYSIS mpg engine horse weight accel year origin
cylinder filter_\$
/PRINT INITIAL EXTRACTION
/PLOT EIGEN ROTATION
/CRITERIA MINEIGEN(0) ITERATE(25)
/EXTRACTION PC
/ROTATION NOROTATE
/METHOD=CORRELATION .

Output explained:

Interpretation of components is very difficult, even when you have some theory as the
basis of analysis. It is better to do a factor analysis, which rotates the components and
makes them more interpretable. The components are still very useful as variables for a
regression model.

49
Non-parametric statistics and where to find them in SPSS.

Here are some of the basic nonparametric tests and how they can be compared to
parametric statistics. All of these test can be found under the nonparametric section in
SPSS.

   Chi Square
o Goodness of fit
o Observed vs. expected group membership
o Replace: one sample t
o SPSS – Chi square

   Mann-Whitney U
o U statistic
o 2 groups medians
o Replace: t 2 groups
o SPSS - Two independent-Samples test

   Wilcoxon sign rank test
o binomial probability
o 2 Repeated measurements
o Replace: repeated t
o SPSS – Two-Related-Samples tests

   Kruskal – Wallis Test
o 2 or more group rank sums
o Replace: 1 factor independent groups ANOVA
o SPSS – K independent samples (Test for several independent samples)

   Friedman test
o 2 or more repeated
o Replace: Repeated ANOVA, more than just pre post test, 3 or more
o SPSS – K related samples (test for several related samples)

   PHI or spearman correlation
o Correlation ranks
o Replace: Correlation
o SPSS – Bivariate Correlation select spearman.

50

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 12 posted: 11/25/2011 language: English pages: 50