# Principal Component Analysis by benbenzhou

VIEWS: 3 PAGES: 128

• pg 1
```									   SW388R7
Data Analysis &
Computers II     Principal Component Analysis: Additional Topics
Slide 1

Split Sample Validation

Detecting Outliers

Reliability of Summated Scales

Sample Problems
SW388R7
Data Analysis &
Computers II                    Split Sample Validation
Slide 2

   To test the generalizability of findings from a
principal component analysis, we could conduct a
second research study to see if our findings are
verified.
   A less costly alternative is to split the sample
randomly into two halves, do the principal
component analysis on each half and compare the
results.
same on the analysis on each half and the full data
set, we have evidence that the findings are
generalizable and valid because, in effect, the two
analyses represent a study and a replication.
SW388R7
Data Analysis &
Computers II             Misleading Results to Watch Out For
Slide 3

   When we examine the communalities and factor
exact results: the communalities should all be
greater than 0.50 and the pattern of the factor
   Sometimes the variables will switch their
component now load on the second and vice versa),
but this does not invalidate our findings.
reverse themselves (the plus's become minus's and
the minus's become plus's), but this does not
invalidate our findings because we interpret the size,
SW388R7
Data Analysis &
Computers II                          When validation fails
Slide 4

   If the validation fails, we are warned that the
solution found in the analysis of the full data set is
not generalizable and should not be reported as valid
findings.

   We do have some options when validation fails:
   If the problem is limited to one or two variables, we can remove
those variables and redo the analysis.
   Randomly selected samples are not always representative. We
might try some different random number seeds and see if our
negative finding was a fluke. If we choose this option, we should
do a large number of validations to establish a clear pattern, at
least 5 to 10. Getting one or two validations to negate the failed
validation and support our findings is not sufficient.
SW388R7
Data Analysis &
Computers II                              Outliers
Slide 5

   SPSS calculates factor scores as standard scores.
   SPSS suggests that one way to identify outliers is to
compute the factors scores and identify those have a
value greater than ±3.0 as outliers.
   If we find outliers in our analysis, we redo the
analysis, omitting the cases that were outliers.
   If there is no change in communality or factor
structure in the solution, it implies that there
outliers do not have an impact. If our factor solution
changes, we will have to study the outlier cases to
determine whether or not we should exclude them.
   After testing outliers, restore full data set before
any further calculations
SW388R7
Data Analysis &
Computers II                Reliability of Summated Scales
Slide 6

   One of the common uses of factor analysis is the
formation of summated scales, where we add the
create the score for the component.

   To verify that the variables for a component are
measuring similar entities that are legitimate to add
together, we compute Chronbach's alpha.

   If Chronbach's alpha is 0.70 or greater (0.60 or
greater for exploratory research), we have support
on the interval consistency of the items justifying
their use in a summated scale.
SW388R7
Data Analysis &
Computers II                                            Problem 1
Slide 7
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problematic pattern of missing data. Use a level of
significance of 0.05. Validate the results of your principal component analysis by splitting
the sample in two, using 519447 as the random number seed.
Based on the results of a principal component analysis of the 8 variables "highest academic
"happiness of marriage" [hapmar], "condition of health" [health], and "attitude toward life"
[life], the information in these variables can be represented with 2 components and 3
individual variables. Cases that might be considered to be outliers do not have an impact on
the factor solution. The internal consistency of the variables included in the components is
sufficient to support the creation of a summated scale.
Component 1 includes the variables "highest academic degree" [degree], "father's highest
includes the variables "general happiness" [happy] and "happiness of marriage" [hapmar]. The
variables "attitude toward life" [life], "condition of health" [health], and "spouse's highest
academic degree" [spdeg] were not included on the components and are retained as individual
variables.

1.   True
The bold text indicates that
2.   True with caution                                    parts to the problem that
3.   False                                                have been added this week.

4.   Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II     Computing a principal component analysis
Slide 8

To compute a principal
component analysis in SPSS,
select the Data Reduction |
Factor… command from the
SW388R7
Data Analysis &
Computers II     Add the variables to the analysis
Slide 9

First, move the
variables listed in
the problem to the
Variables list box.

Second, click on the
Descriptives… button to
specify statistics to
include in the output.
SW388R7
Data Analysis &
Computers II                Compete the descriptives dialog box
Slide 10

First, mark the Univariate
descriptives checkbox to get a
tally of valid cases.

Sixth, click
on the
Continue
Second, keep the Initial                                          button.
solution checkbox to get
the statistics needed to
determine the number
of factors to extract.                                   Fifth, mark the Anti-image
checkbox to get more
outputs used to assess the
appropriateness of factor
analysis for the variables.

Third, mark the
Coefficients checkbox to get
a correlation matrix, one of
Fourth, mark the KMO and Bartlett’s test
the outputs needed to
of sphericity checkbox to get more outputs
assess the appropriateness
used to assess the appropriateness of
of factor analysis for the
factor analysis for the variables.
variables.
SW388R7
Data Analysis &
Computers II                 Select the extraction method
Slide 11

First, click on the          The extraction method refers
Extraction… button to        to the mathematical method
specify statistics to        that SPSS uses to compute the
include in the output.       factors or components.
SW388R7
Data Analysis &
Computers II     Compete the extraction dialog box
Slide 12

First, retain the default
method Principal components.

Second, click
on the
Continue
button.
SW388R7
Data Analysis &
Computers II           Select the rotation method
Slide 13

The rotation method refers to
First, click on the      the mathematical method that
Rotation… button to      SPSS rotate the axes in
specify statistics to    geometric space. This makes
include in the output.   it easier to determine which
components.
SW388R7
Data Analysis &
Computers II           Compete the rotation dialog box
Slide 14

First, mark the                   Second, click
Varimax method                    on the
as the type of                    Continue
rotation to used                  button.
in the analysis.
SW388R7
Data Analysis &
Computers II     Complete the request for the analysis
Slide 15

First, click on the
OK button to
request the output.
SW388R7
Data Analysis &
Computers II             Level of measurement requirement
Slide 16

"spouse's highest academic degree" [spdeg], "general happiness"
[happy], "happiness of marriage" [hapmar], "condition of health"
[health], and "attitude toward life" [life] are ordinal level
variables. If we follow the convention of treating ordinal level
variables as metric variables, the level of measurement
requirement for principal component analysis is satisfied. Since
some data analysts do not agree with this convention, a note of
caution should be included in our interpretation.
SW388R7
Data Analysis &   Sample size requirement:
minimum number of cases
Computers II

Slide 17

Descriptiv e Statistics

Mean    Std. Deviation   Analysis N
RS HIGHEST DEGREE             1.68         1.085           68
FATHERS HIGHEST
.96          .984           68
DEGREE
MOTHERS HIGHEST
.85          .797           68
DEGREE
SPOUSES HIGHEST
1.97         1.233           68
DEGREE
The number of valid cases for this
GENERAL HAPPINESS             1.65          .617           68
set of variables is 68.
HAPPINESS OF
1.47          .532           68
MARRIAGE
While principal component analysis
CONDITION OF HEALTH           1.76
can be conducted on a sample that .848               68
has fewer OR
IS LIFE EXCITINGthan 100 cases, but more
1.53          .532           68
DULL
than 50 cases, we should be
SW388R7
Data Analysis &   Sample size requirement:
ratio of cases to variables
Computers II

Slide 18

Descriptiv e Statistics

Mean     Std. Deviation   Analysis N
RS HIGHEST DEGREE            1.68          1.085           68
FATHERS HIGHEST
.96           .984           68
DEGREE
MOTHERS HIGHEST
.85           .797           68
DEGREE
SPOUSES HIGHEST
1.97          1.233           68
DEGREE
The ratio of cases to
GENERAL HAPPINESS            1.65           .617           68
variables in a principal
HAPPINESS OF
component analysis should .532
1.47                          68
MARRIAGE
be at least 5 to 1.
CONDITION OF HEALTH          1.76           .848           68
IS LIFE EXCITING OR and 8 variables,
With 68        1.53           .532           68
DULL
the ratio of cases to
variables is 8.5 to 1, which
exceeds the requirement
for the ratio of cases to
variables.
SW388R7
Data Analysis &                Appropriateness of factor analysis:
Presence of substantial correlations
Computers II

Slide 19

Principal components analysis requires that there be
some correlations greater than 0.30 between the
variables included in the analysis.

For this set of variables, there are 7 correlations in
the matrix greater than 0.30, satisfying this
requirement. The correlations greater than 0.30 are
highlighted in yellow.
Correlation Matrix

FATHERS     MOTHERS        SPOUSES                     HAPPINESS                    IS
RS HIGHEST      HIGHEST     HIGHEST        HIGHEST        GENERAL         OF         CONDITION     EX
DEGREE         DEGREE      DEGREE          DEGREE       HAPPINESS     MARRIAGE      OF HEALTH     OR
Correlation   RS HIGHEST DEGREE           1.000         .490         .410           .595           -.017         -.172         -.246
FATHERS HIGHEST
.490        1.000           .677           .319         -.100         -.131         -.174
DEGREE
MOTHERS HIGHEST
.410         .677         1.000            .208          .105         -.046         -.008
DEGREE
SPOUSES HIGHEST
.595         .319           .208          1.000         -.053         -.138         -.392
DEGREE
GENERAL HAPPINESS           -.017         -.100          .105          -.053         1.000          .514          .267
HAPPINESS OF
-.172         -.131         -.046          -.138          .514         1.000          .282
MARRIAGE
CONDITION OF HEALTH         -.246         -.174         -.008          -.392          .267          .282         1.000
IS LIFE EXCITING OR
-.138         -.012          .151          -.090          .214          .161          .214
DULL
SW388R7
Data Analysis &             Appropriateness of factor analysis:
Computers II

Slide 20

Anti-image Matrices

FATHERS         MOTHERS          SPOUSES                          HAPPINESS                            IS LIFE
RS HIGHEST    HIGHEST         HIGHEST          HIGHEST           GENERAL           OF             CONDITION         EXCITING
DEGREE       DEGREE           DEGREE           DEGREE          HAPPINESS       MARRIAGE          OF HEALTH         OR DULL
Anti-image Covariance RS HIGHEST DEGREE          .511       -.101            -.079            -.274             -.058           .067              -.008             .108
FATHERS HIGHEST
-.101         .455            -.290            -.024              .103           -.028              .050            .028
There are two anti-image
DEGREE

matrices: the anti-image
MOTHERS HIGHEST
DEGREE
-.079        -.290             .476             .028             -.102           .043              -.052            -.121
covariance matrix and the
SPOUSES HIGHEST
Principal component analysis requires
anti-image correlation -.274
DEGREE
-.024             .028       that .578 Kaiser-Meyer-Olkin Measure of
the      -.014     -.012        .203                                -.039
GENERAL We are
matrix.HAPPINESS interested in
-.058         .103            -.102       Sampling Adequacy be greater than 0.50
-.014       .666     -.325       -.085                                -.085
the anti-image correlation
HAPPINESS OF                                                      for each individual variable as well as the
matrix.
MARRIAGE
.067        -.028             .043       set of variables.
-.012      -.325      .692       -.099                                -.024
CONDITION OF HEALTH     -.008         .050            -.052             .203             -.085           -.099              .749            -.102
IS LIFE EXCITING OR                                               On iteration 1, the MSA for all of the
DULL
.108         .028            -.121
individual variables included in the-.102
-.039      -.085       -.024                                          .876
Anti-image Correlation RS HIGHEST DEGREE        .701 a      -.210            -.161       analysis was greater than 0.5, supporting
-.503      -.099        .113     -.012                                .162
FATHERS HIGHEST
-.210         .640
a
-.623
their retention .187 the analysis.
in
-.048                  -.049      .086                                .044
DEGREE
MOTHERS HIGHEST                                               a
-.161        -.623             .586             .053             -.181           .076              -.087            -.188
DEGREE
SPOUSES HIGHEST                                                                a
-.503        -.048             .053             .656             -.023           -.018              .309            -.055
DEGREE
GENERAL HAPPINESS       -.099         .187            -.181            -.023              .549 a         -.478             -.120            -.111
HAPPINESS OF                                                                                                     a
.113        -.049             .076            -.018             -.478           .619              -.137            -.030
MARRIAGE
CONDITION OF HEALTH                                                                                                                a
-.012         .086            -.087             .309             -.120           -.137              .734            -.126
IS LIFE EXCITING OR                                                                                                                                 a
.162         .044            -.188            -.055             -.111           -.030             -.126            .638
DULL
SW388R7
Data Analysis &     Appropriateness of factor analysis:
Sampling adequacy for set of variables
Computers II

Slide 21

KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling

Bartlett's Test of      Approx. Chi-Square   137.823
Sphericity              df                        28
Sig.                    .000
MSA for the set of variables
included in the analysis
was 0.640, which exceeds
the minimum requirement
of 0.50 for overall MSA.
SW388R7
Data Analysis &   Appropriateness of factor analysis:
Bartlett test of sphericity
Computers II

Slide 22

KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling

Bartlett's Test of      Approx. Chi-Square   137.823
Sphericity              df                        28
Sig.                    .000

Principal component analysis requires
that the probability associated with
Bartlett's Test of Sphericity be less
than the level of significance.

The probability associated with the
Bartlett test is <0.001, which satisfies
this requirement.
SW388R7
Data Analysis &                     Number of factors to extract:
Latent root criterion
Computers II

Slide 23

Total Variance Explained

Component      Total     % of Variance Cumulative %    Total     % of Variance Cumulative %
1               2.600           32.502       32.502     2.600          32.502        32.502
2               1.772           22.149       54.651     1.772          22.149        54.651
3               1.079           13.486       68.137     1.079          13.486        68.137
4                .827           10.332       78.469
5                .631             7.888      86.358
6                .487             6.087      92.445
7                .333             4.161      96.606
8                .272             3.394     100.000
Extraction Method: Principal Component Analysis.
Using the output from iteration 1,
there were 3 eigenvalues greater
than 1.0.

The latent root criterion for
number of factors to derive would
indicate that there were 3
components to be extracted for
these variables.
SW388R7
Data Analysis &                 Number of factors to extract:
Percentage of variance criterion
Computers II

Slide 24

Total Variance Explained

Initi al Ei genval ues                 Extracti on Sums of Squared
Component         T otal     % of Vari ance     Cumul ati ve %     T otal     % of Vari ance Cu
1                   2.600            32.502           32.502         2.600          32.502
2                   1.772            22.149           54.651         1.772          22.149
3                   1.079            13.486           68.137         1.079          13.486
4                    .827            10.332           78.469
5                    .631              7.888          86.358
6                    .487              6.087          92.445
7                    .333              4.161          96.606
8                    .272              3.394        100.000
Extracti on M ethod: Princi pal Com ponent Anal ysi s.
proportion of variance criteria can
be met with 3 components to
satisfy the criterion of explaining
60% or more of the total variance.

A 3 components solution would
explain 68.137% of the total
Since the SPSS default is to extract               variance.
the number of components indicated
by the latent root criterion, our
initial factor solution was based on
the extraction of 3 components.
SW388R7
Data Analysis &
Computers II                   Evaluating communalities
Slide 25

Communalities

Initial   Extraction
RS HIGHEST DEGREE            1.000        .717
Communalities represent the        FATHERS HIGHEST
1.000        .768
proportion of the variance in      DEGREE
the original variables that is     MOTHERS HIGHEST
1.000        .815
accounted for by the factor        DEGREE
solution.                          SPOUSES HIGHEST
1.000        .715
DEGREE
The factor solution should         GENERAL HAPPINESS            1.000        .763
explain at least half of each      HAPPINESS OF
1.000        .711
original variable's variance, so   MARRIAGE
the communality value for          CONDITION OF HEALTH          1.000        .548
each variable should be 0.50       IS LIFE EXCITING OR
or higher.                         DULL
1.000        .415

Extraction Method: Principal Component Analysis.
SW388R7
Data Analysis &
Computers II     Communality requiring variable removal
Slide 26

Communalities

Initial   Extraction    On iteration 1, the
RS HIGHEST DEGREE            1.000        .717     communality for the
variable "attitude toward
FATHERS HIGHEST
1.000        .768     life" [life] was 0.415.
DEGREE                                             Since this is less than
MOTHERS HIGHEST                                    0.50, the variable should
1.000        .815
DEGREE                                             be removed from the next
SPOUSES HIGHEST                                    iteration of the principal
DEGREE
1.000        .715     component analysis.
GENERAL HAPPINESS            1.000        .763     The variable was removed
HAPPINESS OF                                       and the principal
1.000        .711     component analysis was
MARRIAGE
CONDITION OF HEALTH          1.000        .548     computed again.
IS LIFE EXCITING OR          1.000        .415
DULL
Extraction Method: Principal Component Analysis.
SW388R7
Data Analysis &
Computers II     Repeating the factor analysis
Slide 27

select Factor Analysis to
reopen the factor analysis
dialog box.
SW388R7
Data Analysis &
Computers II     Removing the variable from the list of variables
Slide 28

First, highlight
the life variable.

Second, click on the left
arrow button to remove
the variable from the
Variables list box.
SW388R7
Data Analysis &
Computers II     Replicating the factor analysis
Slide 29

The dialog recall command opens
the dialog box with all of the
settings that we had selected the
last time we used factor analysis.

To replicate the analysis without
the variable that we just removed,
click on the OK button.
SW388R7
Data Analysis &
Computers II     Communality requiring variable removal
Slide 30

Communalities

Initial   Extraction    On iteration 2, the
RS HIGHEST DEGREE            1.000        .642     communality for the
variable "condition of
FATHERS HIGHEST
1.000        .623     health" [health] was
DEGREE                                             0.477. Since this is less
MOTHERS HIGHEST                                    than 0.50, the variable
1.000        .592
DEGREE                                             should be removed from
SPOUSES HIGHEST                                    the next iteration of the
DEGREE
1.000        .516     principal component
analysis.
GENERAL HAPPINESS            1.000        .638
HAPPINESS OF                                       The variable was removed
1.000        .594     and the principal
MARRIAGE
CONDITION OF HEALTH          1.000        .477     component analysis was
computed again.
Extraction Method: Principal Component Analysis.
SW388R7
Data Analysis &
Computers II     Repeating the factor analysis
Slide 31

select Factor Analysis to
reopen the factor analysis
dialog box.
SW388R7
Data Analysis &
Computers II     Removing the variable from the list of variables
Slide 32

First, highlight
the health
variable.

Second, click on the left
arrow button to remove
the variable from the
Variables list box.
SW388R7
Data Analysis &
Computers II     Replicating the factor analysis
Slide 33

The dialog recall command opens
the dialog box with all of the
settings that we had selected the
last time we used factor analysis.

To replicate the analysis without
the variable that we just removed,
click on the OK button.
SW388R7
Data Analysis &
Computers II     Communality requiring variable removal
Slide 34

On iteration 3, the
communality for the
variable "spouse's highest
Communalities                       was 0.491. Since this is
less than 0.50, the
Initial   Extraction
variable should be
RS HIGHEST DEGREE          1.000        .674      removed from the next
FATHERS HIGHEST                                   iteration of the principal
1.000         .640      component analysis.
DEGREE
MOTHERS HIGHEST
DEGREE
1.000         .577      The variable was removed
and the principal
SPOUSES HIGHEST
1.000         .491      component analysis was
DEGREE                                            computed again.
GENERAL HAPPINESS         1.000         .719
HAPPINESS OF
1.000         .741
MARRIAGE
Extraction Method: Principal Component Analysis.
SW388R7
Data Analysis &
Computers II     Repeating the factor analysis
Slide 35

select Factor Analysis to
reopen the factor analysis
dialog box.
SW388R7
Data Analysis &
Computers II     Removing the variable from the list of variables
Slide 36

First, highlight the
spdeg variable.

Second, click on the left
arrow button to remove
the variable from the
Variables list box.
SW388R7
Data Analysis &
Computers II     Replicating the factor analysis
Slide 37

The dialog recall command opens
the dialog box with all of the
settings that we had selected the
last time we used factor analysis.

To replicate the analysis without
the variable that we just removed,
click on the OK button.
SW388R7
Data Analysis &
Computers II     Communality satisfactory for all variables
Slide 38

Communalities
Once any variables with
Initial   Extraction      communalities less than
RS HIGHEST DEGREE           1.000        .577       0.50 have been removed
from the analysis, the
1.000         .720
DEGREE                                              should be examined to
MOTHERS HIGHEST                                     identify variables that
1.000         .684       have complex structure.
DEGREE
GENERAL HAPPINESS          1.000         .745
HAPPINESS OF
1.000         .782
MARRIAGE
Extraction Method: Principal Component Analysis.

Complex structure occurs when
correlations (0.40 or greater) on
more than one component. If a
variable has complex structure, it
should be removed from the
analysis.

Variables are only checked for
complex structure if there is more
than one component in the
only one component are described
as having simple structure.
SW388R7
Data Analysis &
Computers II           Identifying complex structure
Slide 39

a
Rotated Component Matrix

Component
1         2
On iteration 4, none of the
RS HIGHEST DEGREE            .732      -.202
variables demonstrated
complex structure. It is not
FATHERS HIGHEST
DEGREE
.848        .031         necessary to remove any
MOTHERS HIGHEST
DEGREE
.810        .169         of complex structure.
GENERAL HAPPINESS            .145        .851
HAPPINESS OF
-.145        .872
MARRIAGE
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.
SW388R7
Data Analysis &
Slide 40

On iteration 4, the 2
components in the
each of them.

a                     No variables need to be
Rotated Component Matrix
removed because they
Component                are the only variable
RS HIGHEST DEGREE         .732        -.202
FATHERS HIGHEST
.031
DEGREE                    .848
MOTHERS HIGHEST
.169
DEGREE                    .810
GENERAL HAPPINESS           .145      .851
HAPPINESS OF
-.145
MARRIAGE                                  .872
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.
SW388R7
Data Analysis &
Computers II                 Final check of communalities
Slide 41

Once we have resolved any
problems with complex
structure, we check the
communalities one last time to
make certain that we are
explaining a sufficient portion
of the variance of all of the
original variables.

Communalities

Initial   Extraction
RS HIGHEST DEGREE           1.000        .577
FATHERS HIGHEST
1.000         .720
DEGREE
MOTHERS HIGHEST                                      The communalities for all of the
1.000         .684        variables included on the
DEGREE
GENERAL HAPPINESS          1.000         .745        components were greater than
HAPPINESS OF                                         0.50 and all variables had
MARRIAGE
1.000         .782        simple structure.
Extraction Method: Principal Component Analysis.
The principal component
analysis has been completed.
SW388R7
Data Analysis &
Computers II             Interpreting the principal components
Slide 42

The information in 5 of the
variables can be represented
by 2 components.
Component 1 includes the
variables

[degree],
a
Rotated Component Matrix                            •"father's highest academic
Component                       •"mother's highest
1        2
RS HIGHEST DEGREE         .732        -.202
FATHERS HIGHEST
.031
DEGREE                    .848
MOTHERS HIGHEST
.169
DEGREE                    .810                        Component 2 includes the
GENERAL HAPPINESS           .145      .851            variables
HAPPINESS OF
MARRIAGE
-.145                          •"general happiness"
.872
[happy] and
Extraction Method: Principal Component Analysis.           •"happiness of marriage"
Rotation Method: Varimax with Kaiser Normalization.        [hapmar].
a. Rotation converged in 3 iterations.
SW388R7
Data Analysis &
Computers II                       Total variance explained
Slide 43

Total Variance Explained

Component      Total      % of Variance Cumulative %    Total     % of Variance Cumulative %   Total     % of V
1               1.953            39.061       39.061     1.953          39.061        39.061     1.953
2               1.555            31.109       70.169     1.555          31.109        70.169     1.556
3                .649            12.989       83.158
4                .441              8.820      91.977
5                .401              8.023     100.000
Extraction Method: Principal Component Analysis.
The 2 components explain
70.169% of the total
variance in the variables
which are included on the
components.
SW388R7
Data Analysis &
Computers II               Split-sample validation
Slide 44

We validate our analysis by conducting an analysis on each half of the
sample. We compare the results of these two split sample analyses
with the analysis of the full data set.

To split the sample into two half, we generate a random variable that
indicates which half of the sample each case should be placed in.

To compute a random selection of cases, we need to specify the
starting value, or random number seed. Otherwise, the random
sequence of numbers that you generate will not match mine, and we
will get different results.

Before we do the do the random selection, you must make certain
that your data set is sorted in the original sort order, or the cases in
your two half samples will not match mine. To make certain your
data set is in the same order as mine, sort your data set in ascending
order by case id.
SW388R7
Data Analysis &
Computers II     Sorting the data set in original order
Slide 45

To make certain the data set is
sorted in the original order,
highlight the case id column,
right click on the column header,
and select the Sort Ascending
SW388R7
Data Analysis &
Computers II     Setting the random number seed
Slide 46

To set the random number
seed, select the Random
Number Seed… command
SW388R7
Data Analysis &
Computers II                   Set the random number seed
Slide 47

First, click on the
Set seed to option
button to activate
the text box.

Second, type in the
random seed stated in
the problem.

Third, click on the OK
button to complete the
dialog box.

Note that SPSS does not
provide you with any
SW388R7
Data Analysis &
Computers II     Select the compute command
Slide 48

To enter the formula for the
variable that will split the
sample in two parts, click
on the Compute…
command.
SW388R7
Data Analysis &
Computers II     The formula for the split variable
Slide 49

First, type the name for the
new variable, split, into the
Target Variable text box.

Second, the formula for the
value of split is shown in the
text box.

The uniform(1) function
generates a random decimal
number between 0 and 1.
The random number is
compared to the value 0.50.

If the random number is less
than or equal to 0.50, the
value of the formula will be 1,
the SPSS numeric equivalent
to true. If the random
number is larger than 0.50,
the formula will return a 0,
the SPSS numeric equivalent
Third, click on the                     to false.
OK button to
complete the dialog
box.
SW388R7
Data Analysis &
Computers II     The split variable in the data editor
Slide 50

In the data editor, the
split variable shows a
random pattern of zero’s
and one’s.

To select half of the
sample for each validation
analysis, we will first
select the cases where
split = 0, then select the
cases where split = 1.
SW388R7
Data Analysis &   Repeating the analysis with the first
validation sample
Computers II

Slide 51

To repeat the principal
component analysis for the
first validation sample, select
Factor Analysis from the
Dialog Recall tool button.
SW388R7
Data Analysis &
Computers II            Using "split" as the selection variable
Slide 52

First, scroll
down the list of
variables and
highlight the
variable split.

Second, click on the
right arrow button to
move the split variable
to the Selection
Variable text box.
SW388R7
Data Analysis &
Computers II      Setting the value of split to select cases
Slide 53

When the variable named
split is moved to the
Selection Variable text
the name to prompt up to
enter a specific value for
split.                               Click on the
Value… button
to enter a
value for split.
SW388R7
Data Analysis &
Computers II           Completing the value selection
Slide 54

First, type the value         Second, click on the
for the first half of the     Continue button to
sample, 0, into the           complete the value
Value for Selection           entry.
Variable text box.
SW388R7
Data Analysis &   Requesting output for the first validation
sample
Computers II

Slide 55

Click on the OK
button to
request the
output.

When the value entry
dialog box is closed, SPSS
after the equal sign. This
specification now tells      Since the validation analysis
SPSS to include in the       requires us to compare the
analysis only those cases    results of the analysis using
that have a value of 0 for   the two split sample, we will
the split variable.          request the output for the
second sample before doing
any comparison.
SW388R7
Data Analysis &   Repeating the analysis with the second
validation sample
Computers II

Slide 56

To repeat the principal
component analysis for the
second validation sample,
select Factor Analysis from the
Dialog Recall tool button.
SW388R7
Data Analysis &
Computers II     Setting the value of split to select cases
Slide 57

Since the split variable is already in the
Selection Variable text box, we only need
to change its value.

Click on the Value… button to enter a
different value for split.
SW388R7
Data Analysis &
Computers II          Completing the value selection
Slide 58

First, type the value        Second, click on the
for the second half of       Continue button to
the sample, 1, into the      complete the value
Value for Selection          entry.
Variable text box.
SW388R7
Data Analysis &   Requesting output for the second validation
sample
Computers II

Slide 59

Click on the OK
button to
request the
output.

When the value entry
dialog box is closed, SPSS
after the equal sign. This
specification now tells
SPSS to include in the
analysis only those cases
that have a value of 1 for
the split variable.
SW388R7
Data Analysis &
Computers II                           Comparing communalities
Slide 60

All of the communalities                            All of the communalities
for the first split sample                          for the second split sample
satisfy the minimum                                 satisfy the minimum
requirement of being                                requirement of being
larger than 0.50.                                   larger than 0.50.

a                                                  a
Communalities                                      Communalities

Initial   Extraction                               Initial   Extraction
RS HIGHEST DEGREE           1.000        .580      RS HIGHEST DEGREE           1.000        .618
FATHERS HIGHEST                                    FATHERS HIGHEST
1.000         .647                                 1.000         .802
DEGREE                                             DEGREE
MOTHERS HIGHEST                                    MOTHERS HIGHEST
1.000         .693                                 1.000         .675
DEGREE                                             DEGREE
GENERAL HAPPINESS          1.000         .667      GENERAL HAPPINESS          1.000         .807
HAPPINESS OF                                       HAPPINESS OF
1.000         .754                                 1.000         .830
MARRIAGE                                           MARRIAGE
Extraction Method: Principal Component Analysis.   Extraction Method: Principal Component Analysis.
a. Only cases for which SPLIT = 0 are used         a. Only cases for which SPLIT = 1 are used
in the analysis phase.                             in the analysis phase.

Note how SPSS identifies for
us which cases we selected
for the analysis.
SW388R7
Data Analysis &
Slide 61

variables RS HIGHEST DEGREE; FATHERS HIGHEST DEGREE; and
second component.

a,b                                                   a,b
Rotated Component Matrix                              Rotated Component Matrix

Component                                             Component
1         2                                           1         2
RS HIGHEST DEGREE            .730      -.215          RS HIGHEST DEGREE            .755      -.219
FATHERS HIGHEST                                       FATHERS HIGHEST
.789         .154                                     .895        -.043
DEGREE                                                DEGREE
MOTHERS HIGHEST                                       MOTHERS HIGHEST
.794         .251                                     .819         .064
DEGREE                                                DEGREE
GENERAL HAPPINESS            .248         .778        GENERAL HAPPINESS            .049         .897
HAPPINESS OF                                          HAPPINESS OF
-.102         .862                                    -.183         .893
MARRIAGE                                              MARRIAGE
Extraction Method: Principal Component Analysis.      Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.   Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.               a. Rotation converged in 3 iterations.
b. Only cases for which SPLIT = 0 are used in         b. Only cases for which SPLIT = 1 are used in
the analysis phase.                                   the analysis phase.
SW388R7
Data Analysis &
Computers II                   Interpreting the validation results
Slide 62

All of the communalities in both validation samples met the criteria.

The pattern of loadings for both validation samples is the same, and
the same as the pattern for the analysis using the full sample.

In effect, we have done the same analysis on two separate sub-
samples of cases and obtained the same results.

This validation analysis supports a finding that the results of this
principal component                                 the population
Rotated Component Matrix analysis are generalizable toRotated Component Matrix
a,b                                                   a,b
represented by this data set.
Component                                               Component
1         2                                             1         2
RS HIGHEST DEGREE            .730      -.215            RS HIGHEST DEGREE            .755      -.219
FATHERS HIGHEST                                         FATHERS HIGHEST
.789         .154                                  .895    -.043
DEGREE                                                  DEGREE
MOTHERS HIGHEST                                         MOTHERS HIGHEST
.794         .251                                  .819     .064
DEGREE                                                  DEGREE
GENERAL HAPPINESS            .248         .778             When we are finished with .897
GENERAL HAPPINESS       .049
HAPPINESS OF
-.102         .862             this analysis, we should select
HAPPINESS OF
-.183     .893
MARRIAGE                                                   all cases back into the data
MARRIAGE
Extraction Method: Principal Component Analysis.            set and remove the variables
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.         we Method: Varimax with Kaiser Normalization.
Rotation created.
a. Rotation converged in 3 iterations.                 a. Rotation converged in 3 iterations.
b. Only cases for which SPLIT = 0 are used in           b. Only cases for which SPLIT = 1 are used in
the analysis phase.                                     the analysis phase.
SW388R7
Data Analysis &
Computers II     Detecting outliers
Slide 63

To detect outliers, we
compute the factor
scores in SPSS.

Select the Factor
Analysis command
from the Dialog
Recall tool button
SW388R7
Data Analysis &
Computers II     Access the Scores Dialog Box
Slide 64

Click on the Scores…
button to access the
factor scores dialog
box.
SW388R7
Data Analysis &
Computers II             Specifications for factor scores
Slide 65

First, click on the
Save as variables
checkbox to create
factor variables.

Second, accept the
Third, click on the
default method using
Continue button
a Regression equation
to complete the
to calculate the
specifications.
scores.
SW388R7
Data Analysis &
Computers II     Compute the factor scores
Slide 66

Click on the Continue
button to compute
the factor scores.
SW388R7
Data Analysis &
Computers II     The factor scores in the data editor
Slide 67

SPSS creates the factor score
variables in the data editor window.
It names the first factor score
“fac1_1,” and the second factor
score “fac2_1.”

We need to check to see if we have
any values for either factor score
that are larger than ±3.0. One way
to check for the presence of large
values indicating outliers is to sort
the factor variables and see if any
fall outside the acceptable range.
SW388R7
Data Analysis &
Computers II     Sort the data to locate outliers for factor one
Slide 68

First, select the
fac1_1 column by

Second, right click
on the column
the Sort Ascending
command from the
SW388R7
Data Analysis &
Computers II     Negative outliers for factor one
Slide 69

Scroll down past the
cases for whom factor
scores could not be
computed. We see that
none of the scores for
factor one are less than
or equal to -3.0.
SW388R7
Data Analysis &
Computers II     Positive outliers for factor one
Slide 70

Scrolling down to the
bottom of the sorted
data set, we see that
none of the scores for
factor one are greater
than or equal to +3.0.

There are no outliers on
factor one.
SW388R7
Data Analysis &
Computers II     Sort the data to locate outliers on factor two
Slide 71

First, select the
fac2_1 column by

Second, right click
on the column
the Sort Ascending
command from the
SW388R7
Data Analysis &
Computers II     Negative outliers for factor two
Slide 72

Scrolling down past the
cases for whom factor
scores could not be
computed, we see that
none of the scores for
factor two are less than
or equal to -3.0.
SW388R7
Data Analysis &
Computers II       Positive outliers for factor two
Slide 73

Scrolling down to the bottom
of the sorted data set, we
see that one of the scores for
factor two is greater than or
equal to +3.0.

We will run the analysis
excluding this outlier and see
if it changes our
interpretation of the analysis.
SW388R7
Data Analysis &
Computers II     Removing the outliers
Slide 74

To see whether or not
outliers are having an
impact on the factor
solution, we will compute
the factor analysis
without the outliers and
compare the results.

To remove the outliers, we will
include the cases that are not
outliers.

Choose the Select Cases…
SW388R7
Data Analysis &
Computers II     Setting the If condition
Slide 75

Click on the If…
button to enter
the formula for
selecting cases
in or out of the
analysis.
SW388R7
Data Analysis &
Computers II     Formula to select cases that are not outliers
Slide 76

First, type the
formula as shown.
The formula says:
include cases if the
absolute value of the
first and second factor
scores are less than
3.0.

Second, click on the
Continue button to
complete the
specification.
SW388R7
Data Analysis &
Computers II     Complete the select cases command
Slide 77

Having entered the
formula for including
cases, click on the OK
button to complete the
selection.
SW388R7
Data Analysis &
Computers II     The outlier selected out of the analysis
Slide 78

When SPSS selects a case out of the
data analysis, it draws a slash
through the case number. The case
that we identified as an outlier will be
excluded.
SW388R7
Data Analysis &
Computers II     Repeating the factor analysis
Slide 79

To repeat the factor analysis
without the outliers, select the
Factor Analysis command from
the Dialog Recall tool button
SW388R7
Data Analysis &   Stopping SPSS from computing factor scores
again
Computers II

Slide 80

On the last factor analysis,
we included the specification
to compute factor scores.
Since we do not need to do
this again, we will remove
the specification.

Click on the Scores…
button to access the
factor scores dialog.
SW388R7
Data Analysis &
Computers II     Clearing the command to save factor scores
Slide 81

First, clear the Save
as variables checkbox.
This will deactivate        Second, click on the
the Method options.         Continue button to
complete the specification
SW388R7
Data Analysis &
Computers II     Computing the factor analysis
Slide 82

To produce the
output for the factor
analysis excluding
outliers, click on the
OK button.
SW388R7
Data Analysis &
Computers II                      Comparing communalities
Slide 83

All of the communalities                               All of the communalities
for the factor analysis                                for the factor analysis
including all cases satisfy                            excluding outliers satisfy
the minimum requirement                                the minimum requirement
of being larger than 0.50.                             of being larger than 0.50.

Communalities                                      Communalities

Initial   Extraction                               Initial   Extraction
RS HIGHEST DEGREE           1.000        .577      RS HIGHEST DEGREE           1.000        .579
FATHERS HIGHEST                                    FATHERS HIGHEST
1.000         .720                                 1.000         .720
DEGREE                                             DEGREE
MOTHERS HIGHEST                                    MOTHERS HIGHEST
1.000         .684                                 1.000         .681
DEGREE                                             DEGREE
GENERAL HAPPINESS          1.000         .745      GENERAL HAPPINESS          1.000         .726
HAPPINESS OF                                       HAPPINESS OF
1.000         .782                                 1.000         .771
MARRIAGE                                           MARRIAGE
Extraction Method: Principal Component Analysis.   Extraction Method: Principal Component Analysis.
SW388R7
Data Analysis &
Slide 84

factor analysis including all                        factor analysis excluding
cases is shown on the left.                          outliers is shown on the right.

a
Rotated Component Matrix                                                     a
Rotated Component Matrix

Component                                             Component
1        2                                            1        2
RS HIGHEST DEGREE            .732     -.202           RS HIGHEST DEGREE            .734     -.201
FATHERS HIGHEST                                       FATHERS HIGHEST
.848        .031                                      .846        .060
DEGREE                                                DEGREE
MOTHERS HIGHEST                                       MOTHERS HIGHEST
.810        .169                                      .810        .157
DEGREE                                                DEGREE
GENERAL HAPPINESS            .145        .851         GENERAL HAPPINESS            .159        .837
HAPPINESS OF                                          HAPPINESS OF
-.145        .872                                     -.143        .866
MARRIAGE                                              MARRIAGE
Extraction Method: Principal Component Analysis.      Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.   Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.                a. Rotation converged in 3 iterations.

variables RS HIGHEST DEGREE; FATHERS HIGHEST DEGREE; and
second component.
SW388R7
Data Analysis &
Computers II              Interpreting the outlier analysis
Slide 85

All of the communalities satisfy the criteria of being
greater than 0.50.

Whether we include or exclude outliers, our
interpretation is the same. The outliers do not have an
effect which supports their exclusion from the analysis.

The part of the problem statement that outliers do not
have an impact is true.

a
Rotated Component Matrix                                                     a
Rotated Component Matrix

Component                                             Component
1        2                                            1        2
RS HIGHEST DEGREE            .732     -.202           RS HIGHEST DEGREE            .734     -.201
FATHERS HIGHEST                                       FATHERS HIGHEST
.848        .031                                  .846            .060
DEGREE                                                When we are finished with
DEGREE
MOTHERS HIGHEST                                       this analysis, we should select
MOTHERS HIGHEST
.810        .169                                  .810            .157
DEGREE                                                all cases back into the data
DEGREE
GENERAL HAPPINESS            .145        .851         set and remove the variables
GENERAL HAPPINESS        .159            .837
HAPPINESS OF                                          we created.
HAPPINESS OF
-.145        .872                                 -.143            .866
MARRIAGE                                              MARRIAGE
Extraction Method: Principal Component Analysis.      Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.   Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.                a. Rotation converged in 3 iterations.
SW388R7
Data Analysis &
Computers II     Computing Chronbach's Alpha
Slide 86

To compute
Chronbach's alpha for
each component in
our analysis, we select
Scale | Reliability
Analysis… from the
SW388R7
Data Analysis &
Computers II     Selecting the variables for the first component
Slide 87

First, move the three
on the first component
to the Items list box.

Second, click on the
Statistics… button to
select the statistics we
will need.
SW388R7
Data Analysis &
Computers II              Selecting the statistics for the output
Slide 88

Second, click on the
First, mark the                             Continue button.
checkboxes for Item,
Scale, and Scale if
item deleted.
SW388R7
Data Analysis &
Computers II     Completing the specifications
Slide 89

Second, click on
the OK button to
produce the output.

First, If Alpha is not
selected as the Model
in the drop down
SW388R7
Data Analysis &
Computers II     Chronbach's Alpha
Slide 90

Chronbach's Alpha is located at the
bottom of the output. An alpha of
0.60 or higher is the minimum
acceptable level. Preferably, alpha
will be 0.70 or higher, as it is in
this case.
SW388R7
Data Analysis &
Computers II     Chronbach's Alpha
Slide 91

If alpha is too small, this column may
suggest which variable should be removed
to improve the internal consistency of the
scale variables. It tells us what alpha we
would get if the variable listed were
removed from the scale.
SW388R7
Data Analysis &
Computers II     Computing Chronbach's Alpha
Slide 92

To compute
Chronbach's alpha for
each component in
our analysis, we select
Scale | Reliability
Analysis… from the
SW388R7
Data Analysis &   Selecting the variables for the second
component
Computers II

Slide 93

First, move the three
on the second
component to the
Items list box.

Second, click on the
Statistics… button to
select the statistics we
will need.
SW388R7
Data Analysis &
Computers II              Selecting the statistics for the output
Slide 94

Second, click on the
First, mark the                             Continue button.
checkboxes for Item,
Scale, and Scale if
item deleted.
SW388R7
Data Analysis &
Computers II     Completing the specifications
Slide 95

Second, click on
the OK button to
produce the output.

First, If Alpha is not
selected as the Model
in the drop down
SW388R7
Data Analysis &
Computers II     Chronbach's Alpha
Slide 96

Chronbach's Alpha is located at the
bottom of the output. An alpha of
0.60 or higher is the minimum
acceptable level. Preferably, alpha
Second, it is
will be 0.70 or higher, asclick in
this case.
SW388R7
Data Analysis &
Computers II                    Answering the problem question
Slide 97

Total Variance Explained

Component         Total     % of Variance Cumulative %       Total     % of Variance Cumulative %   Tot
1                  1.626           40.651         40.651      1.626          40.651        40.651     1.
The answer to the original question is true with caution.
2                  1.119           27.968         68.619      1.119          27.968        68.619     1.
3        Component 1 includes the variables "highest academic degree" [degree],
.694          17.341         85.960
.562          14.040       100.000
this combination of Component further
Extraction Method: Principal variables in Analysis.analyses.
Component 2 includes the variables "general happiness" [happy] and
"happiness of marriage" [hapmar]. We can substitute one component
variable for this combination of variables in further analyses.

The components explain at least 50% of the variance in each of the
variables included in the final analysis.

The components explain 70.169% of the total variance in the variables
which are included on the components.

A caution is added to our findings because of the inclusion of ordinal level
variables in the analysis.
SW388R7
Data Analysis &
Computers II                  Validation with small samples
Slide 98

   In the validation example completed above, 105 cases were
used in the final principal component analysis model. When we
have more than 100 cases available for the validation analysis,
an even split should generally results in 50+ cases per validation
sample.

   However, if the number of cases available for the validation is
less than 100, then splitting the sample in two may result in a
validation samples that are less than the minimum of 50 cases
to conduct a factor analysis.

   When this happens, we draw two random samples of cases that
are both larger than the minimum of 50. Since some of the
same cases will be in both validation samples, the support for
generalizability is not as strong, but it does offer some
evidence, especially if we repeat the process a number of
times.
SW388R7
Data Analysis &
Computers II                  Validation with small samples
Slide 99

   We randomly create two split variables which we will call split1
and split 2, using a separate random number see for each.

   In the formula for creating the split variables, we set the
proportion of cases sufficient to randomly select fifty cases.

   To calculate the proportion that we need, we divide 50 by the
number of valid cases in the analysis and round up to the next
highest 10% increment.

   For example, if we have 80 valid cases, the proportion we need
for validation is 50 / 80 = 0.625, which we would round up to
0.70 or 70%. The formulas for the split variables would be:
split1 = uniform(1) <= 0.70
split2 = uniform(1) <= 0.70
SW388R7
Data Analysis &
Computers II             Validation with very small samples
Slide 100

   When the number of valid cases in a factor analysis
gets close to the lower limit of 50, the results of the
validation may appear to support the analysis, but
this can be misleading because the validation
samples are not really different from the analysis of
the full data set.

   For example, if the number of valid cases were 60, a
90% sub-sample of 54 would result in 54 cases being
the same in both the full analysis and the validation
analysis. The validation may appear to support the
full analysis simply because the validation had
limited opportunity to be different.
SW388R7
Data Analysis &
Computers II                                                   Problem 2
Slide 101

In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic?
Assume that there is no problematic pattern of missing data. Use a level of significance of 0.05. Validate the
results of your principal component analysis by repeating the principal component analysis on two 70% random
samples of the data set, using 743911 and 747454 as the random number seeds.

Based on the results of a principal component analysis of the 7 variables "claims about environmental threats
are exaggerated" [grnexagg], "danger to the environment from modifying genes in crops" [genegen], "America
doing enough to protect environment" [amprogrn], "should be international agreements for environment
problems" [grnintl], "poorer countries should be expected to do less for the environment" [ldcgrn], "economic
progress in America will slow down without more concern for environment" [econgrn], and "likelihood of
nuclear power station damaging environment in next 5 years" [nukeacc], the information in these variables can
be represented with 2 components and 3 individual variables. Cases that might be considered to be outliers do
not have an impact on the factor solution. The internal consistency of the variables included in the components
is sufficient to support the creation of a summated scale.

Component 1 includes the variables "danger to the environment from modifying genes in crops" [genegen] and
"likelihood of nuclear power station damaging environment in next 5 years" [nukeacc]. Component 2 includes
the variables "claims about environmental threats are exaggerated" [grnexagg] and "poorer countries should be
expected to do less for the environment" [ldcgrn]. The variables "economic progress in America will slow down
without more concern for environment" [econgrn], "should be international agreements for environment
problems" [grnintl], and "America doing enough to protect environment" [amprogrn] were not included on the
components and are retained as individual variables.

1.   True
2.   True with caution
3.   False
4.   Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II                 The principal component solution
Slide 102

A principal component analysis found a
two-factor solution, with four of the
components. The communalities and

a
Communalities                                     Rotated Component Matrix

Initial   Extraction                                      Component
ENVIRONMENTAL                                                                       1        2
THREATS                      1.000        .615         ENVIRONMENTAL
EXAGGERATED                                            THREATS                      -.207        .756
HOW DANGEROUS                                          EXAGGERATED
MODIFYING GENES IN           1.000        .694         HOW DANGEROUS
CROPS                                                  MODIFYING GENES IN            .801       -.229
POOR COUNTRIES                                         CROPS
LESS THAN RICH FOR           1.000        .691         POOR COUNTRIES
ENVIRONMENT                                            LESS THAN RICH FOR            .051        .830
LIKELIHOOD OF                                          ENVIRONMENT
NUCLEAR MELTDOWN             1.000        .744         LIKELIHOOD OF
IN 5 YEARS                                             NUCLEAR MELTDOWN              .861        .059
Extraction Method: Principal Component Analysis.       IN 5 YEARS
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.
SW388R7
Data Analysis &
Computers II     The size of the validation sample
Slide 103

Descriptiv e Statistics

Mean    Std. Deviation   Analysis N
ENVIRONMENTAL
THREATS                   3.28          1.008           75
EXAGGERATED
HOW DANGEROUS
MODIFYING GENES IN        3.11           .953           75
CROPS
POOR COUNTRIES
LESS THAN RICH FOR        3.77           .863           75
ENVIRONMENT
LIKELIHOOD OF
NUCLEAR MELTDOWN          2.47
There were 75 valid cases in the final.935           75
IN 5 YEARS
analysis. The sample is to small to split in
half and have enough cases to meet the
minimum of 50 cases for factor analysis.

We will draw two random samples that
each comprise 70% of the full sample. We
arrive at 70% by dividing the minimum
sample size by the number of valid cases
(50 ÷ 75 = 0.667) and rounding up to the
next 10% increment, 70%.
SW388R7
Data Analysis &
Computers II                               Split-sample validation
Slide 104

The first random
number seed stated in
the problem is 743911,
so we enter this is the
SPSS random number
seed dialog.
To set the random number
seed, select the Random
Number Seed… command
SW388R7
Data Analysis &
Computers II     Set the random number seed for first sample
Slide 105

First, click on the
Set seed to option
button to activate
the text box.

Second, type in the
random seed stated in
the problem.

Third, click on the OK
button to complete the
dialog box.

Note that SPSS does not
provide you with any
SW388R7
Data Analysis &
Computers II     Select the compute command
Slide 106

To enter the formula for the
variable that will split the
sample in two parts, click
on the Compute…
command.
SW388R7
Data Analysis &
Computers II     The formula for the split1 variable
Slide 107

First, type the name for the
new variable, split1, into
the Target Variable text
box.

Second, the formula for the
value of split1 is shown in the
text box.

The uniform(1) function
generates a random decimal
number between 0 and 1.
The random number is
compared to the value 0.70.

If the random number is less
than or equal to 0.70, the
value of the formula will be 1,
the SPSS numeric equivalent
to true. If the random
number is larger than 0.70,
the formula will return a 0,
the SPSS numeric equivalent
Third, click on the                    to false.
OK button to
complete the dialog
box.
SW388R7
Data Analysis &     Set the random number seed for second
sample
Computers II

Slide 108

First, click on the
Set seed to option
button to activate
the text box.

Second, type in the
random seed stated in
the problem.

Third, click on the OK
button to complete the
dialog box.

Note that SPSS does not
provide you with any
SW388R7
Data Analysis &
Computers II     Select the compute command
Slide 109

To enter the formula for the
variable that will split the
sample in two parts, click
on the Compute…
command.
SW388R7
Data Analysis &
Computers II     The formula for the split2 variable
Slide 110

First, type the name for the
new variable, split2, into
the Target Variable text
box.

Second, the formula for the
value of split2 is shown in the
text box.

The uniform(1) function
generates a random decimal
number between 0 and 1.
The random number is
compared to the value 0.70.

If the random number is less
than or equal to 0.70, the
value of the formula will be 1,
the SPSS numeric equivalent
to true. If the random
number is larger than 0.70,
the formula will return a 0,
the SPSS numeric equivalent
Third, click on the                    to false.
OK button to
complete the dialog
box.
SW388R7
Data Analysis &   Repeating the analysis with the first
validation sample
Computers II

Slide 111

To repeat the principal
component analysis for the
first validation sample, select
Factor Analysis from the
Dialog Recall tool button.
SW388R7
Data Analysis &
Computers II            Using split1 as the selection variable
Slide 112

First, scroll
down the list of
variables and
highlight the
variable split1.

Second, click on the
right arrow button to
move the split1 variable
to the Selection
Variable text box.
SW388R7
Data Analysis &
Computers II     Setting the value of split1 to select cases
Slide 113

When the variable named
split1 is moved to the
Selection Variable text
the name to prompt up to
enter a specific value for
split1.                              Click on the
Value… button
to enter a
value for split1.
SW388R7
Data Analysis &
Computers II             Completing the value selection
Slide 114

First, type the value           Second, click on the
for the first sample, 1,        Continue button to
into the Value for              complete the value
Selection Variable text         entry.
box.
SW388R7
Data Analysis &   Requesting output for the first validation
sample
Computers II

Slide 115

Click on the OK
button to
request the
output.

When the value entry
dialog box is closed, SPSS
after the equal sign. This
specification now tells      Since the validation analysis
SPSS to include in the       requires us to compare the
analysis only those cases    results of the analysis using
that have a value of 1 for   the first validation sample,
the split1 variable.         we will request the output
for the second validation
sample before doing any
comparison.
SW388R7
Data Analysis &   Repeating the analysis with the second
validation sample
Computers II

Slide 116

To repeat the principal
component analysis for the
second validation sample,
select Factor Analysis from the
Dialog Recall tool button.
SW388R7
Data Analysis &
Computers II     Removing split1 as the selection variable
Slide 117

First, highlight
the Selection
Variable text box.

Second, click on the
left arrow button to
move the split1 back to
the list of variables.
SW388R7
Data Analysis &
Computers II           Using split2 as the selection variable
Slide 118

First, scroll
down the list of
variables and
highlight the
variable split2.

Second, click on the
right arrow button to
move the split2 variable
to the Selection
Variable text box.
SW388R7
Data Analysis &
Computers II     Setting the value of split2 to select cases
Slide 119

When the variable named
split2 is moved to the
Selection Variable text
the name to prompt up to
enter a specific value for
split2.                              Click on the
Value… button
to enter a
value for split2.
SW388R7
Data Analysis &
Computers II            Completing the value selection
Slide 120

First, type the value          Second, click on the
for the second sample,         Continue button to
1, into the Value for          complete the value
Selection Variable text        entry.
box.
SW388R7
Data Analysis &   Requesting output for the second validation
sample
Computers II

Slide 121

Click on the OK
button to
request the
output.

When the value entry
dialog box is closed, SPSS
after the equal sign. This
specification now tells
SPSS to include in the
analysis only those cases
that have a value of 1 for
the split2 variable.
SW388R7
Data Analysis &        Comparing the communalities for the
validation samples
Computers II

Slide 122

All of the communalities                         All of the communalities
for the first validation                         for the second validation
sample satisfy the                               sample satisfy the
minimum requirement of                           minimum requirement of
being larger than 0.50.                          being larger than 0.50.

Communalitiesa                                     Communalitiesa

Initial   Extraction                               Initial   Extraction
ENVIRONMENTAL                                      ENVIRONMENTAL
THREATS                      1.000        .631     THREATS                      1.000        .672
EXAGGERATED                                        EXAGGERATED
HOW DANGEROUS                                      HOW DANGEROUS
MODIFYING GENES IN           1.000        .648     MODIFYING GENES IN           1.000        .679
CROPS                                              CROPS
POOR COUNTRIES                                     POOR COUNTRIES
LESS THAN RICH FOR           1.000        .773     LESS THAN RICH FOR           1.000        .732
ENVIRONMENT                                        ENVIRONMENT
LIKELIHOOD OF                                      LIKELIHOOD OF
NUCLEAR MELTDOWN             1.000        .691     NUCLEAR MELTDOWN             1.000        .746
IN 5 YEARS                                         IN 5 YEARS
Extraction Method: Principal Component Analysis.   Extraction Method: Principal Component Analysis.
a. Only cases for which SPLIT2 = 1 are used        a. Only cases for which SPLIT1 = 1 are used
in the analysis phase.                             in the analysis phase.
SW388R7
validation samples
Computers II

Slide 123

second validation analysis
validation analysis including all
excluding outliers is shown on the
cases is shown on the left.
right.
a,b
Rotated Component Matrix                                                     a,b
Rotated Component Matrix

Component                                             Component
1        2                                            1        2
ENVIRONMENTAL                                         ENVIRONMENTAL
THREATS                       .807        -.147       THREATS                      -.390        .692
EXAGGERATED                                           EXAGGERATED
HOW DANGEROUS                                         HOW DANGEROUS
MODIFYING GENES IN           -.198        .800        MODIFYING GENES IN            .795        -.123
CROPS                                                 CROPS
POOR COUNTRIES                                        POOR COUNTRIES
LESS THAN RICH FOR            .856        .007        LESS THAN RICH FOR            .187        .859
ENVIRONMENT                                           ENVIRONMENT
LIKELIHOOD OF                                         LIKELIHOOD OF
NUCLEAR MELTDOWN              .048        .862        NUCLEAR MELTDOWN              .829        .061
IN 5 YEARS                                            IN 5 YEARS
Extraction Method: Principal Component Analysis.       Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization. validation analyses shows theKaiser Normalization.
Rotation Method: Varimax with
same pattern of iterations.
a. Rotation converged in 3variables, though the first and second component
a. Rotation converged in 3 iterations.
have switched places.
b. Only cases for which SPLIT1 = 1 are used in        b. Only cases for which SPLIT2 = 1 are used in
the analysis phase.                                   the analysis phase.
supports the generalizability of the factor model.
SW388R7
Data Analysis &
Computers II                     Steps in validation analysis - 1
Slide 124

The following is a guide to the decision process for answering

Is the number of valid
cases greater than or
No                   equal to 100?                               Yes

•Set the first random seed and
compute the split1 variable                             •Set the random seed and
•Re-run factor with split1 = 1                          compute the split variable
•Set the second random seed                             •Re-run factor with split = 0
and compute the split2 variable                         •Re-run factor with split = 1
•Re-run factor with split2 = 1

Yes

Are all of the
No
communalities in the
False
validations greater than
0.50?

Yes
SW388R7
Data Analysis &
Computers II     Steps in validation analysis - 2
Slide 125

Yes

Does pattern of factor     No
full data set?

Yes

True
SW388R7
Data Analysis &
Computers II                    Steps in outlier analysis - 1
Slide 126

The following is a guide to the decision process for answering

Are any of the factor
No
scores outliers (larger than
±3.0)?                              True

Yes

Re-run factor analysis,
excluding outliers

Yes

Are all of the                 No
communalities excluding             False
outliers greater than 0.50?

Yes
SW388R7
Data Analysis &
Computers II     Steps in outlier analysis - 2
Slide 127

Yes

excluding outliers match          False
pattern for full data set?

Yes

True
SW388R7
Data Analysis &
Computers II                    Steps in reliability analysis
Slide 128

The following is a guide to the decision process for answering

Are Chronbach’s Alpha       No
greater than 0.60 for all        False
factors?

Yes

Are Chronbach’s Alpha       No
greater than 0.70 for all        True with caution
factors?

Yes

True

```
To top