VIEWS: 3 PAGES: 128 POSTED ON: 11/24/2011 Public Domain
SW388R7 Data Analysis & Computers II Principal Component Analysis: Additional Topics Slide 1 Split Sample Validation Detecting Outliers Reliability of Summated Scales Sample Problems SW388R7 Data Analysis & Computers II Split Sample Validation Slide 2 To test the generalizability of findings from a principal component analysis, we could conduct a second research study to see if our findings are verified. A less costly alternative is to split the sample randomly into two halves, do the principal component analysis on each half and compare the results. If the communalities and the factor loadings are the same on the analysis on each half and the full data set, we have evidence that the findings are generalizable and valid because, in effect, the two analyses represent a study and a replication. SW388R7 Data Analysis & Computers II Misleading Results to Watch Out For Slide 3 When we examine the communalities and factor loadings, we are matching up overall patterns, not exact results: the communalities should all be greater than 0.50 and the pattern of the factor loadings should be the same. Sometimes the variables will switch their components (variables loading on the first component now load on the second and vice versa), but this does not invalidate our findings. Sometimes, all of the signs of the factor loadings will reverse themselves (the plus's become minus's and the minus's become plus's), but this does not invalidate our findings because we interpret the size, not the sign of the loadings. SW388R7 Data Analysis & Computers II When validation fails Slide 4 If the validation fails, we are warned that the solution found in the analysis of the full data set is not generalizable and should not be reported as valid findings. We do have some options when validation fails: If the problem is limited to one or two variables, we can remove those variables and redo the analysis. Randomly selected samples are not always representative. We might try some different random number seeds and see if our negative finding was a fluke. If we choose this option, we should do a large number of validations to establish a clear pattern, at least 5 to 10. Getting one or two validations to negate the failed validation and support our findings is not sufficient. SW388R7 Data Analysis & Computers II Outliers Slide 5 SPSS calculates factor scores as standard scores. SPSS suggests that one way to identify outliers is to compute the factors scores and identify those have a value greater than ±3.0 as outliers. If we find outliers in our analysis, we redo the analysis, omitting the cases that were outliers. If there is no change in communality or factor structure in the solution, it implies that there outliers do not have an impact. If our factor solution changes, we will have to study the outlier cases to determine whether or not we should exclude them. After testing outliers, restore full data set before any further calculations SW388R7 Data Analysis & Computers II Reliability of Summated Scales Slide 6 One of the common uses of factor analysis is the formation of summated scales, where we add the scores on all the variables loading on a component to create the score for the component. To verify that the variables for a component are measuring similar entities that are legitimate to add together, we compute Chronbach's alpha. If Chronbach's alpha is 0.70 or greater (0.60 or greater for exploratory research), we have support on the interval consistency of the items justifying their use in a summated scale. SW388R7 Data Analysis & Computers II Problem 1 Slide 7 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problematic pattern of missing data. Use a level of significance of 0.05. Validate the results of your principal component analysis by splitting the sample in two, using 519447 as the random number seed. Based on the results of a principal component analysis of the 8 variables "highest academic degree" [degree], "father's highest academic degree" [padeg], "mother's highest academic degree" [madeg], "spouse's highest academic degree" [spdeg], "general happiness" [happy], "happiness of marriage" [hapmar], "condition of health" [health], and "attitude toward life" [life], the information in these variables can be represented with 2 components and 3 individual variables. Cases that might be considered to be outliers do not have an impact on the factor solution. The internal consistency of the variables included in the components is sufficient to support the creation of a summated scale. Component 1 includes the variables "highest academic degree" [degree], "father's highest academic degree" [padeg], and "mother's highest academic degree" [madeg]. Component 2 includes the variables "general happiness" [happy] and "happiness of marriage" [hapmar]. The variables "attitude toward life" [life], "condition of health" [health], and "spouse's highest academic degree" [spdeg] were not included on the components and are retained as individual variables. 1. True The bold text indicates that 2. True with caution parts to the problem that 3. False have been added this week. 4. Inappropriate application of a statistic SW388R7 Data Analysis & Computers II Computing a principal component analysis Slide 8 To compute a principal component analysis in SPSS, select the Data Reduction | Factor… command from the Analyze menu. SW388R7 Data Analysis & Computers II Add the variables to the analysis Slide 9 First, move the variables listed in the problem to the Variables list box. Second, click on the Descriptives… button to specify statistics to include in the output. SW388R7 Data Analysis & Computers II Compete the descriptives dialog box Slide 10 First, mark the Univariate descriptives checkbox to get a tally of valid cases. Sixth, click on the Continue Second, keep the Initial button. solution checkbox to get the statistics needed to determine the number of factors to extract. Fifth, mark the Anti-image checkbox to get more outputs used to assess the appropriateness of factor analysis for the variables. Third, mark the Coefficients checkbox to get a correlation matrix, one of Fourth, mark the KMO and Bartlett’s test the outputs needed to of sphericity checkbox to get more outputs assess the appropriateness used to assess the appropriateness of of factor analysis for the factor analysis for the variables. variables. SW388R7 Data Analysis & Computers II Select the extraction method Slide 11 First, click on the The extraction method refers Extraction… button to to the mathematical method specify statistics to that SPSS uses to compute the include in the output. factors or components. SW388R7 Data Analysis & Computers II Compete the extraction dialog box Slide 12 First, retain the default method Principal components. Second, click on the Continue button. SW388R7 Data Analysis & Computers II Select the rotation method Slide 13 The rotation method refers to First, click on the the mathematical method that Rotation… button to SPSS rotate the axes in specify statistics to geometric space. This makes include in the output. it easier to determine which variables are loaded on which components. SW388R7 Data Analysis & Computers II Compete the rotation dialog box Slide 14 First, mark the Second, click Varimax method on the as the type of Continue rotation to used button. in the analysis. SW388R7 Data Analysis & Computers II Complete the request for the analysis Slide 15 First, click on the OK button to request the output. SW388R7 Data Analysis & Computers II Level of measurement requirement Slide 16 "Highest academic degree" [degree], "father's highest academic degree" [padeg], "mother's highest academic degree" [madeg], "spouse's highest academic degree" [spdeg], "general happiness" [happy], "happiness of marriage" [hapmar], "condition of health" [health], and "attitude toward life" [life] are ordinal level variables. If we follow the convention of treating ordinal level variables as metric variables, the level of measurement requirement for principal component analysis is satisfied. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation. SW388R7 Data Analysis & Sample size requirement: minimum number of cases Computers II Slide 17 Descriptiv e Statistics Mean Std. Deviation Analysis N RS HIGHEST DEGREE 1.68 1.085 68 FATHERS HIGHEST .96 .984 68 DEGREE MOTHERS HIGHEST .85 .797 68 DEGREE SPOUSES HIGHEST 1.97 1.233 68 DEGREE The number of valid cases for this GENERAL HAPPINESS 1.65 .617 68 set of variables is 68. HAPPINESS OF 1.47 .532 68 MARRIAGE While principal component analysis CONDITION OF HEALTH 1.76 can be conducted on a sample that .848 68 has fewer OR IS LIFE EXCITINGthan 100 cases, but more 1.53 .532 68 DULL than 50 cases, we should be cautious about its interpretation. SW388R7 Data Analysis & Sample size requirement: ratio of cases to variables Computers II Slide 18 Descriptiv e Statistics Mean Std. Deviation Analysis N RS HIGHEST DEGREE 1.68 1.085 68 FATHERS HIGHEST .96 .984 68 DEGREE MOTHERS HIGHEST .85 .797 68 DEGREE SPOUSES HIGHEST 1.97 1.233 68 DEGREE The ratio of cases to GENERAL HAPPINESS 1.65 .617 68 variables in a principal HAPPINESS OF component analysis should .532 1.47 68 MARRIAGE be at least 5 to 1. CONDITION OF HEALTH 1.76 .848 68 IS LIFE EXCITING OR and 8 variables, With 68 1.53 .532 68 DULL the ratio of cases to variables is 8.5 to 1, which exceeds the requirement for the ratio of cases to variables. SW388R7 Data Analysis & Appropriateness of factor analysis: Presence of substantial correlations Computers II Slide 19 Principal components analysis requires that there be some correlations greater than 0.30 between the variables included in the analysis. For this set of variables, there are 7 correlations in the matrix greater than 0.30, satisfying this requirement. The correlations greater than 0.30 are highlighted in yellow. Correlation Matrix FATHERS MOTHERS SPOUSES HAPPINESS IS RS HIGHEST HIGHEST HIGHEST HIGHEST GENERAL OF CONDITION EX DEGREE DEGREE DEGREE DEGREE HAPPINESS MARRIAGE OF HEALTH OR Correlation RS HIGHEST DEGREE 1.000 .490 .410 .595 -.017 -.172 -.246 FATHERS HIGHEST .490 1.000 .677 .319 -.100 -.131 -.174 DEGREE MOTHERS HIGHEST .410 .677 1.000 .208 .105 -.046 -.008 DEGREE SPOUSES HIGHEST .595 .319 .208 1.000 -.053 -.138 -.392 DEGREE GENERAL HAPPINESS -.017 -.100 .105 -.053 1.000 .514 .267 HAPPINESS OF -.172 -.131 -.046 -.138 .514 1.000 .282 MARRIAGE CONDITION OF HEALTH -.246 -.174 -.008 -.392 .267 .282 1.000 IS LIFE EXCITING OR -.138 -.012 .151 -.090 .214 .161 .214 DULL SW388R7 Data Analysis & Appropriateness of factor analysis: Sampling adequacy of individual variables Computers II Slide 20 Anti-image Matrices FATHERS MOTHERS SPOUSES HAPPINESS IS LIFE RS HIGHEST HIGHEST HIGHEST HIGHEST GENERAL OF CONDITION EXCITING DEGREE DEGREE DEGREE DEGREE HAPPINESS MARRIAGE OF HEALTH OR DULL Anti-image Covariance RS HIGHEST DEGREE .511 -.101 -.079 -.274 -.058 .067 -.008 .108 FATHERS HIGHEST -.101 .455 -.290 -.024 .103 -.028 .050 .028 There are two anti-image DEGREE matrices: the anti-image MOTHERS HIGHEST DEGREE -.079 -.290 .476 .028 -.102 .043 -.052 -.121 covariance matrix and the SPOUSES HIGHEST Principal component analysis requires anti-image correlation -.274 DEGREE -.024 .028 that .578 Kaiser-Meyer-Olkin Measure of the -.014 -.012 .203 -.039 GENERAL We are matrix.HAPPINESS interested in -.058 .103 -.102 Sampling Adequacy be greater than 0.50 -.014 .666 -.325 -.085 -.085 the anti-image correlation HAPPINESS OF for each individual variable as well as the matrix. MARRIAGE .067 -.028 .043 set of variables. -.012 -.325 .692 -.099 -.024 CONDITION OF HEALTH -.008 .050 -.052 .203 -.085 -.099 .749 -.102 IS LIFE EXCITING OR On iteration 1, the MSA for all of the DULL .108 .028 -.121 individual variables included in the-.102 -.039 -.085 -.024 .876 Anti-image Correlation RS HIGHEST DEGREE .701 a -.210 -.161 analysis was greater than 0.5, supporting -.503 -.099 .113 -.012 .162 FATHERS HIGHEST -.210 .640 a -.623 their retention .187 the analysis. in -.048 -.049 .086 .044 DEGREE MOTHERS HIGHEST a -.161 -.623 .586 .053 -.181 .076 -.087 -.188 DEGREE SPOUSES HIGHEST a -.503 -.048 .053 .656 -.023 -.018 .309 -.055 DEGREE GENERAL HAPPINESS -.099 .187 -.181 -.023 .549 a -.478 -.120 -.111 HAPPINESS OF a .113 -.049 .076 -.018 -.478 .619 -.137 -.030 MARRIAGE CONDITION OF HEALTH a -.012 .086 -.087 .309 -.120 -.137 .734 -.126 IS LIFE EXCITING OR a .162 .044 -.188 -.055 -.111 -.030 -.126 .638 DULL a. Measures of Sampling Adequacy(MSA) SW388R7 Data Analysis & Appropriateness of factor analysis: Sampling adequacy for set of variables Computers II Slide 21 KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .640 Bartlett's Test of Approx. Chi-Square 137.823 Sphericity df 28 Sig. .000 In addition, the overall MSA for the set of variables included in the analysis was 0.640, which exceeds the minimum requirement of 0.50 for overall MSA. SW388R7 Data Analysis & Appropriateness of factor analysis: Bartlett test of sphericity Computers II Slide 22 KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .640 Bartlett's Test of Approx. Chi-Square 137.823 Sphericity df 28 Sig. .000 Principal component analysis requires that the probability associated with Bartlett's Test of Sphericity be less than the level of significance. The probability associated with the Bartlett test is <0.001, which satisfies this requirement. SW388R7 Data Analysis & Number of factors to extract: Latent root criterion Computers II Slide 23 Total Variance Explained Initial Eigenvalues Extraction Sums of Squared Loadings Component Total % of Variance Cumulative % Total % of Variance Cumulative % 1 2.600 32.502 32.502 2.600 32.502 32.502 2 1.772 22.149 54.651 1.772 22.149 54.651 3 1.079 13.486 68.137 1.079 13.486 68.137 4 .827 10.332 78.469 5 .631 7.888 86.358 6 .487 6.087 92.445 7 .333 4.161 96.606 8 .272 3.394 100.000 Extraction Method: Principal Component Analysis. Using the output from iteration 1, there were 3 eigenvalues greater than 1.0. The latent root criterion for number of factors to derive would indicate that there were 3 components to be extracted for these variables. SW388R7 Data Analysis & Number of factors to extract: Percentage of variance criterion Computers II Slide 24 Total Variance Explained Initi al Ei genval ues Extracti on Sums of Squared Component T otal % of Vari ance Cumul ati ve % T otal % of Vari ance Cu 1 2.600 32.502 32.502 2.600 32.502 2 1.772 22.149 54.651 1.772 22.149 3 1.079 13.486 68.137 1.079 13.486 4 .827 10.332 78.469 5 .631 7.888 86.358 6 .487 6.087 92.445 7 .333 4.161 96.606 8 .272 3.394 100.000 Extracti on M ethod: Princi pal Com ponent Anal ysi s. In addition, the cumulative proportion of variance criteria can be met with 3 components to satisfy the criterion of explaining 60% or more of the total variance. A 3 components solution would explain 68.137% of the total Since the SPSS default is to extract variance. the number of components indicated by the latent root criterion, our initial factor solution was based on the extraction of 3 components. SW388R7 Data Analysis & Computers II Evaluating communalities Slide 25 Communalities Initial Extraction RS HIGHEST DEGREE 1.000 .717 Communalities represent the FATHERS HIGHEST 1.000 .768 proportion of the variance in DEGREE the original variables that is MOTHERS HIGHEST 1.000 .815 accounted for by the factor DEGREE solution. SPOUSES HIGHEST 1.000 .715 DEGREE The factor solution should GENERAL HAPPINESS 1.000 .763 explain at least half of each HAPPINESS OF 1.000 .711 original variable's variance, so MARRIAGE the communality value for CONDITION OF HEALTH 1.000 .548 each variable should be 0.50 IS LIFE EXCITING OR or higher. DULL 1.000 .415 Extraction Method: Principal Component Analysis. SW388R7 Data Analysis & Computers II Communality requiring variable removal Slide 26 Communalities Initial Extraction On iteration 1, the RS HIGHEST DEGREE 1.000 .717 communality for the variable "attitude toward FATHERS HIGHEST 1.000 .768 life" [life] was 0.415. DEGREE Since this is less than MOTHERS HIGHEST 0.50, the variable should 1.000 .815 DEGREE be removed from the next SPOUSES HIGHEST iteration of the principal DEGREE 1.000 .715 component analysis. GENERAL HAPPINESS 1.000 .763 The variable was removed HAPPINESS OF and the principal 1.000 .711 component analysis was MARRIAGE CONDITION OF HEALTH 1.000 .548 computed again. IS LIFE EXCITING OR 1.000 .415 DULL Extraction Method: Principal Component Analysis. SW388R7 Data Analysis & Computers II Repeating the factor analysis Slide 27 In the drop down menu, select Factor Analysis to reopen the factor analysis dialog box. SW388R7 Data Analysis & Computers II Removing the variable from the list of variables Slide 28 First, highlight the life variable. Second, click on the left arrow button to remove the variable from the Variables list box. SW388R7 Data Analysis & Computers II Replicating the factor analysis Slide 29 The dialog recall command opens the dialog box with all of the settings that we had selected the last time we used factor analysis. To replicate the analysis without the variable that we just removed, click on the OK button. SW388R7 Data Analysis & Computers II Communality requiring variable removal Slide 30 Communalities Initial Extraction On iteration 2, the RS HIGHEST DEGREE 1.000 .642 communality for the variable "condition of FATHERS HIGHEST 1.000 .623 health" [health] was DEGREE 0.477. Since this is less MOTHERS HIGHEST than 0.50, the variable 1.000 .592 DEGREE should be removed from SPOUSES HIGHEST the next iteration of the DEGREE 1.000 .516 principal component analysis. GENERAL HAPPINESS 1.000 .638 HAPPINESS OF The variable was removed 1.000 .594 and the principal MARRIAGE CONDITION OF HEALTH 1.000 .477 component analysis was computed again. Extraction Method: Principal Component Analysis. SW388R7 Data Analysis & Computers II Repeating the factor analysis Slide 31 In the drop down menu, select Factor Analysis to reopen the factor analysis dialog box. SW388R7 Data Analysis & Computers II Removing the variable from the list of variables Slide 32 First, highlight the health variable. Second, click on the left arrow button to remove the variable from the Variables list box. SW388R7 Data Analysis & Computers II Replicating the factor analysis Slide 33 The dialog recall command opens the dialog box with all of the settings that we had selected the last time we used factor analysis. To replicate the analysis without the variable that we just removed, click on the OK button. SW388R7 Data Analysis & Computers II Communality requiring variable removal Slide 34 On iteration 3, the communality for the variable "spouse's highest academic degree" [spdeg] Communalities was 0.491. Since this is less than 0.50, the Initial Extraction variable should be RS HIGHEST DEGREE 1.000 .674 removed from the next FATHERS HIGHEST iteration of the principal 1.000 .640 component analysis. DEGREE MOTHERS HIGHEST DEGREE 1.000 .577 The variable was removed and the principal SPOUSES HIGHEST 1.000 .491 component analysis was DEGREE computed again. GENERAL HAPPINESS 1.000 .719 HAPPINESS OF 1.000 .741 MARRIAGE Extraction Method: Principal Component Analysis. SW388R7 Data Analysis & Computers II Repeating the factor analysis Slide 35 In the drop down menu, select Factor Analysis to reopen the factor analysis dialog box. SW388R7 Data Analysis & Computers II Removing the variable from the list of variables Slide 36 First, highlight the spdeg variable. Second, click on the left arrow button to remove the variable from the Variables list box. SW388R7 Data Analysis & Computers II Replicating the factor analysis Slide 37 The dialog recall command opens the dialog box with all of the settings that we had selected the last time we used factor analysis. To replicate the analysis without the variable that we just removed, click on the OK button. SW388R7 Data Analysis & Computers II Communality satisfactory for all variables Slide 38 Communalities Once any variables with Initial Extraction communalities less than RS HIGHEST DEGREE 1.000 .577 0.50 have been removed from the analysis, the FATHERS HIGHEST pattern of factor loadings 1.000 .720 DEGREE should be examined to MOTHERS HIGHEST identify variables that 1.000 .684 have complex structure. DEGREE GENERAL HAPPINESS 1.000 .745 HAPPINESS OF 1.000 .782 MARRIAGE Extraction Method: Principal Component Analysis. Complex structure occurs when one variable has high loadings or correlations (0.40 or greater) on more than one component. If a variable has complex structure, it should be removed from the analysis. Variables are only checked for complex structure if there is more than one component in the solution. Variables that load on only one component are described as having simple structure. SW388R7 Data Analysis & Computers II Identifying complex structure Slide 39 a Rotated Component Matrix Component 1 2 On iteration 4, none of the RS HIGHEST DEGREE .732 -.202 variables demonstrated complex structure. It is not FATHERS HIGHEST DEGREE .848 .031 necessary to remove any additional variables because MOTHERS HIGHEST DEGREE .810 .169 of complex structure. GENERAL HAPPINESS .145 .851 HAPPINESS OF -.145 .872 MARRIAGE Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 3 iterations. SW388R7 Data Analysis & Computers II Variable loadings on components Slide 40 On iteration 4, the 2 components in the analysis had more than one variable loading on each of them. a No variables need to be Rotated Component Matrix removed because they Component are the only variable 1 2 loading on a component. RS HIGHEST DEGREE .732 -.202 FATHERS HIGHEST .031 DEGREE .848 MOTHERS HIGHEST .169 DEGREE .810 GENERAL HAPPINESS .145 .851 HAPPINESS OF -.145 MARRIAGE .872 Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 3 iterations. SW388R7 Data Analysis & Computers II Final check of communalities Slide 41 Once we have resolved any problems with complex structure, we check the communalities one last time to make certain that we are explaining a sufficient portion of the variance of all of the original variables. Communalities Initial Extraction RS HIGHEST DEGREE 1.000 .577 FATHERS HIGHEST 1.000 .720 DEGREE MOTHERS HIGHEST The communalities for all of the 1.000 .684 variables included on the DEGREE GENERAL HAPPINESS 1.000 .745 components were greater than HAPPINESS OF 0.50 and all variables had MARRIAGE 1.000 .782 simple structure. Extraction Method: Principal Component Analysis. The principal component analysis has been completed. SW388R7 Data Analysis & Computers II Interpreting the principal components Slide 42 The information in 5 of the variables can be represented by 2 components. Component 1 includes the variables •"highest academic degree" [degree], a Rotated Component Matrix •"father's highest academic degree" [padeg], and Component •"mother's highest academic degree" [madeg]. 1 2 RS HIGHEST DEGREE .732 -.202 FATHERS HIGHEST .031 DEGREE .848 MOTHERS HIGHEST .169 DEGREE .810 Component 2 includes the GENERAL HAPPINESS .145 .851 variables HAPPINESS OF MARRIAGE -.145 •"general happiness" .872 [happy] and Extraction Method: Principal Component Analysis. •"happiness of marriage" Rotation Method: Varimax with Kaiser Normalization. [hapmar]. a. Rotation converged in 3 iterations. SW388R7 Data Analysis & Computers II Total variance explained Slide 43 Total Variance Explained Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums o Component Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of V 1 1.953 39.061 39.061 1.953 39.061 39.061 1.953 2 1.555 31.109 70.169 1.555 31.109 70.169 1.556 3 .649 12.989 83.158 4 .441 8.820 91.977 5 .401 8.023 100.000 Extraction Method: Principal Component Analysis. The 2 components explain 70.169% of the total variance in the variables which are included on the components. SW388R7 Data Analysis & Computers II Split-sample validation Slide 44 We validate our analysis by conducting an analysis on each half of the sample. We compare the results of these two split sample analyses with the analysis of the full data set. To split the sample into two half, we generate a random variable that indicates which half of the sample each case should be placed in. To compute a random selection of cases, we need to specify the starting value, or random number seed. Otherwise, the random sequence of numbers that you generate will not match mine, and we will get different results. Before we do the do the random selection, you must make certain that your data set is sorted in the original sort order, or the cases in your two half samples will not match mine. To make certain your data set is in the same order as mine, sort your data set in ascending order by case id. SW388R7 Data Analysis & Computers II Sorting the data set in original order Slide 45 To make certain the data set is sorted in the original order, highlight the case id column, right click on the column header, and select the Sort Ascending command from the popup menu. SW388R7 Data Analysis & Computers II Setting the random number seed Slide 46 To set the random number seed, select the Random Number Seed… command from the Transform menu. SW388R7 Data Analysis & Computers II Set the random number seed Slide 47 First, click on the Set seed to option button to activate the text box. Second, type in the random seed stated in the problem. Third, click on the OK button to complete the dialog box. Note that SPSS does not provide you with any feedback about the change. SW388R7 Data Analysis & Computers II Select the compute command Slide 48 To enter the formula for the variable that will split the sample in two parts, click on the Compute… command. SW388R7 Data Analysis & Computers II The formula for the split variable Slide 49 First, type the name for the new variable, split, into the Target Variable text box. Second, the formula for the value of split is shown in the text box. The uniform(1) function generates a random decimal number between 0 and 1. The random number is compared to the value 0.50. If the random number is less than or equal to 0.50, the value of the formula will be 1, the SPSS numeric equivalent to true. If the random number is larger than 0.50, the formula will return a 0, the SPSS numeric equivalent Third, click on the to false. OK button to complete the dialog box. SW388R7 Data Analysis & Computers II The split variable in the data editor Slide 50 In the data editor, the split variable shows a random pattern of zero’s and one’s. To select half of the sample for each validation analysis, we will first select the cases where split = 0, then select the cases where split = 1. SW388R7 Data Analysis & Repeating the analysis with the first validation sample Computers II Slide 51 To repeat the principal component analysis for the first validation sample, select Factor Analysis from the Dialog Recall tool button. SW388R7 Data Analysis & Computers II Using "split" as the selection variable Slide 52 First, scroll down the list of variables and highlight the variable split. Second, click on the right arrow button to move the split variable to the Selection Variable text box. SW388R7 Data Analysis & Computers II Setting the value of split to select cases Slide 53 When the variable named split is moved to the Selection Variable text box, SPSS adds "=?" after the name to prompt up to enter a specific value for split. Click on the Value… button to enter a value for split. SW388R7 Data Analysis & Computers II Completing the value selection Slide 54 First, type the value Second, click on the for the first half of the Continue button to sample, 0, into the complete the value Value for Selection entry. Variable text box. SW388R7 Data Analysis & Requesting output for the first validation sample Computers II Slide 55 Click on the OK button to request the output. When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells Since the validation analysis SPSS to include in the requires us to compare the analysis only those cases results of the analysis using that have a value of 0 for the two split sample, we will the split variable. request the output for the second sample before doing any comparison. SW388R7 Data Analysis & Repeating the analysis with the second validation sample Computers II Slide 56 To repeat the principal component analysis for the second validation sample, select Factor Analysis from the Dialog Recall tool button. SW388R7 Data Analysis & Computers II Setting the value of split to select cases Slide 57 Since the split variable is already in the Selection Variable text box, we only need to change its value. Click on the Value… button to enter a different value for split. SW388R7 Data Analysis & Computers II Completing the value selection Slide 58 First, type the value Second, click on the for the second half of Continue button to the sample, 1, into the complete the value Value for Selection entry. Variable text box. SW388R7 Data Analysis & Requesting output for the second validation sample Computers II Slide 59 Click on the OK button to request the output. When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 1 for the split variable. SW388R7 Data Analysis & Computers II Comparing communalities Slide 60 All of the communalities All of the communalities for the first split sample for the second split sample satisfy the minimum satisfy the minimum requirement of being requirement of being larger than 0.50. larger than 0.50. a a Communalities Communalities Initial Extraction Initial Extraction RS HIGHEST DEGREE 1.000 .580 RS HIGHEST DEGREE 1.000 .618 FATHERS HIGHEST FATHERS HIGHEST 1.000 .647 1.000 .802 DEGREE DEGREE MOTHERS HIGHEST MOTHERS HIGHEST 1.000 .693 1.000 .675 DEGREE DEGREE GENERAL HAPPINESS 1.000 .667 GENERAL HAPPINESS 1.000 .807 HAPPINESS OF HAPPINESS OF 1.000 .754 1.000 .830 MARRIAGE MARRIAGE Extraction Method: Principal Component Analysis. Extraction Method: Principal Component Analysis. a. Only cases for which SPLIT = 0 are used a. Only cases for which SPLIT = 1 are used in the analysis phase. in the analysis phase. Note how SPSS identifies for us which cases we selected for the analysis. SW388R7 Data Analysis & Computers II Comparing factor loadings Slide 61 The pattern of factor loading for both split samples shows the variables RS HIGHEST DEGREE; FATHERS HIGHEST DEGREE; and MOTHERS HIGHEST DEGREE loading on the first component, and GENERAL HAPPINESS and HAPPINESS OF MARRIAGE loading on the second component. a,b a,b Rotated Component Matrix Rotated Component Matrix Component Component 1 2 1 2 RS HIGHEST DEGREE .730 -.215 RS HIGHEST DEGREE .755 -.219 FATHERS HIGHEST FATHERS HIGHEST .789 .154 .895 -.043 DEGREE DEGREE MOTHERS HIGHEST MOTHERS HIGHEST .794 .251 .819 .064 DEGREE DEGREE GENERAL HAPPINESS .248 .778 GENERAL HAPPINESS .049 .897 HAPPINESS OF HAPPINESS OF -.102 .862 -.183 .893 MARRIAGE MARRIAGE Extraction Method: Principal Component Analysis. Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 3 iterations. a. Rotation converged in 3 iterations. b. Only cases for which SPLIT = 0 are used in b. Only cases for which SPLIT = 1 are used in the analysis phase. the analysis phase. SW388R7 Data Analysis & Computers II Interpreting the validation results Slide 62 All of the communalities in both validation samples met the criteria. The pattern of loadings for both validation samples is the same, and the same as the pattern for the analysis using the full sample. In effect, we have done the same analysis on two separate sub- samples of cases and obtained the same results. This validation analysis supports a finding that the results of this principal component the population Rotated Component Matrix analysis are generalizable toRotated Component Matrix a,b a,b represented by this data set. Component Component 1 2 1 2 RS HIGHEST DEGREE .730 -.215 RS HIGHEST DEGREE .755 -.219 FATHERS HIGHEST FATHERS HIGHEST .789 .154 .895 -.043 DEGREE DEGREE MOTHERS HIGHEST MOTHERS HIGHEST .794 .251 .819 .064 DEGREE DEGREE GENERAL HAPPINESS .248 .778 When we are finished with .897 GENERAL HAPPINESS .049 HAPPINESS OF -.102 .862 this analysis, we should select HAPPINESS OF -.183 .893 MARRIAGE all cases back into the data MARRIAGE Extraction Method: Principal Component Analysis. set and remove the variables Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. we Method: Varimax with Kaiser Normalization. Rotation created. a. Rotation converged in 3 iterations. a. Rotation converged in 3 iterations. b. Only cases for which SPLIT = 0 are used in b. Only cases for which SPLIT = 1 are used in the analysis phase. the analysis phase. SW388R7 Data Analysis & Computers II Detecting outliers Slide 63 To detect outliers, we compute the factor scores in SPSS. Select the Factor Analysis command from the Dialog Recall tool button SW388R7 Data Analysis & Computers II Access the Scores Dialog Box Slide 64 Click on the Scores… button to access the factor scores dialog box. SW388R7 Data Analysis & Computers II Specifications for factor scores Slide 65 First, click on the Save as variables checkbox to create factor variables. Second, accept the Third, click on the default method using Continue button a Regression equation to complete the to calculate the specifications. scores. SW388R7 Data Analysis & Computers II Compute the factor scores Slide 66 Click on the Continue button to compute the factor scores. SW388R7 Data Analysis & Computers II The factor scores in the data editor Slide 67 SPSS creates the factor score variables in the data editor window. It names the first factor score “fac1_1,” and the second factor score “fac2_1.” We need to check to see if we have any values for either factor score that are larger than ±3.0. One way to check for the presence of large values indicating outliers is to sort the factor variables and see if any fall outside the acceptable range. SW388R7 Data Analysis & Computers II Sort the data to locate outliers for factor one Slide 68 First, select the fac1_1 column by clicking on its header. Second, right click on the column header and select the Sort Ascending command from the drop down menu. SW388R7 Data Analysis & Computers II Negative outliers for factor one Slide 69 Scroll down past the cases for whom factor scores could not be computed. We see that none of the scores for factor one are less than or equal to -3.0. SW388R7 Data Analysis & Computers II Positive outliers for factor one Slide 70 Scrolling down to the bottom of the sorted data set, we see that none of the scores for factor one are greater than or equal to +3.0. There are no outliers on factor one. SW388R7 Data Analysis & Computers II Sort the data to locate outliers on factor two Slide 71 First, select the fac2_1 column by clicking on its header. Second, right click on the column header and select the Sort Ascending command from the drop down menu. SW388R7 Data Analysis & Computers II Negative outliers for factor two Slide 72 Scrolling down past the cases for whom factor scores could not be computed, we see that none of the scores for factor two are less than or equal to -3.0. SW388R7 Data Analysis & Computers II Positive outliers for factor two Slide 73 Scrolling down to the bottom of the sorted data set, we see that one of the scores for factor two is greater than or equal to +3.0. We will run the analysis excluding this outlier and see if it changes our interpretation of the analysis. SW388R7 Data Analysis & Computers II Removing the outliers Slide 74 To see whether or not outliers are having an impact on the factor solution, we will compute the factor analysis without the outliers and compare the results. To remove the outliers, we will include the cases that are not outliers. Choose the Select Cases… command from the Data menu. SW388R7 Data Analysis & Computers II Setting the If condition Slide 75 Click on the If… button to enter the formula for selecting cases in or out of the analysis. SW388R7 Data Analysis & Computers II Formula to select cases that are not outliers Slide 76 First, type the formula as shown. The formula says: include cases if the absolute value of the first and second factor scores are less than 3.0. Second, click on the Continue button to complete the specification. SW388R7 Data Analysis & Computers II Complete the select cases command Slide 77 Having entered the formula for including cases, click on the OK button to complete the selection. SW388R7 Data Analysis & Computers II The outlier selected out of the analysis Slide 78 When SPSS selects a case out of the data analysis, it draws a slash through the case number. The case that we identified as an outlier will be excluded. SW388R7 Data Analysis & Computers II Repeating the factor analysis Slide 79 To repeat the factor analysis without the outliers, select the Factor Analysis command from the Dialog Recall tool button SW388R7 Data Analysis & Stopping SPSS from computing factor scores again Computers II Slide 80 On the last factor analysis, we included the specification to compute factor scores. Since we do not need to do this again, we will remove the specification. Click on the Scores… button to access the factor scores dialog. SW388R7 Data Analysis & Computers II Clearing the command to save factor scores Slide 81 First, clear the Save as variables checkbox. This will deactivate Second, click on the the Method options. Continue button to complete the specification SW388R7 Data Analysis & Computers II Computing the factor analysis Slide 82 To produce the output for the factor analysis excluding outliers, click on the OK button. SW388R7 Data Analysis & Computers II Comparing communalities Slide 83 All of the communalities All of the communalities for the factor analysis for the factor analysis including all cases satisfy excluding outliers satisfy the minimum requirement the minimum requirement of being larger than 0.50. of being larger than 0.50. Communalities Communalities Initial Extraction Initial Extraction RS HIGHEST DEGREE 1.000 .577 RS HIGHEST DEGREE 1.000 .579 FATHERS HIGHEST FATHERS HIGHEST 1.000 .720 1.000 .720 DEGREE DEGREE MOTHERS HIGHEST MOTHERS HIGHEST 1.000 .684 1.000 .681 DEGREE DEGREE GENERAL HAPPINESS 1.000 .745 GENERAL HAPPINESS 1.000 .726 HAPPINESS OF HAPPINESS OF 1.000 .782 1.000 .771 MARRIAGE MARRIAGE Extraction Method: Principal Component Analysis. Extraction Method: Principal Component Analysis. SW388R7 Data Analysis & Computers II Comparing factor loadings Slide 84 The factor loadings for the The factor loadings for the factor analysis including all factor analysis excluding cases is shown on the left. outliers is shown on the right. a Rotated Component Matrix a Rotated Component Matrix Component Component 1 2 1 2 RS HIGHEST DEGREE .732 -.202 RS HIGHEST DEGREE .734 -.201 FATHERS HIGHEST FATHERS HIGHEST .848 .031 .846 .060 DEGREE DEGREE MOTHERS HIGHEST MOTHERS HIGHEST .810 .169 .810 .157 DEGREE DEGREE GENERAL HAPPINESS .145 .851 GENERAL HAPPINESS .159 .837 HAPPINESS OF HAPPINESS OF -.145 .872 -.143 .866 MARRIAGE MARRIAGE Extraction Method: Principal Component Analysis. Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 3 iterations. a. Rotation converged in 3 iterations. The pattern of factor loading for both split analyses shows the variables RS HIGHEST DEGREE; FATHERS HIGHEST DEGREE; and MOTHERS HIGHEST DEGREE loading on the first component, and GENERAL HAPPINESS and HAPPINESS OF MARRIAGE loading on the second component. SW388R7 Data Analysis & Computers II Interpreting the outlier analysis Slide 85 All of the communalities satisfy the criteria of being greater than 0.50. The pattern of loadings for both analyses is the same. Whether we include or exclude outliers, our interpretation is the same. The outliers do not have an effect which supports their exclusion from the analysis. The part of the problem statement that outliers do not have an impact is true. a Rotated Component Matrix a Rotated Component Matrix Component Component 1 2 1 2 RS HIGHEST DEGREE .732 -.202 RS HIGHEST DEGREE .734 -.201 FATHERS HIGHEST FATHERS HIGHEST .848 .031 .846 .060 DEGREE When we are finished with DEGREE MOTHERS HIGHEST this analysis, we should select MOTHERS HIGHEST .810 .169 .810 .157 DEGREE all cases back into the data DEGREE GENERAL HAPPINESS .145 .851 set and remove the variables GENERAL HAPPINESS .159 .837 HAPPINESS OF we created. HAPPINESS OF -.145 .872 -.143 .866 MARRIAGE MARRIAGE Extraction Method: Principal Component Analysis. Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 3 iterations. a. Rotation converged in 3 iterations. SW388R7 Data Analysis & Computers II Computing Chronbach's Alpha Slide 86 To compute Chronbach's alpha for each component in our analysis, we select Scale | Reliability Analysis… from the Analyze menu. SW388R7 Data Analysis & Computers II Selecting the variables for the first component Slide 87 First, move the three variables that loaded on the first component to the Items list box. Second, click on the Statistics… button to select the statistics we will need. SW388R7 Data Analysis & Computers II Selecting the statistics for the output Slide 88 Second, click on the First, mark the Continue button. checkboxes for Item, Scale, and Scale if item deleted. SW388R7 Data Analysis & Computers II Completing the specifications Slide 89 Second, click on the OK button to produce the output. First, If Alpha is not selected as the Model in the drop down menu, select it now. SW388R7 Data Analysis & Computers II Chronbach's Alpha Slide 90 Chronbach's Alpha is located at the bottom of the output. An alpha of 0.60 or higher is the minimum acceptable level. Preferably, alpha will be 0.70 or higher, as it is in this case. SW388R7 Data Analysis & Computers II Chronbach's Alpha Slide 91 If alpha is too small, this column may suggest which variable should be removed to improve the internal consistency of the scale variables. It tells us what alpha we would get if the variable listed were removed from the scale. SW388R7 Data Analysis & Computers II Computing Chronbach's Alpha Slide 92 To compute Chronbach's alpha for each component in our analysis, we select Scale | Reliability Analysis… from the Analyze menu. SW388R7 Data Analysis & Selecting the variables for the second component Computers II Slide 93 First, move the three variables that loaded on the second component to the Items list box. Second, click on the Statistics… button to select the statistics we will need. SW388R7 Data Analysis & Computers II Selecting the statistics for the output Slide 94 Second, click on the First, mark the Continue button. checkboxes for Item, Scale, and Scale if item deleted. SW388R7 Data Analysis & Computers II Completing the specifications Slide 95 Second, click on the OK button to produce the output. First, If Alpha is not selected as the Model in the drop down menu, select it now. SW388R7 Data Analysis & Computers II Chronbach's Alpha Slide 96 Chronbach's Alpha is located at the bottom of the output. An alpha of 0.60 or higher is the minimum acceptable level. Preferably, alpha Second, it is will be 0.70 or higher, asclick in this case. SW388R7 Data Analysis & Computers II Answering the problem question Slide 97 Total Variance Explained Initial Eigenvalues Extraction Sums of Squared Loadings Ro Component Total % of Variance Cumulative % Total % of Variance Cumulative % Tot 1 1.626 40.651 40.651 1.626 40.651 40.651 1. The answer to the original question is true with caution. 2 1.119 27.968 68.619 1.119 27.968 68.619 1. 3 Component 1 includes the variables "highest academic degree" [degree], .694 17.341 85.960 "father's highest academic degree" [padeg], and "mother's highest 4 academic degree" [madeg]. We can substitute one component variable for .562 14.040 100.000 this combination of Component further Extraction Method: Principal variables in Analysis.analyses. Component 2 includes the variables "general happiness" [happy] and "happiness of marriage" [hapmar]. We can substitute one component variable for this combination of variables in further analyses. The components explain at least 50% of the variance in each of the variables included in the final analysis. The components explain 70.169% of the total variance in the variables which are included on the components. A caution is added to our findings because of the inclusion of ordinal level variables in the analysis. SW388R7 Data Analysis & Computers II Validation with small samples Slide 98 In the validation example completed above, 105 cases were used in the final principal component analysis model. When we have more than 100 cases available for the validation analysis, an even split should generally results in 50+ cases per validation sample. However, if the number of cases available for the validation is less than 100, then splitting the sample in two may result in a validation samples that are less than the minimum of 50 cases to conduct a factor analysis. When this happens, we draw two random samples of cases that are both larger than the minimum of 50. Since some of the same cases will be in both validation samples, the support for generalizability is not as strong, but it does offer some evidence, especially if we repeat the process a number of times. SW388R7 Data Analysis & Computers II Validation with small samples Slide 99 We randomly create two split variables which we will call split1 and split 2, using a separate random number see for each. In the formula for creating the split variables, we set the proportion of cases sufficient to randomly select fifty cases. To calculate the proportion that we need, we divide 50 by the number of valid cases in the analysis and round up to the next highest 10% increment. For example, if we have 80 valid cases, the proportion we need for validation is 50 / 80 = 0.625, which we would round up to 0.70 or 70%. The formulas for the split variables would be: split1 = uniform(1) <= 0.70 split2 = uniform(1) <= 0.70 SW388R7 Data Analysis & Computers II Validation with very small samples Slide 100 When the number of valid cases in a factor analysis gets close to the lower limit of 50, the results of the validation may appear to support the analysis, but this can be misleading because the validation samples are not really different from the analysis of the full data set. For example, if the number of valid cases were 60, a 90% sub-sample of 54 would result in 54 cases being the same in both the full analysis and the validation analysis. The validation may appear to support the full analysis simply because the validation had limited opportunity to be different. SW388R7 Data Analysis & Computers II Problem 2 Slide 101 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problematic pattern of missing data. Use a level of significance of 0.05. Validate the results of your principal component analysis by repeating the principal component analysis on two 70% random samples of the data set, using 743911 and 747454 as the random number seeds. Based on the results of a principal component analysis of the 7 variables "claims about environmental threats are exaggerated" [grnexagg], "danger to the environment from modifying genes in crops" [genegen], "America doing enough to protect environment" [amprogrn], "should be international agreements for environment problems" [grnintl], "poorer countries should be expected to do less for the environment" [ldcgrn], "economic progress in America will slow down without more concern for environment" [econgrn], and "likelihood of nuclear power station damaging environment in next 5 years" [nukeacc], the information in these variables can be represented with 2 components and 3 individual variables. Cases that might be considered to be outliers do not have an impact on the factor solution. The internal consistency of the variables included in the components is sufficient to support the creation of a summated scale. Component 1 includes the variables "danger to the environment from modifying genes in crops" [genegen] and "likelihood of nuclear power station damaging environment in next 5 years" [nukeacc]. Component 2 includes the variables "claims about environmental threats are exaggerated" [grnexagg] and "poorer countries should be expected to do less for the environment" [ldcgrn]. The variables "economic progress in America will slow down without more concern for environment" [econgrn], "should be international agreements for environment problems" [grnintl], and "America doing enough to protect environment" [amprogrn] were not included on the components and are retained as individual variables. 1. True 2. True with caution 3. False 4. Inappropriate application of a statistic SW388R7 Data Analysis & Computers II The principal component solution Slide 102 A principal component analysis found a two-factor solution, with four of the original seven variables loading on the components. The communalities and factor loadings are shown below. a Communalities Rotated Component Matrix Initial Extraction Component ENVIRONMENTAL 1 2 THREATS 1.000 .615 ENVIRONMENTAL EXAGGERATED THREATS -.207 .756 HOW DANGEROUS EXAGGERATED MODIFYING GENES IN 1.000 .694 HOW DANGEROUS CROPS MODIFYING GENES IN .801 -.229 POOR COUNTRIES CROPS LESS THAN RICH FOR 1.000 .691 POOR COUNTRIES ENVIRONMENT LESS THAN RICH FOR .051 .830 LIKELIHOOD OF ENVIRONMENT NUCLEAR MELTDOWN 1.000 .744 LIKELIHOOD OF IN 5 YEARS NUCLEAR MELTDOWN .861 .059 Extraction Method: Principal Component Analysis. IN 5 YEARS Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 3 iterations. SW388R7 Data Analysis & Computers II The size of the validation sample Slide 103 Descriptiv e Statistics Mean Std. Deviation Analysis N ENVIRONMENTAL THREATS 3.28 1.008 75 EXAGGERATED HOW DANGEROUS MODIFYING GENES IN 3.11 .953 75 CROPS POOR COUNTRIES LESS THAN RICH FOR 3.77 .863 75 ENVIRONMENT LIKELIHOOD OF NUCLEAR MELTDOWN 2.47 There were 75 valid cases in the final.935 75 IN 5 YEARS analysis. The sample is to small to split in half and have enough cases to meet the minimum of 50 cases for factor analysis. We will draw two random samples that each comprise 70% of the full sample. We arrive at 70% by dividing the minimum sample size by the number of valid cases (50 ÷ 75 = 0.667) and rounding up to the next 10% increment, 70%. SW388R7 Data Analysis & Computers II Split-sample validation Slide 104 The first random number seed stated in the problem is 743911, so we enter this is the SPSS random number seed dialog. To set the random number seed, select the Random Number Seed… command from the Transform menu. SW388R7 Data Analysis & Computers II Set the random number seed for first sample Slide 105 First, click on the Set seed to option button to activate the text box. Second, type in the random seed stated in the problem. Third, click on the OK button to complete the dialog box. Note that SPSS does not provide you with any feedback about the change. SW388R7 Data Analysis & Computers II Select the compute command Slide 106 To enter the formula for the variable that will split the sample in two parts, click on the Compute… command. SW388R7 Data Analysis & Computers II The formula for the split1 variable Slide 107 First, type the name for the new variable, split1, into the Target Variable text box. Second, the formula for the value of split1 is shown in the text box. The uniform(1) function generates a random decimal number between 0 and 1. The random number is compared to the value 0.70. If the random number is less than or equal to 0.70, the value of the formula will be 1, the SPSS numeric equivalent to true. If the random number is larger than 0.70, the formula will return a 0, the SPSS numeric equivalent Third, click on the to false. OK button to complete the dialog box. SW388R7 Data Analysis & Set the random number seed for second sample Computers II Slide 108 First, click on the Set seed to option button to activate the text box. Second, type in the random seed stated in the problem. Third, click on the OK button to complete the dialog box. Note that SPSS does not provide you with any feedback about the change. SW388R7 Data Analysis & Computers II Select the compute command Slide 109 To enter the formula for the variable that will split the sample in two parts, click on the Compute… command. SW388R7 Data Analysis & Computers II The formula for the split2 variable Slide 110 First, type the name for the new variable, split2, into the Target Variable text box. Second, the formula for the value of split2 is shown in the text box. The uniform(1) function generates a random decimal number between 0 and 1. The random number is compared to the value 0.70. If the random number is less than or equal to 0.70, the value of the formula will be 1, the SPSS numeric equivalent to true. If the random number is larger than 0.70, the formula will return a 0, the SPSS numeric equivalent Third, click on the to false. OK button to complete the dialog box. SW388R7 Data Analysis & Repeating the analysis with the first validation sample Computers II Slide 111 To repeat the principal component analysis for the first validation sample, select Factor Analysis from the Dialog Recall tool button. SW388R7 Data Analysis & Computers II Using split1 as the selection variable Slide 112 First, scroll down the list of variables and highlight the variable split1. Second, click on the right arrow button to move the split1 variable to the Selection Variable text box. SW388R7 Data Analysis & Computers II Setting the value of split1 to select cases Slide 113 When the variable named split1 is moved to the Selection Variable text box, SPSS adds "=?" after the name to prompt up to enter a specific value for split1. Click on the Value… button to enter a value for split1. SW388R7 Data Analysis & Computers II Completing the value selection Slide 114 First, type the value Second, click on the for the first sample, 1, Continue button to into the Value for complete the value Selection Variable text entry. box. SW388R7 Data Analysis & Requesting output for the first validation sample Computers II Slide 115 Click on the OK button to request the output. When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells Since the validation analysis SPSS to include in the requires us to compare the analysis only those cases results of the analysis using that have a value of 1 for the first validation sample, the split1 variable. we will request the output for the second validation sample before doing any comparison. SW388R7 Data Analysis & Repeating the analysis with the second validation sample Computers II Slide 116 To repeat the principal component analysis for the second validation sample, select Factor Analysis from the Dialog Recall tool button. SW388R7 Data Analysis & Computers II Removing split1 as the selection variable Slide 117 First, highlight the Selection Variable text box. Second, click on the left arrow button to move the split1 back to the list of variables. SW388R7 Data Analysis & Computers II Using split2 as the selection variable Slide 118 First, scroll down the list of variables and highlight the variable split2. Second, click on the right arrow button to move the split2 variable to the Selection Variable text box. SW388R7 Data Analysis & Computers II Setting the value of split2 to select cases Slide 119 When the variable named split2 is moved to the Selection Variable text box, SPSS adds "=?" after the name to prompt up to enter a specific value for split2. Click on the Value… button to enter a value for split2. SW388R7 Data Analysis & Computers II Completing the value selection Slide 120 First, type the value Second, click on the for the second sample, Continue button to 1, into the Value for complete the value Selection Variable text entry. box. SW388R7 Data Analysis & Requesting output for the second validation sample Computers II Slide 121 Click on the OK button to request the output. When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 1 for the split2 variable. SW388R7 Data Analysis & Comparing the communalities for the validation samples Computers II Slide 122 All of the communalities All of the communalities for the first validation for the second validation sample satisfy the sample satisfy the minimum requirement of minimum requirement of being larger than 0.50. being larger than 0.50. Communalitiesa Communalitiesa Initial Extraction Initial Extraction ENVIRONMENTAL ENVIRONMENTAL THREATS 1.000 .631 THREATS 1.000 .672 EXAGGERATED EXAGGERATED HOW DANGEROUS HOW DANGEROUS MODIFYING GENES IN 1.000 .648 MODIFYING GENES IN 1.000 .679 CROPS CROPS POOR COUNTRIES POOR COUNTRIES LESS THAN RICH FOR 1.000 .773 LESS THAN RICH FOR 1.000 .732 ENVIRONMENT ENVIRONMENT LIKELIHOOD OF LIKELIHOOD OF NUCLEAR MELTDOWN 1.000 .691 NUCLEAR MELTDOWN 1.000 .746 IN 5 YEARS IN 5 YEARS Extraction Method: Principal Component Analysis. Extraction Method: Principal Component Analysis. a. Only cases for which SPLIT2 = 1 are used a. Only cases for which SPLIT1 = 1 are used in the analysis phase. in the analysis phase. SW388R7 Data Analysis & Comparing the factor loadings for the validation samples Computers II Slide 123 The factor loadings for the The factor loadings for the first second validation analysis validation analysis including all excluding outliers is shown on the cases is shown on the left. right. a,b Rotated Component Matrix a,b Rotated Component Matrix Component Component 1 2 1 2 ENVIRONMENTAL ENVIRONMENTAL THREATS .807 -.147 THREATS -.390 .692 EXAGGERATED EXAGGERATED HOW DANGEROUS HOW DANGEROUS MODIFYING GENES IN -.198 .800 MODIFYING GENES IN .795 -.123 CROPS CROPS POOR COUNTRIES POOR COUNTRIES LESS THAN RICH FOR .856 .007 LESS THAN RICH FOR .187 .859 ENVIRONMENT ENVIRONMENT LIKELIHOOD OF LIKELIHOOD OF NUCLEAR MELTDOWN .048 .862 NUCLEAR MELTDOWN .829 .061 IN 5 YEARS IN 5 YEARS Extraction Method: Principal Component Analysis. Extraction Method: Principal Component Analysis. The pattern of factor loading for both Rotation Method: Varimax with Kaiser Normalization. validation analyses shows theKaiser Normalization. Rotation Method: Varimax with same pattern of iterations. a. Rotation converged in 3variables, though the first and second component a. Rotation converged in 3 iterations. have switched places. b. Only cases for which SPLIT1 = 1 are used in b. Only cases for which SPLIT2 = 1 are used in the analysis phase. the analysis phase. The communalities and factor loadings of the validation analysis supports the generalizability of the factor model. SW388R7 Data Analysis & Computers II Steps in validation analysis - 1 Slide 124 The following is a guide to the decision process for answering problems about validation analysis: Is the number of valid cases greater than or No equal to 100? Yes •Set the first random seed and compute the split1 variable •Set the random seed and •Re-run factor with split1 = 1 compute the split variable •Set the second random seed •Re-run factor with split = 0 and compute the split2 variable •Re-run factor with split = 1 •Re-run factor with split2 = 1 Yes Are all of the No communalities in the False validations greater than 0.50? Yes SW388R7 Data Analysis & Computers II Steps in validation analysis - 2 Slide 125 Yes Does pattern of factor No loadings match pattern for False full data set? Yes True SW388R7 Data Analysis & Computers II Steps in outlier analysis - 1 Slide 126 The following is a guide to the decision process for answering problems about outlier analysis: Are any of the factor No scores outliers (larger than ±3.0)? True Yes Re-run factor analysis, excluding outliers Yes Are all of the No communalities excluding False outliers greater than 0.50? Yes SW388R7 Data Analysis & Computers II Steps in outlier analysis - 2 Slide 127 Yes Pattern of factor loadings No excluding outliers match False pattern for full data set? Yes True SW388R7 Data Analysis & Computers II Steps in reliability analysis Slide 128 The following is a guide to the decision process for answering problems about reliability analysis: Are Chronbach’s Alpha No greater than 0.60 for all False factors? Yes Are Chronbach’s Alpha No greater than 0.70 for all True with caution factors? Yes True