Document Sample

Analyzing Patterns of Missing Data While SPSS contains a rich set of procedures for analyzing patterns of missing data, they are not included in the set of tools licensed by the University. However, we can replicate much of the analysis with other SPSS procedures. The first set of tasks in the missing data analysis involve the creation of diagnostic variables that support the analysis: first, a variable that counts the number of variables with missing data for each case; second, one new dichotomous variable for each original variable that indicates whether or not the original variable had a missing data value; and third, a single pattern variable for each case that summarizes the missing or valid status of values for all of the variables in the analysis. Using the diagnostic variable that counts the missing values for each case, we can identify cases with large concentrations of missing data as candidates for elimination from the analysis. After we remove specific cases with large numbers of missing variables, we do a frequency distribution for the remaining cases to see if any variables have so many missing cases that the variable should be considered a candidate for exclusion. Next, we compute a frequency distribution for the pattern variable to identify patterns that occur often in the data, indicating a problematic missing data process. Next, using the valid/missing variables as a grouping variable, we examine whether or not the missing cases are statistically different from the valid cases for all of the other variables in the analysis. If the variable is metric, we do a t-test for group differences; if the variable is non-metric, we do a chi-square test of independence to detect group differences. Finally, we do a correlation matrix of the valid/missing variables to detect concentrations of missing data across multiple variables. Analyzing Patterns of Missing Data Slide 1 1. Download the data set Download the HATMISS data set from the course web page and save it in your C:\SW388R7 folder. Analyzing Patterns of Missing Data Slide 2 2. Tallying the Number of Missing Variables One of the major information items we need for the missing data analysis is the number of variables that have missing data for each case in the sample. We will create a new variable which we will name num_miss that will contain the number of variables from the first ten in the data set, x1 through x10. We include only the first ten variables in this calculation to maintain consistency with the text. The SPSS function NMISS counts the number of variables that have missing values. We will use this function to calculate the value for our NUM_MISS variable for each case. Analyzing Patterns of Missing Data Slide 3 Computing the Number Missing by Case First, select the 'Compute…' command from the 'Transform' menu. Fourth, click on the move arrow to move the function to the 'Numeric Expression:' text area. Second, type the name of the variable we want to create, 'num_miss', in the 'Target Variable:' text box. Third, scroll down the list of functions and highlight the 'NMISS' function. Analyzing Patterns of Missing Data Slide 4 Specifying the Variables in the Function First, type the names of the variables to include in the function as a comma-delimited list between the parentheses after the function. Second, click on the OK button to produce the new variable. Third, the new variable appears in a column to the right of the existing columns of data. Analyzing Patterns of Missing Data Slide 5 3. Creating Dichotomous Valid/Missing Variables for Diagnosing Missing Data To determine whether or not the pattern of missing data is random, we create a special diagnostic variable that indicates whether the variable is missing or valid for each case in the data set. Each diagnostic variable is dichotomous, using the value 1 for 'Valid' and the value 0 for 'Missing' Since we may need to refer back to the original variables in the course of the missing data analysis, I recommend a naming convention for the diagnostic variables that makes it easy to identify the original variable. If the original variable name is less than eight characters, an underscore is appended to the end of the original variable name, e.g. the diagnostic variable for race would be race_. If the original variable name is eight characters, the last character is replaced with an underscore, e.g. the diagnostic variable name for response would be respons_. If replacing the last character with an underscore duplicates the name assigned to another diagnostic variable for an eight- character variable name, we drop the last two characters from the original name and append an underscore followed by a sequence letter or digit, e.g. the diagnostic variable name for response would be respon_1 if we had already used the name respons_ for a diagnostic variable. When we assign variable labels to the diagnostic variables, we can add a keyword to the original variable label to designate it as a missing/valid diagnostic variable, e.g. the variable label for the diagnostic variable that had an original variable label of Grade Level could be Grade Level (Valid/Missing). We will demonstrate the process of creating dichotomous Valid/Missing variables for diagnosing missing data using the variables in the HATMISS.SAV data set. If the copy of HATMISS.SAV that you are working with does not have variable labels and value labels, do the exercise Applying a Data Dictionary to apply the data labels from the HATCO.SAV data set to the HATMISS.SAV data set. A quick test for the presence of variable labels is to position the mouse over a variable name in the data editor. If a variable label appears in a yellow tips box, a variable label has been added for that variable. Analyzing Patterns of Missing Data Slide 6 Recoding Diagnostic Variables for Missing Data First, select the 'Recode | Into Different Values…' command from the 'Transform' menu. Fifth, click on the Change button to move the new name to the 'Numeric Variable -> Output Variable' list. Second, move the first original variable, x1, from the input list to the list box 'Numeric Variable -> Output Variable.' Third, type the new variable name, x1_, Fourth, type the variable into the 'Name:' text label for the new x1_ box on the 'Output variable, 'Delivery Speed Variable' panel. (Valid/Missing)' in the Label text box on the 'Output Variable' panel. Analyzing Patterns of Missing Data Slide 7 Opening the Dialog for Old and New Values To specify which old values are to be recoded into new values, click on the 'Old and New Values…' button. Analyzing Patterns of Missing Data Slide 8 Add the Value for Missing Data Second, type 0 into the 'Value:' text box in the 'New Value' panel. First, click on the 'System- or user- missing' option button on the 'Old Value' panel. Third, to add these value changes to the list of recodes, click on the 'Add' button. The change is added to the 'OldNew:' list. Analyzing Patterns of Missing Data Slide 9 Add the Value for Valid Data Second, type 1 into the 'Value:' text box in the 'New Value' panel. First, click on the 'All other values' option button on the 'Old Value' panel. Third, to add these value changes to the list of recodes, click on the 'Add' button. The change is added to the 'OldNew:' list. Analyzing Patterns of Missing Data Slide 10 Completing the Values Dialog Box Since this is the last value specification, we click on the continue button to close the dialog box. Analyzing Patterns of Missing Data Slide 11 Adding Diagnostic Variables for the Remaining Variables First, add the original name, the new diagnostic variable name, and the variable label for the diagnostic variable for all of the other variables through x14. The same value changes which we specified for x1_ will be applied to these variables. Second, click on the OK button to complete the recode request. Analyzing Patterns of Missing Data Slide 12 Adding Value Labels to the Diagnostic Variables Second, highlight the Values cell for the variable we want to work with. Third, click on the gray dialogue box which appears in the cell to bring up the Value Labels dialogue box. To add value labels to the diagnostic variables, first we go to the Variable View worksheet in the Data Editor. Analyzing Patterns of Missing Data Slide 13 Adding the Value Label for Missing First, we type a 0 in the 'Value' text box on the 'Value Labels' Panel. Second, we type 'Missing' in the 'Value Label' text box on the 'Value Labels' Panel. Third, we click on the Add button to add this value label to the list box. Analyzing Patterns of Missing Data Slide 14 Add the Value Label for Valid First, we type a 1 in the 'Value' text box on the 'Value Labels' Panel. Second, we type 'Valid' in the 'Value Label' text box on the 'Value Labels' Panel. Third, we click on the Add button to add this value label to the list box. Analyzing Patterns of Missing Data Slide 15 Apply the Value Labels First click on the OK button to apply the value labels. Analyzing Patterns of Missing Data Slide 16 Displaying the Value Labels for the Variables To display the value labels in the SPSS Data Editor window, we first return to the Data View worksheet of the Data Editor. There we select the 'Value Labels' command from the View menu. When the command is in effect, a check mark will appear before the command. To restore the display to the numeric code display, we select the 'Value Labels' command a second time to toggle it off. Analyzing Patterns of Missing Data Slide 17 The Diagnostic Variables The value labels for the variables appear in the SPSS Data Editor. The display would be improved by adjusting the width of the data columns. This display can be used to examine the pattern of missing values as the text does in table 2.3. Analyzing Patterns of Missing Data Slide 18 4. Adding a Pattern Variable to the Data Set Another indication of a problematic missing data process is the frequent occurrence of the same pattern of missing data among the variables. While patterns can be detected by sorting and scanning the data set, this task is facilitated by the creation of a pattern variable. The pattern variable is a string variable containing one character for each variable in the data set. Each character in the pattern variable is set to a character indicating missing data or a character indicating valid data. To make the pattern more visually intuitive, the characters selected should have the same width when printed. If we do not use same width characters, we cannot scan down values to compare them because the column alignment of the characters is not the same from one value to the next. We will use an X for missing data and a tilde, ~, for valid data, because both are full width characters. To create the pattern variable, we first create a one-character string variable for each of the original variables. Then, we use the SPSS 'CONCAT' function to add the string variables together into a single variable. Analyzing Patterns of Missing Data Slide 19 Recode the Original Variables into String Variables First, select the 'Recode Fourth, type the | Into name for the new Different variable 'x1_x' in the Variables…' 'Name:' text box on command the 'Output Variable' from the panel. Transform menu. Third, move the variable 'Delivery Speed [x1]' to the 'Numeric Variable - > Output Variable' list box. Fifth, click on the Change button to move the name Second, click on 'x1_x' to the the Reset button to 'Numeric Variable -> clear the Output Variable' list previously recoded box. variables. Analyzing Patterns of Missing Data Slide 20 Opening the Dialog for Old and New Values To specify which old values are to be recoded into new values, click on the 'Old and New Values…' button. Analyzing Patterns of Missing Data Slide 21 Add the Value for Missing Data Fourth, type 'X' into the 'Value:' text box in First, click on the 'New Value' panel. the 'System- or user-missing' option button on the 'Old Value' panel. Fifth, to add these value changes to the list of recodes, click on the 'Add' button. The change is added to the 'OldNew:' list. Second, click on the Third, set the 'Width' 'Output variables are of the output variables strings' check box. to 1 character. Analyzing Patterns of Missing Data Slide 22 Add the Value for Valid Data Second, type '~' (a tilde) First, click on the 'All into the 'Value:' text box in other values' option the 'New Value' panel. I button on the 'Old chose a tilde rather than a Value' panel. blank because they will be easier to see. Third, to add these value changes to the list of recodes, click on the 'Add' button. The change is added to the 'OldNew:' list. Analyzing Patterns of Missing Data Slide 23 Completing the Values Dialog Box Since this is the last value specification, we click on the continue button to close the dialog box. Analyzing Patterns of Missing Data Slide 24 Adding String Variables for the Other Original Variables First, add the original name and the new string variable name for all of the other variables through x10. The same value changes which we specified for x1 will be applied to these variables. Second, click on the OK button to complete the recode request. Analyzing Patterns of Missing Data Slide 25 The String Variables The recoded string variables for variables Delivery Speed (x1) through Satisfaction Level (x10) are added to the data editor window. Analyzing Patterns of Missing Data Slide 26 Create the Variable Containing the Concatenated Data First, select the 'Compute…' command from the Transform menu to create a new variable. Second, after clicking on the Reset button to clear the last recoded variable, type the name for the new variable 'miss_str' into the 'Target Variable:' text box. Third, click on the 'Type&Label…' button to set the type of variable to string. Sixth, click on the 'Continue' button to close the 'Type and Label' dialog box. Fourth, in the 'Type' panel mark the Fifth, set the 'Width:' of the new 'String' option button. variable to 10 characters, one for each of the ten string variables. Analyzing Patterns of Missing Data Slide 27 Enter the Formula for the Concatenated Variable First, highlight the 'CONCAT' function in the 'Functions:' list box and move it to the 'String Expression:' text area. Second, type the names of the string Third, click the OK button to variables as a comma complete the compute variable delimited list between function. the parentheses following the CONCAT function name. Analyzing Patterns of Missing Data Slide 28 The Missing Data Pattern Variable One variable now contains a string that has one character for each string variable. This variable contains the pattern of missing and valid data for each case in the data set. We have made a lot of changes to the HATMISS data set that we should save, so we click on the Save File tool. This completes the creation of the diagnostic variables we need to conduct the missing data analysis. Analyzing Patterns of Missing Data Slide 29 5. Removing Cases with a Large Proportion of Missing Variables To identify the cases that we should consider removing, we will sort the data set in descending order by the number of missing variables. The candidates for elimination will appear at the top of the data set. Once we have located the cases that we want to eliminate, we specify a filter condition to eliminate the cases from further analysis. The cases are not deleted from the data set, so we can include them in later analysis should we desire to do so. Analyzing Patterns of Missing Data Slide 30 Sorting the Cases It will be easier to identify problem cases if we sort the cases by the 'num_miss' variable. First, select the 'Sort Cases…' command from the Data menu. Fourth, click on the OK button to sort the data set. Second, click on the 'Descending' option button in the 'Sort Order' panel so that the cases with the largest number of Third, in the 'Sort missing values Cases' dialog, move appear at the top of the 'num_miss' the data set. variable to the 'Sort by:' list box. Analyzing Patterns of Missing Data Slide 31 The Cases Sorted by Number Missing At the top of the sorted data set, we see the six cases which had missing values on 5, 6, or 7 of the original ten values (missing 50% or more of the data). These are the cases that will be excluded from further analysis. Analyzing Patterns of Missing Data Slide 32 Excluding the Cases We exclude the cases with too Second, we mark many missing the 'If condition is values by not satisfied' option selecting them button in the for inclusion in Select panel. later analyses. First, we select the 'Select Cases…' command from the Data menu. Third, we click on the 'If…' button to specify the condition for inclusion. Analyzing Patterns of Missing Data Slide 33 Specifying the If Condition First, move the 'num_miss' variable to Second, complete the condition the condition text area by type '< 5' (less than 5) after on the right. the variable name. This 'If condition' specifies that a case will be included if the value of its 'num_miss' variable is less than 5, i.e. 4, 3, 2, 1 or 0. Cases that have a 'num_miss' value equal to five or greater than 5 will not be included. Third, click on the Continue button to signal completion of the IF condition. Analyzing Patterns of Missing Data Slide 34 Specify Filtering for Unselected Cases We have two options for removing cases that do not satisfy the selection criteria: deletion from the data set and filtering from the data set. Deletion physically removes the cases from the data set permanently. Filtering leaves the cases in the data set, but marks them for exclusion from the analyses. With the cases still in the data set, we can choose to include them in a later analysis. First, mark the 'Filtered' option button Second, click of the OK on the 'Unselected button to complete the Cases Are' panel. selection process. Analyzing Patterns of Missing Data Slide 35 The Data Set with Filtered Cases The cases that did not meet the selection criteria are marked with a diagonal line or slash through their case number. In addition, SPSS added a new variable to the data set, 'filter_$', which has a value of 1 if the case is included, and a value of 0 if the case is not included. When applying a selection criteria, it is good practice to spot check our cases to make certain we specified the 'IF' condition correctly. In this problem, we see that cases 1 through 6, which have num_miss values greater than 4, all have a slash through their case number. Cases 7 through 11, which have num_miss values less than 4, do not have a slash and will still be included in the analyses. Analyzing Patterns of Missing Data Slide 36 6. Summary Statistics for the Unfiltered Cases Filtering cases with 50% or more missing data removed six cases from the data set, reducing our effective sample size to 64 cases. We next look at a frequency distribution for each variable to see if any variables have such a high proportion of missing data that they should be considered candidates for removal from the analysis. We can see the distribution of missing data on each of our variables by using the Frequencies command, which produces the SPSS output equivalent to Table 2.2 on page 56 of the text. We will use a Frequencies command instead of a Descriptives command, because the Frequencies command will provide a count of the remaining missing cases for each variable. Analyzing Patterns of Missing Data Slide 37 Requesting the Frequency Distributions First, select the 'Descriptive Statistics | Frequencies…' command from the Analyze menu. Second, move the variables Delivery Speed through Third, clear the check mark Satisfaction Level (x1 from the 'Display frequency through x10) to the tables' check box. Frequency 'Variable(s):' list box. tables for continuous variables would generate a large volume of output that we do not need. Fourth, click on the 'Statistics…' button to request the mean and standard deviation. Analyzing Patterns of Missing Data Slide 38 Requesting Specific Statistics First, mark the check boxes for 'Mean' and 'Std. Deviation'. All other check boxes should be clear. Third, when the 'Frequencies: Second, click on the Statistics' dialog Continue button to close is closed, click on the 'Frequencies: the OK button to Statistics' dialog box. request the output. Analyzing Patterns of Missing Data Slide 39 The Frequencies Output The frequencies table contains all of the information items in table 2.2. of the text. The horizontal orientation of the table makes it difficult to read. We will change its orientation. Analyzing Patterns of Missing Data Slide 40 Changing the Orientation of the Table First, double click on the table to activate it for editing. When the table is activated, it displays a hatched line border. Second, select the 'Transpose Rows and Columns' command from the Pivot menu. Analyzing Patterns of Missing Data Slide 41 The Transposed Frequencies Table The number of cases in the column labeled Valid are the number of cases that are not missing data for that variable. From studying this column, we see than Delivery Speed, Price Level, and Price Flexibility have the lowest number of valid cases, and thus the largest number of missing cases. For each of these variables, there are still a large number of cases that do not have missing data, so we would not automatically eliminate these variables from the analysis. There is no specific number for the proportion of missing cases that would require the variable to be eliminated. A variable that has 50% or more missing data would not have much credibility, and probably a variable with 40% missing data should be eliminated. However, a variable with 20 to 30% missing data might or might not be retained depending on its importance to the research question. Whatever we decide about missing data, we should identify our decisions in the research report. Analyzing Patterns of Missing Data Slide 42 7. Tabulating Missing Data Patterns In a previous exercise, Adding a Pattern Variable to the Data Set, we created a pattern variable that contained a single string of ten characters representing valid or missing data for the first ten variables in the data set. To create table 2.4 on page 58, we do frequency distribution on the pattern variable. This frequency distribution will tell us if there are one or two patterns of missing data that occur with sufficient frequency to require further investigation. Analyzing Patterns of Missing Data Slide 43 Request a Frequency Distribution for the Pattern Variable First, select the 'Descriptive Statistics | Frequencies…' Sixth, click on the command from the Analyze menu. OK button to complete the frequency request. Second, in the Frequencies dialog box, move the pattern variable, 'miss_str' to the 'Variable(s):' list box. Also be sure the Display ‘frequency tables box’ is checked in this box. Fourth, in the 'Frequencies: Format' dialog, mark the 'Descending counts' option in Fifth, click on the 'Order by' the Continue panel. This will button to close order the the frequency table to 'Frequencies: be from highest Format' dialog. count to lowest. Third, click on the Format… button in the Frequencies dialog box. Analyzing Patterns of Missing Data Slide 44 The Frequency of Different Patterns The results in the frequency table shows the incidence of different patterns. It agrees with the data in table 2.4 of the text, though the patterns are in a different order. As the text identifies, the most prevalent pattern is X1 missing and all other non-missing, with a frequency of 6. Followed by that is X1 and X3 missing, with a frequency of 4. All other patterns have a lower frequency of occurrence. This analysis tells us that we do not have a single missing-data pattern that occurs with sufficient frequency to impact the statistical analysis. Analyzing Patterns of Missing Data Slide 45 8. T-tests and Chi-square Tests for Diagnosing Randomness of Missing Data In previous exercises, we created dichotomous grouping variables for the variables X1 through X10, where the grouping variable was assigned a 1 if the data was valid and a 0 if the data was missing. We will use these grouping variables to determine whether the valid and missing groups differ in their relationship to other variables in the data set. If the missing and valid groups are statistically equivalent on other variables, then the missing cases can be characterized as random, and of no consequence to our analysis. If the missing group shows a statistically significant relationship to the other variable, it suggests that there is a missing data process that requires further understanding. The statistical tests that we use in this analysis are chi-square tests of independence, if the variable to be tested is nonmetric, or t-tests for two independent samples, if the variable to be tested is metric. The authors use the separate variance output for all t- tests instead of examining individual tests of homogeneity. We will follow this practice. When this analysis is conducted, there are usually a large number of statistical relationships tested. We know that using an alpha level of 0.05 in these tests implies that we will make an incorrect inference in one out of every twenty tests. With a large number of tests, we will get some statistically significant relationships even when there is no serious problem with our data. We are not looking at the individual test results, as much as we are concerned with an overall pattern of relationships. NOTE. I cannot reconcile the findings on these tests to the discussion of findings on page 58 of the text. The statistical results are consistent with table 2.5 on page 59, while the text discussion appears to be a carryover from the fourth edition of the text, which does not contain the same statistical results as the fifth edition. Analyzing Patterns of Missing Data Slide 46 The Statistical Tests to Be Computed We will use the grouping variable 'Delivery Speed (Valid/Missing)' (X1_) to explore differences among the next nine variables in the data set, 'Price Level' through 'Satisfaction Level' (X2 through X10). In each statistical test, we are testing the null hypothesis of no relationship associated with the grouping variable, 'Delivery Speed (Valid/Missing)'. If we reject the null hypothesis, we would conclude that persons who did not answer the question on Delivery Speed had a different pattern of responses than did persons who did provide Delivery Speed. The variable 'Firm Size' (x8) is a nonmetric variable and we will do a chi-square test of independence for this variable. The variables 'Price Level' (x2), 'Price Flexibility' (x3), 'Manufacturer Image' (x4), 'Service' (x5), 'Salesforce Image' (x6), 'Product Quality' (x7), 'Usage Level' (x9), and 'Satisfaction Level' (x10) are all metric and we will do t-tests for these variables. Analyzing Patterns of Missing Data Slide 47 The Chi-square Test of Independence First, we select the 'Descriptive Statistics Crosstabs' command from the Analyze menu. Second, we move the dependent variable, 'Firm Size (x8)', to the 'Row(s)' list box. Third, we move the independent, or grouping, variable 'Delivery Speed (Valid/Missing)' to the 'Column(s)' list box. Fourth, we click on the 'Statistics…' button at the bottom of the Crosstabs dialog to request the statistical test. Analyzing Patterns of Missing Data Slide 48 Requesting the Chi-square Test First, we mark the Chi- square test check box to request the statistical test. For this problem, we clear all of the other Second, click on the check boxes in this Continue button to dialog box. complete the request for statistical options. Analyzing Patterns of Missing Data Slide 49 Specifying Cell Contents Fourth, click on the OK button in the Crosstabs dialog to request the output. Third, click on the Continue button to conclude our First, we click on the specifications for 'Cells…' button to cell contents. specify what we want in the cells of the crosstabs table. Second, we mark the check boxes for 'Observed' Counts and 'Column' Percentages. If any other check boxes are marked, we clear them. Analyzing Patterns of Missing Data Slide 50 Chi-square Test Results Second, looking at the column percents in the crosstabulation table, we see that subjects who had a missing value for delivery speed were much more likely to be large firms than were subjects who had valid data for delivery speed (68.4% to 17.8%). This relationship requires further First, the chi-square consideration as a statistical test produced missing data process a significant Sig value, that could affect our so we reject the null analysis. hypothesis and conclude that designation of firm size was different for missing cases than for valid cases. Analyzing Patterns of Missing Data Slide 51 Requesting the T-tests First, we select the 'Compare Means | Independent- Samples T Test…' command from the Analyze menu. Second, we move the variables 'Price Level' (x2), 'Price Flexibility' (x3), 'Manufacturer Image' (x4), 'Service' (x5), 'Salesforce Image' (x6), 'Product Quality' (x7), 'Usage Level' (x9), and 'Satisfaction Level' (x10) to the list box for 'Test Variable(s):'. Fourth, we click on the 'Define Groups…' Third, we move the variable 'Delivery Speed (Valid/Missing)' to the text box for 'Grouping Variable:'. SPSS lists the name of the variable, 'x1_'. Analyzing Patterns of Missing Data Slide 52 Specifying the Groups by Code Number Third, we click on the Continue button to close the 'Define Groups' dialog box. First, we enter 0 in the 'Group 1:' text Fifth, we click on box. 0 indicates the OK button to missing data on the request the t-test original Delivery results. Speed variable. Second, we enter 1 in the 'Group 2:' text box. 1 indicates valid data on the original Delivery Fourth, we note that Speed variable. SPSS completed the group identifiers in the 'Grouping Variable:' text box. Analyzing Patterns of Missing Data Slide 53 Results of the T-tests Using the 'Equal variances not assumed' rows of the table, we see that there is a significant difference in average score for the variables 'Manufacturer Image' and 'Service.' There is no significant difference in means for 'Price Level' and 'Price Flexibility.' If we scroll down the list, we find that there are significant relationships also with 'Usage Level' and 'Satisfaction Level.' These significant findings reinforce the notion that 'Delivery Speed' might be involved in a missing data process that requires further understanding before proceeding with the analysis. Analyzing Patterns of Missing Data Slide 54 9. The Correlation Matrix for Diagnosing Randomness of Missing Data To continue our missing data analysis, we run a correlation matrix for the dichotomous grouping variables: 'Delivery Speed (Valid/Missing)', 'Price Level (Valid/Missing)', 'Price Flexibility (Valid/Missing)', 'Manufacturer Image (Valid/Missing)', 'Service (Valid/Missing)', 'Salesforce Image (Valid/Missing)', 'Product Quality (Valid/Missing)', 'Usage Level (Valid/Missing)', and 'Satisfaction Level (Valid/Missing)'. We examine the pattern of correlations to see if there is are large correlations among multiple pairs of variables that do not have an obvious explanation. An obvious explanation would be that subjects only answered these questions if their answer to another question were some value, e.g. only answer the question about job satisfaction if you are employed. If there are variables that show a strong pattern of systematic missing data without an obvious explanation, we should evaluate the impact that this pattern has on our research questions, and make our decision about including, eliminating, or substituting for these variables. Analyzing Patterns of Missing Data Slide 55 Requesting the Correlation Matrix First, select the 'Correlate | Bivariate…' command from the Analyze menu. Second, move the Valid/Missing diagnostic variables for the metric variables to the 'Variables:' list box. Fourth, click on Third, accept the OK button to the defaults of produce the 'Pearson' for correlation matrix. 'Correlation Coefficients', 'Two-tailed' for 'Test of significance', and 'Flag significant correlations'. Analyzing Patterns of Missing Data Slide 56 The Correlation Matrix Output Our correlation matrix shows the same pattern as shown in the text in table 2.6 on page 60 of the text. As discussed on page 60 of the text, there is only one moderate correlation in this table, Salesforce Image and Satisfaction level. The pattern for missing data is restricted to these variables, so we do not have a serious problem. Analyzing Patterns of Missing Data Slide 57

DOCUMENT INFO

Shared By:

Categories:

Tags:
missing data, missing values, multiple imputation, data analysis, data set, categorical variables, data patterns, descriptive statistics, data sets, standard errors, pattern analysis, spss inc, the missing, the pattern, grouping variables

Stats:

views: | 13 |

posted: | 8/23/2010 |

language: | English |

pages: | 57 |

OTHER DOCS BY aqi13375

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.