Creating Something from Nothing Synthetic and Dummy files
Document Sample


Creating Something from Nothing: Synthetic and Dummy files
National DLI Workshop
May 26, 2003
Ottawa, Ontario
Exercise I: Providing Service for the NPHS 2000-2001 Synthetic File
This exercise consists of four parts.
Looking at the construction of the SPSS system file for the longitudinal
household health file;
Reviewing the procedure for submitting a remote job submission;
Finding variables;
Capturing SPSS code from a “dummy” analysis run that could be used with a
remote job submission.
The data used in this exercise are from the nphs/Synthetic_Files-Dummy_files directory
on the DLI FTP site (see Figure 1 below) and are from the 2000-2001 cycle.
Figure 1: Location on the DLI FTP Site
Each of the files in this directory is a very large compressed file that contains several
files, including a directory structure for organizing these files. For example, the 2000-
2001 NPHS synthetic file is over 94 megabytes compressed. Uncompressed, the
contents are over 439 megabytes. When working with these files, make sure that you
Bo Wandschneider 1
Chuck Humphrey
keep the directory structure when you unzip the contents. 1 This will help your patron
more easily identify the material that she or he requires.
For the purposes of this computing exercise, these files have been downloaded and
extracted for you. You will find them located on the C: drive of your workstation under
the directory, C:\NPHS. If you would like to know further details about downloading and
extracting these files, please ask one of the instructors.
Step A: Create an SPSS system file for the 2000-2001 Longitudinal Household
File
1. Open an Explore window by right-clicking on the Start button and selecting
“Explore”. Locate the folder, C:\NPHS and click once on the icon for this
directory.
How many folders are within C:\NPHS?
Record the names of these folders:
2. Open the document within the C:\NPHS folder, ReadMe.pdf. This file
documents the files within this folder and their content. It also contains important
information for the use of this synthetic product.
Use the ReadMe.pdf file to answer the following questions.
Which directory contains the synthetic raw data file for the household health file for
2000-2001?
What is the actual file name for this synthetic file?
Which directory contains the file with the SPSS data list command?
What is the actual file name for this SPSS file?
3. A copy of the file editor, PFE, 2 has been included with this exercise and is not
part of the official release of this synthetic file on the DLI FTP site. We have
included a copy of this freeware editor to allow you to view conveniently and
quickly the contents of the synthetic file. Double-click on the PFE file icon and
1
In Winzip, this is done by selecting “Use folder names” in the Extract dialogue box and entering
C:\NPHS as the directory.
2
For more information about PFE, see Jackie Godfrey‟s article about this freeware product in the DLI
Update.
Bo Wandschneider 2
Chuck Humphrey
then select File / Open. Go to C:\NPHS\DATA and double-click on
dumylong.txt.
Notice the length of each record. Press the “End” key, which places the cursor one
column beyond the last character on the record, and record the length of record
here:
The line number and column number is reported in the lower-left area of the window
frame of PFE.
4. Quit PFE by select File / Exit from the menu.
5. Start an SPSS session by selecting Start / Programs / SPSS for Windows /
SPSS 11.5. If you are prompted with a “What would like to do this” dialogue box,
cancel it.
6. From the menu, select File / Open / Syntax. Go to C:\NPHS\LAYOUT and
double-click on the file, LONG_i.sps.
7. In the Syntax Editor, enter the following new line above the Data List command:
File Handle Infile / name='C:\NPHS\DATA\dumylong.txt' / lrecl=6540.
8. Don‟t forget the period at the end of this command and dumy has only one „m‟.
Next, scroll to the end of this file and enter on a new line the following command:
EXECUTE.
Note: this is on the line after the single period.
9. Run these commands by selecting from the menu, Run / All. This command will
take a couple of minutes to complete. Don‟t panic. The contents of the synthetic
file will be displayed in the Data Editor once the command has finished. If you
don‟t see data in the Data Editor, ask your instructor for some assistance.
Three other SPSS command files need to be processed: one for variable labels,
another for value labels, and finally one for missing value declarations. The
version of SPSS for Windows doesn‟t process these large command files very
well. Therefore, we‟ll break each of these long command files into thirds and run
three parts for each file.
10. From the Syntax Editor, select File / Open and double-click on the file,
LONGvare.sps. This will begin a new Syntax Editor window. Use Edit / Find on
the menu and locate, FS_6_50. Type a period after the label for the preceding
variable, PC_6_46M , and enter a new line containing: VARIABLE LABELS (see
Figure 2 below).
11. Next, search for FH_8_10 and repeat step 9 by placing a period after the label of
ISC8D1 and entering a new line with VARIABLE LABLES.
Bo Wandschneider 3
Chuck Humphrey
Figure 2: Breaking the Variable Labels Command into Parts
12. Scroll to the bottom of the file and enter on a new line following the line with just
a period: EXECUTE. (don‟t‟ forget the period at the end of the command.) Next
select Run / All from the menu to process this file. This will take a few minutes to
complete. Once SPSS has finished processing this file, click on the Data Editor
window and select the “Variable View” tab at the bottom left-hand part of this
window‟s frame. Notice the inclusion of Labels, but nothing under Values and
Missing.
13. Begin editing the missing values command files next. From the Syntax Editor
select File / Open and double-click on LONGmiss.sps. Search for HCC6_4 and
place a period after (6 THRU 9). Enter on a new line: MISSING VALUES and
then search for MHC4_2. Repeat placing a period after (6 THRU 9) and enter
another MISSING VALUES command on a new line. Scroll to the bottom of the
file and enter after the line with a period, the command: EXECUTE. (then Run /
All from the menu.) Again, this will take a brief period of time to complete. Once
the command has completed, look at the Data Editor and notice the declaration
of missing values for many of the variables.
14. The remaining file to edit is LONGvale.sps. Open this file and find HCC4_1.
Replace the „/‟ just before HCC4_1 with a period. Then enter a new line beneath
this period and type: VALUE LABELS. Next, find MHC0_1F and repeat the
Bo Wandschneider 4
Chuck Humphrey
previous step. Finally, scroll to the bottom of the file and enter after the line with
a period, EXECUTE. (but don‟t run it yet.)
15. Find SP34_MET. This variable has an improper value label specification.
Remove the double-quotes in the specification for „” “‟ so that is reads „ „. Then
select Run / All from the menu. This file will take the longest to process. You will
see “SPSS Processor is ready” once the command has finished.
16. Upon the successful completion of this command, check the Data Editor‟s
variable view and see the addition of value labels for some variables. You now
have an SPSS system file that you can save to your workstation‟s hard drive. Go
to the Data Editor and select File / Save. Go to the directory, C:\NPHS and save
the file using the name, nphsynthetic2000.sav.
This is the file that you should consider distributing to patrons wanting to work with
the synthetic file for the 2000-2001 NPHS using SPSS.
Step B: Using Remote Job Submission with the NPHS
17. Go to the document, ReadMe.pdf, from step 2 above and read the first page
describing the process about gaining permission to use Remote Job Submission.
Does the Health Statistics Division require advanced approval to use “remote
access”?
Can a proposal be conducted through email?
To whom is the proposal sent?
How many categories of information are expected in a proposal for this type of
access?
Step C: Finding Variables in the NPHS
In the next two parts of this exercise, we will simulate a research project that has
received approval for remote job submission. The research task is to investigate the
use of alternative health care services among the Chinese ethnic community
differentiating those who have recently immigrated from Canadian-born and long-term
immigrants. The results will also control for the sex and age of the respondent.
18. Two files under C:\NPHS\DOC\PDF_E contain variable indexes with this
distribution. Begin an Acrobat session if you don‟t already have ReadMe.pdf
open and read the file, Index_t_e.pdf. This is a topical index that we will use to
locate the variables to identify Chinese, immigrant status, sex, age and the
number of alternative health care services used. For each of these, enter the
variable name in the table below.
Bo Wandschneider 5
Chuck Humphrey
Variable Variable Name(s)
Sex
Age
Ethnicity
Immigration Status
Alternative Health Care Services
Hints:
Use the search option that looks for the complete word.
Only use the data from the 2000-2001 cycle. Remember, the synthetic file is for the
longitudinal file and contains the variables across all cycles.
Step D: Running a Dummy Analysis and Saving the Code for Remote Job
Submission
The code prepared in the Syntax Editor in this step will be the code that the researcher
would send to the Health Statistics Division to have analyzed using the master file.
Therefore, we‟ll want to keep a clean version of this Syntax file as the end product of
this exercise. After all, it isn‟t the results from the synthetic file that hold our interest.
Rather, it is the command file setup to accomplish the analysis that is desired.
Typing commands is always risky because it introduces the possibility of keying errors.
This has to be dealt with regardless since remote job submission is built around the
sharing of command files to perform analyses. One can use the SPSS menu interface
and paste commands into the Syntax Editor. This might reduce some keying errors, but
there will still be steps that are probably easier simply to type into the Editor.
This exercise is not about doing analysis, but instead, is intended to be an example of
building a command file that could be emailed using the Health Statistics Division‟s
Remote Access service. Therefore, the analysis in this exercise is not serious research.
The example does include, however, some of the basic tasks that researchers will have
Bo Wandschneider 6
Chuck Humphrey
to do in conducting serious research. Don‟t focus too much on the analysis in this
example, but do pay attention to the types of tasks being accomplished.
19. Open a new Syntax Editor window by selecting from the Data Editor window, File
/ New / Syntax. We need to create a few variables for this hypothetical analysis
including dummy variables indicating Chinese ethnicity, immigrant status, and
females. Dummy variables are encoded 1 and 0 for the presence and absence
of a category. We also need to create an index of the use of alternative health
care services.
20. In the new Syntax Editor, type the following commands:
Comment create a dummy variable for all immigrants.
COMPUTE IMMSTAT = 0.
IF (IMM EQ 1) IMMSTAT=1.
Comment turn off missing value declarations for Ethnicity and Alternative Health
Care services.
MISSING VALUES SDC0_4J HCC0_5A to HCC0_5L ( ).
Comment create a dummy variable for Chinese ethnicity.
COMPUTE CHINESE = 0.
IF (SDC0_4J EQ 1) CHINESE = 1.
Comment create an index of the number of Alternative Health Care services
used.
COUNT ALTHCARE = HCC0_5A TO HCC0_5L (1).
EXECUTE.
21. From the Data Editor, select from the menu Data / Weight Cases. Click on the
radio button for “Weight cases by” and then scroll until you are near the bottom of
the variable list. Locate WT64LS, and double-click on it. This should define
WT64LS as the Frequency Variable. Click OK.
22. To add the weight command to the Syntax editor, select again Data / Weight
Cases. This time click Paste, which will write the code in the Syntax Editor.
23. From the menu, select Analyze / Regression / Linear. Scroll to the bottom of the
variable list and click on ALTHCARE, which is one of the variables that you
created above. Place ALTHCARE in the Dependent variable text box using the
arrow button.
24. In the variable list, click and drag over the variables FEMALE, IMMSTAT, and
CHINESE to highlight them. Place these variables in the Independent variable
text box using the arrow button.
25. Finally, from the variable list, locate DHC0_AGE and add it to the Independent
variable list. Then click OK.
Bo Wandschneider 7
Chuck Humphrey
26. After the results are shown, select again Analyze / Regression / Linear but this
time click Paste. Go to the Syntax Editor and confirm that the command has
been added to this file.
27. Save the contents of the command file by selecting File / Save. Put the file in
C:\NPSH and call it Remotejob1.sps.
The file that your researcher would submit to the Health Statistics Division for
processing with the master file is Remotejob1.sps.
28. This ends the exercise. Please exit SPSS and close all other windows with tasks
running from this workshop.
Bo Wandschneider 8
Chuck Humphrey
Get documents about "