lab 3

Document Sample
lab 3 Powered By Docstoc
					                                                                 Due Date: 9/23/02
                        Lab 3: Experimental Designs
These exercises are designed to show you how to combine data from different
study conditions. In lab 2 we created a baseline dataset. In lab 3 we will create
a followup dataset from an ASCII file of data. The same followup dataset will be
treated as followup data from a panel (cohort) study and from a cross-sectional
design. Once the data are combined via merging or appending they will be
analyzed. NOTE: As soon as you open up SAS or STATA, and before you
begin this assignment, be sure to open a log file, name it and save it:

Creating a panel dataset:

1.     Download the data set "labdat_1" from Blackboard
              Note that the variables are id, q1 through q14 and health, smoke,
              Sort the dataset on id and save it.
2.     Download the ASCII file “laddat_2” from Blackboard
              Create a “labdat_2” dataset as you did in lab 2.
              When creating “labdat_2” name the variables q1_2, q2_2, etc. to
       distinguish wave 2 variables from wave 1 ones. Note: keep “id” as “id”.
3.     Sort the data set on id.
4.     Save the data to your diskette.

You now have 2 datasets, labdat_1 (baseline) and labdat_2 (followup) with the
same id numbers to signify that they represent responses from same people, but
with different variable names to indicate the same variables at a different time

5. Merge the data. Both datasets should be sorted on ID. You might name it

6. Verify that all cases merged.

Answer the following questions using the merged dataset.

1. Are smokers at baseline more or less likely to be smokers at time 2? What
percentage of baseline smokers still smoke at time 2? (Hint: a cross tabulation
will work.)

2. What is the correlation of a person’s perceived health at baseline with his/her
perceived health at time 2? (Hint: a correlation coefficient will answer the

                                     Page (1)
3. Is there a significant increase in students’ desire “Consider nutrition when you
make food choices” (question 1)? (Hint: use a t-test.)

Creating a cross-sectional dataset:

1.     Download the ASCII file “laddatfu” from Blackboard
              Create a “labdatfu” dataset as you did in lab 2, this time use the
              same variable names you used when creating labdat_1, namely, id
              q1, q2, q3, etc. to.
3.     Sort the data set on id.
4.     Save the data to your diskette, naming it “labdatfu.”
5.     Now append labdatfu onto the bottom labdat_1.
              Read the baseline dataset, creating variable to indicate it is
       baseline, and then appending the followup data to it.
6.     Save the dataset as “labdatxt”

Tabulate the id variable. What is the range of id numbers? Why is it different
than the merged (panel) dataset.

Answer the following questions using the appended dataset.

1. Does smoking increase from baseline to followup? What percent of the
sample smokes at baseline and what percent at followup? (Hint: a tabulation
using the wave variable will work.)

2. Does perceived health change significantly over time? (Hint: again a tabulation
using the wave variable will work.)

3. Is there a significant increase in students’ desire “Consider nutrition when you
make food choices” (question 1)?

                                      Page (2)

To merge the datasets:
In a panel study, data are collected from the same respondents at multiple points
in time. Merging allows data from each of these time points to be stored or
organized for each respondent. These commands simply read the second wave
of data and save them as “labdat_2.”
/* Input statement for panel */
# delim ;
infile id q1_2 q2_2 q3_2 q4_2 q5_2 q6_2 q7_2 q8_2 q9_2 q10_2
  q11_2 q12_2 q13_2 q14_2 smoke_2 health_2 major_2 str5 class_2
  condit_2 using labdat_2.txt;
sort id;
save labdat_2, replace;
drop _all;

We then merge the data being sure to have both datasets sorted on “id” the
variable we’ll merge on. After merging, the system creates a variable called
“merge” that indicates the result of the merge for each case.
# delim ;
use labdat_1;
sort id;
merge id using labdat_2;
tab _merge;
drop _merge;

And these commands answer the questions. Later we’ll review them in more
detail, for now just use them to get the data necessary to answer the questions.
tab smoke smoke_2, row chi2;
corr health health_2;
reg health health_2;
ttest q1 = q1_2;

To append the datasets:
In cross-sectional studies, datasets are appended. In such studies, data are
collected from different respondents at multiple points in time. These data are
then appended to the previous time period’s data and a new variable which
records time of interview (whether pre- or post-) is created. These statements
read the data, note that the variable names are the same as in the baseline.
/* Input statement for Cross sectional */
# delim ;
infile id q1 q2 q3 q4 q5 q6 q7 q8 q9 q10
  q11 q12 q13 q14 smoke health major str5 class
  condit using labdatfu.txt;
sort id;
save labdatfu;
drop _all;

                                    Page (3)
To append, read the baseline data, then create the variable “wave” which will be
used to indicate baseline or followup. Wave is 1 in the baseline, then after the
merge wave is missing in followup, so we recode that to be 2.
use labdat_1;
gen wave=1;
append using labdatfu;
tab wave;
replace wave=2 if wave==.;
tab wave;
save labdatxt, replace;

These commands can be used to answer the questions in the lab.
tab wave smoke, row chi2;
tab wave health, row chi2;
tab wave q1, row chi2;

                                    Page (4)

Shared By: