STAT 350 Project Refer to the General Motors air

Document Sample
scope of work template
							                                                 STAT 350 Project
Refer to the General Motors air pollution and mortality observational study, described below. Conduct an
analysis of the associated dataset to address the study question. Prepare a brief (i.e., 5-page single-spaced or
10-page double-spaced) written summary of the results, and attach any plots or tables to support the findings
(but please no long unedited excerpts of SAS output). The main body of your report should be no longer than
five pages single-spaced, or ten pages double-spaced; if you need to put in additional detail, use appendices. The
goal is to get some practice carrying through a data analysis from start to finish. Your writeup should include a
summary of the scientific question and the objectives of your analysis, summaries of the exploratory, modelling,
and diagnostic phases of the analysis, and conclusions. Please consult Carroll for additional information on any
format or other requirements she might have.
Story Name: Air Pollution and Mortality                   Topics: Environment, Public Health
Reference: U.S. Department of Labor Statistics            Datafile Name: SMSA
Authorization: free use

Background: In a study of air pollution, researchers at General Motors collected data on 59 U.S.
Standard Metropolitan Statistical Areas (SMSA’s). The scientific question was whether air pollution
contributes to mortality. Age-adjusted mortality rates for the area, called "Mortality", were considered as
the dependent variable for analysis. Covariates include variables measuring demographic, social and economic
characteristics of the cities, variables measuring climate characteristics, and indices of air
pollution potentials for three common components of fossil fuel emissions.

Data Description: Properties of 59 Standard Metropolitan Statistical Areas (a standard Census Bureau
designation of the region around a city) in the United States, collected from a variety of sources.

Variable Names:
 1. city: City name
 2. Jantem: Mean January temperature (degrees Farenheit)
 3. Jultem: Mean July temperature (degrees Farenheit)
 4. RelHum: Annual average relative humidity (percent)
 5. Rain: Annual rainfall (inches)
 6. Mortality: Age-adjusted mortality (deaths per 100,000 people per year)
 7. Educ: Median education (years of schooling)
 8. PopDens: Population density (residents per square mile)
 9. %NonWhite: Percentage of non whites
10. WC: %White collar workers -low (<50%) coded 0 or high (50% or more) coded 1
    [Note: A "white-collar" worker is a skilled worker and WC is a categorical variable taking on two values.]
11. pop: Population
12. pophouse: Average number of occupants per household
13. income: Median income in US dollars
14. HCPot: HC pollution potential (parts per million)
15. NOxPot: Nitrous Oxide pollution potential (parts per million)
16. SO2Pot: Sulfur Dioxide pollution potential (parts per million)

The Data (excerpts):
city       Jantem Jultem RelHum Rain Mortality Educ PopDens %NonWhite WC pop   pophouse income HCPot NOxPot S02Pot
AkronOH      27    71     59     36    921.87 11.4    3243    8.8     0 660328 3.34     29560   21    15     59
AlbanyNY     23    72     57     35    997.87 11.0    4281    3.5     1 835880 3.14     31458   8     10     39
................................................
YoungstownOH 28    72     58     38    954.44 10.7    3451   11.7     0 531350 3.48     28960   14    13     39

The column for city in the pollution dataset is a character variable rather than a numeric variable. SAS needs
to be told this by putting a dollar sign after the variable name in the data statement:
options pagesize=60 linesize=75;
data smsa;
  infile ’smsa.dat’ firstobs=2;
  input city $ jantem jultem relhum rain mort educ popdens pcnwhite wc
       pop pophouse income HCpot NOxpot SO2pot;

etc.

						
Related docs