STAT 350 Project Refer to the General Motors air
Shared by: nwi10265
STAT 350 Project Refer to the General Motors air pollution and mortality observational study, described below. Conduct an analysis of the associated dataset to address the study question. Prepare a brief (i.e., 5-page single-spaced or 10-page double-spaced) written summary of the results, and attach any plots or tables to support the ﬁndings (but please no long unedited excerpts of SAS output). The main body of your report should be no longer than ﬁve pages single-spaced, or ten pages double-spaced; if you need to put in additional detail, use appendices. The goal is to get some practice carrying through a data analysis from start to ﬁnish. Your writeup should include a summary of the scientiﬁc question and the objectives of your analysis, summaries of the exploratory, modelling, and diagnostic phases of the analysis, and conclusions. Please consult Carroll for additional information on any format or other requirements she might have. Story Name: Air Pollution and Mortality Topics: Environment, Public Health Reference: U.S. Department of Labor Statistics Datafile Name: SMSA Authorization: free use Background: In a study of air pollution, researchers at General Motors collected data on 59 U.S. Standard Metropolitan Statistical Areas (SMSA’s). The scientific question was whether air pollution contributes to mortality. Age-adjusted mortality rates for the area, called "Mortality", were considered as the dependent variable for analysis. Covariates include variables measuring demographic, social and economic characteristics of the cities, variables measuring climate characteristics, and indices of air pollution potentials for three common components of fossil fuel emissions. Data Description: Properties of 59 Standard Metropolitan Statistical Areas (a standard Census Bureau designation of the region around a city) in the United States, collected from a variety of sources. Variable Names: 1. city: City name 2. Jantem: Mean January temperature (degrees Farenheit) 3. Jultem: Mean July temperature (degrees Farenheit) 4. RelHum: Annual average relative humidity (percent) 5. Rain: Annual rainfall (inches) 6. Mortality: Age-adjusted mortality (deaths per 100,000 people per year) 7. Educ: Median education (years of schooling) 8. PopDens: Population density (residents per square mile) 9. %NonWhite: Percentage of non whites 10. WC: %White collar workers -low (<50%) coded 0 or high (50% or more) coded 1 [Note: A "white-collar" worker is a skilled worker and WC is a categorical variable taking on two values.] 11. pop: Population 12. pophouse: Average number of occupants per household 13. income: Median income in US dollars 14. HCPot: HC pollution potential (parts per million) 15. NOxPot: Nitrous Oxide pollution potential (parts per million) 16. SO2Pot: Sulfur Dioxide pollution potential (parts per million) The Data (excerpts): city Jantem Jultem RelHum Rain Mortality Educ PopDens %NonWhite WC pop pophouse income HCPot NOxPot S02Pot AkronOH 27 71 59 36 921.87 11.4 3243 8.8 0 660328 3.34 29560 21 15 59 AlbanyNY 23 72 57 35 997.87 11.0 4281 3.5 1 835880 3.14 31458 8 10 39 ................................................ YoungstownOH 28 72 58 38 954.44 10.7 3451 11.7 0 531350 3.48 28960 14 13 39 The column for city in the pollution dataset is a character variable rather than a numeric variable. SAS needs to be told this by putting a dollar sign after the variable name in the data statement: options pagesize=60 linesize=75; data smsa; infile ’smsa.dat’ firstobs=2; input city $ jantem jultem relhum rain mort educ popdens pcnwhite wc pop pophouse income HCpot NOxpot SO2pot; etc.