Special session on NASS CDS • Objective – To explain the nature and format of the database used for your homework and exam – To learn of issues that are generalizable to other datasets used in injury prevention NASS CDS • The National Automotive* Sampling System Crashworthiness Data Set is owned and maintained by the National Highway Traffic Safety Administration. It developed since 1979, but it has had the current structure since 1988. *Formerly Accident Disclaimer • This presentation relies heavily in materials developed by the National Highway Traffic Safety Administration (US. Dept. of Transportation). Some of the are available in their web site www.nhtsa.dot.gov whereas others were presented in special meetings, such as the power point presentation developed by Dr. Carra, Director of the National Center for Statistics and Analysis, a division of the agency and that follows For more information, visit www-nrd.nhtsa.dot.gov/departments/nrd-30/ncsa/nass.html or /availinf.html • Carra’s slides 1-5 Case Inclusion Criteria • For a crash to make it into the NASS CDS, it has to: – Be a crash on an open road that generates a police report – Involve at least one passenger car, light truck, van or utility vehicle – At least one vehicle must be towed away from the scene • Among all eligible cases, probabilistic sample (next) • Carra’s slides 6-8 • Because the system was designed to be representative ONLY of the US, it is not possible to derive region, state- or county- level estimates. • Carra’s 9-10 • The current number of cases being collected (approx. 4500) is a declining number due to budget limitations in the past years. The system used to collect more than 6000 cases per year • Carra’s 11 Implications of the PROBABILISTIC sampling • Because of this method of selecting cases, if one wants to have the real distribution of any crash-related characteristics in the US, one must use the “sampling weight” attached to that case. • Sampling weights range from 0 (cases that were collected but are not representative of any crash that year in the US) to almost 58,000. The weights have a wide variation. They are available in the variable ratwgt (II) • These weights are derived at the end of the year, once all cases have been selected. • STATA has survey commands that allow to use this “weight” variable in most commands. SUDAAN is a special statistical software with similar capabilities. • The weights are re-evaluated by the agency to accommodate for changes in the number of cases collected per PSU and the number of PSU active at any given time. Who collects data • Police through their regular police reports • Crash investigators who are NHTSA employees and are located near the police jurisdictions that are part of the system. – They locate the vehicles, photograph and measure them; visit the crash site, photograph and collect data; and, follow up victims by interviewing them and/or reading medical records OUTCOME INFORMATION • Collected in a variety of ways: – Abbreviated Injury Severity Score – Injury Severity Score – Maximum severity (Dead, treated at hospital, treated at ED and released, etc) – Work days lost –… SEVERITY OF CRASH INFORMATION • In depth crash investigation allows for careful assessment of energy released during crash (measured in a variety of ways) • Drop carra’s 12 • Carra’s 13-15 Data files accessibility • 1978-1987 (not quite same system) • 1988-1996 trhough NHTSA’s offices in Cambridge, MA • 1997- on line Data structure • Per each year, the approximately 400 variables collected are stored in 6 files that contain information on specific forms: – Accident file (Accident record and accdient event record), it has about 40 variables – General vehicle file (General vehicle form), with some 200 variables – Exterior vehicle file (Exterior vehicle form), some 125 variables … • Interior vehicle file (interior vehicle form), some 150 variables • Occupant assessment file (occupant assessment form), some 125 variables • Occupant injury file (occupant injury form), some 50 variables Sum of variable per file exceeds 400 because of duplication of some variables across files Organizing the data • The hierarchical database can be then managed to generate any new database with selected information on whichever analysis unit one wants (e.g., crash, person) • The files are available in SAS and flat file formats Linking the data • One could merge files from one year while using the identifying information available through files (e.g., psu, case number, record number, version number, accident number, vehicle number, occupant number), or/and • Append years to create a larger dataset For your HWs and Exam • We appended years 1991-2001 • We created an occupant-level file with selected information from accident, (all) vehicle, and occupant files. How to understand the data • Every year, the agency Bill, it would be nice to have here produces a “Coding The picture of the over page of the Manual”, a 800+ page Document, but I don’t know how to document that outlines all Capture the first page of a pdf file the operational issues As an image to bring in here related to the system and You can use www-nass.nhtsa.dot.gov/ NASS/CDS/DataColl/man1995.pdf as the cover pag the definitions of the variables. You can access those at www- nass.nhtsa.dot.gov/NASS /CDS/DataColl (Note: take a peak if you want, NO NEED TO PRINT THEM) II • There is also a Bill, it would be nice to have here The picture of the over page of the summary manual that Document, but I don’t know how to indicates all variables Capture the first page of a pdf file As an image to bring in here ever collected since You can use www-nass.nhtsa.dot.gov/ NASS/Manuals/CDS8896.pdf as the cover page 1988 and summarizes changes over time For example: Age • NASS CSD 1988-1994 • NASS CSD 1995-2001 – Male 1 – Male 1 – Female 2 – Female, not pregnant 2 – Unknown 9 – Female, 1st trimester pregnant 3 – Female, 2nd trimester pregnant 4 – Female 3rd trimester pregant 5 – Female, pregnant, unknown trimester 6 – Unknown 9 For example: Vehicle curb weight • NASS CDS 1988- • NASS CDS 1993- 1992 2001 – In hundreds of – In tens of kilograms. pounds. E.g., 136 = E.g., 46=460 13600 pounds kilograms – Special codes: – Special codes 000, less than 50 pounds 045, less than 5 kg (until 135, 135000 pounds or 1995); less than 454 more since 1996 010, less than 1050 610, 6100 kg or more pounds 612, 6124 kg or more 999, unknown (since 1996) unknown BEFORE YOU ANALYZE THE DATA • Understand it • Create the file you need • Clean the variables Take home message • Know your data • Get the reference manuals • Get a contact person who is very knowledgeable about the data to assist.