Docstoc

Special session on NASS CDS

Document Sample
Special session on NASS CDS Powered By Docstoc
					Special session on NASS
          CDS
• Objective
  – To explain the nature and format of the
    database used for your homework and exam
  – To learn of issues that are generalizable to
    other datasets used in injury prevention
                    NASS CDS
• The National Automotive* Sampling
  System Crashworthiness Data Set is
  owned and maintained by the National
  Highway Traffic Safety Administration. It
  developed since 1979, but it has had the
  current structure since 1988.



    *Formerly Accident
                            Disclaimer
• This presentation relies heavily in materials
  developed by the National Highway Traffic
  Safety Administration (US. Dept. of
  Transportation). Some of the are available in
  their web site www.nhtsa.dot.gov whereas
  others were presented in special meetings, such
  as the power point presentation developed by
  Dr. Carra, Director of the National Center for
  Statistics and Analysis, a division of the agency
  and that follows
For more information, visit www-nrd.nhtsa.dot.gov/departments/nrd-30/ncsa/nass.html or
/availinf.html
• Carra’s slides 1-5
       Case Inclusion Criteria
• For a crash to make it into the NASS CDS,
  it has to:
  – Be a crash on an open road that generates a
    police report
  – Involve at least one passenger car, light truck,
    van or utility vehicle
  – At least one vehicle must be towed away from
    the scene
• Among all eligible cases, probabilistic
  sample (next)
• Carra’s slides 6-8
• Because the system was designed to be
  representative ONLY of the US, it is not
  possible to derive region, state- or county-
  level estimates.
• Carra’s 9-10
• The current number of cases being
  collected (approx. 4500) is a declining
  number due to budget limitations in the
  past years. The system used to collect
  more than 6000 cases per year
• Carra’s 11
         Implications of the
      PROBABILISTIC sampling
• Because of this method of selecting cases, if
  one wants to have the real distribution of any
  crash-related characteristics in the US, one must
  use the “sampling weight” attached to that case.
• Sampling weights range from 0 (cases that were
  collected but are not representative of any crash
  that year in the US) to almost 58,000. The
  weights have a wide variation. They are
  available in the variable ratwgt
                     (II)
• These weights are derived at the end of the
  year, once all cases have been selected.
• STATA has survey commands that allow to use
  this “weight” variable in most commands.
  SUDAAN is a special statistical software with
  similar capabilities.
• The weights are re-evaluated by the agency to
  accommodate for changes in the number of
  cases collected per PSU and the number of PSU
  active at any given time.
          Who collects data
• Police through their regular police reports
• Crash investigators who are NHTSA
  employees and are located near the police
  jurisdictions that are part of the system.
  – They locate the vehicles, photograph and
    measure them; visit the crash site,
    photograph and collect data; and, follow up
    victims by interviewing them and/or reading
    medical records
   OUTCOME INFORMATION
• Collected in a variety of ways:
  – Abbreviated Injury Severity Score
  – Injury Severity Score
  – Maximum severity (Dead, treated at hospital,
    treated at ED and released, etc)
  – Work days lost
  –…
       SEVERITY OF CRASH
          INFORMATION
• In depth crash investigation allows for
  careful assessment of energy released
  during crash (measured in a variety of
  ways)
• Drop carra’s 12
• Carra’s 13-15
      Data files accessibility
• 1978-1987 (not quite same system)
• 1988-1996 trhough NHTSA’s offices in
  Cambridge, MA
• 1997- on line
              Data structure
• Per each year, the approximately 400
  variables collected are stored in 6 files that
  contain information on specific forms:
  – Accident file (Accident record and accdient
    event record), it has about 40 variables
  – General vehicle file (General vehicle form),
    with some 200 variables
  – Exterior vehicle file (Exterior vehicle form),
    some 125 variables
                                     …
• Interior vehicle file (interior vehicle form),
  some 150 variables
• Occupant assessment file (occupant
  assessment form), some 125 variables
• Occupant injury file (occupant injury form),
  some 50 variables


Sum of variable per file exceeds 400 because of duplication
of some variables across files
         Organizing the data
• The hierarchical database can be then
  managed to generate any new database
  with selected information on whichever
  analysis unit one wants (e.g., crash,
  person)
• The files are available in SAS and flat file
  formats
           Linking the data
• One could merge files from one year while
  using the identifying information available
  through files (e.g., psu, case number,
  record number, version number, accident
  number, vehicle number, occupant
  number), or/and
• Append years to create a larger dataset
     For your HWs and Exam
• We appended years 1991-2001
• We created an occupant-level file with
  selected information from accident, (all)
  vehicle, and occupant files.
    How to understand the data
• Every year, the agency       Bill, it would be nice to have here
  produces a “Coding           The picture of the over page of the
  Manual”, a 800+ page         Document, but I don’t know how to
  document that outlines all   Capture the first page of a pdf file
  the operational issues       As an image to bring in here
  related to the system and    You can use www-nass.nhtsa.dot.gov/
                               NASS/CDS/DataColl/man1995.pdf as the cover pag
  the definitions of the
  variables. You can
  access those at www-
  nass.nhtsa.dot.gov/NASS
  /CDS/DataColl
(Note: take a peak if you
  want, NO NEED TO
  PRINT THEM)
                            II
• There is also a            Bill, it would be nice to have here
                             The picture of the over page of the
  summary manual that        Document, but I don’t know how to
  indicates all variables    Capture the first page of a pdf file
                             As an image to bring in here
  ever collected since       You can use www-nass.nhtsa.dot.gov/
                             NASS/Manuals/CDS8896.pdf as the cover page
  1988 and summarizes
  changes over time
           For example: Age
• NASS CSD 1988-1994   • NASS CSD 1995-2001
  – Male 1               – Male 1
  – Female 2             – Female, not pregnant 2
  – Unknown 9            – Female, 1st trimester pregnant 3
                         – Female, 2nd trimester pregnant 4
                         – Female 3rd trimester pregant 5
                         – Female, pregnant, unknown
                           trimester 6
                         – Unknown 9
 For example: Vehicle curb weight
• NASS CDS 1988-                • NASS CDS 1993-
  1992                            2001
  – In hundreds of                – In tens of kilograms.
    pounds. E.g., 136 =             E.g., 46=460
    13600 pounds                    kilograms
  – Special codes:                – Special codes
     000, less than 50 pounds        045, less than 5 kg (until
     135, 135000 pounds or             1995); less than 454
       more                            since 1996
     010, less than 1050             610, 6100 kg or more
       pounds                        612, 6124 kg or more
     999, unknown                      (since 1996)
                                     unknown
   BEFORE YOU ANALYZE THE
            DATA
• Understand it
• Create the file you need
• Clean the variables
       Take home message
• Know your data
• Get the reference manuals
• Get a contact person who is very
  knowledgeable about the data to assist.

				
DOCUMENT INFO