Crash the Crash Test Dummy

Document Sample
Crash the Crash Test Dummy Powered By Docstoc
					                         Ressler 0

       Crash Test
         An Analysis of
      Individual Factors in
        Fatal Car Crashes

    ESE 302
  May 7, 2004
Alexandra Ressler
                                                                                                                                    Ressler 1

Table of Contents
Introduction.................................................................................................................................... 2
  Crash’s Friends ........................................................................................................................... 3
  Data Selection ............................................................................................................................. 4
  Logistic Regression..................................................................................................................... 5
  Assumptions................................................................................................................................ 7
  A Brief Summary of Findings..................................................................................................... 8

Logistic Regression of All Data...................................................................................................... 9

Logistic Regression of Driver Data .............................................................................................. 12

Conclusions................................................................................................................................... 18
  Crash’s Friends Revisited ......................................................................................................... 19
  Questions Raised for Further Study.......................................................................................... 20

Appendix A: Data Selection Criteria ........................................................................................... 21
Appendix B: Histograms for All Data ......................................................................................... 29
Appendix C: Problems with Airbags ............................................................................................ 32
                                                                                                   Ressler 2

           Crash the crash test dummy is vitally interested in which individual factors have the

greatest effect on his chance of survival in a fatal car crash. Car crashes are the leading cause of

accidental death in the United States, and in 2002, there was a car crash fatality every 12 minutes

and a disabling injury every 14 seconds. In that year, “motor vehicle crashes were the leading

cause of death for people ages 1 to 33”1.

                                          Leading Causes of Unintentional Injury Deaths
                                                      United States, 2002

                          Motor Vehicle                                                   44,000
                            Poisoning                                                     15,700
                               Falls                                                      14,500
                           Suffocation by
          Inhalation or Ingestion of Food or Other Object
                            Drowning                                                          1

           Crash simulates a lot of fatal car crashes, but he doesn’t get to pick the circumstances of

each crash, such as the causes, environmental conditions, and vehicle characteristics. He does

know that due to a limited testing budget, he’ll only be tested in the most common types of

private automobiles, including cars, utility vehicles, vans, and pickup trucks. He also knows that

due to limited testing equipment, he only has to worry about impacts from the front, left side,

right side, and rear. Given this information, Crash would like to know whether he can improve

his predicted chance of survival by emulating any of his friends, each of whom personifies a

particular individual trait.

    National Safety Council, <>
                                                                                                      Ressler 3

Crash’s Friends
                                           Seatbelt Sid- Sid always uses a restraining system,
                                           whether it’s a lap belt, a shoulder belt, or a child safety
                                           seat (in the case of Sid Jr.). He’s always telling Crash to
                                           buckle up- is he being sanctimonious, or does he have a
                                   2       point?
                                           Crash Jr.- Okay, he can’t drive, but just because he’s not
                                           behind the wheel doesn’t mean Crash Jr.’s out of harm’s
                                           way. Or does it? And will he be any safer when he’s as
                                           old as Crash and has his own license? In the meantime,
                                           he’d love to ride shotgun (as it’s the cool thing to do), but
                                           whether Crash should let him depends on Backseat Bob.

                                           Crashella- Does Crashella give new meaning to the term
                                           “femme fatale”? Or is she better off than her bulkier
                                           brother? And who’s a safer driver, anyway?


                                           Driver Dan- He’s cool, he’s hot, and he’s driving, thank
                                           you. Driver Dan loves the freedom of the road and the
                                           wheel at his fingertips- but would he be safer if he let his
                                           girlfriend Crashella drive?


                                           Backseat Bob- Forget backseat driver, Bob’s a backseat
                                           passenger! He won’t touch the passenger seat, and he
                                           doesn’t think anyone else should either- especially
                                           Crash Jr. Should Crash listen to Bob and put Crash Jr.’s
                                           reputation on the critical list, or would that be
                                           overprotective parenting at its worst?

                                           Airbag Al- Al’s a bit of an airhead, but he claims there’s
                                           less to damage that way. Should Crash listen to his
                                           bubbleheaded philosophy, or is Al just a windbag?


                                                                                                                           Ressler 4

Data Selection
          Crash took the data for his analysis from the Fatality Analysis Reporting System’s

(FARS) 2002 Case Listings8. His data set included information for 56,833 individuals involved

(but not necessarily killed or injured) in fatal car crashes in 2002. These individuals represented

35,783 vehicles and 25,765 fatal crashes9. In general, Crash took the following data from each

individual, converted it to binomial data (with the exception of Age) and sorted it as follows:

     Impact Point- Principal                       1 Front
                                                   2 Left
                                                   3 Right
                                                   4 Rear
     Age                                           Not sorted; left as a continuous numerical variable.
     Air Bag                                       1 Airbag Deployed
                                                   0 No Airbag Deployed
     Injury Severity                               0 FATAL
                                                   1 NOT
     Person Type                                   0 Driver
                                                   1 Passenger
     Restraint System-Use                          0 No Restraint System-Use
                                                   1 Restraint System Used
     Seating Position                              0 Front Seat
                                                   1 Second Seat
     Gender                                        0 Male
                                                   1 Female
     Body Type (refers to the vehicle              1 Automobiles
     body type)                                    2 Utility Vehicles
                                                   3 Vans
                                                   4 Pickup Trucks
          All independent variables are continuous, while the dependent variable Injury Severity is

nominal. Impact Point-Principal and Body Type are not actually used in the regression analysis,

but serve as important selection criteria. The remaining variables used in the regression analysis

are Bernoulli variables, to simplify the analysis10.

  Fatality Analysis Reporting System’s Web-Based Encyclopedia, <>
  For the exact criteria used to select individuals, please see Appendix A.
  And because crash test dummies like dummy variables. Were these variables not Bernoulli and treated as
nominal, the logistic regressions would calculate an estimate for every possible outcome of the nominal variable; for
                                                                                                       Ressler 5

Logistic Regression11
    As the dependent variable Injury Severity is nominal, Crash cannot use multiple regression,

and consequently cannot determine prediction intervals, r-squared values, or variance inflation

factors. Instead, Crash uses logistic regression, which is designed for Bernoulli dependent

variables and predicts the probability of an outcome rather than the outcome itself. Under

logistic regression, the parameter estimate for each independent variable is called the maximum-

likelihood estimate (as opposed to the least-squares estimate for multiple regression). As a set,

the maximum-likelihood estimates are such that the given observed values are most likely to


    As an example of how the estimates work, let’s say the estimate for Age is -0.02. For each

additional year of age, the individual’s predicted probability of being a fatality decreases by two

percent. In the case of a binomial variable, if Restraint System-Use has an estimate of -0.78,

then the use of a restraint system decreases the individual’s predicted probability of being a

fatality by 78%.

    The χ2 value for a maximum-likelihood estimate is its equivalent of a least-squares estimate’s

F value, and equals the square of its standardized value under the null hypothesis. The greater

the χ2 value, the more significant the variable, or the more it maximizes the chances of having

the given observed values occur. The p-value for each estimate is represented as Prob>ChiSq, or

the probability that one would get such data randomly if the null hypothesis (estimate=0) were

true, and the lower the p-value, the more significant the variable. Other tests for variable

significance are the Wald Tests for Effects, in which one runs the regression with and without

example, each seating position would be treated as a separate variable. N.B. The independent variables must be
   Information from “Notes on Logistic Regression” by Tony E. Smith and the JMPIN 4 online manual.
                                                                                             Ressler 6

the variable and compares the results to determine significance. Crash uses χ2 values to compare

the relative significance of independent variables, as the Wald Tests for Effects produce the same

relative significances among independent variables in his regressions.

       It is important to note that as n approaches infinity, the asymmetric χ2 distribution becomes

increasingly skewed, the standard deviations of the parameter estimates become increasingly

asymptotic, and the χ2 values themselves become less relevant in the absolute sense. In other

words, “the scope of Chi-square statistics is limited when n becomes very large, the smallest

departure from the target becoming statistically significant”12. In regressions with very large

sample sizes, χ2 values are appropriate to compare the relative significance among variables and

regressions, but the values themselves do not adhere to the normal absolute standards. For

example, a value of 4 may be significant or “reasonably good” for a distribution with n=100, but

may be insignificant for a distribution with n=10,000.

       To examine goodness of fit, Crash uses two metrics: the ChiSquare from the Whole Model

Test, which compares the regression model to a model with all parameters but the intercepts

removed, and the success rate. The success rate is calculated by rounding each individual’s

predicted chance of being a fatality to predict whether or not he or she was a fatality, then

comparing said prediction to the individual’s actual injury severity. The success rate is the

percentage of accurate predictions. Since the goal of all regressions is to find the mix of

independent variables that allows the most accurate predictions, success rate is obviously a better


                                                                                                Ressler 7

       •   Crash assumes that his independent variables are the most significant individual factors

           of those included in the FARS case listings. It is possible the he’s neglecting more

           significant but less intuitive factors (e.g., perhaps he should consider his friend Drake the

           Drunk or Donny the Designated Driver, though making alcohol level a selection criteria

           would severely limit his sample size).

       •   Crash assumes that he hasn’t inadvertently excluded any variable-specific categories that

           are significant within his chosen variables13.

       •   Crash assumes that his individuals are independent within his chosen variables. This is a

           faulty assumption for many reasons, e.g. when two or more people come from the same

           car, they cannot have independent seating positions and at most one can be the driver.

       •   Crash assumes that his variables are independent within each individual (i.e., no multi -

           collinearities). This is a very faulty assumption, as Person Type dictates Seating Position

           (the driver must be in the front) and Airbag availability is also limited to the front.

           Furthermore, Person Type influences Age (almost no drivers are younger than 16). This

           flaw will be addressed within the regressions.

       •   The Gauss-Markov assumptions of linearity, independence, and homoscedasticity are not

           used for logistic regression. Though logistic regression “does not have the requirements

           of the independent variables to be normally distributed, linearly related, nor equal

           variance within each group (Tabachnick and Fidell, 1996, p575)”, it requires large

           sample groups14.

     For example, Person Type 3. For more information, please see Appendix A.
                                                                                                    Ressler 8

A Brief Summary of Findings
        Using logistic regressions, Crash finds that he can accurately predict whether a person

was a fatality about 70% of the time. As long as he includes Restraint System as an independent

variable, this is true whether he looks at all individuals, just drivers, or just passengers. This

figure, while moderately disappointing compared to other logistic regressions (where the mid-

80’s is considered “reasonably good” and 90% is considered “quite respectable”15) is nonetheless

impressive when taken in context. The factors that affect a person’s survival in a fatal car crash

are not limited to individual variables, but also extend to the causes of the crash, environmental

conditions, and vehicle characteristics. If one of the individuals in the sample drove their car off

a hundred foot cliff, the data would register whether the driver used a restraint system, but not

that regardless of whether or not the driver used a restraint system he or she had almost no

chance of survival. The logistic regression models based on individual variables are limited in

their prediction accuracy because they ignore significant external factors that affect chance of

survival. When viewed in this light, Crash’s 70% success rate is respectable.

  Based on the “Analysis of Changing Religious Perspectives” report and “Notes on Logistic Regression” by Tony
E. Smith on the class website.
                                                                                                                            Ressler 9

        Logistic Regression of All Data
            Crash first examines the entire data set. His first regression includes all variables.
 Nominal Logistic Fit for Injury Severity SORTED
  Whole Model Test
                                                                                           The statistics report has several notable
   Model          -LogLikelihood      DF ChiSquare Prob>ChiSq
    Difference        4885.923         6    9771.846       0.0000                          features. First of all, the Whole Model
    Full             33264.685
    Reduced          38150.608
                                                                                           Test has an astronomical ChiSquare
    RSquare (U)                    0.1281
    Observations (or Sum Wgts)     56833
   Converged by Gradient
                                                                                           value of 9,771.846. Rather than
  Lack Of Fit
   Source              DF -LogLikelihood ChiSquare                                         indicating a ludicrously good fit, the
    Lack Of Fit      1739       1399.472 2798.944
    Saturated        1745      31865.213 Prob>ChiSq
    Fitted              6      33264.685     <.0001                                        order of magnitude suggests that the
  Parameter Estimates
                                                     Std Error ChiSquare Prob>ChiSq
                                                   0.0249661         10.23        0.0014
                                                                                           unusually monstrous sample size has
    Restraint System-Use SORTED      -1.5615478     0.020127        6019.4        0.0000
    Age                              0.02082281    0.0004871        1827.1        0.0000   produced a radically skewed
    Gender SORTED                    0.10701923    0.0200556         28.47        <.0001
    Person Type SORTED               -0.4761275    0.0233042        417.42        <.0001
    Seating Position SORTED 1v2      -0.4809617    0.0340468        199.56        <.0001   χ2 distribution. Consequently, all of
    Airbag SORTED                    0.14609065     0.021654         45.52        <.0001
   For log odds of FATAL/NOT
  Effect Wald Tests                                                                        Crash’s logistic regressions will have
   Source                          Nparm      DF Wald ChiSquare Prob>ChiSq
    Restraint System-Use SORTED         1      1       6019.39665            0.0000
    Age                                 1      1       1827.09479            0.0000
                                                                                           enormous ChiSquare values, which
    Gender SORTED                       1      1        28.474345            0.0000
    Person Type SORTED
    Seating Position SORTED 1v2
                                                                                           therefore cannot be used to determine
    Airbag SORTED                       1      1       45.5165284            0.0000

absolute significance, but are nonetheless helpful in determining relative significance.

            Having navigated around this first pothole, Crash hits another bump in the road when he

notes that Airbag has a positive estimate, meaning it decreases one’s chances of survival. Since

this seems very counterintuitive, ceteris paribus, this leads Crash to suspect the multicollinearity

among Person Type, Airbag, and Seating Position mentioned earlier. By fitting Person Type and

Airbag by Seating Position, Crash realizes that all drivers sit in the front and that no one in the

second row of seats has an airbag, meaning these three variables are inescapably collinear.
                                                                                                                                                                                   Ressler 10

 Contingency Analysis of Person Type SORTED By Seating Position SORTED 1v2                 Contingency Analysis of Airbag SORTED By Seating Position SORTED 1v2
     Mosaic Plot                                                                            Mosaic Plot
                                    1.00                                                                                    1.00

      Person Type SORTED

                                    0.75                                                                                    0.75

                                                                                              Airbag SORTED
                                    0.50                                                                                    0.50

                                    0.25                                                                                    0.25

                                    0.00                                                                                    0.00
                                                                0                  1                                                                    0                  1

                                                     Seating Position SORTED 1v2                                                             Seating Position SORTED 1v2

     Contingency Table                                                                      Contingency Table
                                          Person Type SORTED                                                                        Airbag SORTED
                                    Count 0         1                                                                       Count 0         1
                                    Total %                                                                                 Total %
                                    Col %                                                                                   Col %
                                    Row %                                                                                   Row %
      Seating Position SORTED 1v2

                                                                                              Seating Position SORTED 1v2
                                    0         34270 13426 47696                                                             0        32754 14942 47696
                                              60.30 23.62 83.92                                                                       57.63 26.29 83.92
                                             100.00 59.50                                                                             78.19 100.00
                                              71.85 28.15                                                                             68.67 31.33
                                    1             0   9137  9137                                                            1          9137      0  9137
                                               0.00 16.08 16.08                                                                       16.08   0.00 16.08
                                               0.00 40.50                                                                             21.81   0.00
                                               0.00 100.00                                                                          100.00    0.00
                                              34270 22563 56833                                                                      41891 14942 56833
                                              60.30 39.70                                                                             73.71 26.29

     Tests                                                                                  Tests
      Source                                   DF        -LogLike RSquare (U)                 Source                                   DF        -LogLike RSquare (U)
        Model                                    1      9830.790        0.2575                  Model                                    1      3087.878        0.0943
        Error                                56831     28348.409                                Error                                56831     29652.442
        C. Total                             56832     38179.199                                C. Total                             56832     32740.320
        N                                    56833                                              N                                    56833

      Test                                       ChiSquare Prob>ChiSq                         Test                                       ChiSquare Prob>ChiSq
        Likelihood Ratio                             19661.58       0.0000                      Likelihood Ratio                             6175.756       0.0000
        Pearson                                      16536.34       0.0000                      Pearson                                      3883.383       0.0000

      Fisher's Exact Test                                 Prob                                Fisher's Exact Test                                 Prob
        Left                                             1.0000                                 Left                                             0.0000
        Right                                            0.0000                                 Right                                            1.0000
        2-Tail                                           0.0000                                 2-Tail                                           0.0000

                                     Kappa     Std Err                                                                       Kappa     Std Err
                                    0.45077 0.003462                                                             -0.24926 0.001821
      Kappa measures the degree of agreement.                                                 Kappa measures the degree of agreement.

                                             Additionally, Crash realizes that drivers are almost all 16 years of age or older, producing

another colinearity and further degrading the integrity of his regression.

                                             To compensate for these developments, Crash reruns his regression multiple times with

different sets of variables16. His final results are summarized in the following table:

  A dummy’s version of stepwise regression, which JMPIN 4 does not allow for logistic regressions. Crash can
approximate stepwise regression by not sorting Injury Level and treating it as a continuous variable, but he does not
need to as the number of variables is small enough that he can run the necessary regressions on his own.
                                                                                                                   Ressler 11

Restraint     Age            Gender         Person         Seat               χ2 for
                                                                             Air Bag       Success
                                            Type           Position           WMT          Rate
X           X             X                 X              X     X            9771.846 0.706421
X           X             X                 X              X                  9726.375 NA
X           X             X                 X                    X            9568.723 NA
X           X             X                        X             X            9346.831 NA
X           X             X           X                                       9486.005 0.706649
X           X             X                        X                          9274.496 NA
X           X             X                                      X            8637.149 NA
X           X             X                                                   8414.179 NA
X           X                         X                                       9453.883 0.705945
X           X                                                                 8412.845 0.691236
X                                                                             5237.82      0.669963
            X                                                                 2187.603 0.626995
        Crash notes that the regression that includes all the variables has the best success rate, but

this is still not the best regression because of its multicollinearities. Given that Person Type

alone produces a higher ChiSquare than Seat Position and Air Bag combined

(9486.005>9346.831), Crash decides to examine Air Bag in a separate driver regression, and to

examine Seat Position in a separate passenger
                                                                 Nominal Logistic Fit for Injury Severity SORTED
                                                                   Whole Model Test
regression. Furthermore, since Gender contributes                   Model         -LogLikelihood      DF ChiSquare Prob>ChiSq
                                                                    Difference        4726.941         3    9453.883       0.0000

a negligible 0.000704% to the success rate, Crash                   Full

discards this variable and selects Restraint System,                RSquare (U)                    0.1239
                                                                    Observations (or Sum Wgts)     56833
                                                                   Converged by Gradient
Age, and Person Type as the most significant                       Lack Of Fit
                                                                    Source             DF -LogLikelihood ChiSquare

factors in determining the probability that an                      Lack Of Fit
                                                                                                 650.884 1301.768
                                                                                               32772.782 Prob>ChiSq
                                                                    Fitted              3      33423.666     <.0001
individual in a fatal car crash is a fatality17. This              Parameter Estimates
                                                                    Term                                Estimate     Std Error ChiSquare Prob>ChiSq
                                                                    Intercept                         -0.0829646 0.0236603           12.30        0.0005
regression represents the most balanced option                      Restraint System-Use SORTED        -1.518155 0.0197314          5919.9        0.0000
                                                                    Age                                0.0222467 0.0004792          2155.3        0.0000
                                                                    Person Type SORTED                -0.6404254 0.020073           1017.9        <.0001
between the contradictory goals of increasing                      For log odds of FATAL/NOT
                                                                   Effect Wald Tests
success rate and eliminating multicollinearity.                     Source                         Nparm      DF Wald ChiSquare Prob>ChiSq
                                                                    Restraint System-Use SORTED         1      1       5919.94559            0.0000
                                                                    Age                                 1      1       2155.31517            0.0000
                                                                    Person Type SORTED                  1      1       1017.91277            0.0000

  Crash tolerates the colinearity between Age and Person Type because the marginal benefit of the latter to success
rate is almost 1.5% and because its ChiSquare is almost half as large as Age’s, making it undesirable to discard.
                                                                                                                                 Ressler 12

             Logistic Regression of Driver Data
            By focusing on the drivers in his data set, Crash can get a better idea of the significance

of Air Bag when isolated from its multicollinearities with other individual factors. He can

investigate a possible link between safe driving and gender, and he can determine how the

colinearity between Person Type and Age affects the model from the driver perspective. His

initial regression produces this statistics report:
Nominal Logistic Fit for Injury Severity SORTED
  Whole Model Test
   Model         -LogLikelihood      DF ChiSquare Prob>ChiSq
                                                                                          Even within the scope of the driver, Air Bag
   Difference        3086.346         4    6172.692       0.0000
                                                                                          is still the least significant variable and has a
   RSquare (U)
   Observations (or Sum Wgts)
                                                                                          positive estimate. Suspecting further
  Converged by Objective
  Lack Of Fit
                                                                                          collinearities, Crash runs a contingency
   Source             DF -LogLikelihood ChiSquare
   Lack Of Fit       652        531.338 1062.676
                              20032.666 Prob>ChiSq
                              20564.004     <.0001
                                                                                          analysis on Air Bag and Restraint System to
  Parameter Estimates
   Term                                Estimate     Std Error ChiSquare Prob>ChiSq        see if perhaps people who use restraint
   Intercept                        0.03350614    0.0313377          1.14        0.2850
   Restraint System-Use SORTED      -1.8182414    0.0260371        4876.6        0.0000
   Age                              0.02168777    0.0006449        1131.1        <.0001   systems are more safety-sensitive in general,
   Gender SORTED                    0.15882526    0.0261276         36.95        <.0001
   Air Bag SORTED                   0.15105195    0.0252463         35.80        <.0001
  For log odds of FATAL/NOT
                                                                                          and therefore more likely to have a functional
  Effect Wald Tests
   Source                         Nparm      DF Wald ChiSquare Prob>ChiSq
   Restraint System-Use SORTED         1      1       4876.60705            0.0000        airbag.
   Age                                 1      1       1131.11759            0.0000
   Gender SORTED                       1      1       36.9521082            0.0000
   Air Bag SORTED                      1      1       35.7978096            0.0000
                                                                                                    Crash also runs a contingency

analysis on Injury Severity and Gender to investigate whether males or females have a better

chance of survival in the driver’s seat and therefore might be considered “safer” drivers.
                                                                                                                                                                                                                Ressler 13
                                       Contingency Analysis of Air Bag SORTED By Restraint System-Use SORTED                           Contingency Analysis of Injury Severity SORTED By Gender SORTED
                                        Mosaic Plot                                                                                     Mosaic Plot
                                                                        1.00                                                                                        1.00


                                                                                                                                           Injury Severity SORTED
                                                                        0.75                                                                                        0.75                                                1

                                          Air Bag SORTED
        In both cases, though the                                       0.50


                                                                        0.25                                                                                        0.25                                                0

data showed a slight disparity                                          0.00
                                                                                    0                          1
                                                                                                                                                                                          0                     1

                                                                                         Restraint System-Use SORTED                                                                       Gender SORTED

                                        Contingency Table                                                                               Contingency Table

(suggesting those who use
                                                                                Air Bag SORTED                                                                          Injury Severity SORTED
                                                                        Count 0         1                                                                           Count 0         1
                                                                        Total %                                                                                     Total %
                                                                        Col %                                                                                       Col %
                                                                        Row %                                                                                       Row %

                                          Restraint System-Use SORTED
                                                                        0          8554   3776        12330                                                         0        12722 11082 23804

restraints are more likely to have
                                                                                  24.96 11.02         35.98                                                                   37.12 32.34 69.46
                                                                                  37.61 32.77                                                                                 68.89 70.13

                                                                                                                                           Gender SORTED
                                                                                  69.38 30.62                                                                                 53.44 46.56
                                                                        1        14192    7748        21940                                                         1          5746     4720 10466
                                                                                  41.41 22.61         64.02                                                                   16.77 13.77 30.54
                                                                                  62.39 67.23                                                                                 31.11 29.87

functional airbags and that
                                                                                  64.69 35.31                                                                                 54.90 45.10
                                                                                 22746 11524          34270                                                                  18468 15802 34270
                                                                                  66.37 33.63                                                                                 53.89 46.11

                                        Tests                                                                                           Tests
                                          Source                                   DF        -LogLike RSquare (U)                          Source                              DF        -LogLike RSquare (U)

female drivers are less likely to
                                            Model                                    1        39.178          0.0018                        Model                                1       3.106        0.0001
                                            Error                                34268     21843.275                                        Error                            34268   23647.243
                                            C. Total                             34269     21882.453                                        C. Total                         34269   23650.350
                                            N                                    34270                                                      N                                34270

                                          Test                                       ChiSquare Prob>ChiSq                                  Test                                  ChiSquare Prob>ChiSq

be fatalities), the disparity was
                                            Likelihood Ratio                               78.356        <.0001                             Likelihood Ratio                         6.213        0.0127
                                            Pearson                                        77.795        <.0001                             Pearson                                  6.209        0.0127

                                          Fisher's Exact Test                                 Prob                                         Fisher's Exact Test                            Prob
                                            Left                                             1.0000                                         Left                                         0.0066
                                            Right                                            <.0001                                         Right                                        0.9939
                                            2-Tail                                           <.0001                                         2-Tail                                       0.0131

not significant enough to make a                                         Kappa
                                                         0.039578 0.004443
                                                                                   Std Err                                                                           Kappa
                                                                                                                                                          -0.01275 0.005111
                                                                                                                                                                               Std Err

                                          Kappa measures the degree of agreement.                                                          Kappa measures the degree of agreement.

definite conclusion. If Crash analyzed several years’ worth of data and found similar slight

disparities each time, then that might allow him to reasonably identify relationships between the

above variables, but since his data is only from 2002, the slight inequalities in weight could very

well be normal variation.

        Having failed to identify relationships involving Air                                                              Distributions

Bag and Restraint System or Gender and Injury Severity,
                                                                                                                                                                                                                      .10             .95
                                                                                                                                                                                                           .001 .01 .05 .25 .50 .75 .90 .99 .999

Crash moves on to a known culprit: Age. The histogram                                                                             60


and normal quantile plot for Age are to the right. It is plainly                                                                  30

evident that the number of drivers drops to zero below 16 and                                                                                                                                         -4 -3 -2 -1           0    1     2    3      4   5

                                                                                                                                                                                                  Normal Quantile Plot

that the distribution’s tails are not normal. In particular, the                                                              Quantiles
                                                                                                                                100.0% maximum                                97.000
                                                                                                                                99.5%                                         89.000
                                                                                                                                97.5%                                         83.000
plot’s residuals are positive for younger drivers (age less than                                                                90.0%                                         70.000
                                                                                                                                75.0%   quartile                              52.000
                                                                                                                                50.0%   median                                37.000
                                                                                                                                25.0%   quartile                              24.000
21) and older drivers (age greater than 50). Crash posits that                                                                  10.0%                                         19.000
                                                                                                                                2.5%                                          16.000
                                                                                                                                0.5%                                          16.000

Age’s significance will be dramatically reduced by its lower
                                                                                                                                0.0%   minimum                                 7.000
                                                                                                                                Mean                                         40.03274
                                                                                                                                Std Dev                                      19.00742
cutoff point.                                                                                                                   Std Err Mean
                                                                                                                                upper 95% Mean
                                                                                                                                lower 95% Mean                               39.83149
                                                                                                                                N                                               34270
                                                                                                                        Ressler 14

        To test this hypothesis, Crash runs a series of regressions whose results are summarized

in the table below:

              Restraint      Age            Gender          χ2 for
                                                          Air Bag        Success
                                                            WMT          Rate
            X           X          X            X           6172.692 0.701401
            X           X          X                        6136.883 0.701109
            X           X                       X           6135.714 0.700263
            X           X                                   6094.431 0.700671
            X                                               4906.507 0.69005
                        X                                   633.8483 0.573154
        Both Gender and Air Bag have a negligible impact on success rate, but whereas Gender

always increases it, the inclusion of Air Bag increases it if Gender is included, but decreases it if

Gender isn’t. The fact that Air Bag can actually decrease Success Rate if included and its

general failure to be significant suggest that it is not properly viewed within a regression of

individual variables. In other words, Air Bag is likely closely related to external variables such

as Impact Points and Most Harmful Events in the sense that the worse the fatal accident is, the

more likely an air bag will be deployed. Consequently, Air Bag is a poor predictor within a

context of individual variables18, and Crash eliminates it and Gender from his final regression.

As predicted, Age suffers considerably from its lower cutoff point, to the extent that Restraining
                                                                           Nominal Logistic Fit for Injury Severity SORTED
System alone is almost eight times as significant as Age                     Whole Model Test
                                                                             Model          -LogLikelihood      DF ChiSquare Prob>ChiSq
                                                                              Difference        3047.215         2    6094.431       0.0000
alone, and has a superior success rate by almost 12%.                         Full             20603.134
                                                                              Reduced          23650.350

Interestingly, Crash notes that the WMT χ2 for the combined                   RSquare (U)
                                                                              Observations (or Sum Wgts)
                                                                             Converged by Gradient
                                                                             Lack Of Fit
regression is far greater than the sum of its parts (6094.431 –              Source              DF -LogLikelihood ChiSquare
                                                                              Lack Of Fit       170        283.947 567.8934
                                                                              Saturated         172      20319.188 Prob>ChiSq

(4906.507+633.848) = 554.076) suggesting that Age and                         Fitted
                                                                             Parameter Estimates
                                                                                                  2      20603.134     <.0001

                                                                             Term                                 Estimate     Std Error ChiSquare Prob>ChiSq

Restraining System complement each other strongly19.                          Intercept
                                                                              Restraint System-Use SORTED
                                                                                                               0.11267265 0.0299576
                                                                                                               -1.7874446 0.0256615
                                                                              Age                              0.02171348 0.0006441           1136.4        <.0001
                                                                             For log odds of FATAL/NOT
                                                                             Effect Wald Tests
                                                                             Source                          Nparm      DF Wald ChiSquare Prob>ChiSq
                                                                              Restraint System-Use SORTED         1      1       4851.77182            0.0000
18                                                                            Age                                 1      1       1136.38046            0.0000
   For more on why Air Bag is a poor predictor, please see Appendix C.
   Or at least, more so than is noticeable in the other “dummy-stepwise” regressions. The ChiSquare for Age almost
doubles from its stand-alone value, while the ChiSquare for Restraint System decreases slightly.
                                                                                                                                Ressler 15

    Logistic Regression of Passenger Data
            A closer perusal of the passenger data allows Crash to examine Seating Position in a

meaningful context and to determine how the colinearity between Person Type and Age affects

the model from the passenger perspective. Crash’s initial regression is as follows:

Nominal Logistic Fit for Injury Severity SORTED                                                With the exception of the abysmal
  Whole Model Test
   Model         -LogLikelihood            DF ChiSquare Prob>ChiSq
   Difference         1147.983              4    2295.966      0.0000
                                                                                               significance level of Gender, Seating
   Full              12570.029
   Reduced           13718.012
                                                                                               Position has the lowest ChiSquare. Crash
   RSquare (U)                          0.0837
   Observations (or Sum Wgts)
  Converged by Gradient
                                                                                               suspects that Seating Position, like Air
  Lack Of Fit
   Source                 DF -LogLikelihood ChiSquare                                          Bag, is heavily dependent on external
   Lack Of Fit        743          516.702 1033.404
   Saturated          747        12053.327 Prob>ChiSq
   Fitted               4        12570.029     <.0001                                          factors, which could explain its relatively
  Parameter Estimates
                                                         Std Error ChiSquare Prob>ChiSq
                                                        0.0360995       370.77        <.0001
                                                                                               poor showing. He knows for a fact that
   Restraint System-Use SORTED             -1.1355824   0.0319055       1266.8        <.0001
   Gender SORTED
                                                                                               the effect of Seating Position on predicted
   Seating Position SORTED 1v2             -0.4471993   0.0336364       176.76        <.0001
  For log odds of FATAL/NOT
                                                                                               chance of being a fatality depends on both
  Effect Wald Tests
   Source                               Nparm      DF Wald ChiSquare Prob>ChiSq
   Restraint System-Use SORTED               1      1        1266.7959           0.0000        Impact Point-Principal and Air Bag. This
   Age                                       1      1       732.196063           0.0000
   Gender SORTED                             1      1       0.68290895           0.4086
   Seating Position SORTED 1v2               1      1       176.759869           0.0000        is plain to see when one notes that 63% of

  Distributions                                          all passengers in the data set were in crashes for which the
   Impact Point-Principal SORTED

                                             4           Impact Point-Principal was the front, suggesting greater danger

                                             2           for those seated in the front (who are the only ones with air

                                                         bags), as attested to by Seating Position’s -0.45 parameter

                                                         estimate. Though it might not be as significant as Restraint

                                                         System, Seating Position has the second largest parameter
       Level        Count       Prob
       3             3346     0.14830
       4             1815     0.08044
       Total        22563     1.00000
               4 Levels
                                                                                                                          Ressler 16

           It’s already evident to Crash that Age plays a relatively more significant role with

 passengers than it does with drivers, as its ChiSquare in the initial regression was more than half

 that of Restraint System’s as opposed to less than one fourth for the initial driver regression. Its
                                                                                normal quantile plot shows a very abnormal

                                             .10             .95 .99 .999
                                  .001 .01 .05 .25 .50 .75 .90                  distribution with especially skewed tails. The
     70                                                                         fact that the median is at 21 suggests that an
                                                                                incredible proportion of passengers involved in
                                                                                fatal car crashes were under 30. Given that
       0                                                                        automobile crashes are the leading cause of death
                                   -3   -2    -1    0     1     2    3      4

                                Normal Quantile Plot
                                                                                for people under 30, this histogram suggests that
                                                                                it’s in part due to the fact that of the passengers
    100.0% maximum   97.000
    99.5%            89.000
    97.5%            81.000                                                     in fatal automobile crashes, the majority are
    90.0%            64.000
    75.0% quartile   40.000
    50.0%   median   21.000                                                     under 30. Given this fascinating and unexpected
    25.0% quartile   15.000
    10.0%             6.000
    2.5%              1.000                                                     piece of information, Crash predicted that Age
    0.5%              0.000
    0.0%   minimum    0.000
                                                                                would play a much more relatively significant
    Mean             28.65842
    Std Dev          21.46074                                                   role compared to Restraint System for passengers
    Std Err Mean      0.14287
    upper 95% Mean   28.93847
    lower 95% Mean   28.37838                                                   than it did for drivers.
    N                   22563
                                                                                          Ressler 17

       Restraint           Age             Gender      χ2 for      Success
                                                       WMT         Rate
       X          X          X                         2295.966 0.717591
       X          X                                    2295.284 0.716882
       X          X                                    2116.238 0.716616
       X                                               1055.551 0.703364
                  X                                    854.1598 0.714621
         As was immediately obvious from the initial regression, Gender plays almost no

role in the determination of success rate, and can be safely and happily eliminated.

Seating Position plays a disappointingly small role, but Crash anticipated this based on its

dependency on external factors and Air Bag. This leaves Restraint System and Age as

the most significant factors, but what is truly exciting is that Age actually has a higher

stand-alone success rate than Restraint System! In fact, Restraint System contributes

only 0.2% to the success rate, suggesting that despite its lower ChiSquare value, Age is a

more powerful predictor than Restraint System for passenger data.

Nominal Logistic Fit for Injury Severity SORTED
 Whole Model Test
  Model          -LogLikelihood      DF ChiSquare Prob>ChiSq
   Difference        1058.119         2    2116.238       0.0000
   Full             12659.893
   Reduced          13718.012

   RSquare (U)                    0.0771
   Observations (or Sum Wgts)     22563
  Converged by Gradient
 Lack Of Fit
  Source              DF -LogLikelihood ChiSquare
   Lack Of Fit       193        206.299 412.5974
   Saturated         195      12453.595 Prob>ChiSq
   Fitted              2      12659.893     <.0001
 Parameter Estimates
  Term                                 Estimate     Std Error ChiSquare Prob>ChiSq
   Intercept                        -0.9506663 0.0292207           1058.5        <.0001
   Restraint System-Use SORTED      -1.0905452 0.0312436           1218.3        <.0001
   Age                              0.02277803 0.0007084           1033.9        <.0001
  For log odds of FATAL/NOT
 Effect Wald Tests
  Source                          Nparm      DF Wald ChiSquare Prob>ChiSq
   Restraint System-Use SORTED         1      1       1218.33062            0.0000
   Age                                 1      1       1033.88546            0.0000
                                                                                          Ressler 18

       Through a series of logistic regressions performed on all 56,833 individuals, on the

34,270 drivers, and on the 22,563 passengers, Crash the crash test dummy succeeded in

narrowing down the most significant individual factors that had the greatest effect on an

individual’s chance of survival in a fatal car crash. For both the main data set and the two

subsets, Crash found logistic regressions that allowed him to predict with greater than 70%

accuracy whether an individual from the data sets was a fatality or not.

       Of the variables that were discarded, Gender appeared to have negligible significance,

while Air Bag and Seating Position are most likely heavily dependent on external variables.

Crash still believes that Air Bag and Seating Position are significant individual variables that

have a great effect on an individual’s chance of survival in a fatal car crash, but to prove this one

must first find a more appropriate context than among other individual variables and also identify

which external variables have the greatest effect.

       The most significant variable in terms of parameter estimate χ2 for both drivers and

passengers was Restraint System, which also had the highest standalone success rate for drivers,

making it the most significant individual factor for drivers. On the other hand, Age had the

highest standalone success rate for passengers, defying its inferior ChiSquare value and making

it the most significant individual factor for passengers. In the data set as a whole, Restraint

System was the most significant variable, beating out Age due to the weakness of Age as a

predictor variable for drivers and due to the much greater proportion of drivers.
                                                                        Ressler 19

Crash’s Friends Revisited
                 Seatbelt Sid has every reason to tell Crash to buckle up-
                 under the all data model, it could improve his predicted
                 chances of survival by 76%!

                 Much as it pains him, Crash Jr. is better off young. In
                 every data set, danger increased with age, in spite of the
                 fact that the majority of passengers were under 22. On
                 the other hand, drivers comprised the majority of the
                 complete data set, and there were practically no drivers
                 younger than 16. In short, the abnormalities in age
                 demographics may have adversely affected results, but the
                 regressions all suggest that younger is safer.
                 The data was not significant enough to support any
                 conclusion on whether gender affects predicted survival
                 rate, whether of driver or passenger, hence Crash cannot
                 conclude whether Crashella is safer in general, a safer
                 driver, or safer as a passenger.
                 Based on the complete data set’s best regression,
                 Driver Dan could improve his predicted chance of
                 survival by 32% if he lets someone else drive.

                 Backseat Bob is right to be leery of the front seat when
                 63% of the passengers in the data set were in a fatal
                 collision whose principal impact was from the front.
                 Nevertheless, due to the omission of powerful external
                 variables, Crash and Backseat Bob can’t conclude with a
                 high significance level that the front seat disimproves
                 one’s chances of survival. Until they identify and include
                 such variables, Crash Jr. will have to suffer in the second
                 Again, due to the omission of significant, related
                 variables, Airbag Al can’t say with significance whether
                 airbags improve survival rate; according to Crash’s
                 model, in some cases they disimprove a person’s survival
                 rate. For more information, please see Appendix C.
                                                                                        Ressler 20

Questions Raised for Further Study

  •   Why do young people form such a large proportion of passengers involved in fatal car

      crashes? Could it be that ceteris paribus, young people are more likely to be a fatality?

  •   Is there a possible connection between Seat Position and Age, based on parental decisions

      like Crash’s?

  •   Could one make a better estimate for Seating Position by looking at only cars which had

      one passenger who would not depend on the Seating Position of others and could choose

      front or back freely?

  •   Is there a way to account for the fact that some individuals represented the same car and

      the same accident and are hence not independent?

  •   What effect would sorting Age into nominal groups have? Would this improve

      prediction success rates?

  •   Which external factors have the greatest effect on the significance of Air Bag and the

      significance of Seating Position?

  •   How is the survival rate of those in the front seat affected by whether those in the back

      seat used Restraining Systems? (E.g., if the person sitting behind the driver is not using a

      restraining system and there is a head-on collision, could the driver be killed by the

      impact of the person hitting from behind?)
                                                                                         Ressler 21

Appendix A: Data Selection Criteria
       Below are tables from the FARS website which list the codes used to quantify each

variable used in the analysis (in the order they appear in the FARS JMPIN data table). The bold

codes are those used to select individuals for the analysis; in other words, each individual used in

the analysis has one of the bold codes in every category. Individuals who do not have a bold

code for every category are discarded. In general, codes that correspond to unknown data are

discarded, unless they fit in with the sorting properties (e.g., an airbag deployed from an

unknown direction still counts as a deployed airbag). Immediately below is a summary of how

each category was sorted for the analysis.

Impact Point- Principal           11-1: 1 (Front)
                                  8-10: 2 (Left)
                                  2-4: 3 (Right)
                                  5-7: 4 (Rear)
Age                               Not sorted
Air Bag Availability/Function     < 10: 1 (Airbag Deployed)
                                  ≥10: 0 (No Airbag Deployed)
Injury Severity                   < 4: 0 (Not Fatal)
                                  4: 1 (Fatal)
Person Type                       1: 0 (Driver)
                                  2: 1 (Passenger)
Restraint System-Use              0: 0 (No Restraint System-Use)
                                  1-4, 8-13: 1 (Restraint System Used)
Seating Position                  11-19: 0 (Front Seat)
                                  21-29: 1 (Second Seat)
Sex (Referred to as Gender)       1: 0 (Male)
                                  2: 1 (Female)
Body Type                         1-9: 1 (Automobiles)
                                  14-16, 19: 2 (Utility Vehicles)
                                  20-29: 3 (Vans)
                                  30-39: 4 (Pickup Trucks)
                                                                                       Ressler 22

                                  Impact Point-Principal
              Code        Definition
                 _        Blank
                 0        Non-Collision
                 1        1 Clock Point
                 2        2 Clock Point
                 3        3 Clock Point
                 4        4 Clock Point
                 5        5 Clock Point
                 6        6 Clock Point
                 7        7 Clock Point
                 8        8 Clock Point
                 9        9 Clock Point
                10        10 Clock Point
                11        11 Clock Point
                12        12 Clock Point
                13        Top
                14        Undercarriage
                99        Unknown

* Only the Clock Point codes are included as it is hypothesized that they have collinear
properties with Seating Position.

              Code        Definition
                 _        Blank
                 0        Up To One Year
                 1        1 Year
                 2        2 Years
                 3        3 Years
                 4        4 Years
                 5        5 Years
                 6        6 Years
                 7        7 Years
                 8        8 Years
                 9        9 Years
                10        10 Years
                11        11 Years
                12        12 Years
                13        13 Years
                14        14 Years
                15        15 Years
                16        16 Years
                17        17 Years
                18        18 Years
                19        19 Years
                20        20 Years
                21        21 Years
                22        22 Years
                Ressler 23

23   23 Years
24   24 Years
25   25 Years
26   26 Years
27   27 Years
28   28 Years
29   29 Years
30   30 Years
31   31 Years
32   32 Years
33   33 Years
34   34 Years
35   35 Years
36   36 Years
37   37 Years
38   38 Years
39   39 Years
40   40 Years
41   41 Years
42   42 Years
43   43 Years
44   44 Years
45   45 Years
46   46 Years
47   47 Years
48   48 Years
49   49 Years
50   50 Years
51   51 Years
52   52 Years
53   53 Years
54   54 Years
55   55 Years
56   56 Years
57   57 Years
58   58 Years
59   59 Years
60   60 Years
61   61 Years
62   62 Years
63   63 Years
64   64 Years
65   65 Years
66   66 Years
67   67 Years
68   68 Years
69   69 Years
70   70 Years
71   71 Years
72   72 Years
                                                     Ressler 24

 73    73 Years
 74    74 Years
 75    75 Years
 76    76 Years
 77    77 Years
 78    78 Years
 79    79 Years
 80    80 Years
 81    81 Years
 82    82 Years
 83    83 Years
 84    84 Years
 85    85 Years
 86    86 Years
 87    87 Years
 88    88 Years
 89    89 Years
 90    90 Years
 91    91 Years
 92    92 Years
 93    93 Years
 94    94 Years
 95    95 Years
 96    96 Years
 97    97 Years or Older
 99    Unknown

       Air Bag Availability/Function
Code   Definition
 0     Non-Motorist
 1     From the FRONT
 2     From the SIDE
 7     From OTHER Direction
 8     From MULTIPLE Directions
 9     From UNKNOWN Direction
 20    Airbag Available-NO DEPLOYMENT
 28    Airbag Available-SWITCHED OFF
 29    Airbag Available-UNKNOWN IF DEPLOYED
 30    Not Available (This Seat)
 31    Previously Deployed/Not Replaced
 32    Disabled/Removed
 99    Unknown if Airbag Available (For this Seat)

                      Injury Severity
Code   Definition
 _     Blank
 0     No Injury (0)
 1     Possible Injury (C)
 2     Nonincapacitating Evident Injury (B)
 3     Incapacitating Injury (A)
                                                                                         Ressler 25

                 4        Fatal Injury (K)
                 5        Injured, Severity Unknown
                 6        Died Prior to Accident*
                 9        Unknown

                                           Person Type
               Code       Definition
                 _        Blank
                 1        Driver of a Motor Vehicle in Transport
                 2        Passenger of a Motor Vehicle in Transport

                3   *     Occupant of a Motor Vehicle Not in Transport

                 4        Occupant of a Non-Motor Vehicle Transport Device
                 5        Pedestrian
                 6        Bicyclist
                 7        Other Cyclist
                 8        Other Pedestrians
                 9        Unknown Occupant Type in a Motor Vehicle in Transport
                 19       Unknown Type of Non-Motorist
                 99       Unknown Person Type

* Code 3 is eliminated as there is no available definition for “in Transport”, which might refer to
a motor vehicle that is in motion (not stopped at a traffic light) or to a motor vehicle that is en
route (rather than parked).

                                  Restraint System-Use
               Code       Definition
                 _        Blank
                 0        Non Used - Vehicle Occupant; Not Applicable
                 1        Shoulder Belt
                 2        Lap Belt
                 3        Lap and Shoulder Belt
                 4        Child Safety Seat
                 5        Motorcycle Helmet
                 6        Bicycle Helmet
                 8        Restraint Used - Type Unknown
                13        Safety Belt Used Improperly
                14        Child Safety Seat Used Improperly
                 15       Helmets Used Improperly
                 99       Unknown

                                       Seating Position
               Code       Definition
                 _        Blank
                 0        Non-Motorist
                11        Front Seat - Left Side(Driver's Side)
                12        Front Seat - Middle
                13        Front Seat - Right Side
                18        Front Seat - Other
                19        Front Seat - Unknown
                                                                                Ressler 26

 21    Second Seat - Left Side
 22    Second Seat - Middle
 23    Second Seat - Right Side
 28    Second Seat - Other
 29    Second Seat - Unknown
 31    Third Seat - Left Side *
 32    Third Seat - Middle *
 33    Third Seat - Right Side *
 38    Third Seat - Other *
 39    Third Seat - Unknown
 41    Fourth Seat - Left Side
 42    Fourth Seat - Middle
 43    Fourth Seat - Right Side
 48    Fourth Seat - Other
 49    Fourth Seat - Unknown
 50    Sleeper Section of Cab (Truck)
 51    Other Passenger in enclosed passenger or cargo area
 52    Other Passenger in unenclosed passenger or cargo area
       Other Passenger in passenger or cargo area, unknown whither or not
 54    Trailing Unit
 55    Riding on Vehicle Exterior
 99    Unknown

Code   Definition
 _     Blank
 1     Male
 2     Female
 9     Unknown

                         Body Type
Code   Definition
 _     Blank
 1     Convertible(excludes sun-roof,t-bar)
 2     2-door sedan,hardtop,coupe
 3     3-door/2-door hatchback
 4     4-door sedan, hardtop
 5     5-door/4-door hatchback
 6     Station Wagon (excluding van and truck based)
 7     Hatchback, number of doors unknown
 8     Sedan/Hardtop, number of doors unknown
 9     Other or Unknown automobile type
       Auto-based pickup (includes E1 Camino, Caballero, Ranchero, Subaru
       Brat,Rabbit Pickup)
 11    Auto-based panel (cargo station wagon, auto-based ambulance or hearse)
 12    Large Limousine-more than four side doors or stretched chassis
 13    Three-wheel automobile or automobile derivative
       Compact utility (Jeep CJ-2-CJ-7, Scrambler, Golden Eagle, Renegade,
       Laredo, Wrangler, .....)
                                                                                   Ressler 27

     Large utility (includes Jeep Cherokee [83 and before], Ramcharger,
     Trailduster, Bronco-fullsize ..)
     Utility station wagon (includes suburban limousines, Suburban,
     Travellall, Grand Wagoneer)
19   Utility, Unknown body type
     Minivan (Chrysler Town and Country, Caravan, Grand Caravan,
     Voyager, Grand Voyager, Mini-Ram, ...)
     Large Van (B150-B350, Sportsman, Royal Maxiwagon, Ram,
     Tradesman, Voyager [83 and before], .....)
22   Step van or walk-in van
23   Van based motorhome
24   Van-based school bus
25   Van-based transit bus
28   Other van type (Hi-Cube Van, Kary)
29   Unknown van type
     Compact pickup (GVWR <4,500 lbs.) (D50,Colt P/U, Ram 50, Dakota,
     Arrow Pickup [foreign], Ranger, ..)
     Standard pickup (GVWR 4,500 to 10,00 lbs.)(Jeep Pickup, Comanche,
     Ram Pickup, D100-D350, ......)
32   Pickup with slide-in camper
33   Convertible pickup
39   Unknown (pickup style) light conventional truck type
     Cab chassis based (includes light stake, light dump, light tow, rescue
41   Truck based panel
42   Light truck based motorhome (chassis mounted)
45   Other light conventional truck type (includes stretched suburban limousine)
48   Unknown light truck type (not a pickup)
49   Unknown light vehicle type (automobile, van, or light truck)
50   School Bus
51   Cross Country/Intercity Bus (i.e., Greyhound)
52   Transit Bus (City Bus)
58   Other Bus Type
59   Unknown Bus Type
60   Step van
61   Single unit straight truck (10,000 lbs < GVWR < or= 19,500 lbs)
62   Single unit straight truck (19,500 lbs < GVWR < or= 26,000 lbs.)
63   Single unit straight truck (GVWR > 26,000 lbs.)
64   Single unit straight truck (GVWR unknown)
65   Medium/heavy truck based motorhome
66   Truck-tractor (Cab only, or with any number of trailing unit; any weight)
67   Medium.Heavy Pickup
     Unknown if single unit or combination unit Medium Truck (10,000 < GVWR <
72   Unknown if single unit or combination unit Heavy Truck (GVWR > 26,000)
73   Camper or motorhome, unknown truck type
78   Unknown medium/heavy truck type
79   Unknown truck type (light/medium/heavy)
80   Motorcycle
81   Moped (motorized bicycle)
82   Three-wheel Motorcycle or Moped - not All-Terrain Vehicle
83   Off-road Motorcycle (2-wheel)
                                                                             Ressler 28

88   Other motored cycle type(minibikes, Motorscooters)
89   Unknown motored cycle type
90   ATV (All-Terrain Vehicle; includes dune/swamp buggy - 3 or 4 wheels)
91   Snowmobile
92   Farm equipment other than trucks
93   Construction equipment other than trucks (includes graders)
97   Other vehicle type (includes go-cart, fork-lift, city street seeeper)
99   Unknown body type
                                                                            Ressler 29

Appendix B: Histograms for All Data
Distributions                         Distributions
 Injury Severity SORTED                Restraint System-Use SORTED

     1                                     1

     0                                     0

   Frequencies                           Frequencies
     Level        Count    Prob            Level        Count    Prob
     0            34338 0.60419            0            21270 0.37425
     1            22495 0.39581            1            35563 0.62575
     Total        56833 1.00000            Total        56833 1.00000
             2 Levels                              2 Levels

Distributions                         Distributions

 Gender SORTED                         Person Type SORTED

                                  1                                     1

                                  0                                     0

   Frequencies                           Frequencies
                                           Level        Count    Prob
     Level        Count    Prob
                                           0            34270 0.60299
     0            35516 0.62492
                                           1            22563 0.39701
     1            21317 0.37508
                                           Total        56833 1.00000
     Total        56833 1.00000
                                                   2 Levels
             2 Levels
                                                                                Ressler 30

Distributions                           Distributions
 Seating Position SORTED 1v2             Airbag SORTED


     1                                       1


     0                                       0

   Frequencies                             Frequencies
     Level        Count     Prob             Level        Count     Prob
     0            47696 0.83923              0            41891 0.73709
     1             9137 0.16077              1            14942 0.26291
     Total        56833 1.00000              Total        56833 1.00000
             2 Levels                                2 Levels
Distributions                           Distributions
 Impact Point-Principal SORTED            Body Type SORTED

     4                                       4                              4

                                    2                                       3

     3                                       3                              2

     2                                       2

     1                                       1

   Frequencies                              Frequencies
     Level        Count     Prob             Level        Count     Prob
     1            37954   0.66782            1            32845   0.57792
     2             7773   0.13677            2             7471   0.13146
     3             7218   0.12700            3             5036   0.08861
     4             3888   0.06841            4            11481   0.20201
     Total        56833   1.00000            Total        56833   1.00000
             4 Levels                                4 Levels
                                                                                         Ressler 31


                                                    .10             .95
                                         .001 .01 .05 .25 .50 .75 .90 .99 .999

                                     -4 -3 -2 -1          0    1     2    3      4   5

                                   Normal Quantile Plot

     100.0% maximum     97.000
     99.5%              89.000
     97.5%              82.000
     90.0%              68.000
     75.0%   quartile   48.000
     50.0%   median     31.000
     25.0%   quartile   19.000
     10.0%              15.000
     2.5%                3.000
     0.5%                0.000
     0.0%   minimum      0.000
     Mean               35.51708
     Std Dev            20.77647
     Std Err Mean        0.08715
     upper 95% Mean     35.68790
     lower 95% Mean     35.34626
     N                     56833
                                                                                             Ressler 32

Appendix C: Problems with Airbags
         There are a number of reasons why airbags may increase one’s chances of being a fatality

within this dataset20.

     •   Airbags can kill people who don’t wear seatbelts, adult or child, particularly in frontal
         crashes where pre-crash braking throws the victim forward so their heads are close to the
         airbag when it deploys.
     •   Airbags increase risk for right-front passengers less than 13 years old.
     •   Airbags by themselves protect only in frontal crashes.
     •   Airbags are not designed to deploy in side, rear, or rollover crashes.
     •   Airbags have a negligible effect in non-frontal crashes.
     •   Absolute benefits are larger for unbelted drivers, but they still have a lower chance of
     •   Airbags are less effective for older drivers.

     Despite these problems, Crash did succeed in finding a set of subsets that verify that the

above problems were, in fact, the problem. Crash took only those accidents for which the

principal impact was at 12 o’clock, that is, the accidents for which airbags were the most critical.

He then ran a regression using injury severity as the nominal dependent variable and using Air

Bag as the continuous independent variable for each of the below subsets. It makes sense that

belted drivers’ risk would increase if their air bag deployed, as this indicates a more severe

accident, and that unbelted drivers are better off using airbags, since the absolute benefits are

greater for them. The fact that airbags increased risk for passengers under the age of 13

corresponds with the known facts, while the fact that airbags decreased risk for older passengers

may be the expected result. Note, however, the low ChiSquares. Airbags are still problematical.

                             Estimate            Std. Error           ChiSquare   Prob>ChiSquare
12 Drivers Belted            0.36908419          0.0404951            83.07       <0.0001
12 Drivers Not Belted        -0.0592098          0.0516979            1.31        0.2521
12 Passengers Child          0.17742011          0.2424533            0.54        0.4643
12 Passengers Adult          -0.0623108          0.0597072            1.09        0.2967

  For more information, please see or
Ressler 33

Shared By: