JMP Tutorial #2 - Bivariate Displays for Categorical Data (i.e. Contingency Tables and Correspondence Analysis)
Data File: Seatbelt-injury.JMP Background: These data come from a study to examine the relationship
between the extent of injuries in an automobile accident and use of seatbelts. In particular, individuals were classified as sustaining no injuries, minor injuries, major injuries, or death. Their use of seatbelts was also recorded as none, lap belt only, or lap belt and shoulder harness.
Variables: Belt Use - variable representing the use of seat belts.
(None, Belt only or Belt+harness) Injury - variable representing extent of injury. (None, Minor, Major, or Death) Freq - number of individuals in each use/injury category.
Goal:
Investigate any relationship that may exist between seat belt use and extent of injury
A First Approach:
By using the Distribution option we begin by examining univariate displays for both seat belt use and injury type. By clicking on the different injury categories you can observe the shading of those individuals in the belt use bar graph to get an idea of the differences in belt use across the injury categories. By using the Shift key (for selecting multiple adjacent categories) or Ctrl key (for selecting multiple non-adjacent categories) you can click on multiple bars at once. Here we have selected both Death and Major injury categories.
From the above display we can see that the most severe injuries are found predominantly in the no belt or belt only categories. Next we use the Chart option from the Graph menu to construct bar charts of injury type broken down by seatbelt use. More specifically, FIRST select % of Total from the Statistics pull-down menu. Next place Injury in the X, Level box and Belt Use in the Grouping box and click OK. The resulting bar graphs are shown below. To change to a pie chart representation, select the Pie option the Chart pull-down menu.
Here we can that the most severe injuries are associated with individuals who did not use belts.
Contingency Tables and Mosaic Plots:
We will now use the Fit Y by X option to examine the relationship between Belt Use (X) and Injury (Y). The mosaic plot and contingency table for these data are shown below.
From the mosaic plot we can clearly see that individuals using both their lap belt and shoulder harness had highest proportion of minor / no injuries sustained while individuals that did not use any restraint had the highest proportion of severe injuries or death. The contingency table for these data is shown below with Row % added to each cell. The total, row and column percentages are included by default with each crosstab in JMP. To avoid clutter you may choose to remove the column and total percents from the table by unselecting these options from the Contingency Analysis pull-down menu.
These percentages can be interpreted as the conditional chance or probability of each injury classification within each belt use category. (e.g. P(Death| No Belt) = .0625 or 6.25 % and P(Death|Belt)=.0429 or 4.29%)
Correspondence Analysis :
Finally we use correspondence analysis to examine the relationship between injury and belt use. To do this select Correspondence Analysis from the pull-down menu to the left of the Contingency Analysis main heading. The results are displayed below.
From the above plot we can see that major injuries and death are most closely associated with individuals who were not wearing any seatbelts. While individuals who were using both their lap belt and shoulder harness are most closely associated with no injuries or only minor injuries sustained.
Note: The ovals were added using AutoShapes in Word.