Cross Tabulation and
Chi-Square
Business Research Methods
January 2002
Cross Tabulation
• Tests whether a relationship exists in
data collected
• i.e. tests whether there is a contingency
between two variables
• Tests whether there are any differences
or similarities in the responses between 2
or more variables
• Usually only do cross tabs between 2
variables that make sense
• Most common x-tabs are by personal
characteristics: such as?…..
- Gender
- Age
- Income level
- Marital status
- Place of residence
- Ethno-cultural background
- Type of household
- Educational background
Example
• Conducting research on the number of
accident claims for a car insurance
company
• Want to see if the number of claims
varies by different types of respondents
• What would be some other meaningful x-
tabs for insurance claims other than
personal characteristics?
Meaningful x-tabs
• Type of car (sports, family, mini-van)
• Whether the driver has any previous
driving convictions
• Whether the driver has taken driving
lessons as a youth
• Quality of vision
• Colour of hair (but would this one be
meaningful?)
Number of insurance claims by gender,
NDJ insurance 2001
Number of Males Females Total
claims
0 10 032 13 478 23 510
1 2 156 1 430 3 586
2 129 25 145
3 13 4 17
Total 12 321 14 937 27 258
• is there a difference between the number
of claims made by gender?
• difficult to tell by absolute numbers
Number of insurance claims by gender,
NDJ 2001
Number of Males % Females %
claims
0 81.4 90.2
1 17.5 9.6
2 1.1 0.2
3 0.1 0.0
Total 100.0 100.0
• Conclusion? Yes there is a
difference…but is it statistically significant?
Need to do a chi-square test to determine
Chi-square test
• It enables you to find out if the values for
the two variables are independent or
associated
• If they are independent, there is no
relationship, i.e. the number of claims
does not vary significantly by gender
• If they are associated, there is a
relationship, i.e. the number of claims
does vary significantly by gender
2 Requirements for Chi-square test
1 Try at least to get 50 cases in each sub-
group of the variables being cross
tabulated
E.g. want to examine relationship
between age and number of claims
would need at least 50 cases in each age
group
i.e. 18-24, 25-34, 35-44, 45-54, 55-64, 65+
if not, collapse sub-groups: 18-34, 35-54,
and 55+
2 Requirements for Chi-square test
2 No more than 20% of cells have less
than 5 expected responses
therefore try to collapse the number of
cells whenever possible
example….
Example
Number of 18-24 25-34 35-44 45+
Holidays per years years years years
year
0 12 4 7 10
1 8 12 7 5
2 8 29 10 4
3 3 2 14 6
4 16 2 11 4
5 4 14 4 7
6 6 13 2 2
7-10 2 2 4 1
11 or more 0 1 4 2
• Number of cells = 36, lots with less than 5
(i.e. more than 20% in fact 50%)
• Thus collapse number of categories...
Thus...
Number of 18-34 35 +
Holidays per yr years years
0 16 17
1-2 57 26
3-4 23 35
5+ 40 26
• Number of cells = 8, none less than 5
• Now meets both requirements
How to check if relationship of 2
variables is associated...
• Run the cross tab for your 2 variables
• Check the chi-square value and the
degrees of freedom
• Need to then check against Table 2
“Critical Values for Chi-Square
Probability” to see if it is significant
• Table 2, assume we are working at .05
probability (relates to confidence level
of 95%)
How to check if relationship of 2
variables is associated...
• if the chi-square stat on your cross tab
analysis is higher than this value, your
two variables are said to be associated,
i.e. there is a relationship between the
two variables
• You can then statistically be confident
in saying that the number of claims is
related to gender