chi square - PowerPoint by qjVIdpX

VIEWS: 40 PAGES: 21

• pg 1
```									Survey of Statistical Methods

April 25, 2005
Sample Problem
• Suppose a bill that proposes to lower the
legal drinking age to 18 is pending before
the state legislature. A political scientist
is interested in determining whether
there is an association between political
affiliation and attitude toward the bill.
He sends out a survey and receives
Democrats.
Data Collection
• A questionnaire was sent to 1,200 individuals.
A total of 500 questionnaires were returned
(42%), however due to incomplete data, the
final sample size for this study was N=400.
• Political affiliation
– Response to question #4 (Do you consider yourself to
be Republican or Democrat?)
• Attitude toward the bill
– Response to question #15 (How would your
characterize your attitude toward lowering the drinking
age to 18? For, Against, or Undecided?)
Chi Square Test of Independence
• Purpose
– To determine if two variables of interest independent (not related) or are
related (dependent)?
– When the variables are independent, we are saying that knowledge of one
gives us no information about the other variable. When they are dependent,
we are saying that knowledge of one variable is helpful in predicting the
value of the other variable.
– The chi-square test of independence is a test of the influence or impact that a
subject‟s value on one variable has on the same subject‟s value for a second
variable.
– Some examples where one might use the chi-squared test of independence
are:
• Is level of education related to level of income?
• Is the level of price related to the level of quality in production?
• Is one party affiliation related to the person's preferred television network?

• Hypotheses
– The null hypothesis is that the two variables are independent. This will be
true if the observed counts in the sample are similar to the expected counts.
• H0: X and Y are independent
• H1: X and Y are dependent
Displaying Independent and
Dependent Relationships
When the variables are                                                    When group membership makes
independent, the proportion in                                            a difference, the dependent
both groups is close to the                                               relationship is indicated by one
same size as the proportion                                               group having a higher
for the total sample.                                                     proportion than the proportion
for the total sample.

Independent Relationship                                                                             Dependent Relationship
betw een Gender and College                                                                         betw een Gender and College
Poportion Attending College

Poportion Attending College
100%                                                                                                100%
80%                                                                                                 80%
60%
60%                                                                                                 60%
40%        40%          40%                                                                                          40%
40%                                                                                                 40%
20%
20%                                                                                                 20%
0%                                                                                                  0%
Males     Females        Total                                                                      Males   Females   Total

From: http://www.utexas.edu/courses/schwab/sw318_spring_2004/SolvingProblems/Class24_ChiSquareTestOfIndependencePostHoc.ppt
Chi Square Test of Independence

• Wording of Research questions
– Are X and Y independent?
– Are X and Y related?
– The research hypothesis states that the two
variables are dependent or related. This will be
true if the observed counts for the categories of
the variables in the sample are different from
the expected counts.

• Level of Measurement
– Both X and Y are categorical
Assumptions
Chi Square Test of Independence
• Each subject contributes data to only one cell

• Finite values
– Observations must be grouped in categories. No assumption is made about
level of data. Nominal, ordinal, or interval data may be used with chi-
square tests.

• A sufficiently large sample size
– In general N > 20.
– No one accepted cutoff – the general rules are
• No cells with observed frequency = 0
• No cells with the expected frequency < 5
• Applying chi-square to small samples exposes the researcher to an
unacceptable rate of Type II errors.
Note: chi-square must be calculated on actual count data, not substituting
percentages, which would have the effect of pretending the sample size is
100.
idno   political   voting
1       1          1                Raw Data
2       1          1
3       1          2
for 1st 20 subjects
4       1          2                 (N=400)
5       1          3
6
7
1
1
3
1
• Political
8       1          1        1 = Republican
9       1          2
10       1          2
2 = Democrat
11       2          3      • Voting
12       2          3
13       2          1
on a bill that proposes to lower
14       2          1        the legal drinking age to 18
15       2          2        1 = For
16       2          2
17       2          3        2 = Against
18       2          3        3 = Undecided
19       2          1
20       2          1
Setup for Analysis

• Research Question
– Is there an association between political affiliation
and attitude toward the bill?
• Statistical Hypotheses
– H0: Political affiliation and attitude toward the bill
are independent
– H1: Political affiliation and attitude toward the bill
are not independent
• Level of Significance
– α = .05
How to Compute the Chi Square Test of
Independence using SPSS

• Analyze – Descriptive Statistics – Crosstabs

• Do not go to
– Analyze – Nonparametric – Chi Square
– This is a different type of Chi Square Test
Set up in SPSS for a
Chi Square Test of Independence
Analyze – Descriptive Statistics --
Crosstabs
Refer to handout of the SPSS output for
information about how to interpret the results to
reach the following conclusions

• There is a significant association between
political affiliation and attitude toward
the bill [2(2) = 6.0: p=.05].

• More democrats are FOR the bill.

• More republicans are AGAINST the bill.
120

100
# of respondents
80

60
40

20
Democrat
0
Republican
For
Against
Undecided

Figure 1a. Histogram showing the association between
political affiliation and attitude toward the bill. A Chi-
Square Test of Independence revealed a significant
relationship between these variables [X2(2)=6.0: p=.05].
Democrats were found to have a more favorable attitude
toward the bill than Republicans.
120

100
# of respondents
80

60
40

20
Democrat
0
Republican
For
Against
Undecided

Figure 1b. Histogram showing the association between
political affiliation and attitude toward the bill. A Chi-
Square Test of Independence revealed a significant
relationship between these variables [X2(2)=6.0: p=.05].
Republicans were found to have a more unfavorable
attitude toward the bill than Democrats.
• How to determine the Critical Region for
the Test Statistic by hand
– Utilizes the Chi Square Distribution
– df = (r-1)*(c-1) = (2-1)*(3-1) = 1*2 = 2
df = 2

Critical X2 =
= 5.991
Post-Hoc analysis for a Chi Square Test of Independence
Which Cell or Cells Caused the Difference

• You can conduct a post-hoc procedure only if the result of the chi-
square test was statistically significant.
• Examination of percentages in the contingency table and expected
• The residual, or the difference, between the observed frequency
and the expected frequency is a more reliable indicator.
• Notice the values labeled “standardized residual” that is computed
for each cell. This value is a z-score.
• Compare the value for each standardized residual against the
critical z-value for your α level.
• This is equivalent to testing the null hypothesis that the actual
frequency equals the expected frequency for a specific cell versus
the research hypothesis of a difference greater than zero.
• There can be 0, 1, 2, or more cells with statistically significant
standardized residuals to be interpreted.
Interpreting Standardized Residuals

• Standardized residuals that have a positive value
mean that the cell was over-represented in the
actual sample, compared to the expected
frequency, i.e. there were more subjects in this
category than we expected.

• Standardized residuals that have a negative value
mean that the cell was under-represented in the
actual sample, compared to the expected
frequency, i.e. there were fewer subjects in this
category than we expected.
Post Hoc Strategy
for the Chi Square Test of Independence

• If there is at least one cell with a significant
standardized residual
– Formulate your conclusion based on a comparison of all of
the cells containing significant standardized residuals.

• If none of the cells have a significant standardized
residual
– Interpret the findings based on a comparison of the „sign (+
or -)‟ of the largest values for the standardized residuals.
– Apply caution when this is the case!
What is a Categorical Variable?
•   A categorical variable represents a set of discrete
events, such as groups, decisions, or anything else that
can be classified into categories. In contrast to a
continuous variable, a value of a categorical variable
indicates a discrete category, whereas a value of a
continuous variable can fall on any point on a numeric
continuum. One example of a categorical variable is a
person's sex, which can be represented by two
exhaustive and mutually exclusive categories: male and
female. A categorical variable may also consist of more
than two categories. For example, a person's major in
college can be categorized as biology, history,
engineering, psychology, etc.

•   A categorical variable can be ordered or unordered. For instance, a person's level of schooling is an ordered variable;
a person's sex is an unordered variable. Although the levels of a categorical variable are often represented by
numerals, these symbols are not interpreted numerically if the variable is unordered.

•   Categorical data are often presented in a contingency table which tabulates the number of observations that fall into
each cell of the table. The table above is a simple 2 x 2 contingency table that crosstabulates whether graduate school
applicants were accepted or rejected and whether they were male or female. Each cell represents a joint event which
is a unique combination of the categorical variables. The crosstabulation of the graduate school's decision and
applicant's gender results in four possible outcomes, or joint events. For example, in Table 1, the joint event
representing rejected males contains 166 observations. Marginal events refer to the total number of observations for a
category of a particular variable. Here, the marginal event for the category female is 245.

From: http://www.utexas.edu/cc/docs/stat57.html#variable

```
To top