# Chi-square

Document Sample

```					Chi-square
A very brief intro
Distinctions
   The distribution
– Chi-square is a probability distribution
 A special case of the gamma distribution
– The t and F are derived from it
 t = ratio of normal to chi-square
 F = ratio of two chi-square distributions
   Goodness of fit tests
– You may see it as the test statistic in a variety of procedures to
determine if some data ‘fits’ what is theoretically expected
   Tests of independence
– Assesses whether paired observations on two categorical
variables are independent of each other
 Contingency table
Goodness of Fit
   Does the data conform to expectations?                               Psych ol o gy Program

Observed N Expect ed N    Residual

The following are program numbers for
Clinical                        7         7.0           .0
                                                     Clinical Health                6           7.0       -1.0
5700                                              Experimen t al (O  E )
 
2
2
5           7.0       -2.0

If we expected a balanced distribution,           Coun seling P hD   E           8           7.0        1.0

does the data suggest that is true?
Coun seling Masters            9           7.0        2.0
T ot al                       35

   Calculation: Sum the squared differences
of the observed frequencies and
expected frequencies, divided by the
expected                                                           (O  E ) 2
   X2 = 1.4286, df = 4, p-value = 0.84
X2 
E
   Conclusion? Not statistically different
from expectations                          X2 
(7  7) 2 (6  7) 2 (5  7) 2 (8  7) 2 (9  7) 2
                           
7         7         7         7         7
   Note however that we wouldn’t expect a
balanced distribution, and could have
changed our expected values to conform
to a more reasonable estimate based on
past entry rates.
Independence
 Moving beyond the single variable, we can
test for the independence of two
categorical variables
 What do undergrad stat students do with
their free time?
Updating their            Talking on cell phone    Texting instead of just    Staring at Ceiling
whatever blog thing       loudly enough that now   actually talking to them
whose contents will get   total strangers know
them fired from some      how the ‘tests’ turned
job in the future         out

Males
30                        40                       20                         10
Females
20                        30                       40                         10
   Is there a relationship between gender
and what the stats kids do with their
free time?
Updating their            Talking on cell phone    Texting instead of just    Staring at the ceiling
whatever blog thing       loudly enough that now   actually talking to them
whose contents will get   total strangers know
them not hired/ fired     how the ‘tests’ turned
from some job in the      out
future

Males
30                        40                       20                         10                       100
Females
20                        30                       40                         10                       100
50                        70                       60                         20                       200

 Expected = (Ri*Cj)/N
 Example for males Updating:
(100*50)/200 = 25

Updating their            Talking on cell phone    Texting instead of just    Staring at the ceiling
whatever blog thing       loudly enough that now   actually talking to them
whose contents will get   total strangers know
them not hired/ fired     how the ‘tests’ turned
from some job in the      out
future

Males (E)     30 (25)                   40 (35)                  20 (30)                    10 (10)                  100
Females (E)   20 (25)                   30 (35)                  40 (30)                    10 (10)                  100

50                        70                       60                         20                       200

   df = (R-1)(C-1)
Interpretation
 X2 = 10.0952, df = 3, p-value = 0.018
 Reject H0, there is some relationship
between gender and how stats students
spend their free time
Assumptions
   Obviously the data itself does not have to be
any particular distribution
– Nonparametric
   Independence
– As usual, we assume observations are independent of
one another
   Inclusion of non-occurences
– The data must include all categories of information
– You put ‘Don’t know’ as a response on your survey,
suffer the consequences!1
Other Versions/Extensions
 For 2 x 2: Yates correction, Fisher’s exact
test
 Beyond the two-way setting: Loglinear
analysis (covered in your Howell text)
 Categorical X Ordinal outcomes
– Tests of linear associations
– Correlational approach (see Howell 10.4)
Effect Size
 2X2
 d family measures of difference
– Relative risk
– Odds ratio
   r family measures of association
– Phi and Cramer’s Phi
   Measure of agreement
– Kappa
Summary
   While you may see the chi-square statistic
used frequently, the chi-squared tests are
increasingly less common
– The reason is that it is relatively rare that a
research question would only entail
categorical variables only
   However the tests are still viable for
descriptive and exploratory forays into
data, and often utilized as such

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 19 posted: 7/27/2012 language: pages: 11
How are you planning on using Docstoc?