chi square - PowerPoint by qjVIdpX

VIEWS: 40 PAGES: 21

									Survey of Statistical Methods




                                April 25, 2005
             Sample Problem
• Suppose a bill that proposes to lower the
  legal drinking age to 18 is pending before
  the state legislature. A political scientist
  is interested in determining whether
  there is an association between political
  affiliation and attitude toward the bill.
  He sends out a survey and receives
  answers from 400 Republicans and
  Democrats.
                  Data Collection
• A questionnaire was sent to 1,200 individuals.
  A total of 500 questionnaires were returned
  (42%), however due to incomplete data, the
  final sample size for this study was N=400.
• Political affiliation
   – Response to question #4 (Do you consider yourself to
     be Republican or Democrat?)
• Attitude toward the bill
   – Response to question #15 (How would your
     characterize your attitude toward lowering the drinking
     age to 18? For, Against, or Undecided?)
      Chi Square Test of Independence
• Purpose
  – To determine if two variables of interest independent (not related) or are
    related (dependent)?
  – When the variables are independent, we are saying that knowledge of one
    gives us no information about the other variable. When they are dependent,
    we are saying that knowledge of one variable is helpful in predicting the
    value of the other variable.
  – The chi-square test of independence is a test of the influence or impact that a
    subject‟s value on one variable has on the same subject‟s value for a second
    variable.
  – Some examples where one might use the chi-squared test of independence
    are:
      • Is level of education related to level of income?
      • Is the level of price related to the level of quality in production?
      • Is one party affiliation related to the person's preferred television network?

• Hypotheses
  – The null hypothesis is that the two variables are independent. This will be
    true if the observed counts in the sample are similar to the expected counts.
      • H0: X and Y are independent
      • H1: X and Y are dependent
                                     Displaying Independent and
                                      Dependent Relationships
                        When the variables are                                                    When group membership makes
                        independent, the proportion in                                            a difference, the dependent
                        both groups is close to the                                               relationship is indicated by one
                        same size as the proportion                                               group having a higher
                        for the total sample.                                                     proportion than the proportion
                                                                                                  for the total sample.



                               Independent Relationship                                                                             Dependent Relationship
                              betw een Gender and College                                                                         betw een Gender and College
Poportion Attending College




                                                                                                    Poportion Attending College
                              100%                                                                                                100%
                              80%                                                                                                 80%
                                                                                                                                          60%
                              60%                                                                                                 60%
                                      40%        40%          40%                                                                                          40%
                              40%                                                                                                 40%
                                                                                                                                                  20%
                              20%                                                                                                 20%
                               0%                                                                                                  0%
                                     Males     Females        Total                                                                      Males   Females   Total


                                        From: http://www.utexas.edu/courses/schwab/sw318_spring_2004/SolvingProblems/Class24_ChiSquareTestOfIndependencePostHoc.ppt
  Chi Square Test of Independence

• Wording of Research questions
  – Are X and Y independent?
  – Are X and Y related?
  – The research hypothesis states that the two
    variables are dependent or related. This will be
    true if the observed counts for the categories of
    the variables in the sample are different from
    the expected counts.

• Level of Measurement
  – Both X and Y are categorical
                     Assumptions
            Chi Square Test of Independence
• Each subject contributes data to only one cell

• Finite values
   – Observations must be grouped in categories. No assumption is made about
     level of data. Nominal, ordinal, or interval data may be used with chi-
     square tests.

• A sufficiently large sample size
   – In general N > 20.
   – No one accepted cutoff – the general rules are
       • No cells with observed frequency = 0
       • No cells with the expected frequency < 5
       • Applying chi-square to small samples exposes the researcher to an
         unacceptable rate of Type II errors.
     Note: chi-square must be calculated on actual count data, not substituting
     percentages, which would have the effect of pretending the sample size is
     100.
idno   political   voting
  1       1          1                Raw Data
  2       1          1
  3       1          2
                                  for 1st 20 subjects
  4       1          2                 (N=400)
  5       1          3
  6
  7
          1
          1
                     3
                     1
                            • Political
  8       1          1        1 = Republican
  9       1          2
 10       1          2
                              2 = Democrat
 11       2          3      • Voting
 12       2          3
 13       2          1
                              on a bill that proposes to lower
 14       2          1        the legal drinking age to 18
 15       2          2        1 = For
 16       2          2
 17       2          3        2 = Against
 18       2          3        3 = Undecided
 19       2          1
 20       2          1
               Setup for Analysis

• Research Question
  – Is there an association between political affiliation
    and attitude toward the bill?
• Statistical Hypotheses
  – H0: Political affiliation and attitude toward the bill
    are independent
  – H1: Political affiliation and attitude toward the bill
    are not independent
• Level of Significance
  – α = .05
   How to Compute the Chi Square Test of
         Independence using SPSS


• Analyze – Descriptive Statistics – Crosstabs

• Do not go to
  – Analyze – Nonparametric – Chi Square
  – This is a different type of Chi Square Test
                Set up in SPSS for a
          Chi Square Test of Independence
Analyze – Descriptive Statistics --
           Crosstabs
    Refer to handout of the SPSS output for
information about how to interpret the results to
        reach the following conclusions

  • There is a significant association between
    political affiliation and attitude toward
    the bill [2(2) = 6.0: p=.05].

  • More democrats are FOR the bill.

  • More republicans are AGAINST the bill.
                        120

                        100
     # of respondents
                         80

                         60
                         40

                         20
                                                            Democrat
                          0
                                                          Republican
                              For
                                    Against
                                              Undecided



Figure 1a. Histogram showing the association between
political affiliation and attitude toward the bill. A Chi-
Square Test of Independence revealed a significant
relationship between these variables [X2(2)=6.0: p=.05].
Democrats were found to have a more favorable attitude
toward the bill than Republicans.
                        120

                        100
     # of respondents
                         80

                         60
                         40

                         20
                                                            Democrat
                          0
                                                          Republican
                              For
                                    Against
                                              Undecided



Figure 1b. Histogram showing the association between
political affiliation and attitude toward the bill. A Chi-
Square Test of Independence revealed a significant
relationship between these variables [X2(2)=6.0: p=.05].
Republicans were found to have a more unfavorable
attitude toward the bill than Democrats.
• How to determine the Critical Region for
  the Test Statistic by hand
  – Utilizes the Chi Square Distribution
  – df = (r-1)*(c-1) = (2-1)*(3-1) = 1*2 = 2
df = 2

Critical X2 =
            = 5.991
 Post-Hoc analysis for a Chi Square Test of Independence
      Which Cell or Cells Caused the Difference

• You can conduct a post-hoc procedure only if the result of the chi-
  square test was statistically significant.
• Examination of percentages in the contingency table and expected
  frequency table can be misleading.
• The residual, or the difference, between the observed frequency
  and the expected frequency is a more reliable indicator.
• Notice the values labeled “standardized residual” that is computed
  for each cell. This value is a z-score.
• Compare the value for each standardized residual against the
  critical z-value for your α level.
• This is equivalent to testing the null hypothesis that the actual
  frequency equals the expected frequency for a specific cell versus
  the research hypothesis of a difference greater than zero.
• There can be 0, 1, 2, or more cells with statistically significant
  standardized residuals to be interpreted.
   Interpreting Standardized Residuals

• Standardized residuals that have a positive value
  mean that the cell was over-represented in the
  actual sample, compared to the expected
  frequency, i.e. there were more subjects in this
  category than we expected.

• Standardized residuals that have a negative value
  mean that the cell was under-represented in the
  actual sample, compared to the expected
  frequency, i.e. there were fewer subjects in this
  category than we expected.
               Post Hoc Strategy
   for the Chi Square Test of Independence

• If there is at least one cell with a significant
  standardized residual
   – Formulate your conclusion based on a comparison of all of
     the cells containing significant standardized residuals.


• If none of the cells have a significant standardized
  residual
   – Interpret the findings based on a comparison of the „sign (+
     or -)‟ of the largest values for the standardized residuals.
   – Apply caution when this is the case!
                  What is a Categorical Variable?
•   A categorical variable represents a set of discrete
    events, such as groups, decisions, or anything else that
    can be classified into categories. In contrast to a
    continuous variable, a value of a categorical variable
    indicates a discrete category, whereas a value of a
    continuous variable can fall on any point on a numeric
    continuum. One example of a categorical variable is a
    person's sex, which can be represented by two
    exhaustive and mutually exclusive categories: male and
    female. A categorical variable may also consist of more
    than two categories. For example, a person's major in
    college can be categorized as biology, history,
    engineering, psychology, etc.


•   A categorical variable can be ordered or unordered. For instance, a person's level of schooling is an ordered variable;
    a person's sex is an unordered variable. Although the levels of a categorical variable are often represented by
    numerals, these symbols are not interpreted numerically if the variable is unordered.

•   Categorical data are often presented in a contingency table which tabulates the number of observations that fall into
    each cell of the table. The table above is a simple 2 x 2 contingency table that crosstabulates whether graduate school
    applicants were accepted or rejected and whether they were male or female. Each cell represents a joint event which
    is a unique combination of the categorical variables. The crosstabulation of the graduate school's decision and
    applicant's gender results in four possible outcomes, or joint events. For example, in Table 1, the joint event
    representing rejected males contains 166 observations. Marginal events refer to the total number of observations for a
    category of a particular variable. Here, the marginal event for the category female is 245.




                                                                                    From: http://www.utexas.edu/cc/docs/stat57.html#variable

								
To top