# Chapter 13 by 1u9WTsx

VIEWS: 28 PAGES: 23

• pg 1
```									Statistics

Chapter 13: Categorical Data Analysis
Where We’ve Been
   Presented methods for making inferences
with a two-level qualitative variable (i.e., a
binomial variable)
   Presented methods for making inferences
about the difference between two binomial
proportions

McClave, Statistics, 11th ed. Chapter 13:   2
Categorical Data Analysis
Where We’re Going
   Discuss qualitative (categorical) data with
more than two outcomes
   Present a chi-square hypothesis test for
comparing the category proportions
associated with a single qualitative variable
– called a one-way analysis
   Present a chi-square hypothesis test relating
two qualitative variables – called a two-way
analysis
McClave, Statistics, 11th ed. Chapter 13:   3
Categorical Data Analysis
13.1: Categorical Data and the
Multinomial Experiment
   Properties of the Multinomial Experiment
1. The experiment consists of n identical trials.
2. There are k possible outcomes (called classes,
categories or cells) to each trial.
3. The probabilities of the k outcomes, denoted by p1, p2,
…, pk, where p1+ p2+ … + pk = 1, remain the same from
trial to trial.
4. The trials are independent.
5. The random variables of interest are the cell counts n1,
n2, …, nk of the number of observations that fall into
each of the k categories.
McClave, Statistics, 11th ed. Chapter 13:   4
Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
   Suppose three candidates are running
for office, and 150 voters are asked
their preferences.
   Candidate 1 is the choice of 61 voters.
   Candidate 2 is the choice of 53 voters.
   Candidate 3 is the choice of 36 voters.
   Do these data suggest the population
may prefer one candidate over the
others?       McClave, Statistics, 11th ed. Chapter 13:   5
Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
Candidate 1 is the     H 0 : p1  p2  p3               1
3    No preference
choice of 61 voters.   H a : At least one of the proprtions exceeds                1
3

E (Number of votes for each candidate| H 0 )  150  50
Candidate 2 is the                                                       3
choice of 53 voters.   E1  E2  E3  50
A chi-square ( 2 ) test is used to test H 0 .
Candidate 3 is the
[n1  E1 ]2 [n2  E2 ]2 [n3  E3 ]2
choice of 36 voters.    
2
           
E1          E2          E3
n =150                         [61  50]2 [53  50]2 [36  50]2
 
2
                     6.52
50        50         50
.05,df  2  5.99147
2

McClave, Statistics, 11th ed. Chapter 13:                     6
Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table

Reject the null
hypothesis

McClave, Statistics, 11th ed. Chapter 13:   7
Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
Test of a Hypothesis about Multinomial Probabilities:
One-Way Table
H0: p1 = p1,0, p2 = p2,0, … , pk = pk,0
where p1,0, p2,0, …, pk,0 represent the hypothesized values of the multinomial
probabilities
Ha: At least one of the multinomial probabilities does not equal its
hypothesized value
[ni  Ei ]2   Rejection region:  2   ,
2

Test statistic:  2  
Ei                                     with (k-1) df.
where Ei = np1,0, is the expected cell count given the null hypothesis.

McClave, Statistics, 11th ed. Chapter 13:                    8
Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
Conditions Required for a Valid                        2   Test:
One-Way Table

1.   A multinomial experiment has been conducted.
2.   The sample size n will be large enough so that, for every cell,
the expected cell count E(ni) will be equal to 5 or more.

McClave, Statistics, 11th ed. Chapter 13:               9
Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
Example 13.2: Distribution of Opinions About Marijuana
Possession Before Television Series has Aired

Legalization    Decriminalization                Existing Law        No Opinion
7%                 18%                               65%            10%

Table 13.2: Distribution of Opinions About Marijuana
Possession After Television Series has Aired

Legalization    Decriminalization                Existing Law        No Opinion
39                  99                                336           26

McClave, Statistics, 11th ed. Chapter 13:                10
Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table

McClave, Statistics, 11th ed. Chapter 13:   11
Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
Expected Distribution of 500 Opinions About Marijuana
Possession After Television Series has Aired

Legalization     Decriminalization               Existing Law          No Opinion
500(.07)=35         500(.18)=90                   500(.65)=325         500(.10)=50

H 0 : p1  .07, p2  .18, p3  .65, p4  .10
H a : At least one of the proportions differs
from its null hypothesis value.
[ni  Ei ]2
Test statistic:   
2

Ei
Rejection region:  2   .01,df 3  11.3449
2

McClave, Statistics, 11th ed. Chapter 13:                 12
Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
Expected Distribution of 500 Opinions About Marijuana
Possession After Television Series has Aired

Legalization   Decriminalization               Existing Law          No Opinion
500(.07)=35        500(.18)=90                  500(.65)=325         500(.10)=50

Rejection region:  2   .01,df 3  11.3449
2

(39  35) 2 (99  90) 2 (336  325) 2 (26  50) 2
 
2
                        
35          90          325           50
 2  13.249

McClave, Statistics, 11th ed. Chapter 13:                 13
Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
Expected Distribution of 500 Opinions About Marijuana
Possession After Television Series has Aired

Legalization   Decriminalization                Existing Law          No Opinion
500(.07)=35        500(.18)=90                   500(.65)=325         500(.10)=50

Rejection region:  2   .01,df 3  11.3449
2

(39  35) 2 (99  90) 2 (336  325) 2 (26  50) 2
 
2
                        
35          90          325           50
 2  13.249
Reject the null
hypothesis

McClave, Statistics, 11th ed. Chapter 13:                 14
Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
   Inferences can be made on any single proportion as well:
   95% confidence interval on the proportion of citizens in the
viewing area with no opinion is

p4  1.96 p4
ˆ          ˆ

n4     26
where p4 
ˆ                  .052
n 500
p4 (1  p4 )
ˆ       ˆ       .052(.948)
and  p4 
ˆ                                 .0099
n             500
p4  1.96 p4  .052  1.96(.0099)  .052  .019
ˆ         ˆ

McClave, Statistics, 11th ed. Chapter 13:   15
Categorical Data Analysis
13.3: Testing Categorical
Probabilities: Two-Way Table

   Chi-square analysis can also be used
to investigate studies based on
qualitative factors.
   Does having one characteristic make it
more/less likely to exhibit another
characteristic?

McClave, Statistics, 11th ed. Chapter 13:   16
Categorical Data Analysis
13.3: Testing Categorical
Probabilities: Two-Way Table
The columns are divided according to the subcategories for one
qualitative variable and the rows for the other qualitative variable.
Column
1            2
              c     Row Totals

1     n11          n12
              n1c   R1

Row              2     n21          n22
              n2c   R2

                                                        
r     nr1          nr2
              nrc   Rr

Column Totals          C1           C1                                 C1    n

McClave, Statistics, 11th ed. Chapter 13:                      17
Categorical Data Analysis
13.3: Testing Categorical
Probabilities: Two-Way Table
General Form of a Two-way (Contigency) Table Analysis:
A Test for Independence
H 0 : The two classifications are independent
H a : The two classifications are dependent
[nij  Eij ]2
Test statistic:  2  
Eij
Ri C j
where Eij 
n
and Ri  total for row i, C j  total for row j , n  sample size
Rejection region:  2   , df = ( r  1)(c  1)
2

McClave, Statistics, 11th ed. Chapter 13:   18
Categorical Data Analysis
13.3: Testing Categorical
Probabilities: Two-Way Table
    The results of a survey regarding marital status and
religious affiliation are reported below (Example
13.3 in the text).
Religious Affiliation
A            B            C         D    None   Totals
Divorced                   39           19           12         28    18     116
Marital
Married, never            172           61           44         70    37     384
Status
divorced
Totals                    211           80           56         98    55     500

H0: Marital status and religious affiliation are independent
Ha: Marital status and religious affiliation are dependent
McClave, Statistics, 11th ed. Chapter 13:                        19
Categorical Data Analysis
13.3: Testing Categorical
Probabilities: Two-Way Table
    The expected frequencies (see Figure 13.4) are
included below:
Religious Affiliation

A           B                   C                   D    None      Totals
Divorced       39        19                 12                 28         18       116
(48.95)   (18.56)            (12.99)            (27.74)    (12.76)
Marital
Status        Married,      172        61                 44                 70         37       384
never       (162.05)   (61.44)            (43.01)            (75.26)    (42.24)
divorced
Totals        211         80                  56                   98     55       500

The chi-square value computed with SAS is 7.1355, with p-value = .1289.
Even at the = .10 level, we cannot reject the null hypothesis.
McClave, Statistics, 11th ed. Chapter 13:                      20
Categorical Data Analysis
13.3: Testing Categorical
Probabilities: Two-Way Table

McClave, Statistics, 11th ed. Chapter 13:   21
Categorical Data Analysis
13.4: A Word of Caution About
Chi-Square Tests

Relative
ease of
use

Misuse and
misinterpretation

applications

McClave, Statistics, 11th ed. Chapter 13:    22
Categorical Data Analysis
13.4: A Word of Caution About
Chi-Square Tests
Be sure

McClave, Statistics, 11th ed. Chapter 13:   23
Categorical Data Analysis

```
To top