Chapter 13 by 1u9WTsx

VIEWS: 28 PAGES: 23

									Statistics


Chapter 13: Categorical Data Analysis
Where We’ve Been
   Presented methods for making inferences
    about the population proportion associated
    with a two-level qualitative variable (i.e., a
    binomial variable)
   Presented methods for making inferences
    about the difference between two binomial
    proportions



                McClave, Statistics, 11th ed. Chapter 13:   2
                      Categorical Data Analysis
Where We’re Going
   Discuss qualitative (categorical) data with
    more than two outcomes
   Present a chi-square hypothesis test for
    comparing the category proportions
    associated with a single qualitative variable
    – called a one-way analysis
   Present a chi-square hypothesis test relating
    two qualitative variables – called a two-way
    analysis
                 McClave, Statistics, 11th ed. Chapter 13:   3
                       Categorical Data Analysis
     13.1: Categorical Data and the
     Multinomial Experiment
   Properties of the Multinomial Experiment
    1. The experiment consists of n identical trials.
    2. There are k possible outcomes (called classes,
       categories or cells) to each trial.
    3. The probabilities of the k outcomes, denoted by p1, p2,
       …, pk, where p1+ p2+ … + pk = 1, remain the same from
       trial to trial.
    4. The trials are independent.
    5. The random variables of interest are the cell counts n1,
       n2, …, nk of the number of observations that fall into
       each of the k categories.
                         McClave, Statistics, 11th ed. Chapter 13:   4
                               Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
   Suppose three candidates are running
    for office, and 150 voters are asked
    their preferences.
       Candidate 1 is the choice of 61 voters.
       Candidate 2 is the choice of 53 voters.
       Candidate 3 is the choice of 36 voters.
   Do these data suggest the population
    may prefer one candidate over the
    others?       McClave, Statistics, 11th ed. Chapter 13:   5
                        Categorical Data Analysis
    13.2: Testing Categorical
    Probabilities: One-Way Table
Candidate 1 is the     H 0 : p1  p2  p3               1
                                                             3    No preference
choice of 61 voters.   H a : At least one of the proprtions exceeds                1
                                                                                       3

                       E (Number of votes for each candidate| H 0 )  150  50
Candidate 2 is the                                                       3
choice of 53 voters.   E1  E2  E3  50
                       A chi-square ( 2 ) test is used to test H 0 .
Candidate 3 is the
                           [n1  E1 ]2 [n2  E2 ]2 [n3  E3 ]2
choice of 36 voters.    
                         2
                                                 
                               E1          E2          E3
n =150                         [61  50]2 [53  50]2 [36  50]2
                        
                         2
                                                               6.52
                                    50        50         50
                       .05,df  2  5.99147
                         2




                             McClave, Statistics, 11th ed. Chapter 13:                     6
                                   Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table

                                     Reject the null
                                     hypothesis




           McClave, Statistics, 11th ed. Chapter 13:   7
                 Categorical Data Analysis
  13.2: Testing Categorical
  Probabilities: One-Way Table
          Test of a Hypothesis about Multinomial Probabilities:
                               One-Way Table
H0: p1 = p1,0, p2 = p2,0, … , pk = pk,0
    where p1,0, p2,0, …, pk,0 represent the hypothesized values of the multinomial
    probabilities
Ha: At least one of the multinomial probabilities does not equal its
    hypothesized value
                          [ni  Ei ]2   Rejection region:  2   ,
                                                                  2

  Test statistic:  2  
                                    Ei                                     with (k-1) df.
    where Ei = np1,0, is the expected cell count given the null hypothesis.


                               McClave, Statistics, 11th ed. Chapter 13:                    8
                                     Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
              Conditions Required for a Valid                        2   Test:
                          One-Way Table

1.   A multinomial experiment has been conducted.
2.   The sample size n will be large enough so that, for every cell,
     the expected cell count E(ni) will be equal to 5 or more.




                         McClave, Statistics, 11th ed. Chapter 13:               9
                               Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
      Example 13.2: Distribution of Opinions About Marijuana
         Possession Before Television Series has Aired

Legalization    Decriminalization                Existing Law        No Opinion
    7%                 18%                               65%            10%

         Table 13.2: Distribution of Opinions About Marijuana
            Possession After Television Series has Aired

Legalization    Decriminalization                Existing Law        No Opinion
    39                  99                                336           26




                         McClave, Statistics, 11th ed. Chapter 13:                10
                               Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table




           McClave, Statistics, 11th ed. Chapter 13:   11
                 Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
      Expected Distribution of 500 Opinions About Marijuana
          Possession After Television Series has Aired

Legalization     Decriminalization               Existing Law          No Opinion
500(.07)=35         500(.18)=90                   500(.65)=325         500(.10)=50

              H 0 : p1  .07, p2  .18, p3  .65, p4  .10
              H a : At least one of the proportions differs
                   from its null hypothesis value.
                                    [ni  Ei ]2
              Test statistic:   
                                  2

                                        Ei
              Rejection region:  2   .01,df 3  11.3449
                                       2



                           McClave, Statistics, 11th ed. Chapter 13:                 12
                                 Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
      Expected Distribution of 500 Opinions About Marijuana
          Possession After Television Series has Aired

Legalization   Decriminalization               Existing Law          No Opinion
500(.07)=35        500(.18)=90                  500(.65)=325         500(.10)=50


      Rejection region:  2   .01,df 3  11.3449
                               2


            (39  35) 2 (99  90) 2 (336  325) 2 (26  50) 2
       
        2
                                               
                35          90          325           50
       2  13.249




                         McClave, Statistics, 11th ed. Chapter 13:                 13
                               Categorical Data Analysis
13.2: Testing Categorical
Probabilities: One-Way Table
      Expected Distribution of 500 Opinions About Marijuana
          Possession After Television Series has Aired

Legalization   Decriminalization                Existing Law          No Opinion
500(.07)=35        500(.18)=90                   500(.65)=325         500(.10)=50


      Rejection region:  2   .01,df 3  11.3449
                               2


            (39  35) 2 (99  90) 2 (336  325) 2 (26  50) 2
       
        2
                                               
                35          90          325           50
       2  13.249
                            Reject the null
                            hypothesis


                          McClave, Statistics, 11th ed. Chapter 13:                 14
                                Categorical Data Analysis
    13.2: Testing Categorical
    Probabilities: One-Way Table
   Inferences can be made on any single proportion as well:
       95% confidence interval on the proportion of citizens in the
        viewing area with no opinion is

             p4  1.96 p4
             ˆ          ˆ

                          n4     26
             where p4 
                   ˆ                  .052
                          n 500
                           p4 (1  p4 )
                           ˆ       ˆ       .052(.948)
             and  p4 
                    ˆ                                 .0099
                                n             500
             p4  1.96 p4  .052  1.96(.0099)  .052  .019
              ˆ         ˆ



                             McClave, Statistics, 11th ed. Chapter 13:   15
                                   Categorical Data Analysis
13.3: Testing Categorical
Probabilities: Two-Way Table

   Chi-square analysis can also be used
    to investigate studies based on
    qualitative factors.
       Does having one characteristic make it
        more/less likely to exhibit another
        characteristic?



                  McClave, Statistics, 11th ed. Chapter 13:   16
                        Categorical Data Analysis
13.3: Testing Categorical
Probabilities: Two-Way Table
The columns are divided according to the subcategories for one
qualitative variable and the rows for the other qualitative variable.
                                    Column
                       1            2
                                                                      c     Row Totals

                 1     n11          n12
                                                                      n1c   R1

Row              2     n21          n22
                                                                      n2c   R2

                                                                         
                 r     nr1          nr2
                                                                      nrc   Rr

Column Totals          C1           C1                                 C1    n


                           McClave, Statistics, 11th ed. Chapter 13:                      17
                                 Categorical Data Analysis
13.3: Testing Categorical
Probabilities: Two-Way Table
General Form of a Two-way (Contigency) Table Analysis:
                  A Test for Independence
H 0 : The two classifications are independent
H a : The two classifications are dependent
                          [nij  Eij ]2
Test statistic:  2  
                                 Eij
              Ri C j
where Eij 
              n
and Ri  total for row i, C j  total for row j , n  sample size
Rejection region:  2   , df = ( r  1)(c  1)
                         2


                          McClave, Statistics, 11th ed. Chapter 13:   18
                                Categorical Data Analysis
      13.3: Testing Categorical
      Probabilities: Two-Way Table
      The results of a survey regarding marital status and
       religious affiliation are reported below (Example
       13.3 in the text).
                                  Religious Affiliation
                                      A            B            C         D    None   Totals
          Divorced                   39           19           12         28    18     116
Marital
          Married, never            172           61           44         70    37     384
Status
          divorced
          Totals                    211           80           56         98    55     500

          H0: Marital status and religious affiliation are independent
          Ha: Marital status and religious affiliation are dependent
                              McClave, Statistics, 11th ed. Chapter 13:                        19
                                    Categorical Data Analysis
              13.3: Testing Categorical
              Probabilities: Two-Way Table
              The expected frequencies (see Figure 13.4) are
               included below:
                                         Religious Affiliation

                             A           B                   C                   D    None      Totals
              Divorced       39        19                 12                 28         18       116
                           (48.95)   (18.56)            (12.99)            (27.74)    (12.76)
Marital
Status        Married,      172        61                 44                 70         37       384
              never       (162.05)   (61.44)            (43.01)            (75.26)    (42.24)
              divorced
              Totals        211         80                  56                   98     55       500

              The chi-square value computed with SAS is 7.1355, with p-value = .1289.
                   Even at the = .10 level, we cannot reject the null hypothesis.
                                     McClave, Statistics, 11th ed. Chapter 13:                      20
                                           Categorical Data Analysis
13.3: Testing Categorical
Probabilities: Two-Way Table




           McClave, Statistics, 11th ed. Chapter 13:   21
                 Categorical Data Analysis
13.4: A Word of Caution About
Chi-Square Tests

    Relative
    ease of
     use


                                             Misuse and
                                           misinterpretation

   Widespread
   applications


                  McClave, Statistics, 11th ed. Chapter 13:    22
                        Categorical Data Analysis
13.4: A Word of Caution About
Chi-Square Tests
 Be sure




           McClave, Statistics, 11th ed. Chapter 13:   23
                 Categorical Data Analysis

								
To top