VIEWS: 28 PAGES: 23 POSTED ON: 3/26/2012 Public Domain
Statistics Chapter 13: Categorical Data Analysis Where We’ve Been Presented methods for making inferences about the population proportion associated with a two-level qualitative variable (i.e., a binomial variable) Presented methods for making inferences about the difference between two binomial proportions McClave, Statistics, 11th ed. Chapter 13: 2 Categorical Data Analysis Where We’re Going Discuss qualitative (categorical) data with more than two outcomes Present a chi-square hypothesis test for comparing the category proportions associated with a single qualitative variable – called a one-way analysis Present a chi-square hypothesis test relating two qualitative variables – called a two-way analysis McClave, Statistics, 11th ed. Chapter 13: 3 Categorical Data Analysis 13.1: Categorical Data and the Multinomial Experiment Properties of the Multinomial Experiment 1. The experiment consists of n identical trials. 2. There are k possible outcomes (called classes, categories or cells) to each trial. 3. The probabilities of the k outcomes, denoted by p1, p2, …, pk, where p1+ p2+ … + pk = 1, remain the same from trial to trial. 4. The trials are independent. 5. The random variables of interest are the cell counts n1, n2, …, nk of the number of observations that fall into each of the k categories. McClave, Statistics, 11th ed. Chapter 13: 4 Categorical Data Analysis 13.2: Testing Categorical Probabilities: One-Way Table Suppose three candidates are running for office, and 150 voters are asked their preferences. Candidate 1 is the choice of 61 voters. Candidate 2 is the choice of 53 voters. Candidate 3 is the choice of 36 voters. Do these data suggest the population may prefer one candidate over the others? McClave, Statistics, 11th ed. Chapter 13: 5 Categorical Data Analysis 13.2: Testing Categorical Probabilities: One-Way Table Candidate 1 is the H 0 : p1 p2 p3 1 3 No preference choice of 61 voters. H a : At least one of the proprtions exceeds 1 3 E (Number of votes for each candidate| H 0 ) 150 50 Candidate 2 is the 3 choice of 53 voters. E1 E2 E3 50 A chi-square ( 2 ) test is used to test H 0 . Candidate 3 is the [n1 E1 ]2 [n2 E2 ]2 [n3 E3 ]2 choice of 36 voters. 2 E1 E2 E3 n =150 [61 50]2 [53 50]2 [36 50]2 2 6.52 50 50 50 .05,df 2 5.99147 2 McClave, Statistics, 11th ed. Chapter 13: 6 Categorical Data Analysis 13.2: Testing Categorical Probabilities: One-Way Table Reject the null hypothesis McClave, Statistics, 11th ed. Chapter 13: 7 Categorical Data Analysis 13.2: Testing Categorical Probabilities: One-Way Table Test of a Hypothesis about Multinomial Probabilities: One-Way Table H0: p1 = p1,0, p2 = p2,0, … , pk = pk,0 where p1,0, p2,0, …, pk,0 represent the hypothesized values of the multinomial probabilities Ha: At least one of the multinomial probabilities does not equal its hypothesized value [ni Ei ]2 Rejection region: 2 , 2 Test statistic: 2 Ei with (k-1) df. where Ei = np1,0, is the expected cell count given the null hypothesis. McClave, Statistics, 11th ed. Chapter 13: 8 Categorical Data Analysis 13.2: Testing Categorical Probabilities: One-Way Table Conditions Required for a Valid 2 Test: One-Way Table 1. A multinomial experiment has been conducted. 2. The sample size n will be large enough so that, for every cell, the expected cell count E(ni) will be equal to 5 or more. McClave, Statistics, 11th ed. Chapter 13: 9 Categorical Data Analysis 13.2: Testing Categorical Probabilities: One-Way Table Example 13.2: Distribution of Opinions About Marijuana Possession Before Television Series has Aired Legalization Decriminalization Existing Law No Opinion 7% 18% 65% 10% Table 13.2: Distribution of Opinions About Marijuana Possession After Television Series has Aired Legalization Decriminalization Existing Law No Opinion 39 99 336 26 McClave, Statistics, 11th ed. Chapter 13: 10 Categorical Data Analysis 13.2: Testing Categorical Probabilities: One-Way Table McClave, Statistics, 11th ed. Chapter 13: 11 Categorical Data Analysis 13.2: Testing Categorical Probabilities: One-Way Table Expected Distribution of 500 Opinions About Marijuana Possession After Television Series has Aired Legalization Decriminalization Existing Law No Opinion 500(.07)=35 500(.18)=90 500(.65)=325 500(.10)=50 H 0 : p1 .07, p2 .18, p3 .65, p4 .10 H a : At least one of the proportions differs from its null hypothesis value. [ni Ei ]2 Test statistic: 2 Ei Rejection region: 2 .01,df 3 11.3449 2 McClave, Statistics, 11th ed. Chapter 13: 12 Categorical Data Analysis 13.2: Testing Categorical Probabilities: One-Way Table Expected Distribution of 500 Opinions About Marijuana Possession After Television Series has Aired Legalization Decriminalization Existing Law No Opinion 500(.07)=35 500(.18)=90 500(.65)=325 500(.10)=50 Rejection region: 2 .01,df 3 11.3449 2 (39 35) 2 (99 90) 2 (336 325) 2 (26 50) 2 2 35 90 325 50 2 13.249 McClave, Statistics, 11th ed. Chapter 13: 13 Categorical Data Analysis 13.2: Testing Categorical Probabilities: One-Way Table Expected Distribution of 500 Opinions About Marijuana Possession After Television Series has Aired Legalization Decriminalization Existing Law No Opinion 500(.07)=35 500(.18)=90 500(.65)=325 500(.10)=50 Rejection region: 2 .01,df 3 11.3449 2 (39 35) 2 (99 90) 2 (336 325) 2 (26 50) 2 2 35 90 325 50 2 13.249 Reject the null hypothesis McClave, Statistics, 11th ed. Chapter 13: 14 Categorical Data Analysis 13.2: Testing Categorical Probabilities: One-Way Table Inferences can be made on any single proportion as well: 95% confidence interval on the proportion of citizens in the viewing area with no opinion is p4 1.96 p4 ˆ ˆ n4 26 where p4 ˆ .052 n 500 p4 (1 p4 ) ˆ ˆ .052(.948) and p4 ˆ .0099 n 500 p4 1.96 p4 .052 1.96(.0099) .052 .019 ˆ ˆ McClave, Statistics, 11th ed. Chapter 13: 15 Categorical Data Analysis 13.3: Testing Categorical Probabilities: Two-Way Table Chi-square analysis can also be used to investigate studies based on qualitative factors. Does having one characteristic make it more/less likely to exhibit another characteristic? McClave, Statistics, 11th ed. Chapter 13: 16 Categorical Data Analysis 13.3: Testing Categorical Probabilities: Two-Way Table The columns are divided according to the subcategories for one qualitative variable and the rows for the other qualitative variable. Column 1 2 c Row Totals 1 n11 n12 n1c R1 Row 2 n21 n22 n2c R2 r nr1 nr2 nrc Rr Column Totals C1 C1 C1 n McClave, Statistics, 11th ed. Chapter 13: 17 Categorical Data Analysis 13.3: Testing Categorical Probabilities: Two-Way Table General Form of a Two-way (Contigency) Table Analysis: A Test for Independence H 0 : The two classifications are independent H a : The two classifications are dependent [nij Eij ]2 Test statistic: 2 Eij Ri C j where Eij n and Ri total for row i, C j total for row j , n sample size Rejection region: 2 , df = ( r 1)(c 1) 2 McClave, Statistics, 11th ed. Chapter 13: 18 Categorical Data Analysis 13.3: Testing Categorical Probabilities: Two-Way Table The results of a survey regarding marital status and religious affiliation are reported below (Example 13.3 in the text). Religious Affiliation A B C D None Totals Divorced 39 19 12 28 18 116 Marital Married, never 172 61 44 70 37 384 Status divorced Totals 211 80 56 98 55 500 H0: Marital status and religious affiliation are independent Ha: Marital status and religious affiliation are dependent McClave, Statistics, 11th ed. Chapter 13: 19 Categorical Data Analysis 13.3: Testing Categorical Probabilities: Two-Way Table The expected frequencies (see Figure 13.4) are included below: Religious Affiliation A B C D None Totals Divorced 39 19 12 28 18 116 (48.95) (18.56) (12.99) (27.74) (12.76) Marital Status Married, 172 61 44 70 37 384 never (162.05) (61.44) (43.01) (75.26) (42.24) divorced Totals 211 80 56 98 55 500 The chi-square value computed with SAS is 7.1355, with p-value = .1289. Even at the = .10 level, we cannot reject the null hypothesis. McClave, Statistics, 11th ed. Chapter 13: 20 Categorical Data Analysis 13.3: Testing Categorical Probabilities: Two-Way Table McClave, Statistics, 11th ed. Chapter 13: 21 Categorical Data Analysis 13.4: A Word of Caution About Chi-Square Tests Relative ease of use Misuse and misinterpretation Widespread applications McClave, Statistics, 11th ed. Chapter 13: 22 Categorical Data Analysis 13.4: A Word of Caution About Chi-Square Tests Be sure McClave, Statistics, 11th ed. Chapter 13: 23 Categorical Data Analysis