Using Cross Tabulation

Reviews
Shared by: Sherwin Steffin
Stats
views:
158
rating:
not rated
reviews:
0
posted:
6/26/2008
language:
English
pages:
0
Abstract Recently, OpEd News, a liberal/progressive political site, once again distinguished itself from its many competitors by making it possible for its members to construct simple polls , and then to review how the answers to these questions are associated with some essential demographic characteristics of its readership. As the results are currently presented, they can very confusing, with many readers experiencing difficulty in drawing useful or accurate inferences from the presented data. This article is designed to walk the reader through the process of decoding and interpreting what can be, an often confusing, and, at times, an indecipherable presentation of the information. For illustrative purposes, we will be looking at a simple poll constructed by the OEN Managing Editor, Rob Kall. The Poll The “Good Guy/Jerk” answer pair, we call the y or Dependent Variable. The first question requiring our attention is whether, after adding another 99 votes, the 45/55 ratio we see here is going to remain the same, or whether it will change substantially. In other words, we want to know whether the answer reported above is reliable. Of equal interest is the question, Are there any characteristics of the respondents which affect the answer shown above?” To do this, we look at a number of demographic characteristics which may give us an answer to this question. The first of these is the Gender of respondents. For any of a number of possible reasons, only 77 respondents provided us with information about their gender. As you note, almost exactly the same numbers were found for both males and females with respect to the y variable. Unanswered is the question of why many more males than females chose to answer select a y-variable choice. Is this because the question was of greater interest to males than females, or because the readership of this site is composed of many more male readers than female readers? To determine this, we have to look at a number of polls, to determine whether this ratio is maintained or is specific to this site. The table below reveals the results of this test: It would appear that women are not underrepresented, but rather, their participation varies with their engagement with a specific poll. Among the polls receiving the most response was a question related to the editorial posture which should be maintained with respect to story submission related to UFO and reported contacts with Extraterrestrial beings. Here is an abbreviated description of choices available to Respondents, against the level of Education completed. Looking at it in this format, it is evident that the most popular choice was to permit such articles, since they are in keeping with OEN editorial philosophy of openness to all kinds of information, be it political, or other areas of readers lives. Education Row/Column No UFO Stories Limit to Inside Pages Fit with OEN Goals Good to Attract New Only Political Stories Other Col Total No College 0 1 7 5 0 0 13 Some College 2 6 35 10 8 4 65 Bachelors Degree 8 6 49 11 18 7 99 Advanced Degree 10 5 26 10 3 0 54 Row Total 20 18 117 36 29 11 231 There remains much more that can be derived from this data, and it is to these other questions that we now wish to direct our attention. Before posing these questions, we must first go through some calculations to answer them. Calculations There are 4 calculations which must be performed, necessary to answering these questions: Expected Values The first quantitative question to be addressed is, How likely is it that, if asked, another group of OEN members given the same Question, and categorizing their educational background, similar numbers would appear in each of the table cells? For those who have been long removed from even the most modest of requirements to use algebra concepts, or constructing formulae in a spreadsheet, the need for, and the process necessary to get an answer to this question may at first appear a challenge beyond the point of interest. Perhaps we can reduce it to a real-life example of a more familiar everyday problem: You have invited 100 of your guy friends to a July 4 th Barbeque. You expect that roughly 80 of them will bring a spouse or current girl friend as a guest. You also expect that some of the women will prefer a vegetarian menu, while the men will much prefer meat-based servings. Guys being guys, almost all will pick the “he-man” meat selection for themselves, and not really knowing what their guests prefer, will probably suggest that most of the women will prefer the more “feminine” vegetable servings. Thus, you get this report from the invited men: REPORTED Row/Column Meat Vegetables Col Total Gender Men Women 90 10 10 70 100 80 Row Total 100 80 180 While you know that this report is likely to be inaccurate, how do you calculate the proportions of meat and vegetable servings you should make available to the two groups? To do this we employ this formula: Cellrc = (Totalr/Totalrc) x Totalc . Statistically, you get this very different result: Why? We assume that the Row Totals and Column Totals have to be correct, since they represent the numbers of responses provided by those who are going to be present at the event. But the numbers in the cells that compose each row and column total are proportional to the events that would occur by chance. That is to say, if we repeated this event many times with the same Row and Column totals, but with different people attending the events, these EXPECTED VALUES become increasingly likely to occur. Residuals Recall, that in this example we are trying to predict the y-dependent (Meat vs. Vegetable) variable from the x- independent (men vs. women) variable. The larger the difference between the Observed and Expected value, in each cell, the greater is our confidence that the x-variable is associated or correlated with predicting the yvariable. Thus, we need to calculate these RESIDUALS or differences between OBSERVED and EXPECTED Values. Since, the next step is going to require that all of the Residual values are positive, we cannot simply subtract EXPECTED from OBSERVED. Instead we must use this formula: Rather than applying EXPECTED and RESIDUAL calculations to the sample, let us now apply them to the UFO vs. Education problem, the actual focus of our investigation. Recall that this was the table of OBSERVED Frequencies: OBSERVED Row/Column No UFO Stories Limit to Inside Pages Fit with OEN Goals Good to Attract New Only Political Stories Other Col Total EXPECTED Row/Column No UFO Stories Limit to Inside Pages Fit with OEN Goals Good to Attract New No College 1.13 1.01 6.58 2.03 Some College 5.63 5.06 32.92 10.13 No College 0 1 7 5 0 0 13 Some College 2 6 35 10 8 4 65 Education Bachelors Degree 8 6 49 11 18 7 99 Education Bachelors Degree 8.57 7.71 50.14 15.43 Advanced Degree 4.68 4.21 27.35 8.42 Row Total 20 18 117 36 Advanced Degree 10 5 26 10 3 0 54 Row Total 20 18 117 36 29 11 231 After applying the calculation shown above, here are the EXPECTED frequencies: Only Political Stories Other Col Total 1.63 0.62 13 8.16 3.10 65 12.43 4.71 99 6.78 2.57 54 29 11 231 Next, the RESIDUALS are calculated: RESIDUALS Row/Column No UFO Stories Limit to Inside Pages Fit with OEN Goals Good to Attract New Only Political Stories Other Col Total No College 1.13 0.00 0.03 4.37 1.63 0.62 7.77 Some College 2.34 0.17 0.13 0.00 0.00 0.26 2.91 Education Bachelors Degree 0.04 0.38 0.03 1.27 2.50 1.11 5.32 Advanced Degree 6.06 0.15 0.07 0.30 2.11 2.57 11.26 Row Total 9.57 0.70 0.25 5.94 6.24 4.56 27.26 Notice that the Sum of all the residuals appears in the Lower Right Corner of the table. This value is called the Chi-Squared Value, or the “Critical Ratio.” It is used to determine whether or not the Residuals are Statistically Significant. What we are asking here is. whether the differences (Residuals) in each cell have occurred by chance, or whether they actually represent an important association between the x and y variables. When this association is large enough, we can predict that cells with the biggest or smallest Residuals will a high relationship to the associated y-value. To calculate this level of association between x and y, we first calculate Degrees of Freedom (df). This number is (Rows-1) x (Columns -1) In the table above, the df = (51) x (4-1) = 12. To find the significance level, go to this site: http://www.psychstat.missouristate.edu/introbook/chisq.htm It will look like this: Notice that the Critical Value (27.26) is greater then the value appearing under the .01 Box. You can interpret this to mean that there is Less then 1 chance in 100 that there is no association between the readers’ choices and their level of education. After All This Work, So What! Admittedly, setting up and calculating al of these values take a lot of work… especially when you consider that there are many demographics shown for each poll. Some of the demographics have so many elements that finding any significant association with a poll question is extremely unlikely. As but one example, the religion demographic contains 27 categories. With OEN Management’s concern for both innovation and value, as the polling process is refined, we can hopefully expect the following to eventually become a part of the reporting process. 1. Management will automate the calculation process, and provide the same calculations which have been explained here. 2. Those demographics lacking statistical significance will simply not be displayed. 3. Demographics containing an excessive number of elements, will be collapsed or combined to a smaller number of categories, such that they will yield useful data. Interpreting the Results RESIDUALS Row/Column No UFO Stories Limit to Inside Pages Fit with OEN Goals Good to Attract New Only Political Stories Other Col Total No College 1.13 0.00 0.03 4.37 1.63 0.62 7.77 Some College 2.34 0.17 0.13 0.00 0.00 0.26 2.91 Education Bachelors Degree 0.04 0.38 0.03 1.27 2.50 1.11 5.32 Advanced Degree 6.06 0.15 0.07 0.30 2.11 2.57 11.26 Row Total 9.57 0.70 0.25 5.94 6.24 4.56 27.26 Returning to our RESIDUALS table, we see something quite unusual. All four categories of the EDUCATION demographic show extremely close agreement between the OBSERVED and the EXPECTED values for just one choice of the five available in the y-variable (Poll Question). Regardless of Education, there is wide agreement that the insertion of UFO stories is consistent with the overall editorial goals of this site. Having previously established the reliability (<.01 chance of being wrong) of getting a different result with another sample of site users, we can have great confidence that this represents the sentiment of the majority of OEN readers. While an inspection of the OBSERVED counts would yield the same inference, this would not be the case when sentiment is less universal. Interestingly, inspection of other demographic attributes of the readers yield the same results – it is the preferred option in almost all cases. Moreover, as predicted, it has held constant as the n (number of respondents) has climbed in he last few days from 231 to 293.

Shared by: Sherwin Steffin
About
I am a retired educator, educational software designer and publisher, and data analyst. My working career came to an abrupt end, when at the age of 68, I was laid off in a mass RIF, (Reduction in Force) in April, 2003. Trai (More...)
Other docs by Sherwin Steffi...
CA Labor Law Resource
Views: 75  |  Downloads: 1
Biography of Matthew Denos
Views: 60  |  Downloads: 0
Religion and the Dumbing of America
Views: 70  |  Downloads: 3
EMPLOYMENT AGREEMENT TEMPLATE Non-Exempt
Views: 287  |  Downloads: 15
Qualifying as an Independent Contractor
Views: 151  |  Downloads: 5
Do it Yourself Data Mining
Views: 147  |  Downloads: 19
09-0407 What's wrong with American Education
Views: 19  |  Downloads: 1
It's All in the Brain
Views: 96  |  Downloads: 3
BTTJ-ConceptNet
Views: 56  |  Downloads: 0
mlrtab1
Views: 3  |  Downloads: 0
DLSE Plaintiff's Hearing Brief
Views: 137  |  Downloads: 3
Severance Waiver and Release
Views: 1427  |  Downloads: 42
Related docs
MO6 Instruction Tabulation Sheet
Views: 0  |  Downloads: 0
ZIP Code Tabulation Areas For Census 2000
Views: 1  |  Downloads: 0
The Cross-Cut
Views: 14  |  Downloads: 0
Parables of the Cross
Views: 13  |  Downloads: 0
Cross-National-Comparison
Views: 3  |  Downloads: 0
Cross The Sign of the Cross
Views: 4  |  Downloads: 0