# Statistics 2070

Document Sample

Statistics 2070
Introductory Statistics for the Social Sciences

Stephen Bieber
statout@uwyo.edu
Office: 334 Ross Hall
Phone: 307-766-4229
Office Hours:

1
Unit 1 – Statistics                                Pages 9-19
Terms                                       Page 9
Quotes                                      Pages 10-11
Examples                                    Pages 11-12
What are Statistics                         Pages 13-14
Research                                    Pages 14-17
Steps in the Research Process      Pages 15-16
Basic Research Questions           Pages 16-17
Article                                     Pages 18-19

Unit 2 – Variables and Measurement                 Pages 20-41
Part I: General Presentation                Pages 20-31
Terms                              Page 20
What is a Variable?                Page 21
What is Measurement?               Pages 21-22
Varies                             Page 22
Some Examples                      Pages 23-24
Levels of Measurement              Pages 24-27
Categorical                Page 25
Ordinal                    Pages 25-26
Ratio                      Page 26
“Ratio”                    Page 27
Quality of Measurement             Pages 27-28
Summarizing Example                Pages 28-30
Quiz                               Pages 30-31
Part II: Special Case – Survey Data         Pages 32-41
Surveys, Questionnaires, Polling   Page 32
Questions                          Page 33
Types of Questions                 Pages 33-34
Scaling                            Pages 34-36
Magnitude                  Page 34
Likert                     Page 35
Yes/No                     Page 35
Forced Choice              Page 36
Wording                            Pages 36-38
Simple Language            Pages 36-37
Be Neutral                 Page 37
No Double Barreled         Pages 37-38
Interview Method                   Pages 38-41
Face to Face               Pages 38-39
Telephone                  Page 39
Mail                       Page 40
Computer                   Pages 40-41
Polling                            Page 41

2
Unit 3 – Populations, Samples, and Sampling                        Pages 42-60
Terms                                                       Pages 42-44
Detailed Example                                            Pages 45-47
Sampling Methods                                            Pages 48-55
Simple Random Sampling                               Page 51
Stratified Random Sampling                           Page 52
Cluster Sampling                                     Page 53
Systematic Sampling                                  Page 54
Convenience Sampling                                 Page 55
Quiz                                                        Pages 56-57
Problems in Sampling                                        Page 58
Infamous Literary Digest Poll of 1936                       Page 59
Draft Lottery Example                                       Page 60

Unit 4 – Summaries and Summary Tables                              Pages 61-68
Terms                                                       Page 61
Example                                                     Pages 62-65
Enumeration                                           Page 62
Summary Statistics                                    Pages 63-64
Table Summarization                                   Pages 64-65
Quiz                                                        Pages 66-68

Unit 5 – Graphs                                                    Pages 69-89
Terms                                                       Page 69
Graphs of Categorical Variables                             Pages 70-73
Bar Graph                                           Page 71
Pie Chart                                           Page 72
Pictogram                                           Page 73
Graphical Summarization of an Ordinal or a Ratio Variable   Pages 74-79
Histogram                                           Pages 74-78
Line Chart                                          Page 79
Bias in Graphs                                              Pages 80-86
Time Plots                                                  Pages 87-89

Unit 6 – Numerical Summarization: Most Representative Value        Pages 90-104
Terms                                                       Page 91
Equations                                                   Pages 91-92
Most Representative Value                                   Pages 92-98
Mode                                                 Page 93
Median                                               Pages 94-95
Mean                                                 Pages 96-98
Skew and Example                                            Pages 99-101
Percentiles and Quartiles                                   Page 102
Quartile Example                                            Page 103
Quiz                                                        Page 104

3
Unit 7 – Numerical Summarization: Deviation            Pages 105-116
Terms                                           Page 105
Equations                                       Pages 105-106
Quiz 1                                          Pages 107-110
Variability                                     Pages 110-115
Deviation – Example                      Page 111
Simple Trick: Standard Deviation         Pages 112-115
Quiz 2                                          Pages 115-116

Unit 8 – Probability                                   Pages 117-129
Terms                                           Page 117
Bernoulli                                       Pages 118-119
Binomial                                        Pages 119-129
Digression – What is “Fair?”             Pages 119-122

Unit 9 – Normal Distribution and Probabilities         Pages 130-152
Terms                                           Page 130
Binomial to Normal and Standard Normal          Pages 131-137
Using the Standard Normal Table                 Pages 138-141
Probability Help Section                        Pages 142-146
Quiz                                            Pages 147-152

Unit 10 – Estimation and Confidence Intervals          Pages 153-168
Terms                                           Page 153
Equations                                       Page 153
Estimation                                      Pages 154-157
Quiz 1                                          Pages 158-160
Confidence Interval for the Sample Proportion   Pages 161-162
Quiz 2                                          Pages 163-164
Confidence Interval for the Sample Mean         Pages 165-166
Confidence Statements                           Pages 167-168

Unit 11 – Hypothesis Testing                           Pages 169-178
Terms                                           Page 169
Hypotheses                                      Pages 170-171
Five Basic Steps                                Pages 171-172
Legal System                                    Pages 173-178
Decision Table                                  Page 178

4
Unit 12 – One Sample Proportion Test                                     Pages 179-193
Test Situation Overview                                           Pages 179-180
One Sample Proportion Test                                        Pages 181-190
Terms                                                     Page 181
Equations                                                 Page 181
One Sample Proportion                                     Pages 181-190
Digression – How to Determine the Critical Value   Pages 186-190
Another Example                                                   Pages 191-192
Quiz                                                              Pages 192-193

Unit 13 – Multinomial and the Chi-Square Distribution                    Pages 194-206
Terms                                                             Page 194
Equations                                                         Page 194
Multinomial                                                       Pages 194-200
Digression 1 – Chi Square Distribution                     Pages 196-197
Digression 2 – Multinomial Test Statistic                  Pages 197-199
Template                                            Page 199
Another Example                                                   Pages 201-203
Quiz                                                              Pages 204-206

Unit 14 – Homogeneity and Independence                                   Pages 207-224
Terms                                                             Page 207
Equations                                                         Page 207
Problem Scenario                                                  Pages 208-213
Homogeneity                                                Pages 209-213
Digression – Homogeneity Test Statistic             Pages 210-212
Template                                            Page 212
Another Problem Scenario                                          Pages 213-217
Independence                                               Pages 215-217
Another Example                                                   Pages 218-220
Quiz 1                                                            Pages 221-222
Quiz 2                                                            Pages 223-224

Unit 15 – Scatterplot and Correlation                                    Pages 225-243
Terms                                                             Page 225
Equations                                                         Page 226
Scatterplot                                                       Pages 226-230
Correlation                                                       Pages 231-235
Template                                                  Page 232
Example 1                                                         Pages 235-238
Example 2                                                         Pages 239-241
Quiz                                                              Pages 241-243

Unit 16 – Other Forms of Correlation                                     Pages 244-246

5
Unit 17 – Regression                                                  Pages 247-267
Terms                                                          Page 247
Equations                                                      Pages 247-248
Regression                                                     Pages 249-258
Determination and Testing of the Regression Line        Pages 252-258
Template 1                                      Page 254
Template 2                                      Page 256
Another Example                                                Pages 258-261
Quick Guesses for the Constant and the Slope in Scatterplots   Page 262
Quiz                                                           Pages 262-265
Causality versus Dependency                                    Pages 265-267
Regression Guessing Web Site                                   Page 267

Unit 18 – Two Independent Samples T-Test                              Pages 268-280
Terms                                                          Page 268
Equations                                                      Page 268
Problem Scenario                                               Pages 269-275
What does Independence Mean?                            Pages 269-270
Independent Samples Statistical Test                    Pages 272-275
Template                                         Page 273
Another Example                                                Pages 276-278
Quiz                                                           Pages 278-280

Unit 19 – Matched Samples T-Test                                      Pages 281-294
Terms                                                          Page 281
Equations                                                      Page 281
What are Matched Samples?                                      Page 282
Matched Samples Form 1 – Traditional Definition         Page 282
Matched Samples Form 2 – New Definition                 Page 282
Problem Scenario                                               Pages 283-288
Template                                                Page 285
Matched Samples Statistical Test                        Pages 286-288
Comparison                                                     Page 288
Another Example                                                Pages 289-291
Quiz                                                           Pages 291-294

Unit 20 – One Way Analysis of Variance                                Pages 295-307
Terms                                                          Page 295
Equations                                                      Page 296
Example                                                        Pages 296-301
Template                                                Page 297
One Way Analysis of Variance Test                       Pages 299-301
Another Example                                                Pages 301-304
Quiz                                                           Pages 304-307

6
Unit 21 – Tables                                                  Pages 308-311
Standard Normal Table                                      Page 308
Chi Square Table                                           Page 309
T Table                                                    Page 310
F Table                                                    Page 311

Unit 22 – Experiments                                             Pages 312-325
Highly Controlled Environment                              Page 312
Design                                                     Pages 312-314
Multi-group Design                                  Pages 312-313
Treatment Group                             Page 312
Control Group                               Page 313
Comparison Group                            Page 313
Longitudinal Design                                 Page 313
Intervention                                Page 313
Bias                                                       Pages 314-315
Random Assignment                                   Page 314
Confounding Variable                                Page 314
Placebo Group                                       Page 315
Placebo Example                                     Page 315
Blinding                                            Page 315
Manipulation of the Explanatory Variable                   Pages 316-317
Biggest Public Health Experiment Ever                      Pages 318-325

Unit 23 – Probability and Some Examples                           Pages 326-342
Example 1 – Coin Flip                                      Pages 327-331
Example 2 – Coin Flip – Personal Probability Perspective   Pages 332-333
Example 3 – Monte Carlo – Roulette                         Pages 333-334
Expected Value                                             Pages 334-337
Expected Value Example 1 – California Lottery              Page 338
Expected Value Example 2 – Colorado Lotto                  Pages 339-340
Expected Value Example 3 – Flood Insurance                 Pages 341-342
Let‟s Make a Deal – Monte Hall Problem                     Page 342

7
Unit 24 – Personal Probability                                     Pages 343-352
Personal Probability Rules                                  Page 344
Comparative Rules                                           Pages 345-347
Certainty Effect                                     Page 345
Pseudocertainty Effect                               Page 346
Logical Intransitivities                             Page 347
Individual Rules                                            Pages 348-352
Availability Heuristic                               Page 348
Detailed Imagination                                 Pages 348-349
Anchoring                                            Pages 349-350
Representativeness Heuristic                         Pages 350-351
Optimism                                             Page 351
Reluctance to Change                                 Pages 351-352
Calibrating                                                 Page 352

Unit 24A – Probability                                             Pages 353-374

Unit 25 – Summary Help Section                                     Pages 375-380
Table 1 – Key to finding the appropriate test situation     Page 377
Table 2 – Hypotheses, Critical Values, Degrees of Freedom   Pages 378-379
Table 3 – Necessary Test Statistic Calculations             Page 380

8
Unit 1: Statistics
Terms
Statistics - Statistics is the science of making accurate and reliable observations, decisions, and
predictions about populations (what is a population; Unit 3) by using only information obtained
from samples.

Descriptive Statistics - information derived from samples to summarize and describe samples in
general.

Inferential Statistics - information derived from samples to make predictions about population.

Form 1 Question - a question in which we are only concerned about a single characteristic or
variable (what is a variable; Unit 2).

Form 2 Question - a question in which we are concerned about how 2 characteristics (variables)
are related to one another.

Form 3 Question - a question in which we are concerned about how one characteristic (variable)
can be used to predict another characteristic (variable).

What are statistics?
Probably the most famous quote about statistics, and a commonly held definition, comes from
Benjamin Disraeli, one of the prime ministers of England during the 19th Century. He stated,
“There are three kinds of lies; lies, damned lies, and statistics.”

To follow Benjamin‟s lead, I will follow with a simple game of two truths and a lie. Below are
three statements about statistics. Two of the statements are true and one is a lie. Which is the lie?

Statement 1: The original definition of the word statistics was the quantum of happiness.
[Quantum of happiness would mean the measurement of happiness]

Statement 2: Joe Friday‟s repeated statement in Dragnet, “Give us the facts and nothing but the
facts,” comes from Charles Dickens‟ novel Hard Times in which he is making a derogatory
comment about the use of statistics in 19th Century England.

Statement 3: Statistics is an area of mathematics.

9

As hard as it might seem, statement 3 is the lie. Although obscure the first two statements are
supposedly true. The quantum of happiness definition of statistics is attributed to Sir John
Sinclair (1799). The third statement is actually a near truth, statistics certainly uses mathematics,
but it isn‟t an area of mathematics. Although you will encounter many equations in this course
and you will be required to make numerous calculations, these calculations are arithmetically
based and rather simple to do, especially with a hand calculator. In addition, these calculations
are secondary to the primary emphasis in the course, which is the answering of questions (real
world problems) using numbers. The numbers are merely a tool through which you can obtain
answers to interesting and relevant questions.

--------------------------------

Before moving on to a definition of statistics, it might be interesting to see what some other
fairly famous people have said about statistics. We have already seen Benjamin Disraeli‟s
opinion; here are the views of some others

Quotes
The Importance of Statistics
Statistics is the most important science in the whole world: for upon it depends the practical
application of every other science and of every art. It is the one science essential to all political
and social administration, all education, all organization based on experience, for it only gives
results to our experience. --- Florence Nightingale

--------------------------------------------------------------------------------------------------------

Figures won‟t lie but liars will figure. --- Charles Grosvenor

It‟s easy to lie with statistics. But it is easier to lie without them. --- Frederick Mosteller

There are three kinds of lies: lies, damned lies, and statistics. --- Benjamin Disraeli

The secret language of statistics, so appealing in a fact minded culture, is employed to
sensationalize, inflate, confuse, and oversimplify. The crooks already know these tricks; honest
people must learn them in self-defense. --- Darrell Huff (How to Lie with Statistics)

------------------------------------------------------------------------------------------------------

10
What the feds have to say
An informed populace will be better able to tell the difference between being informed and being
manipulated.

If this is truly the information age, then those skilled in the science of information (statistics) will
be at a decided advantage over those who are not.

Students should begin to develop a critical attitude toward information presented in the media
and learn to ask relevant questions before making judgments based on that information.

Students should analyze statistical information from the media, interpret results of polls and
surveys, and recognize valid and misleading uses of statistics.

Class activities should make use of statistics in the media, especially advertising claims.

[These come from the Curriculum and Evaluation Standards for School Mathematics (National
Council of Teachers of Mathematics) for grades K-4!]

-----------------------------------------------------------------------------------------------------

Conclusion
Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and
write. --- H. G. Wells

-------------------------------

Can you somehow escape the influence of statistics?

No. You might not be aware of them or their influence, but statistics in some fashion or another
can be found lurking in almost every corner of our lives.

Here are some        examples.
Do statistics influence who I will vote for?

All polls of potential voters are conducted using numerous applications of statistics. The
selection of people to interview [Sampling] is statistics. The selection of the questions to
ask [Variables and Measurement (Unit 2)] is statistics. The summarization and
presentation of the results of the poll [Tables] is statistics. The results of these polls
definitely affect how voters perceive candidates and consequently also influence how
they will vote.

11
Do statistics influence my life style?

Almost everyday studies are being reported in the news that tell us the benefits of low
carb diets, low fat diets, taking vitamins, exercise programs, emotion enhancing drugs (to
enable us to calm down or to enable us to be alert), and a host of other suggestions. All of
these recommendations come from studies (experiments) in which the benefits
recommended have been affirmed through statistics.

Do statistics influence the clothes that I will buy?

The choice of fabrics, colors, and even designs are often the product of marketing surveys
conducted on us. What will be the new fad is not a matter of coincidence or luck. It is the
result of product testing, promotion, and timing. Success is always the by- product of
hard work, but hard work in these areas always includes statistics.

Do statistics influence the clothes that I will wear today?

In the examples above, studies are done, information is collect, and statistics are used to
answer questions and make decisions. In this example, no one is conducting a study on
us, but we are in the midst of an ongoing study that each and every one of us does on
ourselves everyday. Each of us is the recorder of our own past history.

What clothes (styles and colors) are associated with us feeling more positive, happier,
and more ready to meet the day? How do I feel, how would I like to feel? Today will I
choose a power tie, a suit, jeans, a cotton blouse, a silk shirt, or … Each of us is different,
but each of us know ourselves fairly well, even if it might be subconsciously. Here we
see a very fundamental aspect of statistics, the use of past information to assist us in
making decisions that confront us today.

What will the weather be like today? If we think it will be cold, then what will we wear?
Wool, flannel, fleece, a sweater, or … Here we see the same aspect of the personal use of
statistics as above, but I have added another component, Prediction. How have I made a
prediction about what the weather will be, so that I can make a decision about my
clothing choice for today? Have you used a weather report (definitely a statistical
prediction)? Have you used your own personal experience (this is called personal
probability and will be visited later in the course)?

The truth of the matter is that people, us, are statistical organisms. Everything you are going to
learn in this course is already operating in the world around us and in us as well. One of the
emphases of this course will be to enable you to see where this is happening. As a consequence,
it should also enable you to become part of “an informed populace (who) will be better able to
tell the difference between being informed and being manipulated.”

Thus, we have already arrived at the day that H. G. Wells foresaw a century ago. “Statistical
thinking (has become) as necessary for efficient citizenship as the ability to read and write.”

12
So

What are "Statistics?"
Interestingly enough this question is not often answered directly in many introductory statistical
textbooks.

A simple and nearly useless answer to our question is, "statistics is the field of studying
statistics."

As we will see in Unit 3 (Populations, Samples, and Sampling), statistics are characteristics of
samples. Thus, we use the word statistics in two different but similar manners. Statistics are the
characteristics/descriptors of samples, but statistics is also the name of the field in which these
descriptors are systematically studied. In this introductory section, we will deal with the "is"
statistics and not the "are" statistics.

As noted in Moore (Statistics: Concepts and Controversies, 3rd Edition, page xvi), "The aim of
statistics is to provide insight by means of numbers. In pursuit of this aim, statistics divides
the study of data into three parts: I. Producing Data, II. Organizing and Analyzing Data,
and III. Drawing Conclusions from Data." This implies that statistics is the "systematic study
of data." Which would make more sense to us if we knew what data were? [Note: data is a plural,
the singular case of data is datum.] Data are information that we have obtained from samples.
Thus, we see the connection to the paragraph above. Whether we are dealing with data or
characteristics, we are dealing with samples.

As noted in Utts (Seeing Through Statistics, 2nd Edition, page 3), "Statistics is a collection of
procedures and principles for gaining and processing information in order to make
decisions when faced with uncertainty." Utts‟ use of the word information is equivalent to
Moore‟s use of the word data. Utts‟ definition provides us with a focused purpose (to make
decisions) where Moore is considerably more broad (systematic study).

While these definitions appear quite a bit different, we can see that they are basically talking
about the same thing. However, it should be noted that neither directly addresses the fundamental
issue that we are studying samples. As mentioned above, this point will be expanded greatly in
Unit 3.

Here is yet another definition. "Statistics is the science of making accurate and reliable
observations, decisions, and predictions about populations by using only information
obtained from samples." While this definition is really not any better (more universally
accepted) than the two given above, it will be the definition that I will use throughout this course.
Why? Because I like it better. But I am biased, since it is my own. Here is another definition of
my own, "Statistics is a way of looking at the world and attempting to help us make sense of
what is happening." This last definition is probably most consistent with the goal and
orientation of this course.

13
As noted earlier, it is impossible to get away from statistics. They and this field are everywhere.
This period in history has often been referred to as the Information Age. If this is correct, then it
is obvious that a systematic science of information (essentially Moore‟s definition of statistics)
would be of critical and necessary value (the notion proposed by H. G. Wells). It is equally clear
that those skilled in the science of information will be at a decided advantage over those who are
not. Our own government has recognized this situation. In each new edition of the mathematics
standards for K-12 education, increasing importance is placed on teaching the citizens of our
country to be better consumers of information. In particular, an informed populace will be better
able to tell the difference between being informed and being manipulated.

This brings us full circle back to the most famous quote associated with statistics. "There are
three kinds of lies: lies, damned lies, and statistics."

The true situation is that statistics don‟t and can‟t lie. However, it is certainly true that liars use
statistics. Hopefully, this course will enable you to become a better user and consumer of
statistics, and someone less prone to being manipulated by those purposefully, or in ignorance,
handler of information. Thus, enabling you to be less likely to lie, in ignorance of course, to
others.

What is Research?
Often when we speak about doing research, we think of scientists conducting experiments to
answer the deep questions confronting our society, culture, future, etc. While this particular
activity certainly can be referred to as research, this image reflects an extremely small portion of
what research really is. Research is much, much broader.

What is research? Once again, we can come up with various definitions of this term, but let‟s go
at it from a simple perspective only. Research is the process whereby we attempt to answer
questions in an informed manner. Informed means that we are using information and this brings
us back to Moore‟s definition of statistics.

Consider the following example. You have just been presented with a question about which you
know nothing. For instance, the question might be "are left handed people smarter than right
handed people?" There are at least 3 distinct manners in which you might pursue an answer to
this question.

Situation 1: You could answer this question without having any information. This question might
not mean anything to us, therefore we could make our decision on the basis of a coin flip. Heads

having any information. You might be left-handed. Therefore, you might want the answer to be
yes so that you will be seen as smarter.

14
Situation 3: You could answer this question by performing a study. You might collect
intelligence information on several right and left handed people and on the basis of this
information determine whether the right or left-handed people are smarter.

Situation 1 is called guessing, situation 2 is called prejudice, and situation 3 is called research. It
should be easy to see that the answers provided through the research process should be of a
higher quality and more likely to be true than those obtained in either of the other manners.

Thus we can see that research is a very common activity, one in which we all participate in daily.
Even something so simple as deciding on what movie to go to. If you look into the paper to see
your options, then you are collecting information to assist you in making your decision. This is
research.

Why are we talking about research? It is a misconception today, shared even by professionals,
that research is statistics and statistics is research. This is definitely not true. Statistics is only a
part of the research process. Below is presented one conceptualization of the research process. It
is not unique and possibly not even the best, but it will serve our purpose for the present
discussion.

Steps in the Research Process
Step 1: The Question. What would we like to know something about? What is our question of
interest? This is the obvious beginning. This is not statistics.

Step 2: Measurement. This step of the process involves two different aspects. First, given our
question of interest, what are we specifically looking for? What would we like to know
something about? What information are we trying to obtain? Second, how are we going to assign
numbers to the answers we get? This is not statistics. This is called psychometrics, the process
of converting questions written in English to numerical questions that produce data.

Step 3: Design. How are we going to collect the information? This is not statistics.

Step 4: Sampling/Data Collection. How do we determine who to collect the information on?
How do we go out and actually collect the information on these people? This is a very small
area of statistics.

Step 5: Analysis. The analytic methods whereby we extract the answers to our questions from
the information (data) we have collected. THIS IS STATISTICS, the process of extracting the
numerical answers from our numerical questions.

Step 6: Answer. If we have done steps 2 though 5 correctly, then this step becomes the
conversion of our numerical answers back into English to answer the question posed in step 1.
This is not statistics.

15
Here is an example to illustrate these steps in the Research Process.

Step 1: "Are left-handed people smarter than right-handed people?"

Step 2: Specifically, we would like to know two pieces of information for each person we are
going to include in our study. First, handedness. Is the person right or left handed? Second,
smartness. How smart is each person? Now that we know what we are looking for, how are we
going to measure each of these pieces of information? Handedness, we could simply record
whether each person is right or left handed by asking them (1=right, 2=left). Smartness, we could
determine how smart each person is by asking for their IQ score, if known (record the numerical
score). This step is the primary topic of Unit 2.

Step 3: We could collect the data by talking with 30 people that we stop in the Student Union.
This would be an observational study.

Step 4: In order to get the 30 people, we might go to the Student Union on a Saturday and ask
the first 30 people we meet after 10:30 in the morning. This is not a very good sampling plan and
is called convenience sampling. This step is the primary topic of Unit 4.

Step 5: This is the tricky step. There are many ways to extract answers from data. While we will
look at only a few extraction methods they are the essential and fundamental methods for all of
statistics. Basically they can be divided into the descriptive methods (Units 4, 5, 6, 7, and 10)
and the inferential methods (Units 11 – 20). The simple answer to our current example question
might be to compare the mean IQ of the right-handed subjects and the mean IQ of the left-
handed subjects. (Material from Units 6, 10, and 15).

Step 6: Whichever group has the highest mean, will be determined to be the smartest.

Some Basic Research Questions
One of the many tricks in statistics is that Step 5 of the research process (the selection of the
statistical analysis method) must be matched to the numerical question specified in Step 2, which
is the numerical translation of the question of interest presented in Step 1. The type of question
that is being asked will directly affect the type of analysis method to be used. We will see this
association develop throughout the course of the entire semester. It is sufficient at this time to
merely be aware of this relationship. Thus, let‟s finish off this introduction and begin the course
with the specification of three types of general questions of interest (as might be seen in Step 1)
that can be asked. In an effort to simplify this illustration, assume that we know two
characteristics about a group of people. These will be their handedness and “smartness,” as
mentioned earlier.

16
Form 1 Question: What proportion of the people we have studied are left handed?

In this question we are only concern about a SINGLE characteristic; handedness.

Form 2 Question: What is the relationship between the “handedness” and the “smartness” of
those people we have studied?

In this question we are concerned about both characteristics and how they are related to one
another.

Form 3 Question: Can we predict how smart a person is by knowing if she/he is right or left
handed?

In this question we use both characteristics and would like to know if it is possible to make a

1. Question 1 deals with only one characteristic (more on this in Unit 2).
2. Questions 2 and 3 both deal with two characteristics (variables).
3. Questions 2 and 3 could deal with more than two characteristics, but we will limit these
question types to only two characteristics, for simplicity, throughout the entire semester.
4. Questions 1 and 2 typically arise from observational studies.
5. Question 3 typically arises from an experimental study.
6. Question 3 is the most powerful type of question, which leads us into answers called
predictions.

17
at this time solely as an illustration of a few of the points in this first unit. Statistics are being
used by a great number of people in order to justify their beliefs, positions, decisions, and
recommendations. It would be nice if everyone using the same information could arrive at the
same conclusion, but this is an ideal that is rarely achieved in the real world. As was noted earlier
in this unit by the National Council of Teachers of Mathematics for grades K-4, “An informed
populace will be better able to tell the difference between being informed and being
manipulated.”

Article
Trust in tax bill’s fairness at stake in political melee
By Rob Wells (AP Tax Writer)

Washington – At the heart of a political brawl in Congress over the emerging tax bill are
accusations that both sides manipulate statistics to advance the respective political agendas.

More than routine Washington finger-pointing, the debate over who benefits from the tax-cut
bills moving through the House and Senate goes to the question of whether the public perceives
the tax packages as fair.

It‟s a debate almost certain to be a major theme in the 1998 midterm elections.

House Minority Leader Dick Gephardt, D-Mo., said this week‟s vote on the Republican Tax Plan
"is one of the most important votes that will be taken in this congress."

House Ways and Means Chairman Bill Archer, R-Texas, lobbed the first grenade in the battle.
He accused the Treasury of using statistics that inflated wealth and lead to the assertion that the
wealthy benefit from the tax package.

"If you ever wondered why the administration decries tax cuts for the rich, it‟s because they use
a misleading and artificial method of defining income that is out of sync with reality," Archer
said.

Archer contends people who earn \$20,000 to \$75,000 a year get 71 percent of the benefits of his
tax package.

The Treasury Department, using the "Family Economic Income" method of calculating family
wealth, say 77.6 percent of the benefits go to people making more than \$75,000 a year.

18
Archer blames the discrepancy on such factors as Treasury‟s inclusion of "imputed rent on
owner-occupied housing," an odd tax concept that increases a home-owner‟s income as if the
family pays itself rent for the residence.

It also adds to a person‟s gross income the value of fringe benefits and increased value of
insurance policies and pensions.

A senior Treasury official defended the calculation as legitimate because 90 percent of family
income remains cash income.

Robert Greenstein of the liberal Center on Budget and Policy Priorities said Archer‟s argument
doesn‟t hold up when the distribution of tax cuts is viewed in general categories of income, such
as the upper and lower fifths of society.

"If that‟s a point Archer is making, he‟s grasping at straws," Greenstein said.

The accuracy of Joint Committee on Taxation numbers cited by Archer are being criticized as
well.

The Center on Budget and Policy Priorities says Archer‟s numbers are distorted because they
calculate changes in Individual Retirement Accounts and reductions in capital gains taxes
through 2002, well before they fully kick in.

In fact, the proposed capital gains tax changes will increase revenue by \$2.6 billion in tax
revenue. The projected revenues in the first five years are based on the expectation that investors
will cash in their stock market winnings.

19
Unit 2: Variables and Measurement
Part I: General Presentation

Terms
Variable - a characteristic of something, which can be measured and varies.

Measurement - a means of assigning values (usually numbers) to a variable's response
possibilities

Varies - measurements that differ; are not the same

Constant - measurements that do not vary

Quantitative Measurement - numbers are used to express quantity (here the number express
amount of something)

Qualitative Measurement - numbers are used to express quality (here numbers are replacements
for names of something)

Categorical Level of Measurement (AKA, nominal) - the commonly used name for a variable
that is measured qualitatively

Ordinal Level of Measurement - the measurement associated with a variable where the numbers
only express the concepts such as more, less, greater, smaller

Ratio Level of Measurement - the measurement associated with a variable where the numbers
express precise amounts

"Ratio" Level of Measurement - a special case of ordinal measurement where there are more than
5 possible variable responses

Validity - the quality of the measurement of a variable that indicates that the variable is
measuring what it is supposed to measure

Reliability - the quality of the measurement of a variable that indicates that the variable when
measured results in repeatable (consistent) responses

20
What is a variable?
Here is a collection of definitions from a variety of sources. It is interesting to note that many
introductory textbooks in statistics do not define this word at all, even though 100% of the
textbooks talk about variables throughout their extent.

A Variable is

-     A characteristic of interest.

-     A characteristic of the world that can be measured.

-     A characteristic of a member of the population. (Population is present in Unit 3)

-     Any characteristic of a person, environment, or experimental situation that varies from
person to person, environment to environment, or experimental situation to situation.

From these definitions we see an obvious point of similarity. Whatever a variable is, it is a
characteristic of something. Before presenting some examples, which will hopefully clarify the
definition of a variable, there are two additional points that should be extracted from these
definitions. These two points are measurement and varies.

What is measurement?
Measurement is the assignment of unique numerical values to unique aspects of the variable of
interest.

A measurement system is a systematic means of assigning values to a variable.

For illustration consider the following:

A ruler (with marks every ¼ inch) is an example of measurement system. We can lay it
alongside something that we are interested in and we can associate values (in inches)
corresponding to the size of that which we are measuring. Every mark on the ruler indicates a
unique value.

Color is another example of a measurement system. Given a color chart (the measurement
system), we could place it alongside anyone‟s eyes to determine their eye color. Every color in
the chart indicates a unique value. Although this measurement system doesn‟t appear to produce
numbers, we will see later in this unit how it is possible to assign numbers even in a situation like
this one.

Are either of these two example measurement systems perfect? What does perfect mean? For a
measurement system to be perfect, it means that each and every thing that can be measured will
be measured without any errors, mistakes, or inaccuracies.

21
Of the two measurement systems above, let‟s look at the ruler example at greater length. What
would happen if an object we were measuring turned out to be a length that fell between two of
the marks? In some fashion we would have to create a supplementary system which would
enable us to select either the marked value above or below the true length. We are already very
familiar with at least one such supplementary system called rounding or rounding off. Would the
rounded off value represent the true length of the object? No, it would be imperfect, but it would
be close. We often refer to measurements taken from a system like this one in the following
manner: the size of this object is 12 and ¾ inches, measured to the nearest ¼ inch; or the size of
this object is 12 and ¾ inches, accurate to the nearest ¼ inch.

Could we make this particular measurement system better (more accurate)? What does more
accurate mean? A more accurate measurement system is one that would produce smaller errors
or smaller inaccuracies. This is possible? Certainly. We could make it better by using a ruler that
was now marked off to the nearest 1/8 inch. Would this new measurement system be perfect?
No. Could we devise a still better measurement system? Certainly. We could use a ruler marked
off to the nearest 1/16 inch. Would this new measurement system be perfect? No. Could we
devise a still better system? Certainly. We could continue with this discussion, but the real
question is can we ever devise a perfect measurement system? Probably not. In theory we can,
but in practice we can‟t.

This fact is often overlooked by professionals and almost universally overlooked in most
introductory textbooks. But it is true. None of our measurement systems are perfect. Thus, the
real question is, “can we devise a measurement that is as perfect (accurate) as we need it to be?”
The answer to this question is probably yes.

Varies
The other additional point about a variable was the word varies. This means that the “things” that
we are taking measurements of will not all produce the same response. For instance, in the ruler
illustration above it is very unlikely that everything we measure will be exactly the same size. If
we are using this ruler to measure the size of shoe boxes, then some of the boxes will definitely
be the same size as others, but most will be of different sizes. Fairly obviously the boxes that size
5 ½ shoes come in will be shorter in length than the boxes that size 11 ½ shoes come in. More on
this issue will be presented in the following examples.

22
Some Examples
Since this is a course in introductory statistics for the social sciences, let‟s consider the organism
that we are interested in studying is a human being. Given the diversity of human kind, there are
a great many things (characteristics, variables) that are possible to measure. Remember for us to
label “something” as a variable it must (a) be a characteristic of that which we wish to study
(human beings here), (b) it must be measurable, and (c) it must vary.

1. Height.
(a) certainly height is a characteristic of humans
(b) we can measure a person‟s height in inches
(c) since not everyone is the same height, then the measurement values for this variable will
certainly vary
Conclusion: height is a variable; possible values are 6‟ 2” , 5‟ 3” , 5‟ 9” , etc.

2. Age.
(a) certainly age is a characteristic of humans
(b) we can measure a person‟s age in years, in months
(c) since not everyone is the same age, then the measurement values for age will certainly
vary
Conclusion: age is a variable; possible values are 28 years 3 months, 35 years 2 months, etc.

3. Vision.
(a) certainly vision, how well a person sees, is a characteristic of humans
(b) we can measure a person‟s vision through any one of several vision tests
(c) since not everyone‟s vision is the same, then the measurement values for vision will
certainly vary
Conclusion: vision is a variable; possible values are 20/20, 20/100, etc.

For examples 2 and 3, let‟s change the group of people we are interested in obtaining
information about slightly. Rather than being interested in studying any human, we are only
interested in studying totally blind humans.

4. Age.
(a) certainly age is a characteristic of totally blind humans
(b) we can measure a blind person‟s age in years, in months, in days
(c) since not every blind human is the same age, then the measurement values for age will
certainly vary
Conclusion: age is a variable for blind people

23
5. Vision.
(a) certainly vision, how well a person sees, is a characteristic of totally blind humans (it is
just that a totally blind person‟s vision is zero)
(b) we can measure a totally blind person‟s vision through anyone of several vision tests;
however, the results will always be zero
(c) every totally blind person‟s vision will be exactly the same, zero. Thus, this
characteristic of totally blind people does not vary.
Conclusion: vision for totally blind people is not a variable. It is what we call a constant.

What is a constant? A constant is a special variable in which the values for the people measured
are all exactly the same.

Levels of Measurement
Although there are an infinite number of possible measurement systems, they can all be
identified as one of three general types (levels of measurement). These general types are
categorical, ordinal, and ratio. There is actually a fourth general type of measurement system,
called interval, but these are so rare as to not be worth our attention.

There are many good discussions of this topic. Chapter 2 in Statistical Reasoning for Everyday
Life by Bennett, Briggs, and Triola (2nd Edition) is one of the best.

Quantitative means to be able to express something in a numerical fashion. According to the
American College Dictionary, quantitative means the describing or measuring of quantity.
Questions such as how big, how many, how much, lead to quantitative measurement systems.

Qualitative means not quantitative. According to the American College Dictionary, qualitative
means the consideration of quality. Questions such as what type, what color, what nationality,

In our English language, when we say we can measure something, we typically are referring to
quantitative measurement systems. However, in statistics, measures refer to both quantitative and
qualitative measurement systems. But we do acknowledge that in qualitative systems, numbers
are meaningless indicators of quantity.

24
Categorical
Categorical is also called nominal.

Categorical means that we differentiate aspects of a variable based on categories. Even though
we might use numbers to differentiate between categories, the numbers have no quantitative
meaning. See example 2 below.

A categorical level of measurement is one that examines characteristics (variables) which are
qualitative. Categorical variables are qualitative.

Examples of Categorical Variables:

1. Political Party Affiliation. Which could take the values: Democrat, Republican, Other

2. Sex of subject. Which can take the values: female, male. Rather than referring to each
subject as a female or a male, we could introduce a short hand symbolic notation, where f
stands for female and m stands for male. Or if we wished (and this is very common) we
could introduce a short hand numerical notation, where 1 stands for female and 2 stands for
male. In this example we have introduced numbers into a qualitative measurement system.
Notice that the numbers do not mean anything in a quantitative sense; they are only acting as
names (1 simply means female and 2 simply means male).

3. Type of vitamin. Which could take the values: A, B12, C, E, etc.

Ordinal
Ordinal means that we differentiate aspects of a variable based on general differences. General
differences mean that we can tell when something is larger than something else, but we do not
know specifically by how much.

An ordinal level of measurement is one that examines characteristics (variables) whose various
aspects can be ranked or ordered in relation to one another.

Examples of Ordinal Variables:

1. Places for the best jelly at the State Fair. Values would be First Place, Second Place, Third
Place. Etc.

2. What class are you in college? Values would be Freshman, Sophomore, Junior, Senior,

25
3. Respond to the following statement using the scale below: “The stock market is over-
extended.” [This particular type of ordinal variable is called a Likert Scale]

___________1_____________2___________3___________4____________5______
…… Strongly Disagree.…. Disagree.…... Neutral………. Agree…… Strongly Agree

Seeing words, or concepts, such as rank or rate in questions, is usually a strong indication of an
ordinal variable. Seeing words such as greater, lesser, more, or fewer for measurement values is
usually a strong indicator of an ordinal

Ratio
Ratio means that we differentiate aspects of a variable based on specific differences. Specific
differences mean that we can tell how much something is larger than something else. Ratio
variables are ordinal variables in which the issue of how much the difference is between

A ratio level of measurement is one that examines characteristics (variables) whose various
aspects can be quantitatively related. There are three components to a quantitative relationship.
These are (1) order, (2) magnitude, and (3) a meaningful zero. As we saw above, an ordinal
level of measurement addresses (1) only. A ratio level of measurement address all three. Thus,
we can answer questions which require us to know specifically how much bigger one number is
than another, and how much bigger a particular number is than zero. This third feature enables us
to address an additional component where we can compare one number to another in a ratio
sense. For instance, if we have the numbers 6 and 3, we know by (2) above that 6 is three units
bigger than 3, but from (3) we know that 6 is twice as big as 3. This is where this level of
measurement gets its name.

Examples of Ratio Variables:

1. Cost of movie tickets in dollars and cents. Possible values would be \$6.25, \$7.50, etc.

2. Gas mileage in cars in miles per gallon. Possible values would be 18.5 mpg, 19.1 mpg, etc.

3. Time it takes to run a mile in minutes to the nearest 1/100th of a second.

26
“Ratio”
A special case of the ratio level of measurement should be presented which is particularly
applicable in most of the social and behavioral sciences. Probably the most popular method of
collecting data (information) in these sciences is through surveys or questionnaires, and one of
the primary ways of asking questions (variables) of people is through the use of the Likert Scale
(mentioned in the ordinal level of measurement section). Likert Scales are ordinal. However, it is
a common practice in the social and behavior sciences to consider ordinal scales that have at
least 5 different responses (the Likert Scale example presented earlier) to be ratio. To
differentiate this type of ratio level of measurement from the true ratio level of measurement, I
will use the designation “ratio.” The benefits of this elevation of an ordinal scale to a pseudo-
ratio level will be discussed throughout the course.

Quality of Measurement
When choosing a measurement system, there are several questions that seem logical to consider.
A couple of the bigger questions are:

1. Does the measurement system actually measure what it is supposed to be measuring?
Validity

2. Does the measurement system produce repeatable numbers? Reliability

These two issues, validity and reliability, are the foundational principles of the field of
Psychometrics. There are many different types of validity and of reliability, but we will deal with
only their general forms (as reflected above) in this course. Variables of the highest quality are
both valid and reliable. Variables of the lowest quality are invalid and unreliable.

For most physical variables validity is a rather simple concept. The height, weight, width, and
color of something are easily measured and directly related to what we intend to measure.
However, most social or behavioral variables are much more problematic. What are easy and
direct measures of assessing frustration, intelligence, feeling good, feeling bad, etc.?

For instance, if we want to know a baby‟s heart rate, it is easy to take a pulse and this pulse count
is a valid measure of the baby‟s heart rate. However, if we want to know when a baby is hungry,
how do we assess this? When the baby cries? This of course can mean hunger, but it can also
mean teething, sleepiness, etc. Hence, crying is not a valid measure of hunger because it is an
indicator of other issues as well.

What about the reliability of the variables above? How reliable is height? The answer here is
dependent upon how we wish to measure height. Let‟s consider two options.

27
Option 1. I have a steel rod 3 meters long, which is marked off to the nearest millimeter (mm).

Option 2. I have a rod made out of a heat sensitive polymer. At 70oF it is 3 meters long and
marked off to the nearest mm. However, this rod contracts or expands considerably with
temperature fluctuations. For instance, at 60oF this rod is actually only 1.5 meters long and at
80oF this rod is actually 6 meters long.

I have an object I would like to measure, but I do not know the ambient temperature. Evaluate
Option 1 and Option 2 for their validity and reliability. Regardless of temperature, both measure
distance (Option 1 does this well and Option 2 does this poorly), hence both are valid.
Regardless of temperature, Option 1 will replicate its measurement of this object very well,
hence Option 1 is extremely reliable. However, the size of the object will vary greatly using
Option 2 depending upon the ambient temperature. Most likely Option 2 is extremely reliable for
measurements made at exactly the same temperature, but since these measurements depend upon
temperature (as much as the size of the object), then Option 2 introduces a serious bias into the
measurement (temperature) and can not be considered reliable.

How about the reliability of the baby‟s crying? We can accurately ascertain when the baby is
crying and two observers would most likely agree when this is occurring, hence crying is
reliable; even though we are not necessarily certain what the crying means (lack of validity).

Summarizing Example
Task: create a measurement system to ascertain how many words are in a book. We will assume
the book starts on page 1.

Method 1: Count all of the words in the book.

Valid?                         Yes
Reliable?                      Yes
How long to complete?           Very, Very Long
Assumptions?                   You must stay focused on the task or as your mind wanders
unreliability is introduced

Method 2:
a. find the page number of the last page of text
b. select one page at random and count the number of words on this page
c. multiply the number of pages in a. by the number of words in b., to get an estimate
for the total number of words in the book

Valid?                         Yes
Reliable?                      Somewhat
How long to complete?           Few minutes

28
Assumptions?

1. All pages have the same or very nearly the same number of words on them. This is
what is reducing the reliability.
Note; any fluctuation in this number is magnified by the page count multiplier.
2. Much easier to stay focused than in Method 1.

Method 3:
a. find the page number of the last page of text
b. select one page at random
d. select one line at random
e. count the number of words in this line
f. multiply the number of pages in a. by the number of lines in c. by the number of
words in e, to get an estimate for the total number of words in the book

Valid?                        Yes
Reliable?                     Much worse than Method 1 or Method 2
How long to complete?         One minute
Assumptions?

1. All pages have the same or very nearly the same number of lines on them.
2. Every line has the same or very nearly the same number of words in it.
3. Assumptions 1 and 2 are producing the loss of reliability. Note, any fluctuation in the
number of words per line is multiplied by the number of lines per page and the
number of pages, and any fluctuation in the number of lines is multiplied by the
number of pages; hence, there are two sources of bias in this method.

Method 4:
a. measure the spine width of the book in inches
b. multiply the width of the spine by the magic number 100,000, to get an estimate of
the total number of words in the book.

Valid?                        Probably not
Reliable?                     Yes
How long to complete?          15 seconds
Assumptions?

That every book is the same in terms of words per page, thickness of paper, thickness of
cover. Since some books could be novels, cookbooks, cartoon books, etc., the violation of
this assumption reduces our validity. This assumption can quite literally be boiled down
to, is spine width a measure of the number of words?

29
Of the four methods, method 1 certainly will produce the best result. But it will be very painful
to obtain our answer in this manner. Of the methods 2, 3, and 4; method 2 is probably the best,
because it is at least somewhat valid and reliable. But, if you only could choose between methods
3 and 4, which would you select?

Quiz
What level of measurement is being illustrated in 1 though 3? Answer using Categorical,
Ordinal, or Ratio.

1. Grade assigned at the end of a class (A, B, C, D, F).

2. Grade earned on an examination in percent.

3. Course schedule. Which courses are you taking this semester?

If we are trying to assess the likelihood of student success in college, is the SAT examination
valid and reliable?

4. Valid?

5. Reliable?

30

Ordinal. Even though we typically give an A 4 points, a B 3 points, a C 2 points, a D 1 point,
and an F 0 points; this does not imply that an A is twice as good as a C. This is easy to see in that
two Cs do not equal one A. Hence, grade is not ratio. Grade is also not nominal, in that we all
recognize that an A is better than a B, a B is better than a C, etc. So there is an ordering, hence

Ratio. Certainly the numbers are meaningful in this context (hence not nominal), but do they
mean something in a ratio sense or merely an ordinal sense? Does a grade of 80% mean that you
got twice as much correct than if you receive a 40%? Yes. Not only is 80% bigger than 40%, we
know specifically how much bigger it is. 80% is an additional 40% bigger than 40%.

Categorical and Ordinal. This is a somewhat tricky question, since there are two separate
contexts that we can place the answer. First, you could be taking a course in history, another in
chemistry, another in english, and this course in statistics. At this level, the measurement would
be categorical. However, you could be taking a beginning level course in history, an advanced
course in history, a beginning course in chemistry, and this course in statistics. In this situation,
history, chemistry, and statistics reflect a categorical level of measurement. However, the
beginning course in history and the advanced course in history reflect an ordinal level of
measurement. The advanced course is a higher level course than a beginning course. In this
regard we also see that it would be necessary to take the beginning course prior to the advanced
course. The beginning course would be seen as a prerequisite.

Valid --- no. The SAT examination is a valid measure of itself, but has been demonstrated to be
a poor predictor of success in college. Hence, the SAT does not assess success in college and it is
not valid.

Reliable --- yes. While it is generally believed that you can study for this test (and there are
many study guides around) and improve your scores, this generally does not happen. Most
students who repeatedly take this examination score at very similar levels each time and typically
do not “meaningfully” increase their performance over repeated attempts.

31
Part II: Special Case – Survey Data

Surveys, Questionnaires, and Polling

The word survey has been used in the popular media and professional literature to mean a great
variety of things. On the professional side,

Survey is a method of gathering information from a sample of individuals (from the American
Statistical Association).

Survey Sample is a method of obtaining information on a subgroup from a large population.

Survey is to examine for some specific purpose, to look at, to observe (from Webster).

In the popular media, survey more closely resembles the word questionnaire. A questionnaire is
a collection of questions asked of respondents. Thus, a questionnaire is really a subset of a
survey.

Whether you run across the word survey or the word questionnaire is inconsequential. In this
unit, we will only be concerned with the issues surrounding the gathering of information, the
obtaining of information, and the observing of people in the context of asking these people
questions. So within the context of this unit we will use the word questionnaire to relate to the
collection of the questions and the word survey to convey the process of collecting answers to
these questions from people who are called respondents.

There are 3 important aspects in the gathering and obtaining of information (survey) using a
questionnaire. These are

1. The questions to be asked. These form the basis of the questionnaire.

2. The method of measuring the answers to these questions. These were the issues of the level of
measurement, validity, and reliability (Unit 2). In the context of questionnaires, this is called
scaling.

3. The method of collecting the data. These were the issues of sampling (Unit 4). In
questionnaires and surveys this point is complicated somewhat by the manner in which we
interact with the respondent. In the context of surveys this is called the Interview Method.

32
Questions
The material in this section is really within the domain of psychology more than it is in statistics,
but it is critically important to the understanding of questionnaires and the data that is generated
from them. In addition, the total volume of material that could be included in this section and
unit is considerable. In fact, many books and even courses have been written and developed to
present the methods, nuances, and challenges of the construction, analysis, and interpretation of
surveys and questionnaires. Consequently the discussion presented in this unit should be viewed
as only a brief introduction to the area.

Types of Questions
In general, there are two general types of questions, those that are open and those that are closed.

For illustration, consider the following main question, "what do you like to eat?"

Open questions (also called open-ended) are those that do not limit the respondent to a specific
set of responses. The main question above is actually an open question. The respondent is free to
say anything.

Closed questions are those that provide the respondent with a specific set of responses. The
main question above can easily be converted to a closed question. "What do you like to eat;

In a closed question, the respondent is given a set of responses (tacos, burgers, pizza, salad) to
choose from. What is implied by the question (what do you like to eat) is decidedly different in
these two situations. The open question implies that we consider the question from the context of
what is our favorite food to eat. The closed question implies that we consider the question only
from the context of which of the listed foods do we like best. If your favorite food is pizza, then
it doesn't matter whether you were presented with the closed or open question, because your
response would be the same. However, if your favorite food were steak, then it would matter.

Which is the better question, the open form or the closed? Unfortunately, both have advantages

On the plus side
Open questions permit respondents to provide the best answer from their own perspective.
Closed questions enable us to summarize the data obtained from our question in a more succinct
manner.

On the negative side
Closed questions do not permit respondents to provide the best answer from their own
perspective. Open questions do not enable us to summarize the data obtained from our question
in a more succinct manner.

33
For instance, if we asked the open question above to 100 people, it would be possible to obtain
100 different responses. Therefore, no effective summarization of the data would be possible.
However, if we asked the closed question above to 100, we would have at most a total of four
different responses, because we placed this limitation within the context of the question.

As with most things, the answer to which type of question is best depends upon what you wish to
do and how much effort you are willing to go to.

Within the context of closed questions there are a great variety of means whereby we can limit
the choices presented to the respondents. The scaling section below presents a simple overview
of some of the more common scaling methods along with a brief description and some examples.
I am using the word scaling to indicate the type of measurements or numbers (choices) that we
are providing to the respondents to select from.

Scaling
Magnitude Scaling – In magnitude scaling we are attempting to determine the degree to which
an attitude, feeling, belief, etc. is being endorsed or experienced. In these types of questions, we
often see the phrase "how much." Note: when using magnitude scaling we are most commonly
presented with a question.

Example: "How much do you like the smell of pizza?"
0 = not at all
1 = a little
2 = somewhat
3 = a lot

The magnitude is expressed by the increasing level of endorsement (not at all, a little,
somewhat, a lot). Note that the level of measurement in this example is ordinal.

34
Likert Scaling – In Likert scaling we are attempting to determine the degree and direction of
endorsement or experience. Quite often in Likert scaling we are being asked how much we are in
agreement with some statement. It should be noted that Likert Scaling is the most popular form
by far of obtaining numerical information used in questionnaires. Note: when using Likert
scaling we are most commonly presented with a statement.

Example: "The proposed national smoking policy will be good for America."
1 = strongly disagree
2 = disagree
3 = neutral
4 = agree
5 = strongly agree

The direction of our endorsement is expressed in the example above through the words agree
(positive endorsement) and disagree (negative endorsement). The degree or magnitude of our
endorsement is seen through the increasing level of endorsement from agree (or disagree) to
strongly agree (or strongly disagree). Note that the level of measurement in this example is also
ordinal.

Yes/No Scaling – In Yes/No scaling we are providing respondents with only the opportunity to
express the direction of their endorsement. This is a simplified version of the magnitude scale
and once again it is most commonly presented as a question.

Example: "Do you believe in the death penalty?"
0 = No
1 = Yes

What is the level of measurement in this example? This is one of the hardest questions to answer.
Because it has only two possible values it is most commonly considered categorical.
Alternatively a yes response does convey greater endorsement than a no response and hence can
be considered ordinal. However, when and only when the no is coded as a 0 and the yes is coded
as a 1 (as is true in this example), the 0 can be interpreted as 0% endorsement and the 1 as 100%
endorsement. As a result this variable can also be considered ratio. Even though it is hard to
answer what level of measurement a yes/no scaled variable is, it is impossible to be wrong!
Hence, choose whatever level of measurement you would prefer. Most people would choose
ratio.

35
Forced Choice Scaling – In forced choice scaling the respondent is not permitted to remain
neutral toward the statement being presented. Yes/No scaling is an example of this. In general,
forced choice scaling is a modification of Likert scaling in which the neutral option has been
eliminated. The belief in forced choice scaling is that people are not truly neutral about anything
and giving them the option to be neutral allows them the comfort of not having to express
commitments.

Example: "The proposed national policy forbidding smoking in public places will be good for
America."
1 = strongly disagree
2 = disagree
3 = agree
4 = strongly agree

The last component of questions that is important for us to consider, is the manner in which the
questions are stated.

Wording
This is the most problematic aspect of questionnaires. In the wording of our questions or
statements to be endorsed, three simple guidelines should be followed. There are actually many
more guidelines, but I have provided three as the tip of the iceberg.

Use Simple Language
1. Be clear, don‟t be ambiguous
2. Be brief

These are two obvious components to the writing of "good" questions. However, they could
easily compete with one another. For instance, certain questions might be intrinsically confusing.
Thus, in order to clearly state the situation and our intention to a prospective respondent we
would have to write a longer and more descriptive question. Such a question would violate our
second principle of brevity. In almost every situation the concepts of clarity and brevity clash.
Since there is typically no way to resolve this situation such that both conditions are met, it is
wise to apply the following directive. Make the questions as brief as possible and yet still clear
enough to convey completely to a respondent what is being asked.

Example: "Do you like pizza?"

This is certainly a briefly stated question, but is it clear? What aspect of pizza are we being asked
to consider? Its smell, its taste, or ? What type of pizza? Our favorite or something else?

Correction to the example: "Do you like the aroma of pepperoni pizza?"

36
This question is not as brief, but it certainly is much clearer. The problem with the first example
is that some people could be responding to this question by smell, some by taste, some by the
taste of a veggie pizza, others by …. The point here is that the question will produce poor results
because it doesn‟t adequately constrain all respondents to the same context. In this the corrected
example is much better. However, the corrected example does constrain us to only considering
smell and only considering pepperoni pizza. If this is what we wanted, then the corrected
example is good. However, if it overly constrains us, then the corrected example is not only poor,
but probably even worse than the original example, since it is then guaranteed to give us answers
to a question that we aren‟t interested in.

Be Neutral
Do not lead the respondent toward a specific response.

Example: Do you like the aroma of a freshly baked pepperoni pizza served on a cold Friday
evening at the end of a long week in which you haven‟t had much time to relax and enjoy a good
meal?

This question is very specific, is not very brief, and has us considering a lot of information that is
not necessarily directed toward our question of interest. It makes use of a lot of imagery that
seems to be specifically included to draw us into making a positive endorsement of the question.
This particular form of biasing a question is called detailed imagination. Psychologically we
know that we can typically increase a person‟s endorsement of a situation simply by including
more detail in our description which then is more likely to activate our imagination and as a
consequence increases our willingness to make an endorsement in the direction of the bias. The
question above would have us consider much beyond the question of the smell of a pizza. It
directs us to the concept of a warm meal on a cold night at the end of what has been a long week
where we haven‟t had any time to relax, etc. This additional information has nothing to do with
smell. Here is another example from a negative context.

Example: Do you believe people should be allowed to smoke in bars contributing to the foul
smelling air that we know causes short and long term painful health problems?

The key issue in considering neutrality is that we should not include information in a question
that is not relevant to the primary question of interest. Certainly we need to include enough
information to be clear what we are asking, but beyond this point our superfluous wording will
usually result in creating an ambiguous or biased (non-neutral) situation.

No Double Barreled Questions

Example: "Are you satisfied with recent activities of President Bush and the members of the
Senate?"

37
First off this question is not very clear. What activities are we referring too? But most
problematic in this example is our asking of two potentially very different questions. If we
assume that the recent activities of this question could be adequately clarified, we are still left
with having to answer the question from the context of President Bush AND from the context of
the Senate. If we feel very similarly about the activities of the President and the Senate, then
there is no conflict or problem. However, if we feel different about the activities of the President
and those of the Senate, then how do we respond? The only way to resolve this problem would

Interview Method
In the collection of questionnaire data from respondents there are three general methods
employed.

Face to Face Interviewing
In face to face interviewing respondents are interacted with in person. A common place where
this type of interviewing takes place is at the mall. I am sure that upon entering a mall you have
seen a person carrying a clipboard and who occasionally approaches people to be interviewed.
What type of sampling is taking place? Since there is no sample frame, they can not be doing any
form of random sampling. And although this might seem like a convenience sample, there is
more order to what is taking place than might first be expected. Usually on the clipboard is a list
of the type of people that the interviewer is supposed to find and get data from. For instance, one
of the subjects needed might be a female, 20-25 years old, upper middle class income, who is
married with at least one child. These interviewers are exceptionally talented at being able to
look at someone in the mall and accurately answer all of these questions without even talking to
the potential respondent. We generally have the feeling that if we don't make eye contact with
these people that they will leave us alone. This is hardly the case. If you are not the type of
person that they are looking for, then they will not attempt to interview you. On the other hand, if
you are the type of person that they are looking for, then they will attempt to interact with you
regardless of making eye contact or not. This type of sampling is called quota sampling and
closely resembles strata random sampling. What are some of the advantages and disadvantages
of face to face interviewing?

38
a. easiest of the three interview methods to get participants
b. easiest to meet quotas

a. most expensive (usually the interviewers are paid)
b. easiest to bias from the interviewer's perspective (interviewers can consciously and
unconsciously attempt to sway respondents in their answers)
c. easiest to introduce bias from the respondent's perspective (respondents may attempt to
give answers they feel might please the interviewer or might give the interviewer a
positive impression of themselves)
d. the data is not much above the quality of a convenience sample

Telephone Interviewing
In telephone interviewing the respondent is obviously called at home or the office (a relatively
new practice). If you own a phone, then you have experienced this type of interaction. There are
two primary ways you can be contacted for an interview by this method. First, is that you belong
to some organization or subscribe to some service and they have sold your name to a company
desiring to profile its members or target you as a potential user of some product or service. It is
especially important to differentiate between tele-marketers and tele-interviewers (we are only
considering the latter). The second method whereby you might be selected for a telephone
interview is random digit dialing. In this, the interviewer has a computer program that randomly
dials phone numbers within a particular area code. Usually the computer dials about five
numbers at one time and then connects the interviewer with the first person answering the phone.
This is one of the reasons the number of apparently no response calls is going up. If you are the
second or subsequent person to pick up the phone the computer automatically disconnects you
without acknowledgement. Hence, one of the ways to minimize tele-interviewer and tele-
marketer calls is to be slow in answering the phone. The impact from the recent legislation to
create a national do not call list should not have any immediate consequences for tele-
interviewers since it is supposedly designed only for tele-marketers. However, the issue of how
these people are invading the privacy of our homes through the telephone has been now begun to
be addressed and it is quite logical to think that tele-interviewers might eventually end up being
similarly affected. What are some of the advantages and disadvantages of telephone
interviewing?

a. easiest to collect a simple random or stratified random sample
b. one of the fastest methods of collecting questionnaire data

a. the negative stigma associated with tele-marketers
b. harder to communicate clearly with respondents
c. harder to get respondents to participate

39
Mail Interviewing
In mail interviewing the respondent is obviously contacted through the mail. In questionnaires
that are mailed out, you can be certain that you were selected for some purpose. These
questionnaires are rarely, if ever, mailed out at random. In most cases, you should expect that the
people conducting the study know who you are and will know if you return the questionnaire or
not. However, your identity is rarely, if ever, associated with the information you provide on the
questionnaire, so you should be reasonably confident that your responses are being treated
anonymously. Your identity is used to minimize mailing and process costs. How do the people
get your name? Typically they have obtained access in some fashion to a sample frame, which is
a directory of some sort. We are on a number of “junk” mailing lists (such as magazine
subscriptions) and even several legitimate mailing lists (such as the telephone director). An
very similar process to the way in which tele-interviewers get their lists (sample frames). What

a. the least expensive method of collecting questionnaire data
b. the easiest to initiate a simple random or a stratified random sampling
c. a great cover letter, explaining the intent of the questionnaire and the importance of the
respondent's responses, can vastly improve the response rate and minimize a. and b. in

a. easiest for the respondent to get out of responding
b. hence, even though we started with a nice simple random or stratified random sampling,
it is the hardest to maintain
c. takes the longest to collect the data

Computer Interviewing
Computer interviewing is a very recent phenomenon and probably will expand to fill in the void
that will be created when telephone interviewing is no longer effective and efficient. The notion
of computer interviewing is incredibly simply. As with mail and tele-interviewing the first step is
to get a sample frame from which to send out the questionnaires. The main advantage here is that
there are relatively very few costs; no postage, no long distance phone time, no paper costs, no
printing costs, etc. These costs are one of the seriously limiting issues in mail and tele-
interviewing whose direct impact is to reduce the size of our samples. Freed from such
constraints it is now a much easier process to send out the questionnaire to everyone in the
sample frame and thus try to conduct a census. What are some of the advantages and

40
a. Very inexpensive
b. Very fast
c. Very easy
d. Can obtain a very large sample size or even possibly a census

a. Very easy for a person to hit the delete button with little or no guilt [this will have a
tendency to create very similar biases to those discussed in the Census unit]
b. Hard to control for people responding more than once
c. Tremendously susceptible to the volunteer bias (the bias where people with strong
negative feelings [also with strong positive feelings but to a lesser degree) have a
greater likelihood of responding]

In general, the most positive responses are produced in face to face interviewing, the most
negative responses are produced in telephone interviewing, and mail interviewing is in-between.

If we put all of these pieces together we can begin to examine what is done in Polling.

Polling
Polling is another name for surveying and a poll is essentially a special case of a questionnaire.
Usually in a poll there is only one question of specific interest, which usually is concerned about
your feeling regarding a specific attitude, belief, position, endorsement, or opinion on a topic of
current interest. Often, polling is referred to as public opinion polling. Several polling companies
exist and every media service has a polling division. Public opinion polling may be the most
common place where everyday citizens come into contact with statistics. These polls are loaded
with statistical examples.

The most famous (infamous) example of polling is the Literary Digest Poll of 1936. It is
presented at the next unit on sampling.

41
Unit 3: Populations, Samples, and Sampling
Although sampling is a very small piece of the discipline of statistics, it does represent a rather
critical element. Recall that in statistics we are trying to make good observations, decisions,
and/or predictions from information we have collected. There are two essential elements to
generating good information. We dealt with the first element during Unit 2, the issues
surrounding measurement; having an appropriate level of measurement, and having
measurement systems that are both valid and reliable. Knowing that the information we are
going to be collecting will be good, enables us to move on to the second element, sampling.
Good sampling, the topic of this unit, will complete the preliminary phase of research. If we are
going to make good observations, decisions, and predictions, they must be based upon good
information. Good measurements on appropriately selected subjects will result in good
information. This leads us to a fairly decent definition of sampling, the science of selecting
subjects to be measured. Actually, we will get to a better definition in a moment, but this will
start us out for this unit.

Terms
Population - the complete collection of everything that we might we wish study

Sample Frame - the complete listing of the entire population (member by member list)

Sample Unit - any single member from the population

Sample - any collection of any number of members obtained from the population

Sample Size - the number of members from the population who are in the sample (usually
designed by the lower case letter "n")

Census - a sample that contains all members of the population

Statistic - any information obtained from the sample (s for statistic goes with s for sample)

Parameter - any information obtained from the population (p for parameter goes with p for
population)

Sampling - the science of selecting a subset (sample) of the population from all of the members
of the population

Estimate - a statistical word which actually is best translated as guess (e.g., we use
sample statistics to estimate population parameters or we use statistics to make guesses about
parameters)

42
Random - means two important things; first that every member of the population has an equal
chance of being selected into the sample; and second, it means without bias

Bias - systematic error or systematic mistakes

Simple Random Sampling - it is our goal to select members from the population in a manner
such that the following conditions are met; (1) each member of the population has an equal
probability of being selected and (2) that the members selected in this manner will represent an
unbiased reflection of the whole of the population.

Strata - a sub-population within the population

Stratified Random Sampling - a sample composed of a simple random sampling from each strata

Cluster - a division of the population based on geography or some notion of physical closeness

Cluster Sampling - a sampling method based on the selection of random clusters from the
population

Systematic Sampling - a sampling method based on taking a random sample from a population
that is somehow order (e.g., the telephone book)

Convenience Sampling - almost every other sampling method that doesn‟t start with a
population, with a sample frame, or have a random selection component

The following question will be used to illustrate the terms above.

“What is the average age of the governors in the United States?”

A population is the complete collection of everything that we might we wish study. From the
illustration above, the population consists of the 50 governors.

A sample frame is the complete listing of the entire population. From the illustration, the sample
frame would be the list of all 50 of the governors.

Side note 1: The population is the 50 governors. We don‟t have to know them in order to identify
the population. However, we have to know each one of them by name (either by their own name
or by the name of the state that they represent) in order to get the sample frame.

A sample unit is any single member from the population. For instance, a sample unit would be
any one of the governors.

Side note 2: Even though the word sample is used in the sample frame and sample unit, these
two terms apply to the population.

43
A sample is any collection of any number of members obtained from the population. Any one of
the governors would be a sample. Any 10 of the governors would be a sample. Any 20 of the
governors would be a sample. What would we have if we had a collection of governors which
included all 50? A census is a sample that contains all members of the population. Such a sample
is called a census and we rarely refer to this special case as a sample. Hence, when we use the
word sample we will be referring to a collection of members of the population, fewer in number
than everyone (a subset of the population).

The sample size is the number of members from the population who are in the sample.

Age is a variable. We could ask each governor her or his age and then calculate the average age
from these values. (This will be explained in detail in Unit 6).

If we ask all 50 governors their age and calculate the average, what do we know? First, we have
taken a census. Second, we will know the exact average age of this population. When we know a
piece of information (average age) about a population, this piece of information is called a
parameter.

If we ask 10 of the governors their age and calculate the average, what do we know? First, we
have taken a sample. Second, we will know the exact average age of this sample. We do not
know what the exact average age of the population is from the information in this sample.
However, we can guess (estimate) the average age of the members in the population from our
knowledge of the average age of the members in our sample. Will this guess by any good?
When we know a piece of information (average age) about a sample, this piece of information is
called a statistic.

Side note 3: A simple thing to remember is that statistics go with samples (s words go together)
and that parameters go with populations (p words go together).

Sampling is the science of selecting a subset of the population (sample) from all of the members
of the population. Thus, we see that sampling is the manner in which we obtain any subset of the
population. If we did a poor job, we are still sampling. If we did a good job, we are still
sampling. What is the difference between doing a good job and a poor job, and why should
we care?

The goal of sampling is to select a sample from the population that accurately reflects the
population. Ideally it would be exactly like the population in every regard with the exception that
only some of the members were collected rather than all of the members. Thus, a good sample is
one that reflects the population well and a poor sample is one that does not reflect the population
well.

44
Detailed Example
I am interested in measuring the heights of basketball players. Height can be very easily
measured by using a scale in the doctor‟s office. Measured in this manner, height is a ratio
variable, which is probably very valid and reliable. (Unit 2)

Now, on whom do I want to collect this information?        On basketball players.

Specified like this, the population is the complete set of all basketball players.

Is this population well defined or poorly defined?

The answer to this question comes down to two other questions. First, is it clear who would be
members of the population and who wouldn‟t? For instance, do we mean all basketball players
anywhere in the world, all basketball players only in the United States, only professional
basketball players, or ….? The definition of the population above is too vague and needs to be
seriously better specified. Second, how easy would it be to get a sample frame for this
population? If the answer is easy, then the population is almost certainly well defined. The
inability to get a sample frame is usually a strong indicator of an insufficiently specified
population.

Here are some alternative delineations of the population for this example. Evaluate each for
being either poorly defined or well defined?

1. The population is all basketball players in Laramie, Wyoming.           Good or Bad

Bad. By the first aspect of a good definition this definition appears to present a fairly clear and
well specified population. However, getting the sample frame will be impossible. Why? The
problem stems from the lack of specificity in the definition of who is a basketball player?
Anyone who plays at all, anyone who plays in a league, anyone who plays more than once a
week, …?

2. The population is all basketball players in Laramie, Wyoming who play more than once a

Better, but still bad. Although this definition is more specific than the one above, it would still
be very difficult to get a sample frame. Why? Who would know such information? Do kids play
basketball in school as part of their physical education class and hence would be defined as
basketball players? Do adults play once a week with their kids in the driveway and hence would
be defined as basketball players? Here we are confronted with a better definition of who a
basketball player might be, but it is still very vague. In addition, we are confronted with the
problem that no one really knows who belongs to some of these sub-groups and hence a sample
frame can not be constructed.

45
3. The population is all basketball players in Laramie, Wyoming who are over 18 years old who
belong to a recognized team.            Good or Bad

Good. This definition is much more precise. In addition, it would be a relatively simple task to
get a sample frame since any recognized team must have a list of its players. These lists could be
collected and when combined would form the sample frame.

To move the discussion along, let‟s assume that our population is all professional basketball
players in the world.

This is a fairly good definition of a population and we would be able to get a sample frame,
although this would probably take some time.

If we want to know the average height of all professional basketball players, then how are we
going to arrive at our answer? We could measure the height of every member in our sample
frame. For illustration, let‟s say that there are 5,700 professional basketball players around the
globe. Thus, if we wanted to know the average height of a professional basketball player, we
could measure the height of every one of the 5,700 professionals (census), add these heights
together, divide this sum by 5,700, and the result would be the answer to our question (this
average height would be the parameter).

How easy is it going to be to collect the height of every one of the 5,700 professional basketball
players? How much time will it take? How much money will it cost? And, horror or horrors,
what will happen if we can‟t find some of them, what if some of them refuse to be in our study,
what if …. (any host of problems that might result in us obtaining fewer than all of the members
of the population)? Without everyone we no longer have a census, but would now have a subset
of the population, thus, a sample.

For illustration, let‟s say that we obtained only 5,600 of the 5,700. Now when we take the
average height of these 5,600 professionals, will we have the true average height? The true
average height being the parameter we discussed above. While we will probably be close, we
will no longer have the true answer, we will now have an estimate of it. [Estimate is a
statistical word which actually is best translated as guess.] Whenever, we don‟t have the
complete information (census), we have to guess what the true answer (parameter) might be
from the information we have collected with our sample. As presented earlier, the information
that comes from a sample is called a statistic. Here we would use the average height (statistic)
obtained from the 5,600 basketball players (sample) to estimate the true average height
(parameter) of the 5,700 professional basketball players (population). Good sampling enables us
to make good estimates about parameters from our samples, and poor sampling reduces us to
making poor estimates about parameters. While any sample we select will produce estimates of
parameters, only good samples will enable us to make good observations, decisions, and
predictions.

This leads us to a more complete definition of sampling. Sampling is the science of selecting
good samples, those that are representative of their parent populations. Good samples are

46
necessary in order to make good estimates about population parameters, to make good decisions,
and to make good predictions.

Since it will be difficult (too much time, too much cost, too many obstacles to measuring
everyone) to obtain a census, then might it be possible to collect some “small” subset of the
population (a sample) and still do a very good job of knowing the answer to our question (good
estimate of our parameter)? The answer here is yes, and is the topic of the next section.

47
Sampling Methods
There are many ways in which to select a subset of members from the population. With the
example of the basketball players, any collection of the 5,700 members (which doesn‟t include
all of them) is a sample.

If I collected information from the first 50 professional basketball players I met, would this be a
sample? Yes. Would it be a good sample (representative of the whole population)? Probably not.

If I went to Denver and measured the height of all the Denver Nuggets, would this be a sample?
Yes. Would it be a good sample (representative of the whole population)? Probably not.

I have a measuring rod that is 2 meters long and has marks every 1 millimeter. If I went to
Denver and measured the height of all the Denver Nuggets, would this be a sample? Yes. Any
problems here other than expressed immediately above? Yes. Although the measuring rod that I
have indicated is certainly valid and reliable, and would produce very nice ratio data, it is
inappropriate for this situation. Why? Because most professional basketball players will be taller
than 2 meters and my measuring device will be unable to measure their heights. [This really
doesn‟t belong in this sampling section, but it is where the measurement information from Unit 2
applies.]

How many possible samples could be selected from my population of 5,700 professional
basketball players? Almost literally an infinite number. How many of them would be good?
Almost literally none.

Selecting a sample is an incredibly simple task. However, the selection of good samples is a
daunting task. Can the field of statistics assist you in the collection of a good sample? Yes.
Recall the definition of sampling. Sampling is the science of selecting good samples, those
that are representative of their parent populations. There is a science component. The
material presented throughout the rest of this section provides the foundation of this science.
However, in many situations populations are defined in such a vague manner as to make
sampling impossible. In these situations, a researcher should first try to better define (more
clearly and more narrowly) the population. This will help.

Let‟s begin by examining the word random.

Random means without definite aim, purpose, or reason. Taken from the American College
Dictionary.

Without intent.

More statistically, random means two important things. First, that every member of the
population has an equal chance of being selected into the sample; and second, it means
without bias.

48
While these definitions are probably next to worthless unless you already know the concept, they
do serve to connect us to the concept that something done randomly should result in an absence
of bias. What is bias?

Bias can be thought of as systematic error. Errors, of course, are mistakes and thus systematic
error is systematic mistakes. Let‟s look at an example associated with our on-going basketball
problem. If I were to use a 3 meter steel rod (with marks on it to the nearest millimeter) to
measure all of our professional basketball players, this rod should provide us with an extremely
accurate, valid, and reliable measurement system. Any errors in measurement we are probably
making will be attributable to our rounding to the nearest mark when using the rod. Since our
errors are probably equally likely to be slightly above as below the true height of any player, then
our errors should be unbiased (without systematic error).

However, if our measuring rod had the first 20mm cut off from it, so that the rod started at 20
mm and went to 3000mm, then would this rod still be accurate, valid, and reliable? Don‟t laugh
(too hard) this is an actual situation, which was not realized until after all of the data was
collected. Does this new “3 meter” rod (20mm short) still measure to the nearest mm? Yes.
Hence, it is just as accurate as the true 3-meter rod. Does this new “3 meter” rod still measure
height (even though all of the heights will be 20mm too tall? Yes. Hence, it is still valid. If two
of us were to measure the same player would we get the same answer or very, very close to the
same answer? Yes. Hence, this new “3 meter” rod is reliable. So if the new “3 meter” rod is still
accurate, valid, and reliable, what is the problem?

Obviously it records heights at a consistent 20mm too tall. Thus, all of our measurements are in
error. But since every measurement is too tall, they are systematically wrong. They are wrong by
20mm. They are biased. In this case, we know the actual amount of the bias, 20mm. If the
measurements are not systematically too long or too short, then they are said to be unbiased. Of
course, one of the primary aims of psychometrics is to produce measurements, which are
unbiased. And one of the aims of sampling is to produce samples which are unbiased reflections
of their parent populations, and one of the aims of statistics is to produce statistics which are
unbiased estimates of parameters.

This illustration of bias is in terms of measurement. However, the same exact principle applies in
sampling. If our method of selecting a sample was systematically in error, then our resulting
sample would be biased.

Here is a sampling illustration. I am going to select a sample of 25 professional basketball
players from my population.

Sampling Method 1: I am going to number each one of the 5,700 members of my population
from 1 to 5,700. Next I am going to write the numbers 1 to 5,700 on one inch square pieces of
paper. Next I am going to draw 25 pieces of paper from this hat and these players will be the 25
in my sample.

49
Sampling Method 2: There are 2,350 basketball players who play the guard position. I am going
to number each one of the 2,350 members of this sub-population from 1 to 2,350. Next I am
going to write the numbers 1 to 2,350 on one inch square pieces of paper. Next I am going to
draw 25 pieces of paper from this hat and these players will be the 25 in my sample.

Which one of these two methods is the best? Which one is biased? Which one is random?

Which one is best? Sampling method 1 is best for assessing the entire population. Why? Because
it gives each and every member of the population an equal chance of being in the sample.

Which one is biased? Sampling method 2 is biased for assessing the entire population. Why?
Because sampling method 2 limits itself to only considering the sub-population of basketball
players who play guard. In general, these are the shortest players. Hence, this sampling method is
biased (systematic error) in that we are systematically selecting only the shorter players. As a
consequence, our estimate of the height of basketball players will be too short (biased).

Side Note: If the sub-population were our actual population, then sampling method 2 would be
better and it would be unbiased (without systematic error). In this situation sampling method 1
would be inappropriate since it would allow us to select members into our sample that don‟t
belong to the population of interest!

Which one is random? Sampling method 1 is certainly random. Is sampling method 2 random?
NO! Why? Because sampling method 2 does not provide an equal opportunity to all members of
the population to be in the sample, it only provides this opportunity to the members of the sub-
population.

Here are some of the more fundamental sampling methods.

50
Simple Random Sampling
In simple random sampling, it is our goal to select members from the population in a manner
such that the following conditions are met.

1. Each member of the population has an equal probability of being selected.
2. That the members selected in this manner will represent an unbiased reflection of the whole
of the population.

Ex: From our on-going basketball example, how might we randomly select 125 subjects from
our population of 5,700 players?

Illustration 1: Given the sample frame (the complete listing of each member of the
population), we could assign each individual a number from 1 to 5,700 (the total number of
members in the population). Then we could enter a random number table and select 125
numbers. We could then go out and measure the 125 players corresponding to these randomly
selected numbers. The average height from these 125 players (our statistic) should then give us
an unbiased estimate of the true height of the 5,700 members of our population (the parameter).
The random numbers should result in a randomly selected sample, which should produce an
unbiased result.

Illustration 2: We could write down each name from the sample frame on a separate one inch
square piece of paper, and then place all 5,700 cards in a hat. We could then select out 125 cards
from this box. [This is Sampling Method 1 as presented earlier.]

As can be seen from these two illustrations, simple random sampling is simple. However, does it
work?

Simple random sampling is particularly effective when the members of the population are fairly
homogeneous. In this context, homogeneous means alike. Would our population of all
professional basketball players be fairly homogeneous in the context of what we are trying to
measure (height)? Are European players of relatively similar heights to American players? Are
female professionals of similar heights to males professionals? Are guards of similar heights to
centers? There are potentially a lot of subgroups (sub-populations) within our population that are
quite different from one another. In our sample of 125 players (out of the 5,700 players possible),
would it be possible to have selected only centers? Yes. [Not necessarily very likely, but
possible.] Would it be possible to have only females? Yes. Obviously if we measured only
centers, the statistic from our sample would be an over-estimate of the true height (biased). And
if we measured only females, our statistic would probably be an under-estimate (biased). Thus, is
there a better way to select a sample from a population that is diverse? Yes. [Next Section]

51
Stratified Random Sampling
More complicated (diverse) populations are almost universally composed of an extensive array
of sub-populations. Each of these sub-populations is fairly homogeneous in itself, but they are
often quite distinct from one another. Consequently, this situation makes the collection of a
single global sample exceedingly difficult. The answer is to address each of the sub-populations
separately. In sampling, each of these sub-populations is called a strata. In the beginning of
collecting a stratified random sample, we start by placing all members of the sample frame
into each one of the strata. If we are stratifying on the basis of gender, then the entire sample
frame is split into two sub-sample frames. The first containing the list of all females (female
strata) and the second containing the list of all males (male strata). A random sample is then
selected from each of these sub-sample frames (strata) and then combined to form a composite
sample. Sampling in this manner insures that each of the strata are included in the final sample,
thus enabling the global sample to be more representative of the whole than would be possible
using simple random sampling. Sampling in this manner is called stratified random sampling.
The key to stratified random sampling is that the strata need to be proportionately sampled. Thus,
if 20% of the names on our sample frame are females and we are stratifying on the basis of
gender, then our sample of 125 players should include 25 females (20% of the 125) and 100
males (80% of the 125). Certainly, stratified sampling is somewhat more complicated than
simple random sampling, thus why would anyone do it? Remember that the main goal of
sampling is to select a representative sample from the population. Of course, representative
means that our sample should be unbiased. The advantage of stratified random sampling is that it
produces a more representative sample than simple random sampling for any size sample. More
specifically, the 125 players selected via stratified random sampling will be more representative
of the population than 125 players selected via simple random sampling. Thus, stratified random
sampling is more efficient.

52
Cluster Sampling
Cluster Sampling on the surface appears to be quite similar to stratified random sampling, but in
actuality is considerably different. To begin, in cluster sampling the population is split into
clusters rather than strata. What is a cluster? A cluster is a division of the population based on
geography or some notion of physical closeness. For instance, if our population was the total
residents in your community, then clusters could be established for each block of each street in
your community. A strata is also a division of the population. However, a strata is a division
based on some shared characteristic (something everyone in the population has, such as gender),
which is supposed to result in sub-populations which are more homogeneous in themselves than
is the total population. (See the Stratified Random Sampling section which uses gender as a
stratification) Having the strata, we then sample randomly from each strata. In contrast, clusters
share only physical proximity. In theory, each cluster should be a micro-reflection of the total
population. The first step in cluster sampling is to divide the sample frame (population) into
clusters. The second step is to randomly selected however many clusters we wish to sample. The
last step is to collect everyone (a census) of all of the members within each selected cluster.

Ex: Within the context of the basketball problem, a natural set of clusters is team association.
Thus each team represents one cluster. For illustration, let‟s assume that our 5,700 players play
on a total of 570 teams. The sample frame now would contain two pieces of information, the
name of every player and the name of her/his team. From this information, a second sample
frame could be constructed, based on the clusters. In this sample frame, each team would be
listed along with its members. The original sample frame would hold the names of 5,700 players.
The second sample frame would hold the names of the 570 teams along with the names of the
5,700 players. If we wanted to select a sample of 12 of the teams (clusters), we could assign the
numbers 1 to 570 to the teams, select 12 random numbers between 1 and 570, and go to each of
the selected teams (clusters) to collect the heights of its players. We would then measure all of
the players from each of the selected clusters.

Note the differences between stratified and cluster sampling.

1. Strata are divisions based on some logical shared characteristic. Each strata should be more
homogeneous than the total population.
2. Clusters are divisions of the population based on some notion of closeness, proximity, or
geography. Each cluster should be as homogeneous or as heterogeneous as the total
population.
3. In stratified random sampling, the members of each strata are randomly sampled.
4. In cluster sampling, the clusters are sampled randomly and each member of the selected
cluster is measured.

53
Systematic Sampling
This type of sampling is often considered as special case of obtaining a simple random sample.
In systematic sampling the population is divided into groups of members based on nothing more
than being next to one another on the sample frame. For instance, if our sample frame contained
250 names and we wanted to select a sample of 25 members, then we would divide the entire
sample frame into 25 pieces (the number of pieces equals the size of the sample that you wish to
select). This would produce a sample frame composed of 25 smaller pieces composed of 10
members each (250 members/25 pieces = 10 members each). If we identified each member of the
sample frame with the consecutive numbers 1 to 250, then the first piece would contain the
members 1 though 10, the second piece the members 11 through 20, the third piece the members
21 though 30, etc. Thus, the first step in collecting a systematic sample is to divide the sample
frame in the manner just described. The second step, is to select a number randomly between 1
and 10 (select a random member of the first piece). For illustration, let‟s say that the number
selected was 6. Then the 25 members of the sample are every 10th member, selected
systematically, after and including the start point. In this example, the sample would contain the
members 6, 16, 26, 36, 46, etc. Note that one member from each of the 25 pieces would be
selected.

54
Convenience Sampling
The four sampling methods above all conduct sampling in some systematic fashion. They all
require that a population be specified in advance, that a sample frame be known in some form or
another, and that every member of the population (sample unit) be given an equal chance to be in
the sample (random). Almost every other sampling method that doesn‟t start with a population
and with a sample frame, or have a random selection component is by definition called
convenience sampling. The samples collected in this manner are rarely very good, which means
that they are poor representations of the population they supposedly came from and are
exceedingly biased. What does this mean? Unfortunately, what this means is that the information
obtained from convenience samples is essentially worthless to answer any questions that we
might be interested in. The information we obtain from cluster, simple random, systematic, and
stratified random sampling methods are typically relatable to the population; however, the
information obtained from convenience sampling is typically relevant only to the sample we
have collected. Since there are apparently no benefits to convenience sampling, why would
anyone use this method?

Selecting a good sample that is representative of the population can be a time consuming,
expensive, and challenging task; and it is often very tempting to collect data in a manner that
tries to get around these difficulties and is more convenient for the researcher, thus the name.
Convenience sampling is also known as haphazard sampling, which is probably more
appropriately descriptive, but less professional sounding. Of the five methods presented in this
unit, convenience sampling is by far the worst method to collect a sample (even much worse than
the graph depicts below), and unfortunately it is the most prevalent method employed.

If we were to evaluate these sampling methods, based on quality of their estimates, the following
sequence from poor to great might result.

Little Effort ------------------------------------------------------------------------------------- Much Effort
Little Time --------------------------------------------------------------------------------------- Much Time
Little Expense -------------------------------------------------------------------------------- Much Expense
Poor Quality ----------------------------------------------------------------------------------- Great Quality

55
Quiz
In my department there are 50 undergraduate majors (this is the population); 25 of these students
are female and 25 of them are male. In addition, I know that by class rank 20 are sophomores, 20
are juniors, and 10 are seniors.

1. What is the sample frame for this population?

2. What is the sample unit?

3. Is any one of these students selected in any fashion a sample?

4. In a simple random sample of 20 of my undergraduate majors, how many should be female?

5. In a simple random sample of 20 of my undergraduate majors, how many will be female?

6. In a stratified random sample of 20 of my undergraduate majors based on gender, how many
will be female?

7. In a stratified random sample of 20 of my undergraduate majors based on class rank, how
many will be female?

56
1. What is the sample frame for this population? Answer: the names of all 50 of my

2. What is the sample unit? Answer: any one of these students.

3. Is any one of these students selected in any fashion a sample? Answer: yes. Any subset of the
50 students is a sample. It might not be a good sample, but it will be a sample.

4. In a simple random sample of 20 of my undergraduate majors, how many should be female?
Answer: 10. Since 50% of my population is female (25 of the 50), then it is quite reasonable and
logical to expect that 50% of the sample should be female. 50% of 20 is 10.

5. In a simple random sample of 20 of my undergraduate majors, how many will be female?
Answer: unknown. Even though it is logical to expect that 50% of the sample should be female
as explained in question 4; it is not possible to know for certain (will) how many will be female.
Thus, it is not possible to determine the answer.

6. In a stratified random sample of 20 of my undergraduate majors based on gender, how many
will be female? Answer: 10. In this case we know the precise answer (will) since in a stratified
random sample we take one random sample for each strata in accordance with the proportions
expressed in the population. Since the stratification is based on gender, then we will take one
random sample of females and a second random sample of males. Since 50% of the population is
female, then 50% of the final sample must be female; the proportions in the sample must match
the proportion in the population according to gender (the basis of our stratification).

7. In a stratified random sample of 20 of my undergraduate majors based on class rank, how
many will be female? Answer: unknown. We know for certain that the sample must contain 8
sophomores (40% of the sample), 8 juniors (40% of the sample), and 4 seniors (20%) of the
sample since these are the proportions of sophomores, juniors, and seniors in the population. But
we do not know for certain how many will be female since our stratification is not based on
gender. We should be able to guess that 50% of the sample should be female, since 50% of the
population is female, but we will not know the exact number for certain (will).

57
Problems in Sampling
Any problem encountered in the collection of the sample, potentially and probably introduces
bias. And of course, bias of any sort reduces the ability of our statistics to estimate. What are
some of these problems?

1. Problem 1 – The sample frame is incomplete. If you don‟t have a chance to get everyone from
the population, then everyone doesn‟t have an equal chance of being in the sample and thus the
sample can be tainted. This problem is illustrated in the "Infamous Literary Digest Poll of 1936,”
which appears below. This example shows how sampling is a very important aspect of statistics.
Additionally it points out how looking for patterns and making predictions are highly dependent
upon the quality of the information contained in the information (data). Gallup was the star of the
1936 election and his company still is one of the leaders in survey sampling. For lots of
interesting and useful information, check out the Gallup WEB site at http://www.gallup.com.

The Infamous Literary Digest Poll story below is arguably the most famous in the entire area of
sampling. It is taken from Utts, Seeing Through Statistics (1999, 2nd Edition).

2. Problem 2 – Volunteer bias. Volunteer bias is the situation where people with stronger
feelings are more inclined to participate in a study than those who don‟t feel so strongly.

3. Problem 3 – The subjects identified for selection by our random process, could not be
measured. They were not home, they were not willing to participate, etc.

4. Problem 4 – The random process was not truly random. For an example of this form of bias
look at the 1970 Draft Lottery example following the Infamous Literary Digest Poll.

58
The Infamous Literary Digest Poll of 1936
Before the election of 1936, a contest between Democratic incumbent Franklin Delano Roosevelt
and Republican Alf Landon, the magazine Literary Digest had been extremely successful in
predicting the results in U. S. presidential elections. But 1936 turned out to be the year of their
downfall, when they predicted a 3-to-2 victory for Landon. To add insult to injury, young
pollster George Gallup, who had just founded the American Institute of Public Opinion in 1935,
not only correctly predicted Roosevelt as the winner of the election, he also predicted that the
Literary Digest would get it wrong. He did this before they even conducted their poll. And
Gallup surveyed only 50,000 people, whereas the Literary Digest sent questionnaires to 10
million people.

The Literary Digest made two classic mistakes. First, the lists of people to whom they mailed the
10 million questionnaires were taken from magazine subscribers, car owners, telephone
directories, and, in just a few cases, lists of registered voters. In 1936, those who owned
telephones or cars, or subscribed to magazines, were more likely to be wealthy individuals who
were not happy with the Democratic incumbent (Roosevelt).

Despite what many accounts of this famous story conclude, the bias produced by the more
affluent list was not likely to have been as severe as the second problem (Bryson, 1976). The
main problem was volunteer response. The magazine received 2.3 million responses, a response
rate of only 23% (2,300,000/10,000,000 = .23). Those who felt strongly about the outcome of the
election were most likely to respond. And that included a majority of those who wanted a
change, the Landon supporters. Those who were happy with the incumbent were less likely to
bother to respond. [This is a very common problem in all mail interviews, and is not nearly as
common in face to face and telephone interviews.]

Gallup, however, knew the value of random sampling. He was able not only to predict the
election, but to predict the results of the Literary Digest poll within 1%. How did he do this?
According to Freedman and colleagues (1991, p.308), “he just chose 3000 people at random
from the same lists the Digest was going to use, and mailed them all a postcard asking them how
they planned to vote.”

This example illustrates the beauty of random sampling and the idiocy of trying to base
conclusions on nonrandom and biased samples. The Literary Digest went bankrupt the following
year, and so never had a chance to revise its methods. The organization founded by George
Gallup has flourished, although not without making a few blunders of its own. [This was taken
from Freedman, Pisani, Purves, and Adhikari, 1991, p. 307, Statistics. 2nd Edition, New York:
W.W. Norton]

As a note, I presented a new method of conducting surveys and asking questions to Gallup in
1980. The organization was not interested in these new ideas. However, in a conference in
Denver in 1980 a month before the election I predicted a landslide Reagan victory. At the
National Press Club the week before the election the Gallup organized indicated that the race
was too close to call!

59
Draft Lottery Example from Moore "Statistics:
Concepts and Controversies" (page 63-64)
Physical mixing (such as mixing names on cards in a hat) looks random, but … it is devilishly
hard to achieve a mixing that is random. There is no better illustration of this than the first draft
lottery, held in 1970. Because a simple random sample (SRS) of all eligible men would be
hopelessly awkward, the draft lottery aimed to select birth dates in a random order. Men born on
the date chosen first would be drafted first, then those born on the date number 2, and so on.
Because all men ages 19 to 25 were included in the first lottery, 366 birth dates were to be
drawn. Here is a news account of the 1970 draft lottery:

They started out with 366 cylindrical capsules, one and half inches long and one inch in
diameter. The caps at the end were round. The men counted out 31 capsules and inserted in them
slips of paper with January dates. The January capsules were then placed in a large square
wooden box and pushed to one side with a cardboard divider, leaving part of the box empty. The
29 February capsules were then poured into the empty portion of the box, counted again, and
then scraped with the divider into the January capsules. Thus, according to Captain Pascoe, the
January and February capsules were thoroughly mixed. The same process was followed with
each subsequent month, counting the capsules into the empty side of the box and then pushing
them with the divider into the capsules of the previous months. Thus, the January capsules were
mixed with the other capsules 11 times, the February capsules 10 times and so on with the
November capsules intermingled with the others only twice and the December ones only once.
The box was then shut, and Colonel Fox shook it several times. He then carried it up three flights
of stairs, a process that Captain Pascoe says further mixed the capsules. The box was carried
down the three flights shortly before the drawing began. In public view, the capsules were
poured from the black box into the two-foot deep bowl. From the New York Times, January 4,
1970.

You can guess what happened. Dates in January tended to be on the bottom, while birth dates in
December were put in last and tended to be on the top. News reporters noticed at once that men
born later in the year seemed to receive lower draft numbers, and statisticians soon showed
that this trend was so strong that it would occur less than once in a thousand years of truly
random lotteries!

As an aside, I was one of the men subjected to this lottery. I had a birth date in August and
received a draft number of 48, which led to two tours of duty in Vietnam during the war. We
certainly noticed the bias then and for several years to come.

60
Unit 4: Summaries and Summary Tables
Now that the foundation has been laid (measurement and sampling), we should have access to
good data. This means that we should be able to start asking good questions, collecting good
information, and getting good answers to these questions. Throughout the remainder of the
course, we shall get progressively more sophisticated in the type of question we can consider.
But at this point, the type of question we are going to consider will be rather simple. In fact, we

Form 1: what is the nature of a characteristic of something we would like to know about?

From our last couple of units, we can transform this question somewhat.

Form 1: what is the nature of a characteristic of something about our population?

Form 1: what is the value of a parameter of interest from our population?

Of course, in statistics we can not answer this question, but we can approximate it (estimate it)
with the following question,

Form 1: what is the value of a statistic of interest from our sample that will tell us (estimate) the

Terms
An enumeration is the complete listing of the responses to our variable for the entire sample.

A summary statistic is a piece of information obtained from our sample that in some fashion
condenses information in our enumeration (our sample) into a single number.

A frequency (f) is the count of how often a particular response occurs within our sample.
Frequency is how often something happens.

A relative frequency (rf) is the proportion of how often a particular response occurs within our
sample. Mathematically rf = f/n (relative frequency = frequency divided by the size of our
sample). Relative frequency is the scaled frequency of how often something happens (scaled to
be between 0 and 1).

A percentage (%) is the scaled occurence100 *rf (100 times the rf). Percentage is the scaled
frequency of how often something happens (scaled to be between 0 and 100).

A table is one method of presenting summary statistics. Other common methods of presenting
summary statistics are graphical summaries (Unit 4) and numerical summaries (Units 6 and 7).

61
Example
Form 1 Question: What is the most popular network on TV over the past 10 years? (The data for
this problem is fictitious.)

Recall that in form 1 questions there is one variable of interest. What is that variable in this
question? Popularity of networks.

How are we going to measure popularity? There are many ways to do this, but the way I have
chosen is as follows (presented along with the sampling discussion). I will consider only the four
“major” networks (ABC, CBS, FOX, and NBC). Each week the newspaper publishes the top TV
show (highest viewer rating) with its respective network.

Population: the 520 highest rated weekly shows over the past 10 years.

Sample Frame: the complete listing of these 520 shows.

Sample Unit: any one of these 520 listed shows.

Sampling Method: I will list these shows from 1 to 520, access the computer for 50 random
numbers between 1 and 520, and then note the network associated with each of the 50 selected
shows. Hence, we will be conducting a simple random sample. This sample should be a fairly
decent representation of the population.

Variable: most popular network. Possible values: ABC, CBS, FOX, NBC.

Level of Measurement: categorical

Reliable: extremely. Any two of us would probably have no problems identifying a particular
show with the very same network.

Valid: does this method of ascertaining popularity really measure popularity? This is probably
debatable. Hence, it is only somewhat valid.

Sample size: 50 = n

Enumeration
Here are the results of my random selection of 50 networks.

CBS CBS ABC CBS NBC NBC ABC ABC CBS CBS ABC ABC CBS CBS FOX ABC CBS
CBS ABC CBS NBC ABC FOX NBC NBC NBC NBC NBC NBC NBC NBC FOX NBC NBC
ABC ABC FOX ABC NBC FOX ABC FOX FOX NBC FOX ABC CBS NBC ABC CBS

62
A complete listing of 50 sample values is not terribly difficult to present. However, if you look at
this listing, it is still fairly challenging to ascertain the answer to several simple questions from it.
Such as

1. How many times is ABC listed? CBS? FOX? NBC?
2. What is the answer to our primary question? Of the four networks, which is listed most?

Summary Statistics

Rather than the enumeration, the complete listing of the sample, we could count up how many
times each possible value of the variable is present in the sample. This type of counting is called
a frequency count or frequency. Thus we could ask, what is the frequency of ABC in our
sample? What is the frequency of CBS in our sample? FOX? NBC? The results might look like

ABC occurs 14 times (f = 14 or fABC = 14))
CBS occurs 12 times (f = 12 or fCBS = 12)
FOX occurs 8 times (f = 8 or fFOX = 8)
NBC occurs 16 times (f = 16 or fNBC = 16)

Side Note 1: If we sum up all of the frequencies for any problem, what will they always add to?

The sample size = n

If we would prefer, we might wish to summarize our sample from one of the two scaled
perspectives, say as relative frequency. Recall, rf = f/n. Here n = 50.

ABC relative frequency = rf = 14/50 = .28
CBS relative frequency = rf = 12/50 = .24
FOX relative frequency = rf = 8/50 = .16
NBC relative frequency = rf = 16/50 = .32

Side Note 2:    rf(ABC) + rf(CBS) + rf(FOX) + rf(NBC) = .28 + .24 + .16 + .32 = 1.00

If we sum up all of the relative frequencies for any problem, will they always sum to 1?
YES

Lastly, if we wish to summarize our sample in terms of percentage, what would be the results?
Recall % = (100)(rf).

ABC percent = (100)(.28) = 28%
CBS percent = (100)(.24) = 24%
FOX percent = (100)(.16) = 16%
NBC percent = (100)(.32) = 32%

63
Side Note 3: %(ABC) + %(CBS) + %(FOX) + %(NBC) = 28% + 24% + 16% + 32% = 100%

Of course, if we sum up all of the percentages for any problem, they will always sum to 100%.

Is it possible in theory for the sum of the relative frequencies to not equal 1? No
Is it possible in practice for the sum of the relative frequencies to not equal 1? Yes

This occurs due to round-off errors. The same is true for percentage.

Table Summarization
The summaries above are certainly a better, shorter way of presenting the results than the entire
enumeration. However, they still lack organization. We use a table to provide this organization.
In effect, a table summary is simply an organized way to present the summary statistics. If we
are organizing the frequencies, then the logical name for the summary table is the frequency
table. If we are organizing the relative frequencies, then it would be called the relative frequency
table, and similarly we could arrive at the percentage table.

Frequency Table

Network                f
ABC                   14
CBS                  12
FOX                   8
NBC                  16
Total                 50

From this frequency table, it is very easy to answer the two questions that we initially proposed.

1. How frequently does each network appear in our sample?
ABC (14 times), CBS (12 times), FOX (8 times), and NBC (16 times)

2. Which network is listed most frequently? NBC at 16 times.

The answer to the second question is easy to relate back to our population of interest. Over the
past 10 years, which network has been most popular? NBC, because it occurs most in our
sample.

However, the answers to our first question are much more difficult to relate back to our
population. For instance, if I were to tell you that ABC has a 14 frequency of occurrence in our
sample, what does this tell you? Is 14 big? Is 14 small? It is not possible to answer this without
including some additional information. What information? The size of the sample.

64
In such a situation as this, it is usually more meaningful to convert frequency of occurrence in
our sample to relative frequency or percentage of occurrence in our sample. By bringing in this
missing piece of information (n), the summary information is immediately more meaningful. For
instance, if I now tell you that ABC occurs 28% of the time in our sample it gives you some
sense of how often I should expect this result in the population as well. Thus, while the
frequency table, relative frequency table, and percentage table all provide the same information,
the relative frequency and especially the percentage table provide us with a particular
organization that more easily communicates beyond the context of our sample. In most non-
statistical discussions, percentage is used. In most statistical discussions, relative frequency is
used.

Relative Frequency Table

Network                rf
ABC                   .28
CBS                  .24
FOX                 .16
NBC                 .32
Total               1.00

Percentage Table

Network               %
ABC                  28
CBS                  24
FOX                  16
NBC                  32
Total               100

65
Quiz
At the University of Wyoming (Laramie Campus) there are approximately 10,000 students of
whom 5,500 (55%) are females and 4,500 (45%) are males. I would like to draw a sample of 120
students, appropriately balanced for gender, and ask them the following question.

“Of the following four foods, which is your favorite?” Salad Tacos     Hamburgers Pizza

In order to collect the 120 students for the sample the following procedure was done. A list was
made of all 5,500 females and a separate list was made of all 4,500 males. From the female list
66 were chosen at random and from the male list 54 were chosen at random. These students were
then combined to form the one sample of 120 students.

Here are the results: Salad (f=30), Tacos (f=30), Hamburgers (f=26), Pizza (f=34)

For the problem above,

1. What is population, sample frame, sample unit, sampling method, variable, level of
measurement, and sample size?

2. What is the enumeration?

3. What is the frequency table?

4. What is the relative frequency table?

5. What is the percentage table?

66

Question 1

Population – 10,000 students at the University of Wyoming (Laramie Campus)

Sample Frame – The list of all 10,000 students that was broken into two pieces; the female
sample frame that listed all 5,500 female students and the male sample frame that listed all 4,500
male students.

Sample Unit – any one of the students at the University of Wyoming (Laramie Campus)

Sampling Method – stratified random sampling

Variable – favorite food

Level of measurement – categorical

Sample size – 120

Question 2

Tacos Tacos Tacos Tacos Tacos Tacos Tacos Tacos Tacos Tacos
Tacos Tacos Tacos Tacos Tacos Tacos Tacos Tacos Tacos Tacos
Tacos Tacos Tacos Tacos Tacos Tacos Tacos Tacos Tacos Tacos
Burgers Burgers Burgers Burgers Burgers Burgers Burgers Burgers Burgers Burgers
Burgers Burgers Burgers Burgers Burgers Burgers Burgers Burgers Burgers Burgers
Burgers Burgers Burgers Burgers Burgers Burgers
Pizza Pizza Pizza Pizza Pizza Pizza Pizza Pizza Pizza Pizza
Pizza Pizza Pizza Pizza Pizza Pizza Pizza Pizza Pizza Pizza
Pizza Pizza Pizza Pizza Pizza Pizza Pizza Pizza Pizza Pizza
Pizza Pizza Pizza Pizza

67
Question 3

Frequency Table

Food      f
Tacos    30
Burgers 26
Pizza    34
Total   120

Question 4

Relative Frequency Table

Food       rf
Tacos    .250
Burgers .217
Pizza    .283
Total   1.000

Question 5

Percentage Table

Food      %
Tacos    25.0
Burgers 21.7
Pizza    28.3
Total   100.0

68
Unit 5: Graphs
Constructing a graphical summary is going to be more difficult than constructing a table
summary. In fact, a graphical summary can only be drawn after we have constructed a table
summary. Thus, why would anyone want to do a graphical summary?

A picture is worth a thousand words. If done appropriately, a graph is an easier medium to
communicate more completely than in words. Consequently, if done inappropriately, a lie is
easier to communicate in a graph than in words.

A table summary is excellent for communicating what is happening at a particular level of the
variable as seen through the frequency, relative frequency, or percent. In contrast, a graphical
summary is excellent for communicating a comparison of one level of the variable with
another level of the variable. This comparison is usually perceived psychologically in a
relative sense, even if the graph is constructed from the frequency (f) data.

Terms
Horizontal Axis - the axis in which the variable responses are displayed

Vertical Axis - the axis in which the frequencies, relative frequencies, or percentages of the
variable responses are displayed

Bar Graph - the graph in which the results of a categorical or ordinal variable are displayed as
bars in a horizontal/vertical context

Pie Chart - the graph in which the results of a categorical or ordinal variable are displayed in a
circle

Pictogram - the graph in which the results of a categorical or ordinal variable are displayed as
pictures

Histogram - the graph in which the results of an ordinal or ratio variable are displayed as bars in
a horizontal/vertical context

Line Chart - the graph in which the results of an ordinal or ratio variable are displayed as a line
in a horizontal/vertical context

Time Plot – a variation of the Line Chart in which the horizontal axis is time and the vertical
axis represents the variable of interest

69
Graphs for Categorical/Ordinal Variables
In this section, I will use the network data from the previous unit. As a reminder here is the
relative frequency table for these data.

Relative Frequency Table

Network               rf
ABC                 .28
CBS                  .24
FOX                 .16
NBC                 .32
Total               1.00

There are three basic ways of graphing the results of a categorical variable.

1. Bar Graph (also referred to as a Bar Chart) is used for f, %, and rf data
2. Pie Chart is used for % and rf data
3. Pictogram is used for f, %, and rf data

The graphical display of categorical data is extremely popular in newspapers and magazines. The
newspaper USA Today has numerous examples of bar graphs, pie charts, and pictograms in
nearly every issue. See for yourself at http://www.USAToday.com.

70
Bar Graph
In a bar graph, we reflect each value of the categorical variable in f, %, or rf terms as the height
of a bar above the value. In the horizontal axis of a bar graph we present the values of the
categorical or ordinal variable. In the vertical axis of a bar graph we present the f, %, or rf
(depending on which form of the data we wish to consider). For illustration purposes only, we
will use the rf data for all of the graphs in this section.

Thus, for the network example we have been using, our bar graph will have the values ABC,
CBS, FOX, and NBC in the horizontal axis; and the corresponding rf (relative frequency) values
in the vertical axis.

It is easy to see from this graph, that NBC was rated as the top network more times than any
other network in our sample (the bar is the tallest). It is also easy to see that NBC is twice as
popular as FOX. In addition, it is clear that the popularity of CBS is in about the middle between
NBC and FOX.

71
Pie Chart
Although the bar graph is an efficient means of summarizing data, the pie chart is also a very
popular means of presenting a result graphically. In the pie chart, a circle is used to display the
relative proportions of the levels of the categorical or ordinal variable through slices of the pie.

Each pie slice in degrees = 360o times rf. [The superscript o represents degrees] The “slice” table
can easily be obtained from the relative frequency table.

Network          rf       Pie slices in degrees
ABC            .28      (360) (.28) = 100.8o
CBS             .24     (360) (.24) = 86.4o
FOX            .16       (360) (.16) = 57.6o
NBC            .32       (360) (.32) = 115.2o
Total         1.00                       360o

There are a great many ways of slicing the pie, but an easy one is to start at the 12:00 position of
the pie (top) and start slicing off clockwise. Such a method would produce the following pie.

72
Pictogram
The bar graph and pie chart are very useful and professional means of graphically summarizing
data. But, the pictogram is the most consumer friendly method of displaying this type of data in
non-professional arenas, such as newspapers, TV, etc. The pictogram is in essence a bar graph, in
which the bars have been replaced by representative pictures of the topic being displayed. There
are two general ways in which pictures are used in the pictogram. The first is to lengthen the
picture in relation to the frequency, relative frequency, or percentage (whatever is being
communicated). In effect, this lengthening distorts the basic picture by appearing to stretching it
to reach the height depicted by of the corresponding bar graph. This can be seen in Pictogram
Figure 1.

Figure 1

The second is to keep the proportion of the picture correct, but to increase its overall size to
adjust for increasing or decreasing frequencies, relative frequencies, or percentages (Pictogram
Figure 2). While this type of figure might logically make more sense, it is far more challenging
to construct.

Figure 2

73
Graphical Summarization of an Ordinal or a
Ratio Variable
There are essentially three ways to graph the results of an ordinal or a ratio variable. All three
can be used with f, %, and rf data. These are

1. Histogram
2. Line Graph
3. Scatterplot (this will be presented much later in the course; Unit 15)

The histogram and line graph are presented in this section for your information. The histogram
and line graph will both use the following example.

There has been some concern for many years about the use of standardized tests in evaluating
performance and in establishing admission standards. I happen to have SAT data from the 50
states. This data is summarized already in the sense that everyone who took the SAT
examination during the year for which I have the data, was averaged together by state to provide
a state wide score. Thus, I have the mathematics and verbal SAT scores for each state. In this
illustration, we will only use the mathematics scores for the 50 states. As in the previous
Network Popularity problem, we could enumerate all 50 of the scores, but such an enumeration
does not provide us with useful information. Hence, some summarization of the 50 individual
pieces of information, should prove helpful.

Enumeration of the mathematics SAT scores for the 50 U.S. States

437   444 450 459 459 460 460 463 466 466 468 468 468 470 471 548 550
474   475 476 477 483 484 484 484 486 488 493 501 503 507 516 561 567
516   518 520 520 521 523 523 526 527 529 529 537 540 545 546 584

Histogram
The histogram is very similar in appearance to the bar graph. The only noticeable differences are
(1) the horizontal axis represents the values from an ordinal or a ratio variable, and (2) the bars
touch one another in the histogram and are separated from one another in the bar graph. For this
illustration, the horizontal axis will be scores on the mathematics portion of the SAT. A
difficulty in ordinal data, and in ratio data in particular, is that there may be very few replicated
values. Thus, if we were to construct a histogram in the same manner as a bar graph (the
horizontal axis reflecting unique values of the variable), most of the frequency counts would be
one. Thus, there would be very little savings attributable to the summarization. This can be seen
in a comparison of the frequency table below (Math SAT Frequency Table) with the enumeration
above.

74
Frequency Table

x      f

437     1
444     1
450     1
459     2     What we see in this frequency table is a summary
460     2     which is little better (smaller in size) than the
463     1     initial enumeration.
466     2     Therefore, we have to slightly modify the manner
468     3     in which we summarize the data for ordinal and
470     1     especially ratio variables. This slight modification is
471     1     to group the data together. This modification is also
474     1     the step in the graphical summarization of ratio data
475     1     that seems to make histograms and line graphs
476     1     difficult and confusing to construct.
477     1
483     1
484     3
486     1
488     1
493     1
501     1
503     1
507     1
516     2
518     1
520     2
521     1
523     2
526     1
527     1
529     2
537     1
540     1
545     1
546     1
548     1
550     1
561     1
567     1
584     1
Total   50

75
Grouping is a process that is similar to creating some number of boxes and then placing the
individual scores into the appropriate box. I have decided to use 6 boxes. The first box will be
labeled 435 to 459, the second box will be labeled 460 to 484, the third box will be labeled 485
to 509, etc. You will notice that each box has a beginning number and an ending number. These
are referred to as the lower limit (beginning number) and upper limit (ending number) of each
box. Having these boxes, we now place the 50 scores into the appropriate box.

Box     Label            Scores from the 50 States

Box 1 – 435 to 459:      437,444,450,459,459
Box 2 – 460 to 484:      460,460,463,466,466,468,468,468,470,471,474,475,
476,477,483,484,484,484
Box 3 – 485 to 509:      486,488,493,501,503,507
Box 4 – 510 to 534:      516,516,518,520,520,521,523,523,526,527,529,529
Box 5 – 535 to 559:      537,540,545,546,548,550
Box 6 – 560 to 584:      561,567,584

This collection of the individual scores into boxes (summary) can now be put into a frequency
table.
Grouped Frequency Table

SAT Math Scores            f
Box 1 (435 – 459)          5
Box 2 (460 – 484)         18
Box 3 (485 – 509)          6
Box 4 (510 – 534)         12
Box 5 (535 – 559)          6
Box 6 (560 – 584)          3
Total                     50

To construct the histogram we will essentially draw a bar graph with only one major difference;
the bars in the histogram will touch one another rather than having a space between them. While
this may sound rather simple, the discussion about how to make the bars touch is unfortunately
complicated and involves two different methods. Please bear with me in this presentation.

Method 1 – When the bins (boxes) do not share a common connection value. This is the
situation presented in the summary table for the SAT Math Scores in the example immediately
above. You will notice that the upper limit of one bin IS NOT equal to the lower limit of the next
bin. For instance in this SAT Math example, the upper limit of the first box is 459 and the lower
limit of the second box is 460. These two values are not the same; hence you will need to use
method 1 to construct the histogram. In this situation, we need to find a common connection
point between the bins or boxes. This is done by adding the upper limit from one bin to the lower
limit of the next bin and divide by two. Using this method would produce the following five
connection points for the SAT Math data.

76
Common Connection Point 1          (459 +460)/2 = 459.5

Common Connection Point 2          (484 +485)/2 = 484.5

Common Connection Point 3          (509 +510)/2 = 509.5

Common Connection Point 4          (534 +535)/2 = 534.5

Common Connection Point 5          (559 +560)/2 = 559.5

Each of these five points is the common connection point (shared wall) of the bars that will be
raised above these bins. All that now remains is to construct the bars, which is done in exactly
the same manner as was performed in the construction of the bar graph. We draw the height of
the bar above any bin at the level of the appropriate f, rf, or % values from the summary table.
The conversion of the frequency data in the summary table above for the SAT Math scores can
be found in the attached histogram figure.

77
Method 2 – When the bins (boxes) do share a common connect point. In this situation the upper
limit of one bin IS exactly equal to the lower limit of the next bin. The problem here is not in the
determination of the common connection point, but what to do with a value in the data set that
happens to equal the common connection point. Should this case in the data set be added to the
lower bin, the upper bin, or both bins? [This problem is not graphical in nature, but deals with
the appropriate summarization.] In this class we will assume the following. If a data value
happens to equal a common connection point between two bins, it will be placed in the larger of
the two bins only. For instance, let‟s consider a situation that has only three bins, which are I
(460 – 485), II (485 – 510), and III (510 – 535).

Which bin (I, II, or III) would the case associated with the value 512 go into?     III

Which bin (I, II, or III) would the case associated with the value 484 go into?     I

Which bin (I, II, or III) would the case associated with the value 485 go into?     II

The issue in the summarization of ratio data and in the construction of the histogram is to
determine whether the bins provided have limits without a common connection point (Method 1)
or with a common connection point (Method 2), then to use the identified method to summarize
the data and construct the histogram.

Note: The determination of the appropriate number and size of bins (boxes) to use when
grouping ordinal and ratio data is not an overly difficult task, but it is extremely difficult to
communicate in a succinct manner. Rather than attempt to communicate a process that has little
value beyond the context of this course, I have chosen to eliminate this discussion for simplicity
sake. Thus, for every situation presented in this course which will require or use a histogram, line
graph, or table summary of an ordinal or ratio variable, I will provide you with the number of
bins to use and their limits.

78
Line Graph
While our discussion of histograms might have seemed overly complex, the construction of line
graphs is extremely simple.

1. Construct the histogram (refer back to the previous graph)
2. Replace the end point numbers (lower limit and upper limit from the same bin) from
the bins (boxes) used in the histogram and table summary with the midpoint numbers
of the bins. These can be obtained through the following formula.

Midpoint = (lower limit + upper limit) / 2

NOTE: THESE LIMITS ARE WITHIN THE SAME BIN. IN THE PREVIOUS
DISCUSSION OF THE HISTOGRAM, THE LIMITS DISCUSSED WERE
FROM DIFFERENT BINS.

3. Place a dot at the top of each bar in the middle of the bar. These can be seen in the
previous graph.
4. Connect each of the dots
5. Throw away the bars

79
Bias in Graphs
As seen earlier, graphical summaries are excellent for communicating a comparison of one level
of a variable with another level of a variable. However, such summarizations are often
misrepresented in newspapers, magazines, and even professional journal articles through
ignorance and intentional manipulation. This distortion in graphs often occurs because we are
trying to oversimplify, trying to be cute in our depiction, or just because we want to sell a
message regardless of what the data says. Said another way, if a picture is worth a thousands
words, then a bad picture is worth a thousand lies (AKA – nothing lies as well as a bad
picture). In order, to become an educated consumer of graphical summarizations, it is as
important to know how such graphs can be used to present information in a biased manner as it is
to know how to construct them in an unbiased manner. There is an excellent presentation of this
topic in the book entitled How to Lie with Statistics written by Darrell Huff.

While there are numerous illustrations of how to bias a graphical summarization, there are
essentially only three distinct biasing elements (one for each type of graph for a categorical
variable) and they all deal with the issue of perspective. Recall that in graphs, the fundamental
concept being communicated is the relative response from one value of the variable to another.
Thus, if we can somehow influence the perceived magnitude of a particular response, we can
greatly influence its perceived relative response.

80
Bias in the Bar Graph
In the bar graph, bias is introduced through the manipulation of the perceived magnitude of the
response. [This is also true for the Histogram] Remember the vertical axis in the bar graph
reflects frequency, relative frequency, or percentage, and thus the heights of the bars are
supposed to represent the magnitude of the responses. In the bar graph, the manipulation of the
perceived magnitude of response is accomplished through the severing of the vertical axis at
some point (usually near 0). In figures 1 and 2, the frequencies of types of summer employment
for high school students are presented. Take a quick look at figure 1 and note what you see.

Figure 1

What did you see? A quick look at figure 1 gives the impression that there are 4 or 5 times as
many students employed over the summer in the fast food industry than there are in family
businesses. Take another look at Figure 1. However, a more careful examination of figure 1
shows that the vertical axis has been severed between 0 and 9000. What impact does this
severing have on the actual bar graph? Look at figure 2 for the actual graph.

81
Figure 2

In figure 2 the heights of the bars are much more similar in size than was intimated in figure 1.
The actual relative frequencies are quite similar. Instead of the fast food industry being 400% or
500% bigger than family businesses in employing students during the summer, the fast food
industry is only about 25% bigger. Thus, the result of the manipulation in figure 1 was rather
dramatic. The nature of this type of bias, is to make the relative differences between the values of
the categorical variable appear much bigger than they really are. This type of bias is extremely
common and should be looked for every time a bar graph is presented.

82
Bias in the Pictogram
The manipulation of perspective in the pictogram is quite similar to that just discussed in the bar
graph. However, rather than severing the vertical axis, it is presented pictorially in a manner that
distorts the apparent heights of the columns of the pictures. From a psychological or artistic
notion of perspective, if we look at two identical objects, one closer and one farther away, the
one that is closer looks both taller and wider. Thus by manipulating the pictogram so that the
desired column appears closer to the viewer in the perspective of the pictogram, it is possible to
make even a smaller column in reality appear larger. In Figure 3, we have summarized the daily
circulation data for two newspapers. This figure depicts a correctly designed pictogram. In this
figure, the two columns of newspapers appear at an equal distance from us, hence these two
columns differ only in accordance with their true magnitudes. The very same data in Figure 3 is
also presented in Figure 4. Note that by making the one column seem closer to us (by changing
our perspective of the columns from frontal to above), that we have not only exaggerated its
apparent height, we have greatly exaggerated its overall apparent size by additionally increasing
its width and depth. This type of manipulation, which actually appeared in print, is extremely
common in pictograms where the data is being made more consumer attractive. It should be
noted that the average daily circulation of newspaper 1 is 1.7 million and of newspaper 2 1.4
million. This comparison of the two newspapers is very accurately depicted in Figure 3 and very
grossly misrepresented in Figure 4.

Figure 3

83
Figure 4

84
Bias in the Pie Chart
The manipulations we have seen above in the Bar Graph and Pictogram are very conscious
intentional acts to misrepresent the findings of studies in a graphical context. In the case of pie
charts, the most common “manipulation” is in fact not a manipulation at all. All of the graphs we
have seen so far have been relatively simple. This is probably the most important feature of all
summarizations (numerical and graphical), simplicity. We first saw the issue of simplicity arise
earlier in our discussion about histograms. When the data was ungrouped, there were too many
values with very small frequencies, which in effect produced little or no summarization
(simplification). Pie charts such as those seen in the last unit were simple to understand. For
example, look at Figure 5.

Figure 5

Without even knowing the variable being examined or the subjects being questioned, what do we
know from just looking at this figure?

1. The variable being depicted is very likely looking at food in some manner. We know this
from the slices of the pie (burgers, tacos, pizza).
2. Whatever is being asked, we know that pizza is the most popular response (more than 25%
of the pie).
3. We also know that burgers are the least popular response (less than 25% of the pie).

The data in Figure 5 came from asking the following question of 120 people, “what is your
favorite food: burgers, tacos, pizza, or something else?”

A common way to render almost any graph essentially useless is to make it overly complex, thus
violating the fundamental principle of simplicity. This is especially true, and most commonly
true, in the case of Pie Charts. Figure 6 reflects such a situation.

85
Figure 6

Without even knowing the variable being examined or the subjects being questioned, what do we
know from just looking at this figure?

1. It seems fairly clear that we are talking about “someone‟s” monthly expenses.
2. It would seem that the biggest expenses are for food and rent.
3. After this the pie is sliced into many sections and it would take us considerable time and
space to reflect all that is going on.
4. With so many of the slices being rather small, it makes a precise comparison of their relative
size rather difficult.

It should be noted that this figure does not misrepresent the findings from our study. It just
makes them more difficult to understand.

86
Time Plots
A variation on the line graph is the time plot. In almost every way the time plot looks like a line
graph, with two exceptions. First, the horizontal axis is time, and second, the vertical axis
represents the values of the variable of interest. While these two graphs look alike, if you look at
the horizontal axis you will never be fooled. If it is time, then it is the time plot, and if it is the
values of the variable of interest, then it is the line graph. Second, the time plot actually is
composed of two variables, one of which is time. Thus, in actuality the time plot is closer in truth
to the scatterplot (Unit 15). From my experience with the popular media (TV, newspaper, and
magazines), the time plot is used a lot and the line graph is used very little. The time plot is also
commonly referred to as the Time Series Plot.

There are several very popular examples of the time plot to which you are exposed almost daily.
These are the trends in the stock market (Dow Jones Average, NASDAQ, etc.) and a host of
other economic indicators, the weather, and sports.

Here is an example of a time plot from weather data.

During the month of May, I collected the high temperature and the low temperature for all 31
days. Notice that there are three variables here. The first is the high temperature, the second is
the low temperature, and the third is day in the month of May. The enumeration of the data is
(the enumeration is in order from day 1 to day 31):

Highs:
67,68,70,72,67,62,58,48,52,55,62,65,67,68,63,55,57,59,81,68,74,78,68,68,54,70,64,73,74,75,73

Lows:
42,37,38,40,40,43,51,46,42,48,46,38,40,40,38,37,32,40,50,47,47,47,52,53,50,42,49,42,45,50,46

The enumeration is once again not terribly helpful in "seeing" patterns within the data. I have
used this information for the 31 high and low temperatures for the month of May to construct the
three figures below. The first time plot reflects the trend in time (third variable) for the low
temperature (first variable); and the second time plot reflects the trend in time (third variable) for
the high temperature (second variable). The third plot represents a compound time plot, in which
the low and high temperature are plotted together. This would be done to see if there is a
relationship between low and high temperatures.

87
Low Temperature Time Plot

High Temperature Time Plot

88
Compound Temperature Time Plot

The first two time plots are associated with Form 2 questions. How does temperature relate to
day of the month (high in plot 1 and low in plot 2). The third figure is an illustration of a time
plot for complex Form 2 question in which we are asking how both high and low temperature
relate to day of the month.

There doesn‟t seem to be much pattern to any of the plots. Maybe it is wishful thinking, but just
maybe we can see that over the course of the month the temperatures (both high and low) are
creeping somewhat upward. Certainly it can be said that both the high and low temperatures over
the course of May are erratic. There is a numerical summarization method for time plot data and
it is called Time Series Analysis (no big surprise here).

89
Unit 6: Numerical Summarization: The Most
Representative Value
When we are working on problems in which there are only one or two variables of interest or in
which we can consider one variable at a time, then graphical methods of summarizing data are
extremely efficient and effective. However, most real life problems are considerably more
complex. As such, it is tremendously beneficial to be able to summarize problems even more
succinctly than is possible through graphical methods.

The primary purpose of this course is to introduce you to the application of statistics and
statistical thinking existing in the world we live in. The majority of these applications can be
presented logically and/or through graphical means. The numerical summarization methods
bridge the gap between simple everyday applications and the more complicated world of science,
business, forecasting, etc. By learning graphical methods and the principles in the course to this
point, you are becoming a more sophisticated consumer of statistics. The numerical methods are
the foundational elements to becoming a more sophisticated user of statistics. The discipline of
statistics is traditionally split into two main sub-fields. The first is called descriptive statistics.
Descriptive statistics (graphical and numerical summarizations) are used to tell what is currently
happening or what has happened; to describe the world in which we live. The second is called
inferential statistics. Inferential statistics are used to tell us what will happen; to enable us to
make decisions and predictions. These methods are the primary focus of this course. It will take
us the first 10 Units to establish the foundation necessary for the last 10 Units. However, upon
the foundational principles in these last 10 units have been built the entire field of modern era
statistics.

In Unit 4 we saw that the enumeration of the data was a very accurate and cumbersome means of
illustrating the results from a study or experiment. In the frequency table the data could be
presented clearly in much less space; however, in all of these methods, all of the data are present
in one form or another. If we were trying to ultimately simplify a set of data, what would this
simplification look like? The ultimate simplification would be to reduce the data set down to a
single number. Is it possible to condense a data set into a single number, a single number that
would be reflective of all the information? The answer to these questions is yes. The single
number that is reflective of the entire set of data is the number which might be defined as the
most representative or the most common or the most typical of all the possible values.

90
Terms
Mode – the most often (frequently) occurring value in the data set

Median – the middle value in the data set

Mean – the average value in the data set

Distribution – the graphical presentation of an actual collected set of data or a theoretical set of
data

Symmetric distribution – the graphical display above the mean looks identical (in a reversed or
mirror image) of the graphical display below the mean

Skewed distribution – a graphical display that is not symmetric

Left skewed distribution – a graphical display in which the distribution appears to be stretched
out to the left (mean < median)

Right skewed distribution – a graphical display in which the distribution appears to be stretched
out to the right (median < mean)

Percentile – the breaking up of a distribution into 100 “equal” parts. A percentile is the
percentage of the distribution at this identified value or smaller

Quartiles – the breaking up of a distribution into 4 “equal” parts

Equations
X = the symbolic name for a variable

Xi = the score for a specific person on the variable x

n = the sample size

X1 = the score on the variable for the 1st person in the sample

X2 = the score on the variable for the 2nd person in the sample

etc.

Xn = the score on the variable for the nth (last) person in the sample

91
is a symbolic way of saying the sum of everything in the box, thus


n
i 1
Xi  X 1  X2  X3    Xn                is the sum of the scores for all the people in the sample

n

Sum X           X        i
X1  X 2  X 3    X n
X                  i 1
                                   This is the equation for the mean.
n                   n                       n

X             This is read as xbar and is the symbolic name for the mean. !!!!!!!!!!!!

What does this equation mean in English?

The mean of a sample is equal to the sum of the scores for all the people in the sample divided
by the number of people in the sample. This is also called the average.

The most representative value
For any set of data, what is the most representative value of all the possible values?

This is a very fundamental question and it is one in which the answer is not as simple as we
would like it to be. The complexity primarily comes from the consideration of the level of
measurement in the variable we are examining. Hence, the most representative value depends
upon whether the data is categorical, ordinal, or ratio.

The following table can be used to assist us in finding the answer. In this table, three types of
representative values (mode, median, and mean) are presented. For some levels of measurement
only some of the representative values are possible. When several representative values are
possible the one that is best, most representative, is listed in red and bold in this table.

Level of Measurement
Categorical                    Ordinal          Ratio or “Ratio”

Mode                        Mode                Mode
Median              Median
Mean

92
Mode
The mode defines the most representative value for a set of data as the value that has the highest
frequency. As was seen in the table of the last section, it is possible to calculate the mode for any
set of data. However, the concept of the mode is most useful and the mode is the most
representative value of a set of data, when that data arises from a categorical variable.

Examples

In the frequency table below, what is the mode (sometimes referred to as the modal value)? This
data was collected from opening a 1.69 oz. bag of M & Ms and counting the number of each
color.

Color         f

Yellow         9
Red            9
Orange         4
Blue           8
Green1         4
Brown          16

Since Brown has the largest frequency (f=16), then Brown is the mode.

In the following frequency table, what is the mode? This data was collected from opening a 2.17
oz. bag of Skittles and counting the number of each color.

Color             f

Yellow         11
Red            9
Orange         10
Purple         15
Green          15

In this table, there are two values that share the largest frequency. The largest frequency is 15,
and both purple and green achieve this result. Hence, purple and green are the modal values.
Since there are two modes, the data in this frequency table as said to be bi-modal. If there were
three modes the data would be called tri-modal, if there were four modes the data would be

93
Median
The median defines the most representative value as the middle point of a set of data or as that
score such that the same number of data points lie below it as lie above it. You can think of a
median in the context of an interstate highway. The median is that part in the middle of the
highway such that the lanes going in one direction are split off from the lanes going in the other
direction. Roughly half of the highway is split off from the other half of the highway by the
median. On a two-lane road, the white dashed line is the median. The definition of the median,
above, implies that the data can be ordered in some fashion, so that some of the data is identified
as below the median and the same amount of data is identified above the median. This ordering
of the data indicates that we must be considering a variable that is at least measured on an ordinal
scale. This is why the median was not listed as one of the numerical summarization methods, in
the table presented earlier, for categorical data.

If you went out and asked the following question of a sample of people selected in almost any
manner from any population, what would the modal answer be?

“When you think of the most typical or representative response, what do you

When I have asked this question in the manner indicated, by far and away, the modal answer has
been “a response in the middle.” The median is the definitive middle response. Conceptually, the
median is the statistical numerical summarization method that corresponds most closely with
people‟s intuitive understanding of the most representative single response from a set of
responses.

How do we calculate the median from a set of data?

1. Order the data, from the lowest value to the highest value.
2. If n is odd, take the middle value.
3. If n is even, take the average of the two middle values.

Examples

On a scale from 0 (never) to 10 (all the time) answer the following question. “When you go out,
how often do you go with other people?” What level of measurement is this scale? Ordinal. Both
of the examples use these data that were collected from this question.

Example 1

These data come from 11 females.

2 4 6 7 6 7 3 4 6 5 4

What is the median of these responses?

94
Step 1: Order the data from the lowest to the highest value

2 3 4 4 4 5 6 6 6 7 7

Since the number of responses is 11 (odd), then use step 2.

The middle value is 5. It is the 6th number in the sequence. Note that there
are an equal number of values below it (2, 3, 4, 4, 4) and as above it (6, 6, 6, 7, 7).

Example 2

This data comes from 14 males.

2 3 4 5 2 3 6 8 5 4 6 5 4 7

What is the median of these responses?

Step 1: Order the data from the lowest to the highest value

2 2 3 3 4 4 4 5 5 5 6 6 7 8

Since the number of responses are 14 (even), then use step 3.

The middle two values are 4 (7th number) and 5 (8th number). The average of
the middle two numbers is 4.5 (4+5 divided by 2). Note that there are 7
values below this number (2, 2, 3, 3, 4, 4, 4) and 7 values above this number
(5, 5, 5, 6, 6, 7, 8). Conceptually, you should think of the median going in between
the two middle values; kind of like a 15th value.

The issue being addressed in example 1 and 2 reflects a Form 1 Question. For instance, in
example 1 the question would be, how often do females go out with other people? We could
combine these two examples into one and ask a Form 2 question. This would be, do males or
females go out more often with other people?

95
Mean
Of the three ways of determining the most representative value from a set of data, the mean is
probably the notion most familiar to everyone. The mean defines the most representative value
for a set of data as the average. The average is simple to calculate. It is merely the sum of all
values (in the sample) divided by the number of values.

The equation for the mean was given earlier and repeated here for convenience of presentation.


n
Xi
X      i 1

n

x = xbar = mean = sum of all the values in the sample divided by the sample size

On a special and important note, the mean of a sample is designated symbolically as x and
the mean of a population is designated symbolically as .

Roman letters (the ones we used in our own alphabet, such as "x") are used to designate
statistics that come from samples and Greek letters (such as ) are used to designate
statistics that come from populations.

Example

Recently, I asked 5 people how many coins they had in their pocket or purse. These five people
produce the following data.

2 4 4 5 16

What is the mean (average) of these 5 numbers?

sum of the values = x1 + x2 + x3 + x4 + x5 = 2 + 4 + 4 + 5 + 16 = 31

number of values = n = 5

mean = sum divided by the number of values = 31 / 5 = 6.2

What does 6.2 mean? That the average person carries 6.2 coins in her/his pocket or purse. Sort
of. Six point two coins is difficult, actually impossible, to carry. So what does 6.2 really mean? If
we collect another sample of people, then the average number of coins that will be in their
pockets and purses should be 6.2. We will pursue the interpretation in greater detail below.

What is the mode and median of the number of coins that we found in pockets and purses?

Mode = 4 (highest frequency of occurrence, it occurs twice)

96
It is especially important to recognize that the mode is not 16 (the highest response), but 4 the
most frequently occurring response. What does the mode indicate? That more people carried 4
coins than any other number of coins.

Median = middle value (since the number of values is odd) = 4

What does the median indicate? That as many people carried fewer than 4 coins as carried more
than four coins. Note: this is still true even though 4 is one of the numbers below the middle
value!

As we have seen in the previous two sections, the interpretations of the mode and median are
very simple and obvious. However, the interpretation of the mean is somewhat more
challenging?

Interestingly enough, the most commonly used concept for the most representative value of a set
of numbers is the most difficult to interpret and least intuitive of the 3 possible notions.

The mean is the balance point for the set of numbers. What is the balance point for a set of
data?

For illustration of this concept, refer to the figure below, which is supposed to represent a teeter-
totter. Each X in the figure represents one value from the set of data and can be thought of as
weighing an equal amount to all other values (say 1 pound). Recall that these data reflect how
many coins are in a person‟s pocket or purse. WE DO NOT PUT A ONE POUND WEIGHT
ON THE TEETER-TOTTER FOR EACH COIN (there were a total of 31 coins). WE DO PUT
A ONE POUND WEIGHT ON THE TEETER-TOTTER FOR EACH RESPONSE. Since there
are a total of 5 values (responses) in this data set, the total weight of the data set is 5 pounds. The
average is the point where we would place the fulcrum under the teeter-totter such that it would
balance. The balance point is the point where the weight on the left side (left of the fulcrum)
would balance the weight on the right side (right of the fulcrum). This is represented by ^ in the
figure. It should be noted that we have seen this figure before. In the graphing unit it was called a
histogram.

X
X        X X                                         X
1    2    3   4 5       6    7   8    9 10 11 12 13 14 15 16
^

Looking at the data in this figure, which value do you think is the most representative, the mode,
the median, or the mean?

As an aside, but a very important point, the mean is considered to be the very best predictor
of an outcome expected to occur in the future. Hence, if I were to go up to a 6th person and ask
the question, “how many coins do you have in your pocket or purse?”, then the guess I should
make (prediction) would be 6.2 coins. Since 6.2 coins aren‟t really possible, I would round to the
nearest whole number and guess 6 coins. The reason why the mean is the best predictor will be

97
discussed at various points throughout the remainder of the course. In contrast, the median is
considered by many to be the best descriptor of the sample we have collected. In an ideal world
(situation) this difference becomes a non-issue and is addressed in the next section of this unit.

To say the least however, the use of the mean versus median is a hotly contested and debated
issue. It is the basis for the disagreement expressed in the very first article we considered at the
beginning of this course.

The key issues in this debate are illustrated in the next section on SKEW.

98
Skew
The skewness of data is a way of describing the pattern graphically displayed in a graph such as
the histogram. This pattern is only relevant for ordinal or ratio data. Hence, the question actually
motivating the discussion in this section is whether the median or the mean is the most
representative value.

Skewness is the opposite concept of symmetry. As presented earlier, a symmetric
distribution is one in which the graphical display above the mean looks identical (in a
reversed or mirror image) of the graphical display below the mean; and a skewed
distribution is a graphical display that is not symmetric.

Examples

First Example: The following data were collected from a coffee machine that is supposed to
dispense 10 ounce cups of coffee. We purchased 23 cups of coffee (n = 23) and then measured
the number of ounces of coffee dispensed by this machine into each cup (the data below are in
ounces). Are these data symmetric or skewed?

3 4 5 5 5 6 6 6 6 7 7 7 7 7 8 8 8 8 9 9 9 10 11

Even in this ordered enumeration of the data it is hard to tell if the data are symmetric or skewed,
so let‟s graph the results.

X
X    X   X
X      X     X   X   X
X      X     X   X   X
X X X       X    X    X   X X X
1   2   3 4 5       6    7    8   9 10 11
/\

Is the data set above symmetric? Is the “left” side the mirror image of the “right” side? What
defines the “left” and the “right”? Are we going to use the median or the mean to define the
middle and hence the dividing point of the left from the right?

What is the median and what is the mean of this data?

Median – there are 23 values (odd) in the data set, hence the median is the middle value. The
middle value is the 12th number (11 below and 11 above). Thus, the median is 7 (in red).

Mean – the sum of the values is 161. The number of values is 23. The average (mean) is thus
161/23 = 7. Indicated by the balance /\ below the scale in blue.

99
In this problem the mean = median. If 7 defines the “middle,” then does the display to the left of
7 look like the mirror image of the display to the right of 7?

The value next to 7 on the left is 6. There are 4 sixes. The value next to 7 on the right is 8. There
are 4 eights. Thus for the numbers next to 7, the left is like the right (frequency of 4). The value
two away from the 7 on the left is 5. There are 3 fives. The value two away from the 7 on the
right is 9. There are 3 nines. Thus for the numbers two away from the seven, the left is like the
right (frequency of 3). The left will be found to be like the right for the numbers 3 away from 7
and for the numbers 4 away from 7. Thus the left, for all values, looks like the mirror image of
the right, for all values. Hence, for this set of data, the values are symmetric. Actually we could
have determined this to be the case earlier when we found the median and the mean to have the
same value. In symmetric displays, the median = mean. Thus, for a symmetric set of data, it
does not matter whether we use the median or the mean as the most representative value, since
they will be the same value. This is the ideal world situation referred to at the end of the previous
section.

Second Example: After seeing the data I collected from the coffee machine in the example
above, my boss suspects that I didn‟t do a very good job. So he tells me to go and collect another
sample from this machine. The data for my second sample from this coffee machine appears
below (once again presented in ounces). Are these data symmetric or skewed?

1 2 3 4 4 5 5 6 6 6 7 7 7 7 7 8 8 8 8 9 9 9 10

Once again, it is a little hard to tell from the enumeration of the data, so let‟s graph the results.

X
X   X
X           X   X   X
X X X           X   X   X
X X X X X X           X   X   X X
1 2 3 4 5 6            7   8   9 10
/\

Is the data set above symmetric? After seeing the previous example, we now know that the place
to start is with the determination of the mean and the median.

Median – once again there are 23 values, so the median corresponds with the 12th value (the
middle value), which is 7.

Mean – the sum of the numbers above is 146. The number of values is 23. Thus the mean for this
data is 146/23 = 6.35.

If the median is the definition the middle of a set of data, then does the data to the left of the
median look like the mirror image of the data to the right of the median?

If 7 is the median, the value 6 is just to the left. How many sixes are there? Three.

100
The value just to the right of the median is 8. How many eights are there? Four. If you continue
to compare the frequencies of the values moving outward from 7 on the left with the frequencies
of the values moving outward from 7 on the right, then you will see that these frequencies do not
match up as they did in the first coffee machine example. In fact, the left side seems to be more
pulled away (stretched) from the median.

This figure, for our second coffee machine example, can be described in one of two ways. First,
the data is not symmetric, because the left side has been pulled too far off. Or second, the data is
not symmetric, because the right side has not been pulled off enough. In either case, when the
data on the left doesn‟t look like the data on the right, then the display (distribution) is skewed.
The skew can either be to the left (also called negative) or to the right (also called positive). The
direction of the skew is toward the side where the data appears to be overly pulled away. For this
example, the data was overly pulled off to the left; hence this data set is said to be left skewed.
The term negative skew could also be used and means that in the sense of the real number line,
the data has been overly pulled off in the direction of the negative numbers. The reverse situation
would indicate a right or positive skew.

When the data are symmetric, it doesn‟t matter whether we use the median or the mean to
indicate the most representative value of a set of data, because they are the same value. However,
when the data are skewed (such as the example 2) the median will never equal the mean. In fact,
while the median is always the middle value for any set of data, the mean moves in the
direction of the skew. When we have a set of data which is left skewed, then we know that the
mean will be less than the median (the mean is to the left of the median; 6.4 = mean and 7.0 =
median for this example). When we have a set of data that is right skewed, then we know that
the mean will be greater than the median (the mean is to the right of the median).

Here then is the controversy, which value (mean or median) is more representative; is more
appropriate. As an illustration, consider a set of data that represents the distribution of annual
income. Do you think that this distribution will be right or left skewed, or symmetric? Most
likely this distribution will be extremely right skewed. There are a lot of people who make at or
near minimum wage and there are only a few people who make in the millions of dollars per
year. (There are a lot of workers and few CEOs.) These data will be very right skewed toward
the higher salaries. What does this indicate to us in the context of the mean and median?
Remember that the mean is more influenced by the skew than is the median. Hence, the mean of
these annual income data will be much larger than the median. As a consequence, those people
wanting to say that times are good will use the mean as the most representative value (higher
value/salary), and those people wanting to say that the times are bad (or simply not as good as
the other people would have us believe) will use the median as the most representative value
(lower value/salary). Both the median and the mean have something important to say about any
set of data, it is just that they have something different to say when the data happens to be
skewed. This is our first real introduction to the notion that statistics can be used to present two
very different stories about the same data. The question is which story is more appropriate.
Recall Disraeli‟s conclusion about statistics, “there are lies, damned lies, and statistics.” While
this probably makes considerable sense by this time, it really is a comment about users
(presenters) of statistics rather than the statistics themselves.

101
Percentiles and Quartiles
Another way of looking at data in a numerical form is to convert the original data into
percentiles. A percentile is 1/100th of the data set. In essence a percentile is the breaking up of a
distribution into 100 “equal” parts. A percentile is the percentage of the distribution at this
identified value or smaller. For instance, the score associated with the 30th percentile is that score
such that 30% of the people in the sample scored this value or smaller. The 50th percentile is that
value of the data set such that 50% of the data set is that value or smaller. Another name for the
50th percentile is the median.

What is the 0 percentile? What value of the data set is such that no value is smaller than it? The
smallest value (the minimum value).

What is the 100 percentile? What value of the data set is such that every value is smaller than it?
The largest value (the maximum value).

Breaking up the entire data set into percentiles would not reflect much of a simplification in the
presentation of most data sets. However, 5 of the percentile values, called the quartiles, are
particularly useful in creating one of the more popular scientific graphing methods today. This is
called the box plot and is presented in the next section of this unit.

The quartiles are quartering of the data set and hence are the 25th percentile (called Q1), the 50th
percentile (called Q2 or the median), the 75th percentile (called Q3), and the 100th percentile
(called Q4).

While in statistics we speak of percentiles it is common outside of statistics to refer to the same
concept in terms of percentages.

102
Quartile Example
This example is the first example presented in the Median section. On a scale from 0 (never)
to 10 (all the time) answer the following question. “When you go out, how often do you go with
other people?” There were 11 females who answered this question and produced the following
scores. Calculate the 4 quartile values for these data.

2 4 6 7 6 7 3 4 6 5 4

Step 1: Order the data from smallest to largest. Since the sample size is odd (n=11), then the
median (Q2) is the middle ranked value or 5.

2 3 4 4 4 5 6 6 6 7 7

Step 2: The first quartile (Q1) is the median of the values below the median. There are 5 values
(odd number) below the median (in red). Thus, the median of these lower 5 numbers is the
middle ranked value or 4 (the 3rd ranked value of the 5 numbers). Presented in red and
underlined.

Step 3: The third quartile (Q3) is the median of the values above the median. There are 5 values
(odd number) above the median (in blue). Thus, the median of these upper 5 numbers is the
middle ranked value or 6 (the 3rd ranked value of the 5 numbers). Presented in blue and
underlined.

Step 4: The lowest ranked value out of the 11 is the minimum value, which is 2.

Step 5: The highest ranked value out of the 11 (Q4) is the maximum value, which is 7.

Quiz
Here is a set of data to use as a quiz for calculating the mode, median, mean, and the quartiles.
You might find it useful to plot the data.

Quiz Data Set

4 4 5 5 5 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 10 10 10 11 11

103
Plot

X   X
X   X
X   X   X   X   X   X
X X   X   X   X   X   X X
X X   X   X   X   X   X X
1   2   3   4 5   6   7   8   9   10 11

Mode – there are two modes, 7 and 8, both with frequency = 5. Thus this set of data is bi-modal.

Median – since there are an even number of values (26), then the median is the average of the
two middle values. These two values are 7 and 8. The average of 7 and 8 is 7.5 = median. This
means that 13 values are below the median (an odd number) and 13 values are above the median
(an odd number).

Mean – the sum of the 26 numbers is 195. The average is therefore 195/26 = 7.5. You should
have been able to guess this because the display is symmetric; hence the mean and the median
should have the same value and do.

The quartiles are the 0th (the minimum value, which is 4), the 25th percentile (Q1; the median of
the 13 scores in the first half of the data set or the 7th ranked value, which is 6), the 50th
percentile (Q2; the median for the entire data set, which is 7.5), the 75th percentile (Q3; the
median of the 13 scores in the second half of the data set or the 7th ranked value of these 13,
which is 9), and the 100th percentile (the maximum value or Q4, which is 11).

104
Unit 7: Numerical Summarization: Deviation
Terms
Deviation – how far any score is from the mean or how different is any score from the mean or
how distant is any score from the mean

Standard Deviation – in concept the average deviation, also the square root of the variance

Variance – the square of the standard deviation

Range – maximum value in the data set – minimum value in the data set

Equations
The first 8 lines are borrowed from the previous unit.

X = the symbolic name for a variable

Xi = the score for a specific person on the variable x

n = the sample size

X1 = the score on the variable for the 1st person in the sample

X2 = the score on the variable for the 2nd person in the sample

etc.

Xn = the score on the variable for the nth (last) person in the sample


n
i 1
Xi  X 1  X2  X3    Xn              is the sum of the scores for all the people in the sample

n

Sum X         X       i
X1  X 2  X 3    X n
X               i 1
                                   This is the equation for the mean.
n                n                       n

Deviation = any score in the sample minus the mean = ( X i  X )

Squared deviation = the deviation score squared = ( X i  X ) 2

105

n
Sum of the squared deviations = sum of all the squared deviations =                      i 1
( X i  X )2

Variance = sum of all the squared deviations divided by (the sample size - 1)

Sum of Squared Deviations
Variance     s2 
n 1

n

(X     i    X )2
(X1  X )2  ( X 2  X )    ( X n  X )2
   i 1

n 1                                n 1

As in Unit 6 (Roman letters are used to designate sample statistics and Greek letters are used to
designate population parameters), it should be noted that the variance of a sample is designated
symbolically as s2, and the variance of a population is designated symbolically as 2

Standard deviation = square root of the variance

Standard deviation = s              s2

And of course the "s" is the sample standard deviation and "" would be the population standard
deviation.

In Unit 6 we saw how to calculate and interpret several numerical descriptors for the most
representative value of a set of data. While there is much we can do from our knowledge of these
basic descriptors, we are fairly limited in the variety of applications we can consider. The
descriptors to be presented in this unit will greatly increase our ability to consider more problems
and to consider them in much greater depth from a statistical perspective. They all focus on the
fundamental concept of deviation.

One of the major drawbacks of graphical summarizations is that they take up a lot of space and
that they are particularly beneficial only in simple problems. However, a picture is worth a
thousand words (probably more) and hence graphs are hard to beat for the information they
present. One of the primary goals of numerical summarization is to convey the essence of what
the picture would look like if it had been drawn.

From the perspective of the previous unit, you were introduced to two numerical summarization
concepts, these were the most representative value and skewness. The most representative value
gives us some sense of where the middle of the data is to be located. The skew indicates some
general form of the display, pulled off from the middle more to one side than the other or evenly
balanced. The quiz below illustrates how useful these concepts are in communicating pictorial
images of data sets.

106
Quiz 1
In the quiz to follow, a description of a set of data (mean, median, and skewness) will be given.
Given the description, try to imagine what the graphical presentation of the data might
look like. Once you have established a mental image, look at the 2 choices. Two different
graphs are provided. Which of the two graphs is most like your mental image? After you have
made this decision look at the answer and discussion. Each quiz question presents a set of data
obtained in response to the question, "how many books are you currently carrying in your
back pack?"

Question 1. Mean = 5, Median = 5, and the data is symmetric.

Now form the mental image.

Choices
X                                                     X

X X X                                                X    X X

_____X__X__X__X__X__X__X______                       _____X__X__X__X__X__X__X____

0 1 2     3    4   5    6   7   8   9 10             10 11 12 13 14 15 16 17 18 19 20

Data Set A                                            Data Set

Data Set A. The mean of Data Set A is 5. Although Data Set B looks just like Data Set A, its
mean is obviously 15 and hence can not be correct. What does a mean of 5 indicate for this
problem? That the average person in the data set is carrying 5 books. If we were to ask one more
person our target question, then we would predict that this person would be carrying 5 books (the
mean).

107
Question 2. Mean = 5.5, Median = 5, and the data is right skewed.

Now form the mental image.

Choices
X                                                                  X

X X X                                                              X    X X

X X X X           X                                       X X X         X X

________X__X__X__X__X__X_ X X                          X_ X__X_ X__X__X__X_ X__________

0 1 2     3    4   5    6   7    8   9 10              0   1    2   3   4    5    6   7   8   9 10

Data Set A                                                Data Set B

Data Set A. The skew is in the direction that the data set is seemingly pulled off from the
"center." The data set appears to be pulled off to the right. If this is true, then the mean (which is
pulled off most by the skew) should be larger than the median. This is the case in the second
problem; hence Data Set A is correct.

108
Question 3. Mean = 5, Median = 5, and the data is symmetric.

Now form the mental image.

Choices
X                                      X

X                                      X

X                                      X

X                                      X

X____________ X_______________X                      X_ X__X_ X__X__X__X_ X X__X__X

0 1 2     3    4   5   6   7   8   9 10             0   1       2   3    4   5   6   7   8   9   10

Data Set A                                                Data Set B

Both Data Set A and Data Set B fit the description perfectly.

109
The first two questions of the quiz illustrate that knowing the mean, median, and skewness of a
set of data is very helpful in getting a first impression of the nature of a set of data. However, the
third problem shows that these three concepts are not enough. Another very helpful numerical
index is called variability. Variability can be simply defined as how our measurements differ
from one another. Recall that if our measurements are all the same, then the data set is constant.
In this instance, there would be no variability (the measurements do not differ from one another).

Variability
If variability is how our measurements differ from one another, then how might we approach the
measurement of this concept?

Method 1: we could assess specifically how each score differs from every other score.

Method 2: we could establish a reference point and assess how each score differs from this
reference.

Both methods have their relative merits, and even though method 1 might seem to be a more
logical application of the definition of variability, method 2 is the method of choice. This is
especially true when the reference point is in the center of the data, such as the mean or the
median. The concept behind method 2 is called deviation.

Deviation is defined as the distance a data point is from the "center" of the data set. This
sounds simple and pretty much is. But how are we going to define center? In the previous unit on
the most representative value we saw that there are two different methods (excluding the mode)
whereby the center (most representative value) is conceptualized, the mean and the median.
Thus, we could define deviation as

Deviation (median) = score for a particular data point – median

or

Deviation (mean) = score for a particular data point – mean

If the data set is symmetric, then both calculations of the deviation are identical. If the data is
skewed, then the calculations will produce different answers. Which would be better? Logically,
deviation (median) would probably be better since the median is always the exact middle of any
set of data. However, most advanced statistical theory is based on using deviation (mean). This is
for a great many reasons, some of which are even good; however most of these reasons are far
removed from the intent and scope of this class. Hence, even though it might seem less logical,
we will use deviation (mean), just called deviation from here out.

110
Deviation Example
It was the third question in the earlier quiz that initiated this discussion on variability and has led
us to the deviation. Applying the deviation to these two data sets (A and B) generates the
following results. For the discussion below, let

di = the deviation associated with the ith person in the sample (data set).

= the score of the ith person in the sample – the sample mean

Hence, d4 is the deviation associated with the 4th person in the sample

d4 = the score for the 4th person in the sample – the sample mean

Quiz Question 3: Data Set A [recall that the description for this data set had a mean of 5]

d1 = 0-5 = -5           d7 = 10-5 = 5
d2 = 0-5 = -5           d8 = 10-5 = 5
d3 = 0-5 = -5           d9 = 10-5 = 5
d4 = 0-5 = -5           d10 = 10-5 = 5
d5 = 0-5 = -5           d11 = 10-5 = 5
d6 = 5-5 = 0

Quiz Question 3: Data Set B [recall that the description for this data set had a mean of 5]

d1 = 0-5 = -5           d7 = 6-5 = 1
d2 = 1-5 = -4           d8 = 7-5 = 2
d3 = 2-5 = -3           d9 = 8-5 = 3
d4 = 3-5 = -2           d10 = 9-5 = 4
d5 = 4-5 = -1           d11 = 10-5 = 5
d6 = 5-5 = 0

In Data Set B, what does a deviation of d1 = -5 indicate? This indicates that the first person in
this sample is carrying 5 books fewer than the average person. What does a deviation of d6 = 0
indicate? This indicates that the 6th person is carrying exactly the same number of books as the
average person. What does a deviation of d9 = 3 indicate? This indicates that the 9th person is
carrying 3 books more than the average person.

The initial thought that comes to most people when considering how to measure variability
and after being introduced to the notion of the deviation is to propose calculating the average
deviation. The concept is great in theory, but in practice it falls short. What is the average
deviation for these data sets? If we add the deviations up for Data Set A, they add to 0. Thus, the
average deviation (0/11) for Data Set A is 0. The average deviation for Data Set B also turns out
to be 0. In fact, the average deviation for any and all data sets is 0. To get around this particular
problem mathematicians and statisticians have used a very simple "trick."

111
The Simple Trick known as the Standard Deviation
The problem in calculating the average deviation is that the sum of the deviations is always 0.
This really shouldn‟t be too surprising since the mean is the balance point of the data set. In
terms of weights, the balance point is the place where the data to the left of the mean balances
the data to the right of the mean. This is seen in the deviations as the negative deviations
perfectly canceling out the positive deviations. How do we somehow remove the signs of the
negative deviations and yet still consider their magnitude in the determination of the variability?
Although what follows in this section may at first seem complicated, it will be illustrated in
detail with several examples. The trick that has historically been used is to

First, calculate all of the deviations.  (score – mean)  ( X i  X )

Second, square all the deviations.  (score – mean) squared  ( X i  X ) 2


n
Third, sum all the squared deviations.  sum (score –mean)squared                           i 1
( X i  X )2

Fourth, divide the sum of the squared deviations by (n-1). [At this step, you now have what is
known as the variance, this is symbolically represented as “s2”.]


n
2                                                                i 1
(X i  X )2
 s  sum (score –mean) squared; divided by (n-1) 
n 1

Fifth, take the square root of the variance. [At this step, you now have what is known as the
standard deviation, symbolically represented as “sd” or simply as “s”.]

 square root of the [sum (score – mean) squared this divided by (n-1)]

The equation that represents all five of these steps is


n
( X i  X )2
s       i 1

n 1

To complete the illustration we started with in this section, I will calculate the standard deviation
and standard error of the mean for both Data Set A and Data Set B of Quiz Question 3.

Each of the measures presented so far, describe in some fashion how spread out the data is from
the mean; the deviation, standard deviation, variance, and standard error of the mean. The last
indicator of “spread” is called the range. The range is typically defined in one of two fashions.
First, as the minimum value to the maximum value and second, as the maximum value – the
minimum value. You are probably almost equally likely to see the first as the second definition

112
being presented in the media. Since the range is a vastly inferior method of representing the
spread when compared to any of the other methods presented in this unit, it will not be used very
much, and is presented primarily for your information. However, in the context of this class the
range will be defined by our second definition; range = maximum value – minimum value.

Look again at Data Set A for Quiz Question 3

Deviations

Scores         Scores – Mean          Squared Deviations

0 (min.)       0 – 5 = -5            (-5)(-5) = (-5)2 = 25

0              0 – 5 = -5            (-5)(-5) = (-5)2 = 25

0              0 – 5 = -5            (-5)(-5) = (-5)2 = 25

0              0 – 5 = -5            (-5)(-5) = (-5)2 = 25

0              0 – 5 = -5            (-5)(-5) = (-5)2 = 25

5              5–5=0                  (0)(0)    = 02 = 0

10             10 – 5 = 5             (5)(5)    = 52 = 25

10             10 – 5 = 5             (5)(5)    = 52 = 25

10             10 – 5 = 5             (5)(5)    = 52 = 25

10             10 – 5 = 5             (5)(5)    = 52 = 25

10 (max.)      10 – 5 = 5             (5)(5)     = 52 = 25

SUM = 0                          SUM = 250

Sum of the squared deviations is 250

The sum of the squared deviations divided by the sample size is 250/10 = 25

Thus, the variance is 25 and the standard deviation is the    25 = 5

The range is the maximum value - minimum value = 10 - 0 = 10

113
Look again at Data Set B for Quiz Question 3

Deviations

Scores          Scores – Mean          Squared Deviations

0 (min.)       0 – 5 = -5             (-5)(-5) = (-5)2 = 25

1              1 – 5 = -4             (-4)(-4) = (-4)2 = 16

2              2 – 5 = -3             (-3)(-3) = (-3)2 = 9

3              3 – 5 = -2             (-2)(-2) = (-2)2 = 4

4              4 – 5 = -1             (-1)(-1) = (-1)2 = 1

5              5–5=0                  (0)(0)    = 02 = 0

6              6–5=1                  (1)(1)    = 12 = 1

7              7–5=2                  (2)(2)    = 22 = 4

8              8–5=3                  (3)(3)    = 32 = 9

9              9–5=4                  (4)(4)    = 42 = 16

10 (max.)       10 – 5 = 5            (5)(5)    = 52 = 25

SUM = 0                         SUM = 110

Sum of the squared deviations is 110

The sum of the squared deviations divided by the sample size is 110/10 = 11

Thus, the variance is 11

The standard deviation is the 11 = 3.317

The range is the maximum value - minimum value = 10 - 0 = 10

What do we note from these results? The farther the data points are from the mean (Data Set A),
the higher the standard deviation will be and that the closer that the data points are to the mean
(Data Set B), the smaller will be the standard deviation. Thus by including the standard deviation
among our numerical hints in the quiz, it would have been possible to differentiate these three
figures from one another, which wasn't possible at the beginning of this unit.

114
Going back to the last unit, it should be relatively clear that the standard deviation requires a
ratio or at least “ratio” level of measurement since the mean is required in its calculation. And of
course, since the standard deviation requires a ratio or at least “ratio” level of measurement, then
the standard error of the mean also must require this level of measurement as well.

Quiz 2
Let‟s return to the first example in the skew section of the previous unit, which is reproduced
below.

The following data were collected from a coffee machine that is supposed to dispense 10 ounce
cups of coffee. We purchased 23 cups of coffee (n = 23) and then measured the number of
ounces of coffee dispensed by this machine into each cup (the data below are in ounces).

3 4 5 5 5 6 6 6 6 7 7 7 7 7 8 8 8 8 9 9 9 10 11

Calculate

1. The deviations and squared deviations for these data.

2. The variance for these data.

3. The standard deviation for these data.

4. The range for these data.

115
Recall that the mean for these data was 7.

Deviations
Scores          Scores – Mean          Squared Deviations

3              (3-7) = -4             (-4)(-4) = (-4)2 = 16
4              (4-7) = -3             (-3)(-3) = (-3)2 = 9
5              (5-7) = -2             (-2)(-2) = (-2)2 = 4
5              (5-7) = -2             (-2)(-2) = (-2)2 = 4
5              (5-7) = -2             (-2)(-2) = (-2)2 = 4
6              (6-7) = -1             (-1)(-1) = (-1)2 = 1
6              (6-7) = -1             (-1)(-1) = (-1)2 = 1
6              (6-7) = -1             (-1)(-1) = (-1)2 = 1
6              (6-7) = -1             (-1)(-1) = (-1)2 = 1
7              (7-7) = 0               (0)(0) = (0)2 = 0
7              (7-7) = 0               (0)(0) = (0)2 = 0
7              (7-7) = 0               (0)(0) = (0)2 = 0
7              (7-7) = 0               (0)(0) = (0)2 = 0
7              (7-7) = 0               (0)(0) = (0)2 = 0
8              (8-7) = 1               (1)(1) = (1)2 = 1
8              (8-7) = 1               (1)(1) = (1)2 = 1
8              (8-7) = 1               (1)(1) = (1)2 = 1
8              (8-7) = 1               (1)(1) = (1)2 = 1
9              (9-7) = 2               (2)(2) = (2)2 = 4
9              (9-7) = 2               (2)(2) = (2)2 = 4
9              (9-7) = 2               (2)(2) = (2)2 = 4
10             (10-7) = 3             (3)(3) = (3)2 = 9
11             (11-7) = 4             (4)(4) = (4)2 = 16
SUM = 0                    SUM = 82

Sum of the squared deviations is 82

The sample size was 23, thus the sample size - 1 = 22

The sum of the squared deviations divided by the sample size is 82/22 = 3.727

Thus, the variance is 3.727 and the standard deviation is the        3.727 = 1.931

The range is the maximum value - minimum value = 11 - 3 = 8

116
Unit 8: Probability - Binomial
Historically the probability unit in any introductory course is the most problematic. Twenty five
years ago the introductory course included several weeks on probability and dealt with such
problems as selecting particular combinations of cards from a deck of cards or drawing red and
black balls from a bag contain some known number of each. These problems were complex and
had very little relevance to the course objectives. Although this course and nearly every other
introductory course offered anywhere will contain a unit on probability even today, the topic is
much shorter and much more relevant, and yet it is still one of the most challenging of the entire
course. The truth is that probability is one of the essential elements required to make the
transition from the more descriptive previous units to the more inferential units that follow. So
here goes.

Terms
Probability (pr) is the likelihood of something happening in the future. pr is the abbreviation of
the word probability.

Relative frequency is the likelihood of something happening in our sample.

A Bernoulli variable is one in which there are two and only two possible outcomes.

A Binomial variable is one in which a Bernoulli variable is conducted n independent times.

Fair is a way of describing a variable in which all possible individual outcomes are equally
likely.

Rule 1 - Any probability must be greater than or equal to 0

Rule 2 - Any probability must be les than or equal to 1

Rule 3 - The sum of the probabilities for all response possibilities (outcomes) must be 1.0

As in the Terms above, the definitions of probability and relative frequency are very similar. The
difference between the two is reminiscent of the distinction between samples and populations,
and the distinction between statistics and parameters. As we saw in unit 3 we use statistics from
samples to make predictions about parameters from populations. In a completely analogous
fashion we are going to use relative frequencies to make predictions about probabilities.

117
Bernoulli
Let the letter Y symbolically represent a Bernoulli variable. In a Bernoulli variable there are two
possible individual outcomes. For simplicity let‟s designate these two possible outcomes
numerically as 0 and as 1.

As an illustration, the single flip of a coin is a Bernoulli variable. Here, the outcome of the coin
flip = Y. The outcome of the flip can be either a head or a tail. We will ignore the possibility of
the outcome landing and staying on its edge. Thus, we could designate the outcome head by the
number 0 and the outcome tail by the number 1.

Let the probability of outcome 1 (a tail will be flipped) be written as

(*)        pr(outcome of the coin flip is tail) = pr(Y = 1) = p

and the probability of outcome 0 (a head will be flipped) be written as

pr(outcome of coin flip is head) = pr(Y = 0) = q = 1 – p

Please note; the letters "p" and "q" represent probabilities. "p" is the probability that a tail will be
the result of one flip of the coin and "q" is the probability that a head will be the result of one flip
of the coin.

If we repeat this Bernoulli variable "n" times, then the outcomes can be designated as

Y1, Y2, Y3, …., Yn

What do these letters mean? Y1 = the result from the first flip of the coin, Y2 = the result from
the second flip of the coin, Y3 = the result from the third flip of the coin, etc.

Example: If we flip the coin 5 times (n = 5), then we will have 5 separate outcomes; one from
each flip. Assume that in these 5 flips we obtained a head on the first flip, a head on the second, a
tail on the third, a head on the fourth, and a tail on the fifth. Symbolically we have, Y1 as the
outcome from the first flip of the coin, Y2 as the outcome from the second flip, Y3 as the
outcome from the third flip, Y4 as the outcome from the fourth flip, and Y5 as the outcome from
the fifth flip. The result of these 5 flips can be expressed in three different ways: the actual result,
the symbolic result, and the numeric result (recall from above that a head is represented
numerically as a 0 and a tail as a 1).

1st Flip   2nd Flip     3rd Flip    4th Flip   5th Flip
Symbolic Result          Y1         Y2           Y3          Y4        Y5
Numeric Result           0          0             1          0          1

A Binomial variable is the combined result from n Bernoulli variables, such that the Binomial
variable (symbolically designated by the letter X) is the sum of the n Bernoulli variables.

118
X = Y1 + Y2 + Y3 + …. + Yn

Note that this kind of looks like the equation for the mean only we haven‟t divided by n.
The equation to calculate the Binomial probability of an outcome is overly complicate,
unnecessary, and will not be used in this course.

Binomial
To begin this section I will develop most of the necessary mechanics of probability through the
following example.

Coin Flip Example: I am going to conduct an experiment in which I will flip a “fair” coin 4
times. As a reminder, the flip of each coin is a Bernoulli variable since there are only two
possible outcomes (head or tail). Symbolically these four outcomes can be represented by Y1, Y2,
Y3, and Y4.

Question 1: If I flip a coin once, then how many different outcomes are possible?

Answer 1: there are 2 possible outcomes; the first outcome could be a head (Y1 = 0) and the
second outcome could be a tail (Y1 = 1).

Question 2: If I flip a coin once, what type of variable have I obtained information about?

Answer 2: Bernoulli; a Bernoulli variable is one in which there are only two possible outcomes.

Question 3: What is the probability of the outcome of a single flip of the coin being a tail?

Answer 3: From (*) on the previous page, the probability of this outcome is “p”.

Question 4: In the scenario above for the Coin Flip Example, what does the word "fair" mean?

Digression - Fair
The point of this digression is not to illustrate some simple probability situations, but to present
some very fundamental and extremely important points about probability.

Rule 1: How small can any probability be? Is it possible for anything to occur less often than
never? Of course not. What is the number associated with the concept of never? Zero. If an
outcome has a zero probability of occurring, then it will never be possible for this outcome to
occur. For instance, is it possible to flip a tail with a two headed coin? No, this would be
impossible. Here pr(flip of a coin is a tail) = 0. Zero is the smallest any probability can be; this is
Rule 1.

119
Rule 2: How large can any probability be? Is it possible for anything to occur more often than
always? Of course not. What is the number associated with the concept of always? One. If a coin
has 2 heads (two headed), then how often will a head be the outcome of flipping this coin?
Always. Here pr(flip of a coin is a head) = 1. One is the largest any probability can be; this is
Rule 2.

Rule 3: If there are only two possible outcomes, then what is the probability that one or the other
will occur? For the coin illustration the question is, “if we flip a coin, what is the probability of
getting a head or a tail?” Well we know for certain that the result must be either a head or a tail,
thus we know with absolute certainty (probability equal one) that the result of our flip will be a
head or a tail. This is not a very interesting question, but it does tell us that the sum of the
probabilities of all possible outcomes is always equal to one; this is Rule 3. For this coin flip
problem this means that pr(flip a head) + pr(flip a tail) = 1.00

Definition: A Fair situation is one in which each possible outcome has exactly the same
probability of occurring.

First example: Bernoulli.

Flip of a single coin results in a head (Y=0)

Flip of a single coin results in a tail (Y=1)

Pr(Y=1) = p

Pr(Y=0) = q

But if the coin is "fair," then by the definition above each possible outcome (head and tail) must
have exactly the same probability of occurring, which can be stated as

pr(outcome of coin flip is head) = pr(outcome of coin flip is tail)

or symbolically as

pr(Y = 0) = pr(Y = 1)

or as

p=q.

From Rule 3: THE SUM OF ALL POSSIBLE OUTCOMES IS ALWAYS 1.

Thus    p+q = 1

and by subtracting p from each side, we have       q=1–p

120
If the coin is fair, then

p=q           (definition of fair)

and we now have

p+q=1          (from above)
and
p+p=1
Thus
2p = 1
and
p =½

Therefore

q = 1–p =1–½=½

Check p + q must equal 1 and here ½ + ½ does equal 1.

Also notice that p = ½ and q = ½, thus satisfying the condition imposed by “fair.”

Example 2: Roll of a "fair" die.

Let Y = result of one roll of a die. How many possible outcomes are there? We could roll a 1, 2,
3, 4, 5, or 6. Thus, there are 6 possible outcomes. They are

Roll of a single die results in a 1 (Y=1)

Roll of a single die results in a 2 (Y=2)

Roll of a single die results in a 3 (Y=3)

Roll of a single die results in a 4 (Y=4)

Roll of a single die results in a 4 (Y=5)

Roll of a single die results in a 6 (Y=6)

This is obviously not a Bernoulli variable. Why? Because there are 6 possible outcomes not two.
What are the probabilities associated with each outcome?

121
Let,

pr(Y=1) = p1

pr(Y=2) = p2

pr(Y=3) = p3

pr(Y=4) = p4

pr(Y=5) = p5

pr(Y=6) = p6

From Rule 3: THE SUM OF ALL POSSIBLE OUTCOMES IS ALWAYS 1.

Thus,

p1 + p2 + p3 + p4 + p5 + p6 = 1

Recall if the variable is fair, then all possible outcomes must have the same probability of
occurring. This means that

p1 = p2 = p3 = p4 = p5 = p6

If any situation is fair, then the probability of each outcome will equal

1 / (number of possible individual outcomes)

In this case, the number of possible outcome is 6, thus the probability of each individual outcome
must be

1/6

and

p1 = p2 = p3 = p4 = p5 = p6 = 1/6 .

As a check,

p1 + p2 + p3 + p4 + p5 + p6 = 1/6 + 1/6 + 1/6 + 1/6 + 1/6 + 1/6 = 1

End of the "Fair" Digression

122

“Fair” means that all possible outcomes have exactly the same probability of occurring. In the
Coin Flip Example since there are only two possible outcomes, then each outcome is equally
likely; hence ½. Thus at this point we now know that p = ½ if our coin is "fair."

Question 5: As presented in the Coin Flip Example scenario above, what type of variable have I

Answer 5: Binomial; a Binomial variable is one in which a Bernoulli variable is repeated "n"
independent times. In this case we have repeated the Bernoulli variable (the flip of a single coin
with only two possible outcomes) 4 times (n=4).

Question 6: What does independent mean?

Answer 6: Independent means that each repetition of the Bernoulli variable does not affect or
influence any other of the Bernoulli variables. In this example, it means that flipping the coin the
first time does not affect the outcome of the second flip of the coin. In other words, the
probability of flipping a head on the first flip of the coin remains the same on the second flip,
remains the same on the third flip, etc.

Question 7: Independence is a pretty reasonable assumption, but when might it not be?

Answer 7: There are many possible answers here. For instance, if the coin were made out of soft
lead, then the flipping of the coin the first time might easily result in the shape of the coin
changing. If its shape changes, then would it be reasonable to assume that the probability of
getting a head or tail might also change? YES ! IF this were the case, then the flipping of such a
coin would NOT result in a Binomial variable.

Question 8: If we flip a “fair” coin 4 times, then how many possible individual outcomes are
there?

Answer 8: It is possible to get a head or a tail on the first flip, a head or a tail on the second flip,
a head or a tail on the third flip, and a head or a tail on the fourth flip. Thus, how many
individual outcomes are there? The simple way to calculate the general answer to this question is
by multiplying the number of outcomes from each repetition. For example, if we collect the
result from any variable 3 times, then the total number of outcomes is equal to the number of
possible outcomes (the first time) times the number of possible outcomes (the second time) times
the number of possible outcomes (the third time).

In this situation we have repeated the Bernoulli variable four times and there are two possible
outcomes from each Bernoulli variable. Thus, the number of individual outcomes should be 2
(the first time collected) times 2 (the second time collected) times 2 (the third time collected)
times 2 (the fourth time collected) or 16.

123
Question 9: What is each of the 16 possible individual outcomes?

Answer 9: This answer is easiest to derive if we create a column for every variable (4 in this
case) and then write down the possible outcomes in some logical manner. If we use an H for a
Head and a T for a tail, then the logical manner is the following. Start in the fourth column and
list the outcome possibilities alternating between the two. In this case, H then T, then H, then T,
etc. Go to the third column. List the outcomes once again alternating between the possibilities, H
then T. However, H must be listed twice to cover each of the possibilities presented in the fourth
column. Then the T must be listed twice, then the H twice again, etc. Go to the second column.
List the outcomes once again alternating between the possibilities, H then T. Repeat H enough to
cover one cycle of H and T in the third column, then repeat T enough to cover one cycle of H
and T in the third column. Continue. Finally go to the first column. List the outcomes once again
alternating between the possibilities, H then T. Repeat H enough to cover one cycle of H and T
in the second column, then repeat T enough to cover one cycle of H and T in the second column.
Note; when all the individual outcomes have been listed, there should not be any other unique
arrangement of the letters (H and T) possible and no arrangement of the letters should be
duplicated. (I know that this sounds complicated, but trust me in this presentation. You will not
be required to generate this answer or one like it.)

Table 1
Outcome        1ST     2ND     3RD     4TH
1               H        H       H        H
2               H        H       H        T
3               H        H       T       H
4               H        H       T       T
5               H        T      H        H
6               H        T      H        T
7               H        T      T         H
8               H        T      T         T
9               T       H       H        H
10              T       H       H        T
11              T       H       T        H
12              T       H       T        T
13              T       T       H        H
14              T       T       H        T
15              T       T       T        H
16              T       T       T        T

Question 10: What is the probability of each of the individual outcomes from question 9 if the
coin is "fair?"

Answer 10: Recall that “fair” means that each of the individual outcomes will have the same
probability which is equal to

1 / (the number of possible individual outcomes) = 1/16

124
Question 11: If we define the result of the Binomial variable as X which is the sum of the results
from the repeated Bernoulli variables, then how many different Binomial results will there be?

X = Y1 + Y2 + Y3 + …. + Yn

and since n = 4

X = Y1 + Y2 + Y3 + Y4

and if we let Y = 0 when a head is flipped and Y =1 when a tail is flipped, then how small can
the sum be? If we flipped four heads, then the sum would be 0 = 0 + 0 + 0 + 0. How large can the
sum be? If we flipped four tails, then the sum would be 4 = 1 + 1 + 1 + 1.

Now we can convert the Table 1 presented in answer 9 to X, which is the sum of the four
variables. I have now replaced the H with 0 and the T with 1.

Table 2
Outcome       1ST    2ND    3RD    4TH            SUM
1              0      0      0       0              0+0+0+0=0
2              0      0      0       1              0+0+0+1=1
3              0      0      1       0              0+0+1+0=1
4              0      0      1       1              0+0+1+1=2
5              0      1      0       0              0+1+0+0=1
6              0      1      0       1              0+1+0+1=2
7              0      1      1       0              0+1+1+0=2
8              0      1      1       1              0+1+1+1=3
9              1      0      0       0              1+0+0+0=1
10             1      0      0       1              1+0+0+1=2
11             1      0      1       0              1+0+1+0=2
12             1      0      1       1              1+0+1+1=3
13             1      1      0       0              1+1+0+0=2
14             1      1      0       1              1+1+0+1=3
15             1      1      1       0              1+1+1+0=3
16             1      1      1       1              1+1+1+1=4

Thus, how many different binomial outcome results are there?

Five; these results are 0, 1, 2, 3, and 4. The number simply answers the question, “how many
tails were flipped out of four flips?”

125
Question 12: What are the probabilities associated with each of the five possible outcomes from
the Binomial variable?

Answer 12: The easiest way to do this is simply to add the probabilities of all the individual
outcomes that each produce the same binomial outcome from Table 2. For instance, “what is the
probability of getting the binomial outcome 2?” What this question says in English is, “What is
the probability of getting 2 tails out of four flips of a "fair" coin?” Using the easiest method to
calculate this probability should be to add the probability of the individual outcomes 4, 6, 7, 10,
11, and 13 from Table 2 in answer 11. Since each of the individual outcomes has the same
probability of 1/16 (answer 10), then this probability should be

pr(X = 2) = pr(outcome 4) + pr(outcome 6) + pr(outcome 7) + pr(outcome 10) +
pr(outcome 11) + pr(outcome 13)

= 1/16 + 1/16 + 1/16 + 1/16 + 1/16 + 1/16 = 6/16.

Note: since in this example each of the individual outcomes is equally likely, then all we
have done is multiply the number of times that the outcome 2 occurs in Table 2 by the
common probability of 1/16.

Similarly we can calculate the probability of the other Binomial outcomes (0, 1, 3, and 4)

pr(X = 0) = pr(outcome 1) = 1/16

pr(X = 1) = pr(outcome 2) + pr(outcome 3) + pr(outcome 5) + pr(outcome 9)

= 1/16 + 1/16 + 1/16 + 1/16 = 4/16

pr(X = 3) = pr(outcome 8) + pr(outcome 12) + pr(outcome 14) + pr(outcome 15)

= 1/16 + 1/16 + 1/16 + 1/16 = 4/16

pr(X = 4) = pr(outcome 16) = 1/16

It is important to see that in the four repetitions of a Bernoulli variable that there are 16
total outcomes. However, within these 16 total outcomes there are only 5 unique Binomial
outcomes; the number of tails flipped in 4 flips of a coin can be 0 tails (outcome 1), 1 tail
(outcome 2), 2 tails (outcome 3), 3 tails (outcome 4), or 4 tails (outcome 5).

Now it is possible to collect these results in a table form as seen in Unit 5, except that I am now
summarizing based on probability where we used relative frequency in Unit 5.

126
Table 3

Outcome         probability
1 (0 tails)       1/16
2 (1 tail)        4/16
3 (2 tails)       6/16
4 (3 tails)       4/16
5 (4 tails)       1/16

Recall from the Digression that the sum of all possible outcomes MUST EQUAL 1. Is this true
for the Binomial?

pr(X=0) + pr(X=1) + pr(x=2) + pr (x=3) + pr(x=4)

= 1/16 + 4/16 + 6/16 + 4/16 + 1/16 = 16/16 = 1

It works! It‟s a miracle. Well not really.

NOTE: the repeated performance of the fair Bernoulli variable does NOT result in a fair
Binomial variable. Specifically, the probabilities for the 5 outcomes from Table 3 are not
equal to one another.

Question 13: If I were to repeat this Bernoulli variable 10 times (10 flips of the fair coin), then
how many possible individual outcomes would be possible?

Answer 13: Recall from answer 8 that the number of individual outcomes can be found by
multiplying the number of possibilities from the first variable times the number of possibilities
from the second variable times the number of possibilities from the third variable etc. In this case
there are 10 repetitions each with two outcomes. So the total number of individual outcomes
should be

2 times 2 times 2 times 2 times 2 times 2 time 2 times 2 times 2 times 2 = 210 = 1,024

Question 14: What is each of the 1,024 possible individual outcomes?

Answer 14: While this question could be answered, it would be very time consuming to produce
and hence it will not be done.

Question 15: What is the probability associated with each of the 1,024 possible individual
outcomes if the coin was “fair?”

1 / (the number of possible individual outcomes) = 1/1024

127
Question 16: If we define the result of the Binomial variable as X which is the sum of the results
from the repeated Bernoulli variables, then how many unique Binomial results will there be?
This is similar to the discussion in answer 12 and the use of Table 3.

Answer 16: Recall X is simply the sum, hence the sum can be as small as 0 (all heads) and as
high as 10 (all tails). How many unique possible Binomial outcomes will there be? 11 which is
equal to 1 more than the number of times that the Bernoulli variable was repeated (n+1). Note:
this also works for Question 100. The Bernoulli was repeated 4 times (n=4) and there were 5
unique (4+1) Binomial outcomes.

Question 17: What is the probability of each of the 11 possible Binomial outcomes?

Answer 17: This is impossible to determine using the method of answers 11 and 12, unless all of
the 1,024 individual outcomes were listed.

Question 18: Is there a short cut way to answer question 17?

Answer 18: Yes. This is where we would use the equation for calculating Binomial
probabilities. I will leave this equation to a course in probability. It will not be shown or
used in this course.

Rather than learning to use the Binomial equation for calculating probabilities, I have provided
the probabilities associated with the 11 possible outcomes.

Using the Binomial Equation for n = 10 (number of flips), p = ½ (the probability of a fair
outcome from the Bernoulli variable that the Binomial is based on), and letting X = the number
of tails that we expect to occur (Outcome), then

Outcome                     Probability Statement       Actual Probability

0 (0 tails in 10 flips)      pr (X = 0)                  .00098
1 (1 tail in 10 flips)       pr (X = 1)                  .00977
2 (2 tails in 10 flips)      pr (X = 2)                   .04395
3 (3 tails in 10 flips)      pr (X = 3)                   .11719
4 (4 tails in 10 flips)      pr (X = 4)                   .20508
5 (5 tails in 10 flips)      pr (X = 5)                   .24609
6 (6 tails in 10 flips)      pr (X = 6)                   .20508
7 (7 tails in 10 flips)      pr (X = 7)                   .11719
8 (8 tails in 10 flips)      pr (X = 8)                   .04395
9 (9 tails in 10 flips)      pr (X = 9)                   .00977
10 (10 tails in 10 flips)    pr (X = 10)                  .00098

128
Question 19: What have you learned?

Answer 19: In part you will have to answer this question for yourself. In reality there isn‟t much
use for this entire unit, except for the fact the table from Question 18 is very important for the
next unit and critical to understanding the remainder of the course. Thus, in itself this unit isn‟t
much, but in the bigger picture it is vital. In any event, I would try to carry away the big picture
items from this unit (the statements in color) and proceed to the next unit after you catch your
breath.

KEY THINGS TO REMEMBER

(1) What does "fair" mean
(2) The 3 probability rules, presented in the "Fair" digression
(3) The probabilities listed in answer 18

129
Unit 9: Normal Distribution and Probabilities

Terms
Distribution - another name for a histogram or line graph (the vertical axis displays frequencies,
relative frequencies, or percentages)

Probability Distribution - a distribution in which the vertical axis displays probabilities

Normal Distribution (AKA Normal Probability Distribution, but the word probability is usually
omitted) - for the purpose of this course a distribution that is unimodal and symmetric in which
the mean =  and the standard deviation = 

 = population mean

= population standard deviation

2 = population variance

Standard Normal (this is also a probability distribution, but the word probability is usually
omitted) - a distribution that is unimodal and symmetric in which the mean = 0 and the standard
deviation = 1.

Standard Normal Conversion Formula (SNCF 1)                   Z = (X – ) / 

Standard Normal Conversion Formula 2 (SNCF 2)                 X = (Z)(

Fact 1     pr(Z < z) = 1.0000 - pr(Z > z)

Fact 2     pr(Z < -z) = pr(Z > z)

Fact 3     pr(z1 < Z < z2) = pr(Z < z2) - pr(Z < z1)

130
Binomial to Normal and Standard Normal
In the last unit the probabilities associated with the Binomial variable were presented. As a
reminder, when p = .5 and n = 10, the resultant 11 probabilities were the following rounded to
three decimal places.

pr (X = 0) = .001
pr (X = 1) = .010
pr (X = 2) = .044
pr (X = 3) = .118
pr (X = 4) = .205
pr (X = 5) = .246
pr (X = 6) = .205
pr (X = 7) = .118
pr (X = 8) = .004
pr (X = 9) = .010
pr (X = 10) = .001

Using the histogram graphing principles presented earlier in the course (Unit 5) we can create a
graph to reflect these results. Such a graph would have the values of X (0 to 10) in the horizontal
axis and the values of the probabilities in the vertical axis. This plot would look like

Figure 1

131
Another name for the histogram is distribution and this word is more commonly used by
statisticians. If the vertical axis presented the frequencies, then the graph would be called the
frequency distribution. In this case the vertical axis presents the probabilities; hence this graph
would be called the probability distribution. In particular the graph above would be called the
Binomial probability distribution.

What are some of the characteristics of any probability distribution?

The first characteristic is that the “bars” are reflective of the probabilities associated with the
value of the variable displayed in the horizontal axis. For instance the height of the bar above the
value of the variable = 4 is what? It is .205, the probability of having four tails out of 10 flips of
the coin. If the width of the bar is 1 and the height of the bar is .205, what is area of the rectangle
above the number 4? The area of any rectangle is the width times the height or in this case 1
times .205 = .205. Note here that the area of the rectangle is equal to the probability.

If the areas of the rectangles are equal to the probabilities, what is the total area represented by
all of the rectangles? By Rule 3 presented in Unit 8, the sum of all probabilities for any problem
is always equal to 1; hence the total area of the rectangles must also equal 1. This is the second
characteristic of a probability distribution.

From a Binomial distribution it would be an easy task to calculate probabilities. For instance

pr (X = 6) = area of the rectangle above 6 = .205.

pr (X > 6) = area of the rectangles for 6, 7, 8, 9, and 10
= .205 + .118 + .044 + .010 + .001 = .378

This probability does include 6, since the > sign means all values above 6
and including 6.

pr (X < 6) = area of the rectangles for 0, 1, 2, 3, 4, and 5
= .001 + .010 + .044 + .118 + .205 + .246 = .624

This probability does not include 6, since the < sign means all values below 6, but
not including 6.

pr (X < 6) + pr (X > 6) should always equal 1.

Why? Because the sum of all possible probabilities must always equal 1.

Here they add to .378 + .624 = 1.002.

Why is this number unequal to 1.000?

Because we rounded the probabilities to three decimal places. This is round-off error.

132
Thus, if we know that pr (X < 6) + pr (X > 6) = 1.000, then using some algebra (subtracting pr
(X > 6) from both sides)

pr (X < 6 ) = 1.000 – pr (X > 6)

or by subtracting pr (X < 6 ) from both sides, that

pr (X > 6) = 1.000 – pr (X < 6)

Now let‟s change our mind set a little. If I were to convert the binomial probability distribution
above to a line chart, it would look like the following.

Figure 2

What does this graph look like?

1. It looks like it has only one mode. This is called unimodal. (from Unit 6)

2. It looks like it is symmetric. (from Unit 6)

3. If it is symmetric and unimodal, then we know that the median = mean = mode. Hence we
know that the middle of this graph is the mean. (from Unit 6)

133
In Figure 2 if you smooth out the line a little (so that it looks more like curved pieces than
straight line pieces), then what would this shape look like? With a little imagination you might
be able to see a bell. This shape is often called the bell shaped distribution or the normal
distribution. Unfortunately with smooth shaped curves like Figure 3 below it is impossible to
simply calculate probabilities. For the determination of areas under curves (the probabilities) you
need calculus. Fortunately for us someone has taken it upon themselves many years ago to
calculate various probabilities for normal distributions so we don‟t have to do the calculations on
our own and has placed these calculations in convenient tables.

The primary purpose of this unit is to help you to become familiar with using tables to
calculate probabilities. In this unit you will be introduced to the Table for the Normal
Distribution. In later units you will be introduced to three other tables. All of the tables used
and needed in this course can be found in Unit 21 - Tables.

From the Binomial Probability Distribution above (Figure 1) I was able to calculate the
probability for pr(X > 6) very easily. From the perspective of the smoothed Binomial Probability
line chart, what would this probability look like?

Figure 3

Without the rectangles calculating the area under the curve and to the right of 6 would be
difficult. Here is where we need calculus or access to a table. The table we are going to use in
this unit is the Standard Normal Table which is Table 1 in Unit 21. However, before we can get
to this table we will need to do some preliminary calculations.

134
Preliminaries
How many different normal shapes are there? Looking at Figure 4 below, how many of these are
normal?

Figure 4

135
The answer is that all of them are normal. They all look unimodal and symmetric. In fact, there
are literally an infinite number of different normal shapes. With so many different forms how is
it possible to create a table that would produce the probabilities for all of them? The answer is
found in the following equation. I will call it the standard normal conversion formula (sncf).

Z = (X – ) / 

What do these letters mean?

X is our variable. It is a short hand way of representing all of the possible values associated with
this variable. For instance in our long running coin flip problem, X was the number of tails
produced in an experiment where a coin was to be flipped 10 times. X could be anyone of the
values from 0 (no tails flipped) to 10 (10 tails flipped).

 is the population mean (as seen in Unit 6). We write statistics (values that come from samples)
in our familiar Roman letters and write parameters (values that come from populations) in Greek
letters.

 is the population standard deviation and 2 would be the population variance (as seen in Unit
7).

Finally, Z represents the standard normal distribution.

The purpose of the standard normal conversion formula (sncf) is to convert any normal of any
shape (Figure 4) into the very same standard normal which appears in Figure 5 below.

The next statement is not entirely accurate from a statistical perspective; however, it is an
accurate statement from a conceptual perspective consistent with the presentation in this unit.
If we know the population mean and standard deviation for a probability distribution, then we
can calculate the probabilities for this distribution using the standard normal conversion formula.
How can we use the standard normal conversion formula for calculating the probability in Figure
3, pr (X > 6), and how well does this compare to the probability that we calculated from the
original Binomial Probability Distribution, Figure 1?

From Figure 3 we wanted to know       pr (X > 6) which becomes the following after applying
the standard normal conversion formula and using some algebra

pr (X > 6) = pr (X –  > 6 – )                        Subtracting  from each side.

= pr [(X – )/ > (6 – )/)]                Dividing each side by  , we now have
(X – )/[This is sncf]

= pr [Z > (6 – )/)]                       Note that the "sign" in the expression,
which is ">", never changes.

136
This is now in the proper form (in terms of Z) and if we knew the population mean and standard
deviation for the Binomial, then we should be able to access Table 1 in Unit 21. Although I am
not going to show how to get the population mean and standard deviation for the Binomial, I will
tell you that these are well known and that for a Binomial

 = (n)(p)   and    = (n)(p)(1-p)

In the problem we have been examining, p was equal to the probability of flipping a tail using a
fair coin which is .5 and the number of times this was repeated was n or 10 times. Thus we know

 = (n)(p) = (10)(.5) = 5
and
 = (n)(p)(1-p) = (10)(.5)(1-.5) = (10)(.25) = 2.5
and
= square root of 2 = square root of 2.5 = 1.58

and now replacing with 2.5 and with 1.58, we have

pr [Z > (6 – 5)/1.58] = pr [Z > .63]

Now we are finally ready to use Table 1 found in Unit 21. Looking at the graph on the top of
Table 1 you will see a shaded region to the right of a cutoff value. This shaded region is the area
under the curve to the right of the cutoff value. Recall that the area under the curve is the
probability. So the shaded area in the graph is the probability associated with the probability
statement

pr (Z > cutoff)

What does this say about the table? It says that all of the values in the table are calculated for
ONLY THIS TYPE OF PROBABILITY STATEMENT. (Z > cutoff)

137
Using the Standardized Normal Table - Unit 21
Upon close inspection it can be seen that the table is composed of numbers in a first row which is
labeled CUTOFF. There are also numbers in a first column which is also labeled CUTOFF. If we
remove the first row from the table and the first column from the table, you will see a bunch of
numbers which are presented to four decimal places. Not too surprisely the first row and first
column are used to identify the CUTOFF value. The numbers to four decimal places are
the probabilities.

The probability statement found at the top of the Standardized Normal Distribution Table is very
similar to the one we are trying to find the answer too. The only difference is that in the table we
have > and in our probability statement we have > . This is easily resolved by knowing that for
the standard normal distribution

pr (Z > cutoff) = pr (Z > cutoff)

so it does not matter if the equal sign is present or not when using this table. Knowing this, then
the probability statement of the table is identical to our probability statement if

Cutoff = .63

Now, if cutoff = .63, then what is the probability associated with   pr [Z > .63] ?

(***) Recall that we use the first column and the first row of this table to identify the CUTOFF
value. Specifically, we use the first column of the table to identify the integer and first decimal
point of the cutoff value, and we use the first row to identify the second decimal point. Thus for
our cutoff we need to go down the first column to the value marked as .6 which is the same as
0.6. Then we need to go across the first row until we reach the value marked as .03. Putting 0.6
and .03 together (add them) produces 0.63, which is our cutoff value of interest. The probability
associated with the cutoff value of .63 can be found by placing a ruler under the row labeled .6
and going over to the column labeled .03. At the intersection of the .6 row and the .03 column
you should find the value .2643. If you did not get this value, then go over this paragraph a
second time. Now we know

pr [Z > .63] = .2643

How does this compare to the probability we calculated from the Binomial Probability
Distribution?
Pr (X > 6) = .378
Is .2643 (from the Standard Normal) very close to .378 (from the Binomial Probability
Distribution)?

No !

However, the comparison above really isn‟t completely fair. You will notice in Figure 3 that the
line separating the shaded region from the unshaded region is directly above 6. In Figure 1

138
however the rectangle containing 6 really begins at 5.5, only half of this rectangle is above 6.
Thus a better comparison would be to take all of the rectangle for value 10, all of the rectangle
for value 9, all of the rectangle for value 8, all of the rectangle for value 7, and only half of the
rectangle for value 6. These would add to .2744.

Now the comparison doesn‟t look so bad. .2744 for the Binomial and .2643 for the Standard
Normal.

We could continue to press on with the similarities between these two distributions, but the
Binomial has really completed its purpose, the lead in to the Standard Normal. From here we will
not use the Binomial again.

Here are some direct examples from the Standard Normal Table in Unit 21 to help you get
comfortable using the table. See if you can obtain the probabilities for the cutoffs as specified. If
you can't, then review the paragraph in the indigo color again. (marked by ***)

Pr [Z > .44] = .3300
Pr [Z > 1.57] = .0582
Pr [Z > 1.96] = .0250
Pr [Z > 2.33] = .0099
Pr [Z > 2.82] = .0024
Pr [Z > 3.00] = .0013

Notice that as Z gets bigger and bigger, that the associated probability gets smaller and smaller.

Here is a comparison of any normal distribution with the standard normal distribution.

Normal Characteristics

1.   Unimodal
2.   Symmetric
3.   Mean = 
4.   Standard deviation = 
5.   The normal distribution looks very similar to Figure 2 and those in Figure 4

Standard Normal Characteristics

1.   Unimodal
2.   Symmetric
3.   Mean = 
4.   Standard deviation = 
5.   The standard normal distribution looks very similar to Figure 5 below

139
Figure 5

IT IS VERY IMPORTANT TO NOTE THAT 0 IS THE MEAN OF THE STANDARD
NORMAL, WHICH INDICATES THAT ALL VALUES (CUTOFFS) BELOW THE
MEAN WILL BE NEGATIVE AND ALL VALUES (CUTOFFS) ABOVE THE MEAN
WILL BE POSITIVE. THIS IS THE RESULT OF THE STANDARD NORMAL
CONVERSION FORMULA.

Let‟s finish this unit with several examples that will assist you in learning how to use the
Standardize Normal Distribution Table specifically, and they should help you to use the other
tables found in Unit 21 in the later units.

Example 1. What is the probability associated with the shaded region of the standard normal
distribution in Figure 6?

Figure 6

This is a pretty easy one to calculate. We find the row identified by 1.0 in the first column and
the column identified by .00 in the first row, which intersect at the value .1587 in the table.

Example 2. What is the probability associated with the unshaded region of the standard normal
distribution in Figure 6?

140
Recalling that the sum of all probabilities must be 1 and that the entire area under the curve in a
probability distribution is 1, then we know that the sum of the shaded region in Figure 6 and the
unshaded region in Figure 6 must be 1.

pr (Z < 1.00) + pr (Z > 1.00) = 1.0000

Thus, using some algebra we know that

Unshaded region = pr (Z < 1.00) = 1.00 – pr (Z > 1.00) = 1.0000 - .1587 = .8413

In general then we know that

Fact 1          pr (Z < z) = 1.0000 – pr (Z > z)        Little z refers to any specific cutoff
value.

Example 3. What is the probability associated with the shaded region of the standard normal
distribution in Figure 7?

Figure 7

At first this doesn‟t seem as easy until we remember that the standard normal distribution is
symmetric. This means that the left hand side of the graph looks exactly like the mirror image of
the right hand side. If you hold a mirror to Figure 7 you will notice that it is the mirror image of
Figure 6. What does this mean? It means that

pr (Z < – 1.00) = pr (Z > 1.00) = .1587

Even though it is not immediately apparent when looking at the Standardized Normal
Distribution Table, it can be seen that there are no negative numbers in either the first row or first
column of the table and hence the table can not be used to find probabilities for negative cutoff
values. Thus we must use a trick to convert negative cutoff values into positive cutoff values.
This trick is presented in Fact 2.

In general then we know that

Fact 2          pr (Z < - z) = pr (Z > z)

141
The probability help section link below will take you to a number of example problems
using these basic principles, which are worked out in DETAIL. I would suggest looking at
the probability help section prior to attempting the quiz. If you can do the quiz problems
on your own (use the answers only to check yourself), then you should be well prepared to
take the test over this material.

Probability Help Section
When calculating probabilities using the Standardized Normal Distribution table, things to note

1. standard normal conversion formula (sncf)

Z = (X – ) /                                              Label as       sncf 1

Which also means that

X = (Z) () +                                              Label as       sncf 2

2. pr (Z < z) = 1.0000 – pr (Z > z)                                 Label as      fact 1

3. pr (Z < -z) = pr (Z > z)                                        Label as       fact 2

4. pr (z1 < Z < z2) = pr (Z < z2) – pr (Z < z1)                    Label as       fact 3

Why are facts 1 and 2 necessary?

The table that we are using ONLY GIVES probabilities for pr (Z > z). This is the standard form
for our table. This means that if the information in the parentheses is in any other form, then we
can not calculate the probability from the table. Thus, it is necessary to know how to convert any
probability expression (the material in the parentheses) to the standard form. This is the purpose
of facts 1 and 2. Fact 3 helps us to convert more complicated probability expressions into two
simpler pieces. Then we can use facts 1 and 2 to convert to the standard form presented in our
table. Fact 3 is found in the quiz section and is not needed immediately.

In general, probability problems can take one of six possible types. These are

1.   Given X, what is Z
2.   Given Z, what is X
3.   Given Z, what is the probability
4.   Given X, what is the probability
5.   Given the probability, what is Z (this will NOT be used in this course)
6.   Given the probability, what is X (this will NOT be used in this course)

142
The first step in using this section is to identify which type of problem that you are trying to
solve and then to follow the specific instructions, provided below, as best as you can.

For the examples below, use = 100 and  = 10

1. Given X, what is Z          ex: If X = 85, then Z = ?

To solve this problem, use sncf 1 directly.

Z = (X – ) /  = (85 – 100) / 10 = (-15) / 10 = - 1.50

Here is an important note. The correct equation is (X – ) divided by It is not X - ()

The correct use of sncf 1 produces Z = -1.50 (as seen above)

The incorrect use of sncf 1 produces Z = 85 - (100/10) = 85 - 10 = 75.

2. Given Z, what is X          ex: If Z = -2.25, then X = ?

To solve this problem, use sncf 2 directly.

Z = (Z) () +  = (-2.25) (10) + 100 = (-22.5) + 100 = 77.5

3. Given Z, what is the probability (in these problems you do not need to know the values for 
and , there are not used since we already have Z)

There are 3 different types for this problem that we will use. Let‟s call them 3A, 3B, and 3C

3A.     pr (Z > z)      This problem is exactly in the form of our table. All that
needs to be done is look up the value (z) in the table directly.
Note: in our table this value is called cutoff.

Ex:      pr (Z > 1.74)          In the row 1.7 and the column .04 and you will find
the probability, which is .0409

3B.     pr (Z < z)      This problem must be converted into the standard form. The
appropriate conversion formula is expressed in fact 1.

= 1.0000 – pr (Z > z)           The unknown part is now problem 3A. Look up the
probability from the table and subtract it from one.

Ex:      pr (Z < 0.86) = 1.0000 – pr (Z > 0.86)     Look up this probability (for the
part in blue) in the row .8 and
the column .06 It is .1949

= 1.0000 - .1949 = .8051   This is the answer (plum)

143
3C.     pr (Z < - z)   Once again, this problem is not in the standard form for our table
and must be converted. The conversion formula that is appropriate
here is fact 2.

= pr (Z > z)   This is now in the form necessary to use our table.

Ex:   pr (Z < -1.61) = pr (Z > 1.61) This probability is found by directly using
the table. In the row 1.6 and the column .01
is the probability .0537

4. Given X, what is the probability

The first step in these problems is to convert the probability expression to Z using the
standard normal conversion formula (sncf 1)

Ex:     pr (X < 80) = pr [(X – ) < (80 – 100)]        Notice that we have only subtracted
the mean ( on the left and 100 on
the right from the score (X on the left
and 80 on the right)

= pr [(X – ) /  < (80 – 100) / 10]      Notice that we have only divided by the
standard deviation ( on the left and 10
on the right)

= pr [Z < (-20) / 10] = pr (Z < -2.00)    This gives Z by using sncf 1 and its value
(-2.00)

While it might be difficult to see at this point, once you have converted the expression in
X to the expression in Z, you will have one of the three types in 3. above Simply go back
to 3 and do the problem. For instance, continuing with this example

pr (Z < -2.00) = pr (Z > 2.00) = .0228                 From the strategy in 3c above.

Here are 9 example problems presented in random order. Remember that the "trick" in doing
these problems is to first identify the type of the problem (match to one of the four types
presented and illustrated). For these 9 problems use  = 75 and  = 8.

Example 1: pr (Z > .87) = ?

This is a type 3 problem. Specifically it is a type 3A problem. It is immediately in the form of the
table and can be looked up directly.

pr (Z > .87) = .1922

144
Example 2: pr (Z < - .38) = ?

This is a type 3 problem. Specifically it is a type 3C problem. It must be converted to the
appropriate form using fact 2.

pr (Z < -.38) = pr (Z > .38) = .3520

Example 3: If X = 95, then Z = ?

This is a type 1 question. Simply use sncf 1.

Z = (X - ) /  = (95 - 75) / 8 = 20 / 8 = 2.50

Example 4: pr (X > 95) = ?

This is a type 4 problem and must FIRST be converted to Z.

pr (X > 95) = pr [(X - m) > 95 - 75] = pr [(X - m)/s > (95-75)/8] = pr (Z > 2.50)

This is now a type 3A problem. So we can use the table immediately.

pr (Z > 2.50) = .0062

Example 5: pr (Z < 1.44) = ?

This is a type 3 problem. Specifically it is a type 3B problem. It must be converted to the
appropriate form using fact 1.

pr (Z < 1.44) = 1.0000 - pr (Z > 1.44) = 1.000 - .0749 = . 9251

Example 6: If Z = .35, then X = ?

This is a type 2 question. Simply use sncf 2.

X = Z () +  = .35 (8) + 75 = 2.80 + 75 = 77.80

Example 7: pr (Z > 1.96) = ?

This is a type 3 problem. Specifically it is a type 3A problem. It is immediately in the form of the
table and can be looked up directly.

pr (Z > 1.96) = .0250

145
Example 8: pr (Z < -1.96) = ?

This is a type 3 problem. Specifically it is a type 3C problem. It must be converted to the
appropriate form using fact 2.

pr (Z < -1.96) = pr (Z > 1.96) = .0250

Example 9: pr (-1.96 < Z < 1.96) = ?

This is definitely a type 3 question; however, it is not in one of the three subtypes (3A, 3B, or
3C) that were presented. But this should be recognizable as in the general form listed as "fact 3."
Perhaps we can understand this problem by using fact 3 and then determining which specific
type that it is.

pr (-1.96 < Z < 1.96) = pr (Z < 1.96) - pr (Z < -1.96)           using fact 3

Now this problem has two parts. The first part is pr (Z < 1.96) which is specifically a type 3B
problem. The second part is pr (Z < -1.96) which is specifically a type 3C problem. So now we
should be able to complete the problem.

pr (Z < 1.96) - pr (Z < -1.96) = 1.0000 - pr (Z > 1.96) - pr ( Z < -1.96)

the underlined portion was converted using 3B

and now

1.0000 - pr (Z > 1.96) - pr (Z < -1.96) = 1.0000 - pr (Z > 1.96) - pr (Z > 1.96)

the underlined portion was converted using 3C

and now

1.0000 - .0250 - .0250 = 1.0000 - .0500 = .9500

so     pr (-1.96 < Z < 1.96) = .9500

The red portion of this last problem is important to remember.

146
Quiz
Use the Standard Normal Conversion Formula, the Standard Normal Table, Fact 1, and Fact 2, to
answer the following 4 questions. Hint: draw the graph first and then shade the identified area. I
WOULD STRONGLY ENCOURAGE YOU TRY THESE PROBLEMS PRIOR TO LOOKING

1. If  = 75 and  = 10, then pr (X > 83) = ?

2. If  = 75 and  = 10, then pr (X < 86.4) = ?

3. If  = 75 and  = 10, then pr (X < 66.8) = ?

4. If  = 75 and  = 10, then pr (66.8 < X < 86.4) = ?

Question 5 is a good challenge question to see if you know how to fully use the material of this
unit.

5. If pr (-z < Z < z) = .95, then what does z equal?

Another type of problem

6. If Z = 1.80, and  = 75 and  = 10, then what does X equal?

The figures associated with these probability statements are on the next page. The quiz

147
Graph for Question 1

Graph for Question 2

Graph for Question 3

148
Graph for Question 4

Graph for Question 5

149
1. pr (X > 83) = pr [(x – )/ > (83 – 75)/10]           Standard Normal Conversion (sncf 1)

= pr [Z > .80] = .2119

2. pr (X < 86.4) = pr [(x – )/ < (86.4 – 75)/10]       Standard Normal Conversion (sncf 1)

= pr [Z < 1.14]

= 1.0000 – pr [Z > 1.14]                 Fact 1

= 1.000 - .1271 = .8729

3. pr (X < 66.8) = pr [(x – )/ < (66.8 – 75)/10]       Standard Normal Conversion (sncf 1)

= pr [Z < -.82]

= pr [Z> .82] = .2061                    Fact 2

4. pr (66.8 < X < 86.4)

This is not a probability statement like any other we have seen so far. However if you draw the
graph and shade the indicated area you will see we want to know the probability of being
between the two indicated values. There is a real easy answer to problem 4, which is to note that
problem 2 has you calculating the area of the entire curve below 86.4. In problem 4 we want the
entire area of the curve below 86.4, but do want to remove the area below 66.8, which is the
probability calculated in problem 3. Hence the easy answer is to subtract the answer to problem 3
from the answer to problem 2. This would produce .8729 - .2061 = .6668.

The not as easy method uses the Standard Normal Conversion, Fact 1, Fact 2, and Fact 3 in the
following manner

pr (66.8 < X < 86.4)

= pr [ (66.8-75)/10 < (x – )/ < (86.4 – 75)/10)]   Standard Normal Conversion (sncf 1)

= pr [ -.82 < Z < 1.14]

Here we need Fact 3 which is pr [z1 < Z < z2] = pr [Z < z2] – pr [Z < z1]

= pr [Z < 1.14] – pr [Z < -.82]                      Fact 3

= 1.0000 – pr (Z > 1.14) – pr [Z < -.82]             Fact 1 (underline)

150
= 1.0000 – pr (Z > 1.14) – pr [Z > .82]                Fact 2 (double underline)

= 1.000 - .1271 - .2061                               Problem 2 – Problem 3

= .8729 - .2061 = .6668                               Easy way answer

5. Having answered problem 4, this should now be easier.

pr (-z < Z < z) = .95                                      Given in the problem

pr (-z < Z < z) = pr (Z < z) – pr (Z < -z)                 Fact 3

= 1.0000 – pr (Z > z) – pr (Z < -z)          Fact 1 (underline)

= 1.0000 – pr (Z > z) – pr (Z > z)          Fact 2 (double underline)

= 1.0000 – 2 [pr (Z > z)]                   Add the two common terms

1.0000 – 2 [pr (Z > z)] = .95                              Couple the above result & the given

1.0000 – 2 [pr (Z > z)] - .95 = .0000                      Subtract .95 from each side

.0500 = 2 [pr (Z > z)]                                     Add 2 [pr (Z > z)] to each side

.0500 / 2 = pr (Z > z)                                    Divide each side by 2

.0250 = pr (Z > z)

Thus, we have the reverse problem from all of the ones we have calculated so far. We know the
probability, but we do not know the cutoff value. What you have to do now is to look in the table
to find the value .0250 or the one closest to it. Once this value is found, then you know the
location in the table (the intersection of the row and column). The value of the cutoff that
produces this probability is identified by the addition of the row and column.

When you look in the table you will find one and only one value of .0250. It is in the row
identified as 1.9 and in the column identified as .06. Thus the value of the cutoff that produces
the probability of .0250 is 1.9 + .06 or 1.96. This is your value of z. To check you result calculate
the simple solution to the following problem.

pr (Z > 1.96) = ?

151
6. Go back to the standard normal conversion formula, which is

Z = (X – ) / 

Or doing some slight algebra,

Z times  = (X – ) and this becomes

(Z times ) +  = X                      This is sncf 2

Here we have  = 75 and  = 10, and that Z = 1.80, thus

X = (1.80 * 10) + 75 = 18 + 75 = 93.

To check, use the standard normal conversion formula on the X value that you obtained.

Z = (93 – 75)/10 = 18/10 = 1.80. This is the same value as the one we started with, thus X
= 93 must be correct.

152
Unit 10: Estimation and Confidence Intervals
Terms
Estimate - a statistical word which actually is best translated as guess (e.g., we use
sample statistics to estimate population parameters or we use statistics to make guesses about
parameters)

Point Estimate - an estimate in which we only use a single number (point) to make our guess
regarding the population parameter

Margin of Error - the amount of error contained in our sample data

Interval Estimate - an estimate in which we use many numbers to make our guess regarding the
population parameter, usually all of the numbers between two determined numbers; between a
lower value and an upper value

Confidence Interval (C.I.) – an interval estimate in which we desire to have some
predetermined level of confidence that our estimate actually does predict the population
parameter (usually we would like to have either 95% or 99% confidence in our interval estimate)

Equations

Margin of Error          1 divided by the square root of the sample size = 1
n

Confidence Interval      point estimate  margin of error

t , / 2                t-value (used in the determination of the C.I. for the mean)

(alpha)                level of confidence, in this class  will always equal .05

                      degrees of freedom

153
Estimation
In Unit 3 on Populations, Samples, and Sampling we use statistics as estimates of parameters.
The word estimate is a fancy statistical techno-speak term which is most closely translated in
English as guess. There are a great many ways to generate guesses. We can make them on the
basis of our own personal feelings (very biased and probably a very poor guess), we can make
them on the basis of our own experience (not as biased, but certainly limited; and probably only
a fair guess), or we could make them based on collected information (the least likely to be
biased and most likely to be a very good guess). As the quality of the collected information goes
up (more representative of the population) the quality of the guess will correspondingly go up. In
statistics we only make guesses based on collected information (samples) and as such these
educated guesses qualify as estimates.

Estimates are used in every aspect of the world around us. Weather forecasts are estimates,
economic forecasts are estimates, popularity polls are estimates, and on and on. Any place that
collects data (information from samples) and anyone who uses data makes estimates. Here is an
example taken from the results of a political poll conducted by USAToday/CNN/Gallup in May
2004 and presented in U.S.A. Today.

USAToday Example - Do you approve or disapprove of the way George W. Bush is
handling his job as president? 47% of the people who responded to this question answered
approve. Results are based on telephone interviews with 1,002 National Adults, aged 18+,
conducted May 21-23, 2004. For results based on the total sample of National Adults, one can
say with 95% confidence that the margin of sampling error is ±3 percentage points.

This type of question, called the approval rating, is collected on average about 2-3 times each
month on the sitting president.

What is happening in this example?

1. We know that the survey was conducted on 1,002 selected individuals.

2. We know that the approval rating was reported as 47%.

3. We have two possible responses to the question; these are approve and disapprove. The
question then represents a Bernoulli variable (only two responses). Since we have asked 1,002
people (n=1002), then the whole sample represents a Binomial variable.

4. We have a sample of 1,002 people from some population. Who might the population be?
Registered voters, potential voters, adults (over 18), …? From the poll we learn that the
population is National Adults age 18+. It is never specified what National Adult means.

5. What is being estimated? How the members of the population, quite possibly adults in the
U.S., feel about the president‟s performance.

154
6. What do we know from the poll? We know that 47% of the people sampled indicated that they
approved of the way the president is handling his job. What does this mean? We are going to use
this collected information to conclude (predict) that 47% of the adult population in our nation is
supportive of the way the president is handling his job. Is this estimate the truth? Probably not.

7. Is this a “good” estimate?

What does “good” mean in this context? What is an estimate supposed to do? It is supposed to
provide us some information about the population. Thus, if an estimate is good, then it should
provide us with some accurate information about the population. The better the estimate the
closer it should be to the real truth (the parameter of the population). How do we obtain good
estimates? There are two necessary conditions that must be met in order to have confidence that
our estimates are good. The first is that the sample must be representative of the population and
second that the sample size must be sufficiently large. Neither of these is easy to address. Is the
sample representative of the population? We will have to trust that the sampling methods of Unit
3 will generate representative samples as they should. Is the sample size large enough? It is easy
to understand that a sample of size 100 is definitely large enough if there are only 150 people in
the population and that a sample of size 100 is definitely too small if there are 1,000,000 people
in the population. So how big is big enough obviously depends to some degree on the size of the
population that we are trying to represent with our sample. If our population is large, like that
alluded to in this example (National Adults age 18+), then how large is large enough? This is a
difficult question to answer, so in general we substitute another in its place, which is “how large
does our sample have to be in order to get some specified level of accuracy?”

If we assume that our sample is appropriately representative of the population, then “good” is
simply defined as being accurate to some pre-determined degree. For instance do we want to be
accurate to 1%, 2%, 5%, …?

One of the problems with statistics is that we never determine truth with statistics. The truth in
the context of this example is the true proportion (parameter) of adults in the nation who approve
of the way in which the president is handing his job. Since the population is so large, it would be
too costly and too time consuming to attempt to find the truth. In addition, once the truth was
known probably enough time has elapsed such that the truth may actually have changed. As a
consequence, we take samples and make estimates from them. These answers are not the truth
but are our educated guesses about what the truth might be. Better guesses should be closer to the
truth than poorer guesses. The way in which statistical guesses, estimates, are assessed for their
quality is in their accuracy. If you could only guess one value for the true proportion of the
population who approved of the president, what would you guess? The answer is simple, the
proportion from the sample; 47%.

If I told you that this guess was accurate to within 3%, what would this mean to you? Even
though 47% is probably not the true answer, I would be willing to bet that the true answer is
between 44% and 50% (47% – 3% and 47% + 3%).

155
If I told you that this guess was accurate to within 1%, what would this mean to you? Even
though 47% is probably not the true answer, I would be willing to bet that the true answer is
between 46% and 48% (47% – 1% and 47% + 1%).

Which of these two situations is preferable? The second is preferable because the interval of our
prediction is smaller than in the first. The general principle here is that the smaller the interval
the more accurate is our prediction and the wider the interval the less accurate our prediction.
How is the accuracy of our estimate obtained? In the sense of the polls that we see here in the
U.S.A. Today newspaper and those universally presented on television the accuracy is produced
by the following equation

Accuracy = 1 / square root of the sample size

In statistics we call accuracy measured in this fashion the margin of error. Accuracy measured
in this fashion is really a misnomer, since what is truly being measured is the amount of error
contained in our sample data. Recall that the sample size in this poll was 1,002. The square root
of 1,002 is 31.65, and 1 divided by 31.65 is .03 or 3%. If you look again at the end of the
statement presented in the USAToday Example it states,

Results are based on telephone interviews with 1,002 National Adults, aged 18+, conducted May
21-23, 2004. For results based on the total sample of National Adults, one can say with 95%
confidence that the margin of sampling error is ±3 percentage points.

It appears that our calculation for margin of error corresponds with their calculation of margin of
sampling error. These two phrases are identical even though USAToday has added the word
sampling to the phrase.

Now that you know how to calculate accuracy (margin of error, also called margin of sampling
error), what does 95% confidence mean?

As presented above, one of the problems with statistics is that we never determine truth with
statistics. [Doesn‟t this sound familiar; as Benjamin Disraeli said, “there are three kinds of lies;
lies, damned lies, and statistics.”] Thus, even with the margin of error taken into account we still
do not know for certain the truth. Here are two statements.

Statement 1: I am 100% certain that the true proportion of adults in our nation who approve of
the way that the president is handling his job is between 0% and 100%.

Statement 2: I am 95% certain that the true proportion of adults in our nation who approve of
the way that the president is handling his job is between 44% and 50%.

Which is a true statement? Since it is guaranteed that a proportion must be bigger than 0% and
less than 100%, statement 1 is obviously true. Before we can assess statement 2 we need to
understand what it means. In a simplistic and conceptual sense it is making the statement that if
we selected 100 samples similar to this one, then 95 out of the 100 samples would produce an
estimate that would be greater than 44% and less than 50%. Said another way we are 95% certain

156
(very very very certain but not completely certain) that the truth is that the true proportion of
adults in our nation who approve of the way that the president is handling his job is between 44%
and 50%. Said in this way then statement 2 is also true.

Which statement is more usable? The interpretation of statement 1 is that we are absolutely
certain that anything can be true. Thus this statement doesn‟t really say anything and is
completely worthless. Statement 2 on the other hand is far more meaningful even though it
recognizes that the truth might be found outside our prediction.

I realize that you are probably thinking that I have beaten this horse quite enough. However,
these last couple of pages attempt to present the most complicated part of statistics. This is the
notion that all of our predictions have a high degree of confidence and YET THEY STILL CAN
BE WRONG. Recall the Literary Digest Poll of 1936 article presented earlier in the class (Unit
3). In order to make a prediction that is guaranteed to be correct, we need to make a prediction
like that presented in statement 1. Most of the time we would be satisfied to be right 95% of time
accepting that we will be wrong 5% of the time, especially if our predictions have a small
enough margin of error associated with them to be meaningful (statement 2).

How often would you be willing to be wrong? Within the scientific community the typical error
rate is pretty much universally set at 5%. However in making life and death decision we might
want our error rate to be only 1%, but note as we drive our error rate to 0% we will get closer
and closer to statement 1. This discussion will be continued in the next unit; however, for the
convenience of presentation throughout the remainder of this course the error rate will be set at
5%.

Throughout this discussion I have really talked about two different kinds of estimates. Both are
answers to the question, “what is the true proportion of adults in our nation who approve of the
way that the president is handling his job?” The two answers were 47%, and between 44% and
50%.

The first answer is a single value which is the estimated proportion obtained from our sample
and is called a point estimate.

The second answer is a range of values which is centered around our point estimate and takes
into consideration the size of our sample. It is called an interval estimate or a confidence
interval.

Confidence intervals are calculated in a great number of ways depending upon what is being
estimate; however, ALL OF THEM HAVE THE SAME CONCEPTUAL INTEPRETATION
AS PRESENTED ABOVE.

Thus I am not going to do any of their calculations beyond the concept of the margin of error in
this course. For us then the 95% confidence interval is equal to the

Point estimate + margin of error

157
Quiz 1
Here is a question taken from this same USAToday/CNN/Gallup Poll as previously presented in
this unit.

If Massachusetts Senator John Kerry were the Democratic Party's candidate and George W. Bush
were the Republican Party's candidate, who would you be more likely to vote for?

The result of this question asked to 1,002 National Adults aged 18+ was

Bush 48.5%
Kerry 51.5%

1. What is the point estimate for Bush? What does it mean?

2. What is the point estimate for Kerry? What does it mean?

3. What is the margin of error?

4. What is the 95% confidence interval for Bush? What does it mean?

5. What is the 95% confidence interval for Kerry? What does it mean?

6. If it takes 50% or more of the votes to win, then based on this poll who do you think will win?

Bush, Kerry, or too close to call

158
1. 48.5% It would seem that 48.5% of the adults in our nation favor Bush over Kerry.

2. 51.5% It would seem that 51.5% of the adults in our nation favor Kerry over Bush.

3. margin of error = 1 / square root of the sample size = 1 / 31.65 = .032

4. 48.5% + 3.2% or 45.3% to 51.7%. We are 95% confident (certain) that the true percentage
of adults in our nation who favor Bush over Kerry is somewhere between 45.3% and 51.7%.

5. 51.5% + 3.2% or 48.3% to 54.7%. We are 95% confident (certain) that the true percentage
of adults in our nation who favor Kerry over Bush is somewhere between 48.3% and 54.7%.

6. Too close to call. Why? I will provide a very extensive answer to this question, which is
very relevant and very important.

Here are the two confidence intervals side by side.

Bush      Kerry
45.3% to 51.7%
48.3% to 54.7%

Notice that the two confidence intervals overlap. The values between 48.3% and 51.7% are
contained in both intervals. What does this mean? The confidence interval for Bush can be
interpreted as indicating that we are 95% confident that the true percentage of people in favor of
Bush is between 45.3% and 51.7%. Thus, it would be possible for the true value in favor of Bush
to be 50%, which is between 45.3% and 51.7%. The confidence interval for Kerry can be
interpreted as indicating that we are 95% confident that the true percentage of people in favor of
Kerry is between 48.3% and 54.7%. Thus, it would be possible for the true value in favor of
Kerry to be 50%, which is between 48.3% and 54.7%. This means that it would be possible for
Bush and Kerry to share exactly the same percentage of support. This is why the correct answer
to problem 6 is too close to call.

Although what follows is not part of question 6, let's take a moment to expand this problem
somewhat.

If Bush‟s point estimate was 54.5%, then what would be his 95% confidence interval?

54.5% + 3.2% or 51.3% to 57.7%

In contrast, if Kerry's point estimate was 45.5%, then what would be his 95% confidence
interval?

45.5% + 3.2% or 42.3% to 48.7%

159
In our expanded problem we are 95% confident that the true percentage of people in favor of
Bush is between 51.3% and 57.7%, and we are 95% confident that the true percentage of people
in favor of Kerry is between 42.3% and 48.7%. Based on these results, who do you think will
win?

Bush. Why?

Here are the two confidence intervals side by side.

Kerry              Bush
42.3% to 48.7%
51.3% to 57.7%

Notice that these two confidence intervals do not have any overlap. The highest possible value in
Kerry's confidence interval (48.7%) is BELOW the lowest possible value in Bush's confidence
interval (51.3%). Since there is no overlap in the confidence intervals, we are now justified in
concluding that the candidate with the higher confidence interval values is 95% likely to win
over the candidate with the lower confidence interval values.

Here is a summary of these two situations.

When using confidence intervals for two conditions (here two candidates) we can conclude the
following:

1. the two conditions are not different (we make this conclusion when the confidence intervals
overlap)

2. one of the conditions is bigger than the other (we make this conclusion when the confidence
intervals do not overlap)

will win (since all of the values are > 50%). In this case we would predict that we are 95%
confident that Bush would win the up coming election based on the results of this poll.

Although the entire discussion up to this point has been of the first type of confidence
interval, as presented at the beginning of this unit, the general definition of a confidence
interval as the Point estimate + margin of error still holds true for the following two types
of confidence intervals.

160
Confidence Interval for the Sample Proportion
In the popular media form of the confidence interval that you have just gone over the margin of
error was defined as

1
n

Although this definition is rather simple and typically fairly accurate, it is in fact not statistically
precise. The appropriate margin of error for the sample proportion is actually defined as

( Z / 2 )   p (1 p) / n
ˆ     ˆ

where

ˆ
p is the estimate of the sample proportion = frequency / sample size = f / n

n is the sample size

Z  / 2 is the symbolic way of referring to the value we obtain from the Standard Normal Table
(Unit 21). In a 95% confidence interval this value will be 1.96

With the definition of the 95% confidence interval being the point estimate  the margin of
error, then the 95% confidence interval for the sample proportion becomes

p  (Z / 2 )
ˆ                 p (1 p) / n
ˆ     ˆ

As a small digression, note that a rough approximation of the margin of error above when “phat”
is near .5 becomes

1.96 times square root [(.5)(1-.5)/n]

= 1.96 times square root [.25/n]

= (1.96)(.5) times square root of (1/n)

Since (1.96)(.5) = .98, which is approximately equal to 1, then our margin of error simplifies to

1
n

which is the margin of error used in the main body of this unit (the popular media form).

End of this small Digression

161
Example - Problem Scenario - A Poll was taken of 1010 U.S. employees. The employees
sampled were asked whether they call in sick at least once a year when they simply need to relax;
202 responded yes. What is the 95% confidence interval for the proportion of employees who
falsely call in sick each year?

The sample size (n) = 1010

The frequency who falsely call in sick each year is 202

The point estimate for the proportion who falsely call in sick is the relative frequency

ˆ
= f / n = 202/1010 = .20 = p

The margin of error is (Z / 2 )   p (1 p) / n
ˆ     ˆ

= (1.96) [square root {(.20)(1-.20)/1010}]

= (1.96) [square root {(.20)(.80)/1010}]

= (1.96) [square root {.16/1010}]

= (1.96) [square root {.000158}]

= (1.96) [.013] = .025

Confidence interval is    p  (Z / 2 )
ˆ               p (1 p) / n
ˆ     ˆ

= .20  .025

which produces       .20 - .025 = .175

and                .20 + .025 = .225

So the confidence interval is .175 to .225

What does this confidence interval mean?

We are 95% confident that the percentage of all U.W. employees who falsely call in sick each
year is between 17.5% and 22.5%.

162
Quiz 2 -     I am interested in the use of cell phones while driving. What proportion of accidents
occur while the driver is engaged in a conversation on a cell phone? I contacted my automobile
insurance company and asked for a sample of 100 randomly selected traffic accident records for
my area. [By the way, all insurance companies have this type of information.] From these 100
records I found that 58 involved a driver using a cell phone. From this information

1. Calculate and interpret the appropriate point estimate

2. Calculate and interpret the 95% confidence interval using the more statistically precise
equation from this section

163
Question 1

The sample size (n) = 100

The frequency of accidents involving a driver on a cell phone is 58

The point estimate for the proportion of accidents involving a driver on a cell phone is the
relative frequency

ˆ
= f / n = 58/100 = .58 = p

Interpretation of this point estimate. 58% of accidents in my insurance area involve the use of a
cell phone.

Question 2

The margin of error is (Z / 2 )   p (1 p) / n
ˆ     ˆ

= (1.96) [square root {(.58(1-.58)/100}]

= (1.96) [square root {(.58)(.42)/100}]

= (1.96) [square root {.24/100}]

= (1.96) [square root {.0024}] = (1.96) [.049] = .096

Confidence interval is p  (Z / 2 )
ˆ                p (1 p) / n
ˆ     ˆ

= .58  .096

which produces .58 - .096 = .484

and             .58 + .096 = .676

So the confidence interval is .484 to .676

What does this confidence interval mean (interpretation)?

We are 95% confident that the percentage of driving accidents in my insurance area involving a
cell phone is between 48.4% and 67.6%

164
Confidence Interval for the Sample Mean
In this section you will be exposed to the confidence interval for the mean. Although we are
estimating a different parameter (the mean) than we have throughout all of the preceding portion
of this unit (the proportion), the concept of the confidence interval does not change. In fact, the
definition of the confidence interval remains as

point estimate  the margin of error

Naturally since we are now estimating the mean, we can expect the point estimate and the
margin of error to be different from before. In the case of the confidence interval for the sample
mean, the point estimate is the sample mean (xbar) and the margin of error is defined as

s
(t , / 2 )
n

where

s    is the sample standard deviation

n    is the sample size, and

t , / 2 is the symbolic way of referring to the value we obtain from the T distribution, which is
presented in Unit 21.

This value involves a concept of degrees of freedom () which we will not see until Unit 13 and
the T distribution which we will not see until Unit 15. At this point in the course the t-value will
be directly given to you in order to make the calculations for the confidence interval for the
mean. By the end of the course you should be able to find these t-values on your own.

The 95% confidence interval for the mean is defined as point estimate  the margin of error ,
which is

s
x  (t , / 2 )
n

Example - Problem Scenario - How much chicken does an average person in the United States
consume each year? To answer this question I took a sample of 16 adults in Cheyenne Wyoming
and asked them to record how much chicken (measured to the nearest pound) they ate in the next
year. Here are the data

Person             1       2   3   4   5   6   7   8   9 10 11 12 13 14 15 16

Chicken            47 59 53 39 53 51 62 55 50 46 72 50 65 45 70 63

165

Question 1

The appropriate point estimate is the mean = (47 + 59 + ... + 70 + 63)/16 = 880 / 16 = 55.00

Interpretation of the mean - the average adult person in Cheyenne (possibly the United States)
eats 55 pounds of chicken each year

Question 2

Even though this will not make sense until after Unit 15,

I will tell you that for this problem  = n - 1 = 16 - 1 = 15 and that the t-value ( t , / 2 ) = 2.131

The standard deviation, s, is

square root of [SUM of the squared deviations divided by (n-1)]

= square root of [1318 / 15] = square root of [87.867] = 9.37

Very nicely here, the square root of n = square root of 16 = 4

Thus, the 95% confidence interval for the sample mean is

s
x  (t , / 2 )
n

= 55.00  (2.131) [9.37 / 4]

= 55.00  (2.131) [2.34]

= 55.00  4.97

And the confidence interval is

55.00 - 4.97 = 50.03 and

55.00 + 4.97 = 59.97

Interpretation of the confidence interval

We are 95% confident that the average adult in Cheyenne eats between 50.03 and 59.97 pounds
of chicken each year.

166
A pretty good discussion of this concept can be found in Statistics: Concepts and Controversies,
3rd Edition by David S. Moore. This discussion is provided for your examination.

Confidence Statements
The results of a Simple Random Sample (SRS) are random in the special sense in which
statisticians use that word: the outcome of a simple sample is unpredictable but there is a definite
pattern of results in the long run. The results of tossing a coin are also random, as are the sexes of
human or animal offspring. Random phenomena are described by the laws of probability, which
allow us to calculate how often each outcome will occur in the long run. … The most important
reason for the deliberate use of chance in collecting data is that we can then apply the laws of
probability to draw conclusions from the data.

One form of statement based on probability is so commonly used to describe the accuracy of
opinion polls and other samples that every newspaper reader and TV viewer should be familiar
with it. Often the statement is casual. For example, a news report on the shifts in public attitudes
in response to a presidential speech said,

The poll, conducted by Gordon Black Associates of Rochester, N.Y., is based on
714 interviews with adults randomly selected from across the USA. Results before
the speech have a margin of error of 5.5 percent. Post-speech results have a 5
percent margin of error. (From USA Today, November 1, 1983)

That 5 percent margin of error describes the accuracy of the poll, but it leaves much unsaid. As
connoisseurs of data, we want to know the whole story.

[Here is another example] In November 1989, the Gallup Poll asked 1234 adults, "Do you expect
that the overall quality of your life will be better by the year 2000?" Here the parameter of
interest is the proportion (usually associated with the letter p) of all U. S. adults who expect to be
better in the future. In the sample, 950 answered "Yes." The statistic used to estimate p is
therefore
estimate of p = (# of people who said yes) / (number of people sampled) = 950 / 1234 = .77

Gallup statisticians, from their knowledge of the sampling distribution of the estimate of p,
describe the precision of the statistic as follows: In 95% of all possible samples, the statistic
(estimate of p) will take a value within + 4% of the true proportion. The news article about
the poll says this in less technical language: "The poll had a margin of sampling error of plus or
minus four percentage points." (Reported in the New York Times, January 1, 1990)

[The above example uses a statement of precision about our estimate of the true proportion.] This
statement of precision has two parts: a level of confidence (95%) and a margin of error (+ 4%).
The level of confidence says how often in the long run the margin of error will be met. We could
demand higher confidence (99%), or settle for lower confidence (90%). But we cannot achieve
100% confidence. … The news report left out the level of confidence that should accompany its

167
margin of error. A naïve reader might imagine that the sample proportion is certain to be within
+ 4% of the true proportion. The connoisseur knows that polling organizations almost always
make 95% confidence statements, and fills in the missing confidence level.

A confidence statement turns a long-run fact about the sampling distribution of a statistic
into a statement of our confidence in the result of a simple sample. The usual form of a 95%
confidence statement about a parameter estimated by an unbiased statistic is

With 95% confidence, the parameter lies in the range

Statistic  margin of error

168
Unit 11: Hypothesis Testing
Terms
Hypothesis - a speculation (statement) we make about a population

Null Hypothesis - a statement about our population that we are willing to assume is true or
correct

Alternative Hypothesis - a statement about our population that we would like to demonstrate is
true or correct

Five Basic Steps of Hypothesis Testing

Step 1 - Determine the Hypotheses (null and alternative)

Step 2 - Identify the Test Statistic

Step 3 - Calculate the value of the Test Statistic from the data

Step 4 - Compare the value of the Test Statistic to the Critical Value

Step 5 - Make a decision from the information in Step 4

Test Statistic - collects information from the sample in support of the alternative hypothesis

Critical Value - the point where there is sufficient evidence for the alternative hypothesis so that
we no longer can hold onto our assumption about the null hypothesis being correct

Significant - the decision to reject the null hypothesis

Type I error - the conclusion of the alternative hypothesis when the null in actuality is correct

Type II error - the conclusion of the null hypothesis when the alternative in actuality is correct

Alpha - the probability associated with the Type I error

Level of Confidence - the probability of the conclusion of the null hypothesis when in actuality
the null is correct = 1 - 

Power - the probability of the conclusion of the alternative hypothesis when in actuality the
alternative is correct

169
Hypotheses
Hypothesis testing is quite simply the testing of a hypothesis. Logically this definition can be
reduced to two issues, hypothesis and testing.

What is a hypothesis?

A hypothesis is a foundation, a supposition, or an unproved theory. (Webster‟s New World
Dictionary of the American Language, College Edition).

Statistically, a hypothesis is a statement about a population.

Given the development of this course around the notion that statistics is the art and/or science of
making decisions and predictions about populations given samples, it is easy to see that this
second definition fits well within our context. This should also give you a hint about the second
issue, testing. As indicated in Unit 3, unless we sample the entire membership of the population
(conduct a census), we will never know with absolute certainty the values of the parameters of
that population. Thus, in statistics, we are always trying to make predictions and decisions about
populations from the samples that we have taken. In addition, we are more specifically trying to
make predictions and decisions about the parameters of those populations from the statistics
obtained from our samples. Therefore, since a hypothesis is a statement about a population, it
must be a speculation. The viability of this speculation must be considered in light of data we
obtain from collecting a sample from this population. The determination of the viability of such
speculations (hypotheses) is called testing. Thus, through the use of sample data, we attempt to
ascertain whether a speculation we have made about a population (hypothesis) is true or not
(test). Since the test rests upon the sample we have collected, the test is therefore not perfect and
only reflects the true veracity about our speculation as our sample truly represents the population
from which it came.

In summary, a hypothesis is a speculation we make about a population that is tested for its merit
through the collection and analysis of data obtained from a sample from this population.

Hypotheses come in two forms. There is a null version of the hypothesis and there is an
alternative version.

The null hypothesis (symbolically labeled H0) is a statement about our population that we are
willing to assume is true or correct. This actually corresponds well with one of the definitions in
the Webster‟s New World Dictionary referred to above. A working hypothesis is a supposition
which is tentatively accepted to explain certain facts to provide a basis for further investigation.

The alternative hypothesis (symbolically labeled HA) is a statement about our population that
we would like to demonstrate is true or correct.

The null and alternative hypotheses are set in opposition to one another, such that by using the
information (data) in our sample, we would like to be able to conclude that we have enough

170
evidence to ascertain which of the two hypotheses is more likely to be correct. Does the evidence
(data) indicate that our initial speculation (null) is more likely to be correct or does it indicate
that alternative speculation is more likely to be correct? This is the essence of testing.

Five Basic Steps of Testing
While there is great disagreement about how many steps are involved in testing, some texts
indicate four, others five, and still others six, nearly everyone is in agreement about the material
contained in them regardless of number. Therefore, let's consider four steps below from the
perspective of their content, rather than their number.

Step 1: Determine the null hypothesis and the alternative hypothesis. In the area of
hypothesis testing this is obviously the beginning. We need to have speculations (2, one null and

Step 2: Specify the alpha (this term and its use will be explained later in this unit; however, in
this course it will always be .05) and identify the appropriate test statistic.

Step 3: Collect and summarize the data and calculate the test statistic. This is essentially old
hat to us, because it deals with the collection of an appropriate sample from the population
(sampling, Unit 3) and the appropriate summarization of the data (several units). The new part is
the test statistic.

What is a test statistic? Given what we have had so far in this unit, this definition should be
relatively simple to concoct. A statistic is a piece of information we determine from a sample. A
test statistic is therefore a piece of information we determine from a sample that enables us to
conclude (test) whether the null or the alternative is more likely correct. (In general)

However, more specifically the test statistic extracts information from our sample about the
viability of the alternative hypothesis ONLY, such that if we have sufficient information (beyond
the threshold, see the Legal System illustration below) that the alternative is true, then we may
reject the null hypothesis and conclude the alternative. If we do not have sufficient information
(not enough information to conclude the alternative), then we fail to reject the null hypothesis
and in effect conclude the null. The test statistic collects information from the sample in support
of the alternative hypothesis.

Step 4: Determine how unlikely the test statistic is if the null hypothesis is true. This step is
where we meet the complexity of hypothesis testing head on and is related to the presentation in
the previous unit on confidence intervals. In this class step 4 will only be the comparison of the
test statistic to the critical value. I will approach the explanation of this step through the
discussion below entitled The Legal System.

Step 5: Make a decision. What does the result of step 4 mean in the context of the problem that
we are examining?

171
Often when we conclude the alternative hypothesis, we state that our findings or results
were significant or determined to be significant.

In contrast, when we conclude the null hypothesis, we state that our findings or results
were insignificant or determined to be insignificant.

Thus, when you see the words significant and insignificant in a magazine or newspaper article,
or hear them on TV, it is almost always the case that they are being used in the context of finding
for the alternative or the null hypothesis in some study or experiment that was conducted.

Summary of these 5 steps from the perspective of H0 and HA.

- Determine H0 and HA

- Collect data

- Use the test statistic to extract information from the data in support of HA (note in support of
HA, not H0)

- If enough information in support of HA can be found, then we will conclude that HA is true.
When we conclude HA we say that our findings are significant.

- If not enough information in support of HA can be found, then we will fail to conclude that HA
is true (although not statistically correct, we often indicate in this instance that we conclude that
H0 is true). When we fail to conclude that HA is true we say that our findings are not signficant.

172
The Legal System
Some of the points that I make below about out legal system have been somewhat simplified and
corrupted for clarity of presentation.

Within the American legal system, a person is assumed to be innocent unless s(he) can be proven
guilty. Within the context of our discussion this situation very nicely breaks down into our null
and alternative hypotheses. The null hypothesis is our statement that we are willing to assume is
true or correct, that this person is innocent. The alternative hypothesis is our statement that we
would like to demonstrate is true or correct, that this person is guilty. During the trial, evidence
is presented by various people specific to the charges being levied against the defendant. Please
note that the symbol of the American legal system is a blind folded woman holding a balance
scale. The figure below represents this balance (I have deleted the woman).

A greatly simplified but accurate depiction of the legal system is that a court case before a jury
works like the following.

First, evidence is presented.

The prosecuting attorney attempts to make the scale look like this.

173
The defense attorney attempts to make the scale look like this.

In the end, after all of the evidence has been presented, the scale can look like almost anything in
between the two figures presented immediately above.

Before the case goes before the jury, the judge gives some instructions to the jury. Once again, a
greatly simplified but fairly accurate rendering of these instructions might be the following. It is
not your duty to determine that the defendant IS or IS NOT guilty, but to determine that
sufficient evidence has been presented to ascertain beyond reasonable doubt that the defendant
is guilty. Often the defense attorney's only strategy is to cast doubt upon the prosecution's
evidence, thus placing a serious obstacle in the path of the jury being able to conclude beyond
reasonable doubt. Let's place the concept of reasonable doubt into our balance scale picture
through the use of a dashed line.

In this last figure, we see the basic components of hypothesis testing. The two hypotheses are
present, one on each side of the balance. (Step 1). The evidence is collected and depresses the
scale on one side or the other (the tilt of the scale represented by the solid line is the indicator of
the test statistic, Step 3). The figure depicts the relationship of the dashed line to the solid line

174
and based on where the solid line is in relation to the dashed line a decision is made (Step 4).
What does our decision mean? (Step 5)

When the jury considers the evidence amassed throughout the trial, the balance scale of a
particular case might look the figure below. In this situation, the evidence presented does not tip
the scale below the reasonable doubt threshold. Therefore, the jury must conclude not guilty or
that not enough evidence was presented to reach a decision of guilty.

In contrast, the balance scale of a particular case might look like the figure below. In this
situation, the evidence presented does tip the scale beyond the reasonable doubt threshold.
Therefore, the jury would conclude that this person has been found to be guilty (based on the
evidence provided).

175
In statistics, we call the reasonable doubt threshold the critical value. The critical value can be
defined as the point where there is sufficient evidence for the alternative hypothesis so that
we no longer can hold onto our assumption about the null hypothesis being correct.
Therefore in the situation where our test statistic (solid line) reflects insufficient evidence in
relation to our critical value,

we must conclude that there is insufficient evidence to reject our null hypothesis and must
therefore accept the null as being acceptable (the person is not guilty).

However, in the situation where our test statistic (solid line) reflects sufficient evidence in
relation to our critical value,

176
we must conclude that there is sufficient evidence to reject our null hypothesis and must
therefore accept the alternative as being acceptable (the person is guilty).

Now for the fly in the ointment. The legal system is not perfect and neither is statistics. Hence,

In the legal system this means two things. First, that people who are truly innocent are declared
guilty and second, that people who are truly guilty are declared not guilty. Both of these are
regrettable situations, but they do occur. Let's translate both of these situations into statistics. If
innocent is the null and guilty is the alternative, then the first mistake (error) is the conclusion of
the alternative hypothesis when we should have concluded the null. This is called the type I
error (the rejection of the null when in fact it is true). This particular mistake is often called the
false positive (an innocent person is convicted). In statistical jargon the probability of this error is
identified by the Greek letter alpha (. The second mistake (error) is the conclusion of the null
hypothesis when we should have concluded the alternative. This is called the type II error
(the acceptance of the null hypothesis when in fact it is false). This particular mistake is often
called the false negative (a guilty person is declared innocent). In statistical jargon the
probability of this error is identified by the Greek letter beta (.

Are these two mistakes, false positive and false negative, of equal importance? Is it a more
serious mistake to let a guilty person go free or to incarcerate an innocent person? If you are
thinking that it is a more serious ethical problem to incarcerate an innocent person, can I fix the
system such that it is impossible to make this mistake. If you look at the first row of the table
below (the actual truth is that the person is innocent), you will notice that we can make the
second column (false positive decision) disappear if we set  = 0. Notice that by making the
second column disappear we have eliminated the guilty decision by the jury. Hence, in order to
eliminate the possibility of a false positive mistake (the incarceration of an innocent person) we
must always conclude that the person is innocent no matter what evidence is presented. Since
this is unacceptable, we must tolerate some level of possibility of making the false positive
mistake (type II error), but let‟s make this probability as low as possible or reasonable.
Statistically we have traditionally set this value at 5%. In the terms of the legal system example
this means that we would only want to incarcerate an innocent person 5% of the time. If this
seems too high to you, then you would have to set  even smaller, such as 1% or .5%, etc. But in
order to do this you correspondingly increase the amount of evidence that is necessary to
convict; hence you will be letting more guilty people go free.

Notice that there are also two correct decisions in the table below. The first is the conclusion that
a person is innocent (H0) when in fact the person is innocent (H0) and this is called the Level of
Confidence. The second is the conclusion that a person is guilty (HA) when in fact the person is
guilty (HA) and this is called Power.

These correct and incorrect decisions can then be collected and placed in a single table. This
important table appears on the next page.

177
Decision Table

Decision
HA is false              HA is true
H0 is true               H0 is false
Innocent                  Guilty
Correct Decision          Incorrect Decision
HA is false
1-                       
H0 is true
Level of Confidence       Type I Error
Innocent
Level of Significance
False Positive
Actual Truth      Incorrect Decision        Correct Decision
                        1-
HA is true    Type II Error             Power
HO is false   False Negative
Guilty

178
Unit 12: One Sample Proportion Test
Before considering the topic of this unit, the following Test Situation overview is presented to
provide some guidance and context for this and all of the remaining units.

Test Situation Overview
From here to *************** is very important !!!!!

This special note is for Units 12, 13, 14, 15, 16, 17, 18, 19, and 20. From this unit until the end of
the course we are going to examine one test situation after another. There are many complex
issues that are presented in an Introductory Statistics course, and two of the most challenging
will be presented in this special note. In essentially this second half of the semester, 9 different
test situations will be presented. The first challenging issue is how to determine which of these 9
tests is appropriate for any given problem. The second challenging issue is whether a specific
question or a general question is being asked. We saw in Unit 11 that the question being asked is
called the alternative hypothesis. Each unit from 12 to 20 presents a different test situation. Each
of these test situations is summarized and presented in a comparative form in the link entitled,
"Summary Help Section." The link to the Summary Help Section is provided in Units 12, 13, 14,
15, 17, 18, 19, and 20.

Challenging Issue 1 (Determination of the Appropriate Test Situation)

In order to determine the appropriate test situation 3 questions MUST be answered. These are

Question 1: How many variables are mentioned in the problem scenario?

Question 2: What is the level of measurement of each variable?

Question 3: How many samples are mentioned in the problem scenario?

These 3 questions will be specifically answered for each of the 9 test situations presented in this
course. Each test situation has a unique set of answers. Thus, by answering these 3 questions you
will be led to the specific test situation which is appropriate. The Summary Help Section is
organized by the answers to these 3 questions.

Challenging Issue 2 (Is the form of the alternative hypothesis (the question expressed in the
problem scenario) specific or general?)

Here is a little chart to print off or memorize. In this chart the 9 test situations for this class are
presented. Five of the tests have both a specific and a general form. For these 5 tests (those
presented in Units 12, 15, 17, 18, and 19) this second challenging issue is relevant. However, 4
of the tests (those presented in Units 13, 14, and 20) only have a general form. This second
challenging issue does NOT have to be addressed for them.

179
Test Situation                         General Form        Specific Form

One Sample Proportion (Unit 12)             Yes                Yes

Multinomial (Unit 13)                       Yes                 No

Homogeneity (Unit 14)                       Yes                 No

Independence (Unit 14)                     Yes                  No

Correlation (Unit 15)                       Yes                Yes

Regression (Unit 17)                        Yes                Yes

Independent Samples t-test (Unit 18)        Yes                Yes

Matched Samples t-test (Unit 19)            Yes                Yes

One-way Analysis of Variance (Unit 20)       Yes                No

How can we see the difference between the specific alternative and the general alternative?

First, in each of the units which present test situations that have both specific and general
alternatives, this issue will be addressed, so look for it in the discussion.

Second, in a generic sense a specific alternative is one in which we are looking for a specific
outcome. For instance, if we have two samples does our question (alternative hypothesis) ask
about the mean of one of the samples being LARGER than the other sample? We can replace the
word in capital letters in the previous sentence with such specific key words as smaller, bigger,
greater, less, etc. Notice that these words all express one specific outcome. In contrast, in a
generic sense a general alternative is one in which we are looking for any outcome. A key word
to look for in this situation is different. However, sometimes we might recognize the existence of
a general alternative simply by noting that none of the specific key words are mentioned. This is
simply a generalization. The important point is to read VERY CAREFULLY the question being
asked in the problem scenario. Does the question limit us to a SINGLE outcome? If yes, then the
question leads us to a specific alternative hypothesis. Or does the question provide us with more
than one outcome being possible? If yes, then the question leads us to a general alternative
15, 17, 18, and 19.

***************

180
Now for the Presentation of the
One Sample Proportion Test
Terms
One Sample Proportion Test Situation - When you have only one sample and only one
categorical variable (which has only two possible outcomes)

Distribution of the Test Statistic - Standard Normal (Z)

p = the proportion specified in the null hypothesis

ˆ
p = the proportion estimated (predicted) from the sample

Equations

p p
ˆ
Test Statistic   Z 
p (1  p )
n

One Sample Proportion
Initial Problem Scenario: Are the children of presidents more likely to be female or male?
In this very simple problem scenario, what are the answers to the 3 questions for determining the
appropriate test situation?

Question 1. How many variables are there? There is only one; the gender of the children.

Question 2. What is the level of measurement of the variable gender? Categorical with only two
possible outcomes; male and female

Question 3. How many samples are there? Only one sample of the children.

If the answers to these 3 questions are not clear, then contact me.

A key in the determination of the number of samples is the following. If two or more samples are
specified, then you will see this in the problem scenario. Here is an example scenario in which
two samples are mentioned.

181
Alternative Scenario: I would like to answer the following question, "are the children of
presidents more likely to be female or male?" In order to answer this question, I took one random
sample of 10 republican presidents and another random sample of 10 democratic presidents.

Notice that in this alternative scenario two samples are specifically mentioned, one of republican
presidents and one of democratic presidents. In the Initial Problem Scenario above, notice that
there is no mention of samples at all. When no samples are mentioned, then it is safe to assume
that there is only one sample.

Putting these answers together for the Initial Problem Scenario we get

When you have only one sample and there is only one categorical variable (which has only
two possible outcomes), then the test situation is the One Sample Proportion.

Having determined that the appropriate test situation is the One Sample Proportion, then the next
thing to determine is whether the question mentioned in the problem scenario is specific or
general?

Question 4. General or specific question? General.

How did I determine this answer? Below are three questions, two are specific and one is general.
Which are specific and which is general?

A. Do presidents have more female children than male children?

B. Do presidents have more male children than female children?

C. Do presidents have more female or male children?

"A" and "B" are specific questions. Why? Remember that specific questions limit us to only a
single outcome. In "A" we are looking at ONLY the outcome in which there are more females
than males. In "B" we are looking at ONLY the outcome in which there are more males than
females.

"C" is the general question. Why? Because there are two possible answers to this
question. Presidents having more female children is an answer to this question;
however, presidents having more male children is also an answer to this question. The key word
to look for here is "or." "Or" immediately implies more than one outcome.

Continuing

182
Recall from the previous unit on hypothesis testing that hypotheses come in pairs. We will need
to specify a null and an alternative hypothesis. The null hypothesis is a statement we are willing
to establish as the reference point; a statement about our population that we are willing to assume
is true or correct. The alternative hypothesis is a statement about our population that we would
like to demonstrate is true or correct. If we are trying to demonstrate that there are more girls or
that there are more boys, then the appropriate null hypothesis would be that there is an equal
number of each. A key here is that the null hypothesis for almost any test situation is nearly
always no difference or no relationship.

In the One Sample Proportion test the hypotheses are always written in terms of the
proportions.

The null hypothesis therefore in English would appear as, "there is no difference in the
proportion of females born to presidents as males born to presidents." What would that look like
in statistical symbols? If we use the letter "p" to designate proportion, pfemales = proportion of
females, and pmales = proportion of males, and the symbol H0 to designate the null hypothesis,
then we have

H0 : pfemales = pmales

And we know that the sum of all possible probabilities, relative frequencies, or proportions must
always equal 1 (Rule 3 from Unit 8). Thus

pfemales + pmales = 1

And since we are speculating in the null hypothesis that the proportion of females and males is
the same, then

pfemales = pmales = ½ = .5

Thus we can write the null hypothesis in 3 different ways which all say the same thing

Way 1          H0 : pfemales = pmales

Way 2          H0: pmales = .5

Way 3          H0: pfemales = .5

Which of these 3 ways is best? All 3 are equivalent, but usually Way 2 or Way 3 is generally
considered better in the use of the One Sample Proportion test. Does it matter if we use Way 2 or
Way 3? No, but for any particular problem either Way 2 or Way 3 might be more convenient.
This will be determined problem by problem, but I have selected Way 2 for a reason that will be

183
Now for the alternative hypothesis. Athough we know that the alternative for the problem
scenario that I specified above is general, I will write the three possible alternatives (A,B,C
above) so that you can see what each would look like.

A. If I am looking for more females than males then the alternative should be that the proportion
of females is greater than .5

HA: pfemales > pmales   or

HA: pfemales > .5        If you use Way 3, then this should be HA

B. If I am looking for more males than females then the alternative should be that the proportion
of males is greater than .5

HA: pmales > pfemles    or

HA: pmales > .5          If you use Way 2, then this should be HA

C. If I am looking for either more females or males then the alternative should be that the
proportion of males to females is unequal to .5. I am looking for either result; the proportion of
males greater than .5 (more males) or the proportion of males lesss than .5 (more females).

HA: pmales  pfemles     If you use Way 1, then this should be HA

HA: pmales  .5          If you use Way 2, then this should be HA

In this case I am looking for the general alternative (C immediately above). Our hypotheses now
look like

H0: pmales = .5

HA: pmales  .5

Using the 5 step outline for testing presented in Unit 11, we now have the answers to the
hypotheses (Step 1) and the test statistic (Step 2); and all of the hard work is over. All we have to
do now is work through the calculations (Step 3) which will quickly lead us to the answers for
Steps 4 and 5.

In a one sample proportion test, the test statistic is the standard Normal and is defined as
the following

p p
ˆ
Z 
p (1  p )
n

What do all of these symbols mean?

184
n = sample size

Z = the Standard Normal Distribution

p = the proportion specified in the null hypothesis

ˆ
p = the proportion estimated (predicted) from the sample (this would be the appropriate relative
frequency)

If you look back over this unit so far it is rather amazing what we have been able to do with the
very short 12 word question we started with. Now let‟s add in the data as it was collected prior to
1995.

Prior to 1995 there had been 150 children fathered by U. S. presidents. Of these 150 children 91
had been males and 59 had been females. Using this information let‟s TEST our hypotheses
using the one sample proportion test.

Step 1: H0: pmales = .5

HA: pmales  .5

Step 2:  = .05 (this will always be the case in this class); Test situation = one sample proportion

Step 3: The test statistic is

p p
ˆ
Z 
p (1  p )
n

Side note: in any ratio of two things such as “a” divided by “b” which looks like “a/b,” the
top part (“a”) is called the numerator and the bottom part (“b”) is called the denominator.
In the above equation for the test statistic, the top part (numerator) is

phat minus p                       ˆ
(same as p - p)

and the bottom part (denominator) is

square root of [(p times (1-p)) divided by n]

The p without a hat in this equation is the specified proportion that appears in H0. Here p = .5

The p with the hat is our point estimate of the proportion (relative frequency) of males in our
sample = 91/150 = .607

So we have the numerator (top) in our test statistic = phat - p = .607 - .5 = .107

185
and we have the denominator (bottom) in our test statistic = square root of {(p)(1-p)/n}
Note this is "p," not phat.

= square root of {(.5)(1-.5)/150}

= square root of {.25/150} = square root of {.00167} = .041

and now

Z = top / bottom = numerator / denominator = (.107)/.041 = 2.610

Here is why I chose males. Since the estimated proportion for males was LARGER than that for
females, I chose to write the hypotheses in terms of the males so that the subtraction in the
numerator would be positive (just easier to deal with).

Step 4: Compare the test statistic to the critical value. Before we get through this step we will
have to take a rather lengthy digression. Unfortunately I have to take this digression in the
middle of this step rather than linking to it separately, since our on-line platform doesn‟t really
support linked graphics and they are necessary for this digression.

DIGRESSION – How to determine the critical value.

How many different alternative hypotheses are possible for the one sample proportion test?
Three. Two specific ones and one general one. Unfortunately each of these alternative
hypotheses generates a different critical value. While this adds a level of complexity at the
beginning, the nice part about it is that once through this digression you will know how to
calculate all possible critical values for any test. Let‟s build our discussion from the simplest case
to the most complex.

If our alternative hypothesis was HA: p > .5, what would be our critical value? [The key here is
that the arrow (>) in the alternative points us in the proper direction.]

Recall from the previous unit on Hypothesis Testing that in order to reject the null hypothesis we
must have more than the threshold (critical value) amount of evidence for the alternative (test
statistic), which means that the value of the test statistic must be greater than the value of the
critical value.

Since the test statistic is a Z value (Standard Normal value), then we will need to access the
Standard Normal Table in Unit 21 to find the critical value.

Side Note: whatever distribution is identified for the test statistic (Z here) we must access
that corresponding table in Unit 21 to find the critical value. If I gave you a test statistic
identified as F (we will see this later in the course), then you will need to use the F table to
find the critical value.

186
Recall in the Hypothesis Testing unit that a balance was used to present the test statistic. As more
and more evidence was collected for the alternative the balance began to tip to the right side and
eventually passed the critical value.

Below is a Standard Normal graph with a point on the right identified as the critical value. [Here
is where the arrow in the alternative (>) points to the side of the graph where the critical value
can be found. The critical value in the graph is on the right side and the arrow points to the right.]

Figure 1

If you consider this graph as the balance, then it must be perfectly balanced at 0, since this is the
exact middle of a symmetric distribution. Recall that the test statistic collects evidence for the
alternative. On this balance I am going to place a one gram weight at the point equal to the
calculated value of the test statistic. Thus, if there was absolutely no evidence for the alternative
then the value of Z would equal what? It would equal 0. If we placed the one gram weight at 0
(the point on the graph equal to the value of the calculated test statistic) what would happen?
Nothing. It would still be balanced perfectly.

For convenience of discussion at this point I will tell you that the critical value is 1.645. What
would happen if there was just a little evidence for the alternative; say Z = .5 amount? We would
now place the one gram weight at the point .5 on this balance and the balance would begin to tip
down to the right. Here you will have to use your imagination to see the scale tipping down. The
purpose of the critical value is to identify that point such that a value of the test statistic less than
the critical value will place our one gram weight to the left of the point identified as the critical
value. What this means is that there isn‟t sufficient evidence for the alternative to be able to
reject the null. However, as the test statistic is able to find more and more evidence for the
alternative, eventually the one gram weight will be pushed to the right beyond the critical value.
At this point the balance is sufficiently tipped over and it is no longer reasonable to believe in the
null hypothesis; it should be rejected.

So how do you find the critical value? Here is where the , specified in step 2 comes in. In
Figure 1 above, you will notice that I have an arrow pointing to the area in the Standard Normal
Curve beyond the critical value and have identified this area as . Alpha is the concept from Unit
10 (Estimation and Confidence Intervals) and Unit 11 (Hypothesis Testing) that relates to the

187
proportion of errors you are willing to make in your predictions. Alpha has been set in this class
to always be .05, which means that all of our decisions to reject the null hypothesis are being
made with the understanding that 5% of our decisions will be wrong. To find the critical value
we know the probability (.0500, the value inside the Z table) and we need to find the Z value
(CUTOFF) associated with this probability. Remember that the CUTOFF value is found by a
combination of the first column and first row of the table. This was like question 5 in the quiz of
Unit 9. We will see in the Standard Normal Table that the probability associated with a Z value
of 1.64 is .0505 and the probability associated with a Z value of 1.65 is .0495. The probability
that we are seeking is .0500 which is half way between these two or a Z value half way between
1.64 and 1.65 or 1.645 (the number given to you above).

The alternative hypothesis associated with Figure 1 above is called one-tailed since the critical
value is on only one side of the graph.

Now we are ready for the second specific alternative; HA: p < .5

Recalling that the arrow in the alternative points the way (<), produces the following Figure.

Figure 2

Once again we see that that alternative hypothesis that generates this graph would be called one-
tailed, since the critical value is only on one side of the graph. If we are collecting evidence now
for the alternative the one gram weight will be going to the left rather than to the right. The
discussion here is simply the mirror image of the one above. Using the same argument we are
now looking for that value of Z that is associated with  (.05) on the left hand side of the graph.
There are two easy ways to get to this answer.

First, the Standard Normal Distribution is symmetric. This means that the left hand side is simply
the reverse of the right hand side.

Hence, if our critical value before was 1.645 it now must be -1.645.

Second, recalling the Normality Unit (Unit 9; Fact 2) we have that pr (Z > z) = pr (Z < -z)

or that pr (Z > 1.645) = pr (Z < -1.645).

188
In either case, we get the critical value for the alternative hypothesis (HA: p < .5) as -1.645. Here
we will reject the null hypothesis as soon as the value of the test statistic goes beyond the critical
value or becomes LESS than -1.645 (more negative).

Now for the third alternative hypothesis; the general alternative. HA: p      .5

Since there is no arrow in this alternative then we must put the critical value in BOTH tails of the
Standard Normal Distribution. This is depicted in Figure 3.

Figure 3

Since there are two critical values associated with this graph, not to surprisingly this situation is
called two-tailed. While at first a little daunting, the critical values in this graph are actually
fairly easy to calculate given the discussion above. All we need to do is calculate the critical
value on the right and then the one on the left is merely its negative. The only difference between
this two-tailed situation and the earlier one-tailed situation is that now the alpha must be divided
between its two parts. If  = .05, then /2 = .025. Alpha still represents the proportion of errors
that we are willing to make, but now we are willing to make 2.5% errors declaring greater than
(the right hand side) and 2.5% errors declaring less than (the left hand side). Together they total
our 5% errors. Now we are looking for the Z value associated with a probability of .025 (this is
directly the answer to quiz question 5 from Unit 9). By looking at the Z table we see that the
probability .0250 is found in the row marked by 1.9 and the column marked by .06, thus
producing our desired Z value of 1.96. Now we know everything. The critical value on the right
hand side of the graph is 1.96 and the critical value on the left hand side of the graph is -1.96.

What does this mean? 1., 2., and 3. below are VERY IMPORTANT.

As we collect evidence for the alternative one of three things can happen.

1. The value of the test statistic could be greater than -1.96 and less than 1.96. This would place
the one gram weight somewhere in the middle portion of the graph. In this situation we would
not have enough evidence to reject the null hypothesis.

189
2. The value of the test statistic could be greater than 1.96. This would place the one gram weight
somewhere in the extreme right portion of the graph. In this situation we would have enough
evidence to reject the null hypothesis AND MAKE THE CONCLUSION THAT P > .5 !

3. The value of the test statistic could be less than -1.96. This would place the one gram weight
somewhere in the extreme left portion of the graph. In this situation we would have enough
evidence to reject the null hypothesis AND MAKE THE CONCLUSION THAT P < .5 !

If you have any questions regarding the material presented in this digression, then please
contact me.

END of the DIGRESSION on How to Determine the Critical Value

Since we had the general form of the alternative hypothesis we now know that we have two
critical values (1.96 and -1.96).

Comparing our test statistic to the critical value, which one of the three possibilities are we in?

Possibility 1 – Value of the Test Statistic is between -1.96 and 1.96 (fail to reject H0)

Possibility 2 – Value of the Test Statistic is greater than 1.96 (reject H0)

Possibility 3 – Value of the Test Statistic is less than -1.96 (reject H0)

Here our value of the test statistic (Step 3) was 2.610. Since 2.610 is GREATER THAN 1.96,
then we reject H0

Step 5: What does this conclusion in step 4 mean? Since 2.610 > 1.96 we are in possibility 2,
which means that the proportion of males born to presidents is greater than .5. Said
another way, the proportion of males born to presidents is greater than the proportion of
females.

190
Another Example - Problem Scenario: Parking has long been considered to be a
problem at the University of Wyoming by the students. In order to examine this problem, a
sample of 100 students was random selected and each student was asked the question, "do you
think parking at UW is a problem?" Each student could respond YES or NO. Of the 100
students, 61 respond YES to this question.

Which test situation is appropriate?

Question 1: How many variables are there? Only one. The variable is "do you think parking at
UW is a problem?"

Question 2: What level of measurement is this variable? Categorical with only two possible
outcomes, YES and NO.

Question 3: How many samples are there? There is only one sample mentioned of 100 students.

Thus from these answers the One Sample Proportion Test is appropriate.

Is the alternative hypothesis specific (one tailed) or general (two tailed)?

The problem scenario very clearly indicates that there is a single belief, YES parking is a
problem. We are only trying to determine if parking is a problem, we are not interested in the
outcome where parking is not found to be a problem. Thus this situation is specific (one tailed).
Remember the alternative is the hypothesis that we would like to demonstrate is true.
Specifically in this example that parking IS a problem. If this is true, then we should have a high
proportion (relative frequency) of YES responses, specifically more than 50% of the responses
should be YES or symbolically stated as pyes > .5

Now for the test.

Step 1: H0: pyes = .5

HA: pyes > .5

Step 2: a = .05, Test Situation = One Sample Proportion Test

Step 3: phat = number of YES responses / total in the sample = relative frequency of YES
responses = 61/100 = .61

Calculation of the numerator (top part of the test statistic) = phat - p = .61 - .50 = .11

Calculation of the denominator (bottom part of the test statistic)

= square root of {(p)(1-p)/n} = square root of {(.5)(.5)/100}

= square root of {.25/100} = square root of {.0025} = .05

191
Z = numerator / denominator = .11/.05 = 2.20

Step 4: This is a one-tailed alternative situation. The tail is in the right (the arrow, >, points to the
right). Thus the critical value is 1.645.

Since 2.20 > 1.645, the test statistic is greater than the critical value, thus we reject H0

Step 5: Our conclusion is that students do in fact believe that parking is a problem at UW.

Note: for a One Sample Proportion test, we have only 3 possible critical values

Possibility 1 (one-tailed): if HA: p > .5, then the CV = 1.645

Possibility 2 (one-tailed): if HA: p < .5, then the CV = -1.645

Possibility 3 (two-tailed): if HA: p ≠ .5, then CV1 = -1.96 and CV2 = 1.96

Quiz
A new medical diagnostic test has been developed to determine whether an individual is in the
early stages of AIDS or not. The current test is rather expensive and the new test is very cheap.
The question is whether the new test can actually distinguish between those individuals who
have AIDS and those who don‟t. Eighty individuals who were known to either have AIDS or not
were given the new medical diagnostic test. Of the 80 people 54 were diagnosed correctly and 26
were diagnosed incorrectly. Is there evidence that this test works? Test appropriately using 
=.05 and the 5-step guideline.

192
There is only one sample of 80 people. There is one variable, which is test performance result
(correct, incorrect). This is a categorical variable with only two possible outcomes; hence the
appropriate test statistic is the one sample proportion. The question of interest is to ascertain if
the test works? Works means that it would have to be more successful (correct) than not
(incorrect). Thus, the alternative is the specific one-tailed alternative (p > .5).

Step 1: H0: pcorrect = .5
HA: pcorrect > .5

Step 2:  = .05 and the test situation is the one sample proportion test

Step 3: phat = number of correct decisions / total decisions = 54/80 = .675

Calculation of the numerator (top part) = phat - p = .675 - .500 = .175

Calculation of the denominator (bottom part)

= square root of {(p)(1-p)/n} = square root of {(.5)(.5)/80} = square root of {.003125} = .056

Z = numerator / denominator = .175 / .056 = 3.125

Step 4: This situation is a one-tailed alternative to the right, thus the critical value = 1.645.

Compare the test statistic to the critical value.

Test statistic (3.125) > critical value (1.645), thus we reject H0

Step 5: What does this mean? The rejection of the null hypothesis means that we have enough
evidence to conclude that the proportion of correct answers produced by our new medical
diagnostic test is greater than 50% (HA). So the new medical diagnostic test works.

193
Unit 13: Multinomial and the Chi Square
Distribution
Terms
Multinomial Test Situation - a problem in which there is only one sample and only one variable
measured at either a categorical level of measurement or at an ordinal level of measurement with
3 or more possible values

Distribution of the Test Statistic - Chi Square with  degrees of freedom (   )
2

degrees of freedom = number of possible outcomes - 1       (note; not the sample size - 1)

n = sample size

O = observed frequency

E = expected (predicted) frequency

Equations

(O  E ) 2
Test Statistic          Sum
2

E

E = (n)(p)

Multinomial
Problem Scenario: Are gas prices really going up? To answer this question I took a random
sample of 45 gas stations two weeks ago and recorded the price of one gallon of regular unleaded
gasoline. Yesterday I went to each of these gas stations for a second time and once again
recorded the price of one gallon of gasoline. In comparing the second price to the first price I
recorded the following for each station:

Did the price go down at this station – Down

Did the price stay the same at this station – Same

Did the price go up at this station – Up

194
Here are the results of my study. The prices in 9 of the stations went down, 15 stayed the same,
and 21 went up.

To identify the test situation from the above problem scenario we once again need to address our
preliminary 3 questions.

1. How many variables are there? One; the price change as measured by down, same, up

2. What is the level of measurement of price measured in this fashion? It is ordinal with 3
possible outcomes (down, same, up).

3. How many samples are there? There is one sample of 45 gas stations.

If you have a problem in which there is only one sample and only one variable measured at
either a categorical level of measurement or at an ordinal level of measurement with 3 or
more possible values then the test situation is the multinomial.

How does this differ from the one sample proportion test situation? In essence the One
Sample Proportion Test situation and the Multinomial Test situation are IDENTICAL.
The ONLY difference is that in the One Sample Proportion Test situation the variable
ONLY HAS 2 VALUES. In the Multinomial Test situation the variable has 3 or more
values.

Knowing that the test situation is multinomial we can now determine the hypotheses. If you
recall from the general versus specific hypothesis table in Unit 12, the multinomial is one of the
four test situations in this class that does not have a specific form. This is great news. Why?

The great news is this. Regardless of the problem scenario the null and alternative hypotheses
will always be the same. For all of the multinomial problems that we will consider the null
hypothesis will be that the proportions for all possible outcomes are the same; essentially this is
saying that the null hypothesis represents the concept of fair. Thus, the speculated proportion (the
one specified in the null hypothesis is simply 1 divided by the number of possible outcomes).
The only thing that needs to be determined from the scenario is how many outcomes are in our
variable. In the problem scenario above the number of outcomes is three; the prices can go up,
stay the same, or go down. Hence, we know that the speculated proportion of each outcome as
specified in the null hypothesis must be 1/(number of possible outcomes) = 1/3 (See step 1
below).

In the multinomial test situation, there is one and only one alternative hypothesis. It
THE QUESTION OF WHETHER THE DATA REPRESENT A FAIR DISTRIBUTION
OF OUTCOMES OR NOT. The alternative hypothesis is simply not H0. This is stated in
English as "the proportions for the various outcomes are not the same.

Given this beginning, let‟s start to work through the 5-step hypothesis testing guideline.

195
Step 1: H0: pdown = psame = pup = p = 1/3      [The proportions are the same]

HA: not H0                                [The proportions are not the same]

Here pdown = proportion or probability of a gas station's prices going down

psame = proportion or probability of a gas station's prices staying the same

pup = proportion or probability of a gas station's prices going up

p = speculated proportion under the assumption that our situation is FAIR

Step 2: = .05 and the test situation is the multinomial

Step 3: How do we calculate the test statistic for the multinomial? This will require two
digressions to accomplish. The first will be the introduction of the Chi Square Distribution and
the second will be the calculation of the test statistic itself.

Digression 1 – Chi Square Distribution

As the name implies in Chi Square we are looking at something that deals with squared terms.
What does this mean? Since anything squared is positive, then it is logical to assume that the Chi
Square Distribution can not produce any negative values. If you look at the Chi Square
Distribution table in Unit 21 you will see a figure at the top of the page that starts at zero and
only goes to the right. This table is much easier to use than the Standard Normal Table. In this
class since we are only going to use  = .05 we will only be interested in two columns; the first
which is labeled degrees of freedom and the third which is labeled .05. The third column is our
alpha. Note: Chi is pronounced as K + hi, but pronounced as a one syllable word, not two.

What is the first column, degrees of freedom? Recall from the Normal Distribution unit that there
are many different shapes for the Normal Distribution. Since each shape would have us
calculating probabilities in a unique way we were fortunate to be able to use the Standard
Normal Conversion equation to transform each Normal to this ONE common form. In the case of
the Chi Square Distribution and all of the others used later in this course (T and F) there is no
ONE common form. The shape of these distributions changes in relation to their degrees of
freedom. What degrees of freedom ARE is not of any interest or value in this course. However,
for each distribution you will need to know how to calculate the degrees of freedom in
order to access the Table successfully. In the case of the multinomial the degrees of freedom
always are equal to: the number of possible outcomes – 1. In our particular problem we have 3
possible outcomes (down, same, up) and thus the degrees of freedom are 3 – 1 or 2. The symbol
that we will use in this class for degrees of freedom will be the lower case Greek letter delta,
. [As a note, the Standard Normal Distribution does not have any degrees of freedom associated
with it.]

196
How to calculate the critical value from the Chi Square Table in Unit 21.

In order to find the critical value you need to answer 2 questions.

Question 1: what are the degrees of freedom for your problem scenario?

Question 2: what is the alpha ()? This question is easy, it is always .05

Knowing the answers to these two questions you can now use the table. Here are some examples.

Example 1: If we have 6 possible outcomes, then what is the CV (critical value)?

If we have 6 possible outcomes, then  = 6-1 or 5. Since  = .05, then the CV is found in the 5th
row of the Chi Square table (=5) and the third column (.05), which is the number 11.07

Example 2: If we have 10 possible outcomes, then what is the CV?

If we have 10 possible outcomes, then  = 10-1 or 9. Since  = .05, then the CV is found in the
9th row of the Chi Square table (=9) and the third column (.05), which is the number 16.92

Example 3: If we have 3 possible outcomes, then what is the CV?

If we have 3 possible outcomes, then  = 3-1 or 2. Since  = .05, then the CV is found in the 2nd
row of the Chi Square table (=2) and the third column (.05), which is the number 5.99.

End of Digression 1

Digression 2 – The Multinomial Test Statistic
(O – E)2
2   = SUM ------------
E

This equation will need considerable explanation.

What looks like a capital X is actually the capital Greek letter Chi and is the symbol of the Chi
Square distribution.

The lower case Greek letter  (delta) stands for the degrees of freedom, which is 2 in our current
situation.

197
The O stands for the observed frequency. There is one observed frequency for each possible
outcome. In our problem there are three possible outcomes (down, same, up) and thus three
possible observed frequencies. These are Odown, Osame, and Oup. From our problem scenario we
have:

Odown = 9
Osame = 15
Oup = 21

The E stands for the expected frequency. The expected frequency is what we would expect
(or predict) to occur if in fact our null hypothesis is actually true (FAIR). Since the null
proportion (probability) of each possible outcome is the same (p = speculated proportion under
the assumption that our situation is FAIR), then each expected frequency is equal to the sample
size ("n") times p (identified in the null hypothesis). For our problem, n = 45 and p = 1/3, thus E
for any possible outcome = (45)(1/3) = 15. [Please note that E must be the same for every one
of the possible outcomes.]

Edown = 15
Esame = 15
Eup = 15

O - E = the deviation between the observed frequency and the expected frequency.
[This is the same concept as deviation (score - mean) from Unit 7.]

(O – E)2 = the squared deviation = (O - E) times (O - E)

(O – E)2/E = the squared deviation divided by the expected frequency
= contribution to the test statistic from each of the outcomes

The SUM stands for the sum of all the deviations squared (O – E)2 each divided by its own E.
(the sum of the contributions to the test statistic, which equals the value of the test statistic).
Thus, we will need to calculate each of the deviations (O – E) then square all of these
deviations then divide each of these squared deviations by E and then add them together to
get the SUM (O – E)2/E .

It should be noted that there will be one (O – E)2/E value for each possible outcome. Once again,
this value for each outcome is the contribution to the test statistic for that particular outcome.

198
Putting all of this together we get the following for our problem scenario:

Deviations (O-E)
Odown – Edown = 9 - 15 = - 6
Osame – Esame = 15 - 15 = 0
Oup – Eup = 21 - 15 = 6

Squared Deviations (O – E)2
(Odown – Edown)2 = (-6)2 = (-6)(-6) = 36
(Osame – Esame)2 = 02 = (0)(0) = 0
(Oup – Eup)2 = 62 = (6)(6) = 36

Contributions to the Chi Square Test Statistic
(Odown – Edown)2 / Edown = (9 – 15)2 / 15 = (-6)2 / 15 = 36/15 = 2.4
(Osame – Esame)2 / Esame = (15 – 15)2 / 15 = (0)2 / 15 = 0/15 = 0
(Oup – Eup)2 / Eup = (21 – 15)2 / 15 = (6)2 / 15 = 36/ 15 = 2.4

And finally

SUM of all these terms = 2.4 + 0 + 2.4 = 4.8

Thus the value of our test statistic is

2 = 22 = 4.8

Here is a template that should enable you to easily get through this calculation.

p = proportion identified in the null hypothesis = 1/3

E = (n)(p) = (45)(1/3) = 15

Template
Possible Outcomes
Down Same Up                Sum         Comment on the Sum
O               9      15    21           45         This row must add to n
E              15      15    15           45         This row must add to n
(O-E)          -6      0      6            0         The sum of deviations must always add to 0
(O-E)2         36      0    36                       Not of interest
(O-E)2/E      2.4       0   2.4           4.8        This is the value of your test statistic

End of Digression 2

199

Step 4: The critical value can easily be found from the Chi Square Table in Unit 21. Since there
is one and only one possible alternative hypothesis for a multinomial situation, then there is one
and only one way to find the critical value (unlike the situation in the previous unit). By looking
at the following figure we see that we are only interested in the critical value associated with the
right tail of the distribution. This figure perfectly corresponds to the figure at the top of the Chi
Square Table. Thus we know that the table provides us directly with our critical value
(Digression 1).

Figure 1

As a review to access the table we need to know alpha, which is .05 and we need to know our
degrees of freedom (), which is the number of outcomes minus 1 or (3 - 1) = 2. To use the table
go down the first column until you reach the appropriate degrees of freedom for your problem
then go over to the third column (the title of this column should be .05). The value indicated will
be your critical value. In our problem you should find the critical value to be 5.99.

How do you use the critical value in the multinomial situation?

Possibility 1: If your test statistic > critical value, then Reject H0. The outcomes are NOT
EQUALLY likely (this is HA)

Possibility 2: If your test statistic < critical value, the Fail to Reject H0. The outcomes seem to be
equally likely (this is H0)

For our problem the value of the test statistic was 4.8 and our critical value was 5.99. Thus, our
results indicate that we do not have evidence to indicate that the outcomes are not equally likely.

Step 5: What do our results mean? We do not have evidence that gas prices are going up any
more than they are going down or staying the same.

200
Another Example
Problem Scenario: Do freshmen, sophomores, juniors, or seniors at UW have more fun? In order
to answer this question, a sample of 100 seniors at UW was random selected and produced the
following results:

Question asked of each student: Thinking back over your time at UW, which year in school did
you have the most fun? The possible answers to this question were as a freshman, as a
sophomore, as a junior, or as a senior.

Of the 100 students, 35 said freshman, 15 said sophomore, 10 said junior, and 40 said senior.

Which test situation is appropriate? Here are our three preliminary questions.

Question 1: how many variables? There is ONLY one, which year in school did you have the
most fun. There are 4 possible responses to this question: freshman, sophomore, junior, and
senior.

Question 2: what level of measurement is our variable? It is ordinal.

Question 3: how many samples? There is only one sample; it consists of 100 seniors.

Thus, with one sample and only one variable which is ordinal with 3 or more possible values, the
test situation is Multinomial.

From this scenario we know

n = sample size = 100

p = specified proportion if our variable is fair = 1/(number of possible outcomes) = 1/4

E = expected frequency = (n)(p) = (100)(1/4) = 25

 = degrees of freedom = 4 - 1 = 3

Step 1: H0: pfreshman = psophomore = pjunior = psenior = p = 1/4

HA: not H0            [The proportions are not equal to one another]

Step 2:  = .05 and the test situation is the multinomial

201
Step 3: Calculate the value of the test statistic (TS) using our template

Possible Outcomes

Freshman     Sophomore Junior Senior           Sum

O                 35           15         10       40         100

E                25            25         25      25          100

(O-E)            10           -10         -15     15            0

(O-E)2           100          100         225     225

(O-E)2/E          4             4          9       9         26.00 = 23 = TS

Step 4: The CV is found in the Chi Square table of Unit 21. With  = 3 and  = .05 the CV =
7.81.

Since the TS (26.00) > CV (7.81), therefore we will reject H0

Step 5: THIS IS AN IMPORTANT DISCUSSION about the interpretation of a significant
multinomial test.

The interpretation of a significant result (reject H0) from the multinomial is found through
the deviations (O-E). I would note that the full answer for Step 5 is found in the last paragraph
of this section below. The rest of this discussion is presented to explain how to arrive at this

If the null hypothesis is true, then each and every one of the deviations (O-E) should be zero or
very close to zero. If the alternative hypothesis is true, then the deviations (O-E) should be very
different from zero. What does this mean?

Since E is our expected frequency assuming the null hypothesis is true, then if (O-E) is near zero,
then our observed frequency must be very close to that expected (predicted) by the null
hypothesis. If (O-E) is not near zero, then our observed frequency is not very close to that
expected (predicted) by the null hypothesis. The larger the deviations the worse are the
predictions from the null hypothesis.

In our problem scenario what does this mean? Our main question of interest is which year do
UW students have the most fun? If the test is not significant, then our conclusion is that UW
students have essentially the same amount of fun as freshmen as sophomores as juniors as
seniors. One year is not really better or worse than any other.

202
However, if the test is significant, then our conclusion is that UW students experience more fun
in some years than they do in others. Once we know that they experience more fun in some years
than others, then the next logical question is which year or years? The answer to this question is
found in the deviations (O-E). The deviation line from step 3 appears in red. Let's interpret these
four numbers.

Freshman O - E = 10, What does 10 mean? First the number is positive, which means that O
must have been larger than E. This can be easily seen to be true by looking at the O and E lines
of our template. What does a positive 10 mean? It means that we observed 10 people more (O -
E is positive) than we expected. If the FAIR model was true (null hypothesis; which states that
students think each year in school is equally likely to be the most fun), then we should have had
only 25 students respond that their freshman year was their best, but in fact 35 said the freshman
year was their best (10 more than predicted).

Sophomore O-E = -10, What does -10 mean? It means that we observed 10 people less (O - E is
negative) than we expected. If the FAIR model was true (null hypothesis), then we should have
had 25 students respond that their sophomore year was their best, but in fact only 15 said the
sophomore year was their best (10 less than predicted).

Junior O-E = -15, What does -15 mean? It means that we observed 15 people less (O - E is
negative) than we expected. If the FAIR model was true (null hypothesis), then we should have
had 25 students respond that their junior year was their best, but in fact only 10 said the junior
year was their best.

Senior O-E = 15, What does 15 mean? It means that we observed 15 people more (O - E is
positive) than we expected. If the FAIR model was true (null hypothesis), then we should have
had only 25 students respond that their senior year was their best, but in fact 40 said the senior
year was their best.

Typically, for simplicity, when the result is significant (reject H0) we usually look at only
the largest positive deviation and the largest negative deviation for interpretation rather
than all of the deviations.

In this sense the interpretation of our significant result (Step 5) would be, UW students in
general believe that their senior year is the one which is the most fun (highest positive
deviation) and believe that their junior year is the one which is the least fun (highest
negative deviation).

203
Quiz
A study of 791 heart attack victims produced the following data.

Monday Tuesday Wednesday Thursday Friday Saturday Sunday
145    105      111       120     97      115     98

Are heart attacks equally likely to occur any day of the week? Appropriately test this question
using  = .05.

Our preliminary three questions.

1. How many variables? One; day of the week.

2. What level of measurement is day of the week? It is actually categorical with 7 possible
outcomes. It might seem ordinal, but then what happens from Sunday to Monday?

3. How many samples are there? There is only one sample of 791 (n = 791) heart attack victims.
We know that there is only one sample, because there is no mention of a second, third, etc.
sample.

Since we have only one sample and only one variable which is categorical with 3 or more values
the test statistic is multinomial. If there were only 2 values the test situation would have been the
one sample proportion.

204
Step 1: H0: pmonday = ptuesday = pwednesday = pthursday = pfriday = psaturday = psunday = p = 1/7
: heart attacks are equally likely to occur on any day
HA: not H0
: heart attacks are not equally likely to occur on any day

The proportion identified in the null hypothesis will be 1/(number of possible outcomes)

Step 2:  = .05 and the test situation is the multinomial

Step 3: I will use our template.

p = 1/7

E = (n)(p) = (791)(1/7) = 113

Using our Template for the Multinomial

Monday Tuesday Wednesday Thursday Friday Saturday Sunday                                    Sum
O        145    105      111      120       97     115      98                                      791
E        113    113      113       113    113      113     113                                       791
(O-E)     32      -8      -2        7      -16       2     -15                                        0
2
(O-E)    1024    64        4       49     256        4     225
(O-E)2/E 9.06    .57      .04      .43    2.27      .04    1.99                                     14.40

 = degrees of freedom = the number of possible outcomes – 1 = 7 – 1 = 6

2 = 26 = value of the test statistic = 14.40

Step 4: The critical value for this problem is found in the Chi Square table in Unit 21. The
degrees of freedom are 6 (row 6 of the table) and the is.05 (column 3), hence the critical value
is 12.59.

The test statistic (14.40) is larger than the critical value (12.59), thus we Reject H0.

Step 5: This is a very important discussion and must be fully understood. If it is not
completely understandable, then be sure to ask me a question about it. I would note that
nearly all of what is presented in this step is for instruction and is not a part of what needs to be
presented in this step. In fact, the only portion of this discussion that needs to be presented for
the answer to Step 5 appears in blue below (the past paragraph of the answer to this quiz).

What does our result in Step 4 mean? We now have evidence that heart attacks do not occur
equally throughout the week. If they aren‟t equally likely to occur, then how are they not
equal likely to occur. Once again the (O-E) row in our template provides us with the answer.

If (O-E) = 0 or close to 0, then our observed results are close to our expected results.

205
But remember that our test statistic is the collector of evidence for the alternative. The alternative
is that our observed results will be different than our expected results. Thus anywhere where (O-
E) is 0 or close to zero doesn‟t provide us with much evidence for the alternative.

However, as (O-E) gets bigger and bigger, then (O-E)2 gets bigger and bigger, (O-E)2/E gets
bigger and bigger, and then the SUM of all the (O-E)2/E must get bigger and bigger.

Thus the bigger (O-E) values will eventually be the bigger contributors to the test statistic.

What are the bigger (O-E) values? The biggest is for Monday (+32) and the second biggest is for
Friday (-16).

What does E really mean in our problem? If the null hypothesis is true (heart attacks are equally
likely to occur throughout the week), then E is the number of heart attacks in our sample we
would expect to occur on any day of the week. If heart attacks are equally likely to occur
throughout the week (H0), then we would expect 113 of them to have occurred on Monday,
Tuesday, Wednesday, etc.

What does (O-E) > 0 mean? That more heart attacks did actually occur (observed) than we
expected.

What does (O-E) = +32 for Monday mean? That we actually observed 32 MORE heart attacks
on Monday than we would have expected if H0 were true.

What does (O-E) < 0 mean? That fewer heart attacks did actually occur (observed) than we
expected.

What does (O-E) = -16 for Friday mean? That we actually observed 16 FEWER heart attacks on
Friday than we would have expected if H0 were true.

Now we can answer Step 5. What does our result mean?

The evidence from our sample was that heart attacks are not equally likely to occur
throughout the week. We have evidence that indicates that more occur on Monday (largest
positive deviation) and fewer occur on Friday (largest negative deviation)!

206
Unit 14: Homogeneity and Independence
Terms
Homogeneity Test Situation - a problem in which there are two or more samples, and only one
variable measured at a categorical level of measurement or at an ordinal level of measurement

Independence Test Situation - a problem in which there is only one sample, and there are two
variables each measured at a categorical level of measurement or at an ordinal level of
measurement

Distribution of the Test Statistic - Chi Square with  degrees of freedom (   )
2

 = degrees of freedom = (number of rows - 1)(number of columns - 1)

Contingency Table - a table with multiple rows (2 or more) and multiple columns (2 or more).
The boxes formed by the table are called cells. The numbers in the cells are the observed
frequencies. The number of cells in a contingency table is equal to the number of rows times the
number of columns. Thus, a contingency table with 2 rows and 3 columns has 6 (2 times 3) cells.

O = observed frequency - there is one observed frequency for each cell

E = expected frequency - there is one expected frequency for each cell

Cell - one of the boxes formed by the intersection of the rows and columns in the contingency
table; the value in each cell is the observed frequency

Marginal Sum - the sum of the observed frequencies for an entire row or column

Row Marginal Sums - are the sums of the observed frequencies for each and every row

Column Marginal Sums - are the sums of the observed frequencies for each and every column

Total Sum - is the sum of all the observed cell frequencies = sum of all the row marginal sums
= sum of all the column marginal sums

Equations
(O  E ) 2
Test Statistic       Sum
2

E

E = expected cell frequency = (row marginal sum)(column marginal sum) / (total sum)

207
Problem Scenario. While it might not seem overly exciting I am going to begin this
unit by expanding upon the gas station illustration of the previous unit. This is for two reasons;
first to show how the homogeneity problem is similar to the multinomial problem and second to
show how it is different.

Problem Scenario: Are gas prices really going up? To answer this question I took a random
sample of 45 gas stations in Wyoming and a random sample of 60 gas stations in Colorado
(Colorado is bigger so I took a bigger sample from them) two weeks ago, and recorded the price
of one gallon of regular unleaded gasoline. Yesterday I went to each of these gas stations for a
second time and once again recorded the price of one gallon of gasoline. In comparing the
second price to the first price I recorded the following for each station.

Did the price go down at this station – Down

Did the price day the same at this station – Same

Did the price go up at this station – Up

The similarities and differences between this problem and the one at the beginning of the
previous unit should be fairly apparent. Everything is the same, with the exception that
I now have two samples instead of one; a sample from Wyoming and a sample from Colorado.

To identify the test situation from the above problem scenario we once again need to address
our 3 preliminary questions.

1. How many variables are there? One; the price change as measured by down, same, up

2. What is the level of measurement of price measured in this fashion? As in the last unit, this
variable is ordinal with 3 possible values (down, same, up).

3. How many samples are there? The problem clearly identifies two samples (one sample in
Wyoming and another sample in Colorado). This is the only difference between this problem and
the previous one (multinomial).

What is the indicated test situation? If you have a problem in which there are two or more
samples, and only one variable measured at a categorical level of measurement or at an
ordinal level of measurement, then the test situation is Homogeneity.

Knowing that the test situation is homogeneity we can now determine the hypotheses. From the
table in Unit 12, we can see that homogeneity is another of the test situations in which there is
not a specific alternative form. In general, the null hypothesis for the homogeneity test situation
can be expressed as each sample behaves similarly in regard to the variable. For our problem
scenario the null hypothesis would appear as "gas price changes are the same in Wyoming as in

208
The alternative hypothesis is simply not the null. What does this mean? Very simply the
alternative hypothesis for the homogeneity test situation can be expressed as each sample
behaves differently in regard to the variable. For our problem scenario the alternative would
appear as, "gas price changes are different in Wyoming than they are in Colorado."

Homogeneity
Homogeneity means that the responses to the variable for each sample are in the same proportion
as the responses for all of the other samples. The samples are homogeneous (the same). For
illustration in our sample, homogeneity would mean

that the proportion of down responses should be the same or very similar between Wyoming and

that the proportion of same responses should be the same or very similar between Wyoming and

that the proportion of up responses should be the same or very similar between Wyoming and

This is the null hypothesis in the homogeneity situation. It can be stated simply as

H0: The gas price changes are the same in Wyoming and Colorado

In general the null hypothesis can be written in accordance with the following template

H0: The populations (from which the samples were selected) are the same on the variable

Just like in the multinomial situation, the alternative here is simply

HA: not H0

In English this states that gas price changes in Wyoming are different than are the gas

The results from a problem such as our example are typically presented in a table like the one
below. This is called a contingency table. The values (numbers) in a contingency table are the
frequencies. The table below is called the observed (O) frequency table and reflects the results of
our sample data.

__Down____Same_____Up___
Wyoming |____9____|___15___|___21___| 45
21        30       54     105

209
Here are some terms associated with these tables (matched by color).

Cells – the cells are the boxes in the table formed by combinations of the rows and columns.
There are 6 cells in this table (values in red). Often the cells are referred by two numbers (i,j),
where i = the row and j = column. For instance, the (1,1) cell is the number of gas stations in our
Wyoming sample (row 1) in which the prices went down (column 1); and the (2,3) cell is the
number of gas stations in our Colorado sample (row 2) in which the prices went up (column 3).

Marginal Sums – the marginal sums are the values on the sides (margins) of this table and
reflect the corresponding row or column sums (values in blue). Logically the row sums are the
sizes of our samples.

Total Sum – the total sum is the sum of the column marginal sums = the sum of the row
marginal sums = sum of the sizes of all the samples in green). This is sometimes called the total
sample size.

Now we are ready to test the hypothesis of homogeneity using our 5-step guideline.

Step 1: H0: The gas price changes are the same in Wyoming and Colorado (homogeneity)
HA: not H0 [gas price changes are different in Wyoming and Colorado]

Step 2:  = .05 and the test situation is homogeneity

Step 3: How do we calculate the test statistic for homogeneity? In the answer to this question I
have good news and bad news. The good news is that the test statistic uses the Chi Square
Distribution; the same as in the previous unit. The bad news is that the calculation of the test
statistic is considerably more complicated.

Digression – The Homogeneity Test Statistic
(O – E)2
2   = SUM ------------
E

Even though this equation looks identical to the one in the previous unit for the multinomial it
isn‟t exactly the same.

The part that is the same.

What looks like a capital X is actually the Greek letter chi and used to designate the Chi Square
distribution.

The O stands for the observed frequency (these are the values in our contingency table) and the E
stands for the expected frequency.

The part that is different.

210
The lower case Greek letter  stands for the degrees of freedom. For Homogeneity the degrees of
freedom are equal to

(# of rows in the contingency table – 1) times the (# of columns in the contingency table – 1).

Which in our problem is (# of rows – 1)(# of columns – 1) = (2-1)(3-1) = (1)(2) = 2. Here  = 2.

Although the E stands for the expected frequency as before, it is calculated entirely differently.
The expected values are calculated ONLY FOR THE CELLS and each cell expected value is
determined by the following equation

E = (row marginal sum for the row the cell is in) times (column marginal sum for the column the
cell is in) all of this divided by the total sum, or in a slightly simplified manner

E = (appropriate row sum) (appropriate column sum) / total sum

For illustration here are the calculations for the expected frequencies for all of the cells in our
problem.

E for the (1,1) cell = (first row, first column) cell = prices went down in Wyoming

(appropriate row sum)(appropriate column sum)/total sum

= (row 1 sum)(column 1 sum)/total sum = (45) (21)/105 = 9.00

E for the (1,2) cell = (first row)(second column) cell = prices stayed the same in Wyoming

= (row 1 sum)(column 2 sum)/total sum = (45)(30)/105 = 12.86

E for the (1,3) cell = (first row)(third column) cell = prices went up in Wyoming

= (row 1 sum)(column 3 sum)/total sum = (45)(54)/105 = 23.14

E for the (2,1) cell = (second row)(first column) cell = prices went down in Colorado

= (row 2 sum)(column 1 sum)/total sum = (60) (21)/105 = 12.00

E for the (2,2) cell = (second row)(second column) cell = prices stayed the same in Colorado

= (row 2 sum)(column 2 sum)/total sum = (60)(30)/105 = 17.14

E for the (2,3) cell = (second row)(third column) cell = prices went up in Colorado

= (row 2 sum)(column 3 sum)/total sum = (60)(54)/105 = 30.86

211
Well this was the worst of the bad news. It really isn‟t so complicated as it is more time
consuming and calculator intense.

Now all you have to do is calculate (O-E), then (O-E)2, and finally (O-E)2/E. To do this I will
introduce a helpful template just like the one in the previous unit. Rather than having a nice
simple single table we will have to construct four separate tables; one for each of E, (O-E), (O-
E)2, and (O-E)2/E. You should check my calculations for your own benefit.

Template         [E Table, (O-E) Table, (O-E)2 Table, and (O-E)2/E Table

Expected (E) Table
__Down_____Same_____Up___
Wyoming |____9____|__12.86__|__23.14__|              45

21        30       54       105

In the Expected Table above, I have left the marginal sums and total sum in this table to illustrate
that the cells still add to these very same numbers.

(O-E) Table
__Down______Same______Up___
Wyoming |____0____|___ 2.14___|___-2.14__|                0
0          0          0

In the (O - E) Table above, it should not be surprising that the row and column marginal sums
are now zero, since the sum of all deviations must always equal 0. However it is a good check of
our calculations.

(O-E)2 Table
__Down______Same______Up___
Wyoming |____0____|___ 4.58___|___4.58___|

(O-E)2/E Table
__Down_____Same______Up___
Wyoming |____0____|___ .36___|___.20___|              .56
.98 = Value of the Test Statistic

The sum of all the cells = value of the test statistic = 2 = 22 = .98

212
Step 4: The critical value for the homogeneity is calculated in exactly the same manner as it was
for the multinomial. The degrees of freedom for our problem are 2 and the  = .05. Going to the
Chi Square table in Unit 21 produces the critical value = 5.99.

Possibility 1: the test statistic > critical value, Reject H0 (reject homogeneity)

Possibility 2: the test statistic < critical value, Fail to Reject H0 (homogeneity)

For our problem the test statistic is .98 and the critical value is 5.99.

The value of the test statistic (.98) is less than the critical value (5.99), thus we fail to Reject H0

Step 5: It would appear that the change in gasoline prices over the past two weeks are similar in

Another Problem Scenario
The question I am interested in is the following, “is a person attitude about gun control related to
her/his attitude about the death penalty?”

To answer this question I randomly sampled 100 residents of the State of Wyoming and asked
them the following two questions.

Question 1: Do you believe that the State has a right to limit your ownership and use of guns?

Possible answers were YES or NO

Question 2: Do you believe that certain crimes should be punishable by the death penalty?

Possible answers were YES or NO

The results from this survey are displayed in the table below.

Death Penalty

__YES____ NO__

Gun Control YES |___5___|__ 35___| 40

NO |__ 50___|__10___| 60

55        45       100

213
Conduct the appropriate test for this problem. Although the table above looks very similar to the
contingency table presented during the Homogeneity portion of this unit, is it really the same?

To begin let‟s look at our 3 preliminary questions.

1. How many variables are there? Two; the first is attitude about gun control (question 1) and the
second is attitude about the death penalty (question 2).

2. What is the level of measurement of these two variables? Both are measured YES/NO and
hence are categorical variables.

3. How many samples are there? There is one sample of 100 residents.

What is the indicated test situation? If you have a problem in which there is only one
sample, and there are TWO variables each measured at a categorical or at an ordinal level
of measurement, then the test situation is Independence.

Knowing that the test situation is Independence we can now determine the hypotheses. From the
table in Unit 12, we can see that Independence is another of the test situations in which there is
not a specific alternative form. In general, the null hypothesis for the independence test situation
can be expressed as the two variables are independent of one another. The word independent
means not related. So the null hypothesis can also be stated as the two variables are not related to
one another. For our problem scenario the null hypothesis would appear as "peoples attitudes
about gun control are not related to their attitudes about the death penalty."

The alternative hypothesis is simply not the null. What does this mean? Very simply the
alternative hypothesis for the independence test situation can be expressed as the two variables
are related to one another. For our problem scenario the alternative hypothesis would appear as,
"people‟s attitudes about gun control are related to their attitudes about the death penalty."

Please note that it is possible to state the alternative hypothesis as a question, "are people's
attitudes about gun control related to their attitudes about the death penalty?" This question is a
Form 2 question as presented in the Research Questions Section of Unit 1.

214
Independence
What does independence mean? If the two variables are independent, then the response a person
gives to the first variable is not related to the response she/he gives to the second variable. This
would mean that a person‟s attitude about gun control is not related to her/his attitude about the
death penalty. This will be explained in more detail as we go through this example.

The null hypothesis for this type of problem is simply that the two variables are independent or
not related.

H0: A person‟s attitude about gun control and the death penalty are independent (not related)
HA: not H0 [Which means that these two attitudes are in some fashion related.]

Let‟s try to conduct the test of independence.

Step 1: H0: A person‟s attitude about gun control and the death penalty are not related
HA: not H0    (are related)

Step 2:  = .05 and the test situation is independence

Step 3: How do we calculate the test statistic for independence? In the answer to this question I
have good news and bad news. The test statistic is calculated identically to how it was calculated
for homogeneity. The good news is that this is not a new method. The bad news is that it will be
just as time consuming and calculator intense. Once again I would suggest that you check my

The Observed (O) Table
Death Penalty
__YES____ NO__
Gun Control YES |__ 5 ___|__ 35 __| 40
NO |__ 50___|_ _10 __| 60
55      45   100

The template

E for cell (1,1) = (row 1 sum)(column 1 sum)/total sum = (40)(55)/100 = 22

E for cell (1,2) = (row 1 sum)(column 2 sum)/total sum = (40)(45)/100 = 18

E for cell (2,1) = (row 2 sum)(column 1 sum)/total sum = (60)(55)/100 = 33

E for cell (2,2) = (row 2 sum)(column 2 sum)/total sum = (60)(45)/100 = 27

215
Expected (E) Table
Death Penalty
__YES____ NO__
Gun Control YES |__ 22___|__ 18__| 40
NO |__ 33___|_ _ 27__| 60
55       45   100

(O-E) Table
Death Penalty
__YES____ NO__
Gun Control      YES |__-17___|__ 17__| 0
NO |__ 17___|_ _-17__| 0
0         0

(O-E)2 Table
Death Penalty
__YES____ NO__
Gun Control YES |__289___|__ 289__|
NO |__ 289___|_ 289__|

(O-E)2/E Table
Death Penalty
___YES_____ NO__
Gun Control YES |__ 13.14__|__ 16.06_|              29.20
NO |__ 8.76__|_ _ 10.70_|              19.46
48.66 = value of the test statistic

The degrees of freedom for independence is calculated just like it was for homogeneity which
was the  = (# of rows – 1) times (# of columns – 1) = (2 – 1)(2 – 1) = (1)(1) = 1

2= 21 = 48.66

Step 4: Once again to calculate the critical value we will use the Chi Square table in Unit 21.
Here the degrees of freedom are = 1 and  = .05. These produce the critical value = 3.84.

Possibility 1: test statistic > critical value, reject H0, the two variables are related

Possibility 2: test statistic < critical value; fail to reject H0, the two variables seem to be not
related

In this problem we have

Test statistic (48.66) > critical value (3.84), thus we Reject H0

Step 5: The evidence from our study indicates that a person‟s attitude about gun control is
related to their attitude about the death penalty. What does this mean?

216
To interpret the significant finding (reject H0) in any test in which we calculate (O-E), such
as the multinomial, homogeneity, and independence, we will look at the (O-E) part of the
test statistic calculation to interpret the result. As with the multinomial and the homogeneity
situations,

If (O-E) > 0, then we observed more responses than we would have expected if our null
hypothesis were true.

If (O-E) < 0, then we observed fewer responses than we would have expected if our null
hypothesis were true.

Let‟s look more closely at the specifics of our (O-E) table above. It is presented here again for
our convenience.

(O-E) Table
Death Penalty
__YES____ NO__
Gun Control YES |__-17___|__ 17__| 0
NO |__ 17___|_ _-17__| 0
0         0

As seen in Unit 13, in the multinomial situation we often interpret only the largest positive
deviation and the largest negative deviation.

However, in the homogeneity situation and in the independence situation, we often interpret
only the positive deviations. For instance, in the problem, two of the cells have (O-E) > 0.

In cell (1,2) we have more people than expected saying Yes to gun control at the SAME TIME
as they are saying No to the death penalty. In cell (2,1) we have more people saying No to gun
control at the same time as they are saying YES to the death penalty.

What does this mean? It seems that people who favor gun control (limits) also are against the
death penalty (limits) and people who are against gun control (no limits) also favor the death
penalty (no limits).

217
Another Example
Problem Scenario: 65 people who attended the Denver Auto Show were asked the following 2
questions.

Question 1: Do you like NASCAR auto racing? Yes          No

Question 2: Do you like country music? Yes No

Based on this information alone, what is the test situation?

Our 3 preliminary questions.

Question 1: how many variables are there? There are two; do you like NASCAR and do you like
country music.

Question 2: what level of measurement are these variables? They are YES/NO, hence
categorical.

Question 3: how many samples are there? There is only one sample mentioned of 65 people.

If we have only one sample and two variables measured at the categorical level, then the test
situation is independence.

Here are the data for our 65 people and 2 variables.

Like Country Music

__YES_____NO__

Like NASCAR         |_ _30_ _|__ 10___| 40

Don't Like Nascar |__ 5 __|_ _ 20___| 25

35        30       65

Test this situation.

Step 1: H0: Liking NASCAR auto racing is NOT RELATED to liking country music

HA: Liking NASCAR auto racing is RELATED to liking country music

Step 2:  = .05 and the test situation is independence

218
Step 3: Calculate the value of the test statistic using the template.

Observed Table

Like Country Music

__YES_____NO__

Like NASCAR        |_ _30_ _|__ 10___| 40

Don't Like Nascar |__ 5 __|_ _ 20___| 25

35         30        65

Expected Table

E for cell (1,1) = (40)(35)/65 = 21.5

E for cell (1,2) = (40)(30)/65 = 18.5

E for cell (2,1) = (25)(35)/65 = 13.5

E for cell (2,2) = (25)(30)/65 = 11.5

Like Country Music

__ YES______NO__

Like NASCAR        |_ _21.5_ _|__ 18.5__| 40

Don't Like Nascar |__ 13.5 __|__ 11.5__| 25

35            30      65

(O-E) Table

Like Country Music

__ YES______NO__

Like NASCAR        |_ _8.5_ _|__ -8.5___| 0

Don't Like Nascar |__ -8.5 __|__ 8.5__ | 0

219
(O-E)2 Table

Like Country Music

___ YES_______NO__

Like NASCAR       |_ _72.25_ _|__ 72.25__|

Don't Like Nascar |__ 72.25 __|__ 72.25__|

(O-E)2/E Table

Like Country Music

__ YES______NO__

Like NASCAR       |_ _3.26_ _|__ 3.91__|     7.17

Don't Like Nascar |__ 5.35 __|__ 6.28__| 11.63

18.80

Step 4:

The degrees of freedom () = (number of rows - 1)(number of columns - 1)

= (2 - 1)(2 - 1) = (1)(1) = 1

 = .05

So the CV is 21 = 3.84

The TS (18.80) > CV (3.84), so we reject H0

Step 5: Our conclusion is that Liking NASCAR auto racing is related to Liking Country Music,
specifically looking at the two positive deviations we see that People who Like NASCAR auto
racing also Like Country Music (cell (1,1)), and People who Do Not Like NASCAR auto racing
also Do Not Like Country Music (cell(2,2)). Thus, if you like one you tend to like the other. And
if you don't like one you tend to not like the other.

220
Quiz 1
Here are three problem settings. One is a multinomial problem, one is a homogeneity problem,
and one is an independence problem. Identify each problem below, and write the appropriate null
and alternative hypothesis.

Problem 1

Do the people in legislative branch of the government favor the war in Iraq? To answer this
question I took a random sample of 20 democratic senators and a random sample of 20
republican senators and asked them the simple question, “do you favor the war in Iraq?”

Problem 2

How can we resolve our involvement in the Iraq war? To answer this question I asked a sample
of 40 senators the following question, “how do you think our involvement in Iraq will end?” The
response possibilities were (1) U.N. intercession, (2) establish an Iraqi government, or (3) we
will pull out.

Problem 3

Is the war in Iraq a fight against terrorism? To answer this question I asked a sample of 40
senators if they favored a extended fight against terrorism (Yes or No) and if they favored the
war in Iraq (Yes or No).

221
Problem 1 is a homogeneity problem. There are clearly two identified samples. The only
problem situation we have encountered so far that has more than one sample is homogeneity.

H0: democratic senators and republican senators feel the same about the war in Iraq
HA: not H0, they don‟t feel the same

Problem 2 is a multinomial problem. There is only one sample of 40 senators and there is only
one variable (involvement in Iraq will end). Since there are 3 possible outcomes this is
multinomial. If there had been only two possible outcomes this would have been the one sample
proportion.

H0: The senators feel that our involvement in Iraq is equally likely to end by U.N. intercession as
by an Iraqi government being established as by our pulling out, which can be stated statistically
(symbolically) as

pUNIntercession = pIraqigoverment = ppullingout = p = 1/3

HA: not H0, they don‟t feel these are equally likely

Problem 3 is an independence problem. There is one sample and two variables.

H0: The senators feelings about an extended fight against terrorism are not related to their
feelings about being involved in the war in Iraq

HA: not H0, their feelings about these two issues are related

222
Quiz 2
The following data were collected to answer the question posed in problem 1 of quiz 1.

Observed Table
Favor Iraq War
___YES____NO___
Democratic Senators |____5___|___15___| 20
Republican Senators |___18___|____2___| 20
23        17    40

Appropriately and completely test this homogeneity problem.

Step 1: H0: democratic senators and republican senators feel the same about the war in Iraq
HA: not H0, they don‟t feel the same

Step 2:  = .05 and the test situation is homogeneity

Step 3: Now for the work.

Observed Table
Favor Iraq War
___YES____NO__
Democratic Senators |__ 5 __|_ _ 15 __| 20
Republican Senators |__ 18__|__ 2___| 20
23       17     40

E for cell (1,1) = (row 1 sum)(column 1 sum)/total sum = (20)(23)/40 = 11.5

E for cell (1,2) = (row 1 sum)(column 2 sum)/total sum = (20)(17)/40 = 8.5

E for cell (2,1) = (row 2 sum)(column 1 sum)/total sum = (20)(23)/40 = 11.5

E for cell (2,2) = (row 2 sum)(column 2 sum)/total sum = (20)(17)/40 = 8.5

223
Expected Table
Favor Iraq War
___YES____NO__
Democratic Senators |__11.5__|__ 8.5___| 20
Republican Senators |__11.5__|__ 8.5___| 20
23       17      40

(O-E) Table
Favor Iraq War
___YES____NO___
Democratic Senators |__ -6.5__|__ 6.5___| 0
Republican Senators |___ 6.5__|__-6.5___| 0
0       0

(O-E)2 Table
Favor Iraq War
___YES_____NO___
Democratic Senators |__ 42.25__|__42.25__|
Republican Senators |___42.25 _|__42.25__|

(O-E)2/E Table
Favor Iraq War
___YES____NO___
Democratic Senators |__ 3.67__|__ 4.97__| 8.64
Republican Senators |__ 3.67__|__ 4.97__| 8.64
17.28 = value of the test statistic

The degrees of freedom for the homogeneity problem is equal to

 = (# of rows – 1)(# of columns – 1) = (2 – 1)(2 – 1) = (1)(1) = 1

2 = 21 = 17.28

Step 4: Compare the test statistic to the critical value. The critical value is determined from the
Chi Square Table in Unit 21 for  = .05 and  = 1. This produces a critical value of 3.84.

Test statistic (17.28) > critical value (3.84), thus we reject H0

Step 5: What do our results mean? Looking at the (O-E) table we see that there are more
democrats who are against the war (cell (1,2)) and more republicans who are in favor of the war
(cell (2,1)).

224
Unit 15: Scatterplot and Correlation

Terms
Scatterplot - is the graphing tool that permits us to visually look for a pattern between 2
variables; one of the variables is placed in the horizontal axis and the other variable is placed in
the vertical axis; it is the graphical method for displaying the data that come from a form 2
question

Correlation - is the method that assigns a numerical value to the pattern observed in the
scatterplot; it is the numerical method for answering a form 2 question

Positive Correlation - is the numerical value associated with the pattern observed in the
scatterplot when high values of one variable are associated (related) to high values in the other
variable (conversely, it is also the pattern observed when low values of one variable are
assocaited with low value in the other variable)

Negative Correlation - is the numerical value associated with the pattern observed in the
scatterplot when high values of one variable are associated (related) to low values in the other
variable (conversely, it is also the pattern observed when low values of one variable are
assocaited with high value in the other variable)

Zero Correlation - is the numerical value associated with the pattern observed in the scatterplot
that looks like a circle; this occurs when low values in one variable are related to both low and
high values in the other variable (conversely, it is also the pattern observed when high values in
one variable are also related to both low and high values in the other variable)

Correlation Test Situation - a problem in which there is only one sample, there are two
variables each measured at a ratio or “ratio” level of measurement, and a form 2 is being asked

Distribution of the Test Statistic - "t distribution" with  degrees of freedom (t )

 = degrees of freedom = sample size - 2 = n - 2

225
Equations
Correlation

Sum[( X  X )(Y  Y )]
r
[ Sum ( X  X ) 2 ][Sum (Y  Y ) 2 ]

Test Statistic

r
t 
(1  r 2 )
(n  2)

Scatterplot
The scatterplot represents a radical departure from the types of graphs we have seen so far. In the
previous graphs (except the pie chart), we have placed the values from the variable of interest in
the horizontal axis and the frequency count, relative frequency, or percent in the vertical axis. In
the case of the scatterplot, the values from the first variable go in the horizontal axis and the
values from the second variable go in the vertical axis. One of the simplest questions involving
two variables is the following: "is variable 1 related to variable 2?"

What does this question in fact mean? If the answer to the question is yes, then we should be able
to discern some pattern in the responses between the two variables. If the answer is no, then we
will not be able to discern a pattern. This explanation is probably more confusing than helpful,
but I have presented it for a purpose that will become clear as we progress through this section.
Suffice it to say at this point that the existence of a relationship between variables means that we
can find some pattern.

The scatterplot is the graphing tool that permits us to visually look for a pattern between 2
variables. Correlation is the numerical tool that corresponds with the scatterplot when our data
has come from questions of Form 2 (the question forms were presented in Unit 1, in the Research
Questions section). At this point, an example will definitely help. In the graphing unit I presented
an example using the scores from all 50 States on the mathematics portion of the SAT
examination. As you are probably aware, the SAT examination is typically broken down into two
scores, the mathematics score and the verbal score. In Figure 1 below, variable 1 (horizontal
axis) is the mathematics score on the SAT and variable 2 (vertical axis) is the verbal score on the
SAT.

226
Figure 1

When this problem was presented in the graphing unit the data points (sample units) were the
mathematics scores obtained from each of the 50 States. Thus, we had 50 Math SAT scores, one
for each state. In this situation, we still have 50 data points, but for each data point (State) we
have two values, its Math SAT score and its Verbal SAT score. In the scatterplot we have a total
of 50 data points (if you count them, there are 50 little squares in the scatterplot above). The
position of each square (data point, State) in the graph is located by being directly above the
Math SAT score and directly to the right of the Verbal SAT score. For the "first" square (lower
left hand corner), this particular State achieved an average Math SAT score of 437 and an
average Verbal SAT score of 394. Each of the 50 States is plotted in the scatterplot in exactly
this manner. The form 2 question of interest for this situation would be, “Does a relationship
exist between the Math SAT score and the Verbal SAT score?” This question can be visually
addressed by looking at the resultant graph (scatterplot) and attempting to see some discernible
pattern in the data points. Do you see a discernible pattern in these data? For Figure 1 above, the
answer should be yes. So how would you describe the pattern that you see?

It would seem that States with high average Math SAT scores also seem to have high average
Verbal SAT scores. And correspondingly, States with low average Math SAT scores seems to
also have low average Verbal SAT scores. Putting these two statements together provides our
description of the pattern. As scores generally increase in Math SAT scores, they also increase in
Verbal SAT scores. This is called a positive relationship or pattern. The positive pattern takes the
form of a scatter of points located from the lower left hand corner of the scatterplot moving
upward and rightward to the upper right hand corner. Reading from left to right Figure 1looks
like we are going uphill.

227
Aside: If you haven‟t noticed it already, please note that this figure is missing a large portion
of both the horizontal and vertical axes. From 0 to 420 is missing from the horizontal axis and
from 0 to 380 is missing from the vertical axis. Where this was a major issue in our discussion
about bar graphs and histograms, it is not an issue in scatterplots. In scatterplots we are only
attempting to see the pattern of the data points, not how much these points are bigger than 0 for
either variable. Hence, the severing of scatterplots is common and does not present a problem for
us.

In Figure 2 below, we see an entirely different pattern being portrayed. In this pattern, which
goes from upper left to lower right, increasing values of variable 1 (horizontal axis) are
associated (related) with lower values of variable 2 (vertical axis). This depicts a negative
relationship or pattern. In terms of the two variables presented above (Variable 1 = Math SAT
and Variable 2 = Verbal SAT), the negative pattern would indicate that as the Math SAT scores
increased (going from left to right) that in general the Verbal SAT scores decreased (went from
top to bottom). In this case, reading from left to right Figure 2 looks like we are going downhill.

Figure 2

What would you expect the data points to look like in a scatterplot in which no pattern existed?
The next figure, Figure 3, illustrates this situation. This particular scatter of data points is called a
zero relationship or pattern. It is easy to see why it is called zero, in Figure 3 the data points
approximately form a circle. In describing the data points of this figure, note that values for the
second variable do not generally increase or decrease in association with increasing values of the
first variable. In terms of our two variables, a zero pattern would indicate that as Math SAT
scores increased (going left to right) there was no corresponding increase or decrease in the
Verbal SAT scores. Said another way, as the Math SAT scores increased, some of the Verbal
SAT scores increased, some decreased, and some stayed about the same. NO PATTERN.

228
Figure 3

Now that you are getting to be an expert in scatterplots, how would you describe the pattern in
the next scatterplot (Figure 4)?

Figure 4

Is there a discernible pattern in the data points? Yes

229
Is this pattern positive (as we have talked about a positive pattern)? No

Is this pattern negative? No

Is this pattern zero? No

Figure 4 opens the door to what are called non-linear or curvilinear patterns. This will be the
only place where we discuss this particular possibility. Such scatterplots are rarely discussed
below a third or even fourth level course in statistics.

What would be a good description of the pattern in Figure 4? As values of the first variable
initially increase (from 1 to 5), the values of the second variable also increase (positive
relationship), but as we continue to increase values in the first variable (from 6 to 10) all of a
sudden values in the second variable start and continue to decrease (negative relationship). Can
this type of scatter occur in real problems? Yes. In psychology this particular scatter is called an
inverted U distribution. Here is a problem setting where it frequently appears. Let variable 1 be
perceived stress associated with taking a test and variable 2 be performance (grade in percent) on
the test. It is known by educators, that at low to moderate level of test related stress, students‟
performance actually improves with increasing stress. However, once a student is over-stressed,

In general when we look at a scatterplot, it is a relatively easy task to visually describe the
pattern of the data points, if one exists. This is not typically the case when we numerically
summarize the data. The numerical summarization associated with scatterplots and form 2
questions is called correlation.

230
Correlation
Correlation is a number that can take on values between –1 and +1, and is calculated from the
data as expressed in the scatterplot. Correlation is a numerical expression that indicates how
strongly the two variables being considered are related to one another, which can be positive or
negative. As you can probably guess, scatterplot patterns that are positive in appearance are
associated with correlations between 0 and +1, and scatterplot patterns that are negative in
appearance are associated with correlations between –1 and 0. Logically scatterplots with zero
pattern are associated with a correlation of 0 or very near zero.

Whenever you see correlation being used, a form 2 is being examined and the magnitude of the
relationship between the variables is being communicated.

Some keys to understanding correlation values:

1. a zero correlation appears as a near circle in a scatterplot

2. a correlation of +1 appears as a straight line going from the lower left hand corner of the
scatterplot to the upper right hand corner

3. a correlation of –1 appears as a straight line going from the upper left hand corner of the
scatterplot to the lower right hand corner

4. a middle value + correlation appears as an ellipse or oval going from the lower left hand
corner to the upper right hand corner. The more positive the correlation the more the oval looks
like a straight line, the closer to zero the correlation the more the oval looks like a circle.

5. a middle value - correlation appears as an ellipse or oval going from the upper left hand
corner to the lower right hand corner. The more negative the correlation the more the oval looks
like a straight line, the closer to zero the correlation the more the oval looks like a circle.

Given this introduction to the concept of correlation, how is it numerically calculated?

Due to problems in the Ecollege equation editor I have written the mean of the x variable in the
equation below as xbar and the mean of the y variable as ybar.

The equation for the correlation is

Sum[( X  X )(Y  Y )]
r
[ Sum ( X  X ) 2 ][Sum (Y  Y ) 2 ]

231
This equation contains several new terms that we have not seen before. These are

r = correlation

The formal name for “r” is the Pearson Product Moment Correlation Coefficient. There are
actually many different ways of numerically calculating the correlation for a particular set of data
(see Unit 16), but the Pearson correlation is by far the most common. In fact it is so common that
we drop the Pearson Product Moment part of the name and simply call it correlation. Thus, when
you see or hear someone talking about correlation it is this form that they are talking about. In
the next unit several other forms of correlation are presented, but in this unit we will only
concern ourselves with the Pearson form.

Symbolically "x" is used to represent one of the variables and y is used to represent the other
variable

( X  X ) = (x – its mean) = deviations for the x variable. If we square each of these deviations
and sum them up and then divide by (n-1), then we will have the sample variance for the x
variable (Unit 7).

(Y  Y ) = (y – its mean) = deviations for the y variable. If we square each of these deviations
and sum them up and then divide by (n-1), then we will have the sample variance for the y
variable.

( X  X )(Y  Y ) = the deviation in the x variable times the deviation in the y variable.

Once again we are faced with what seems to be a fairly complex equation to calculate. As with
the previous units using Chi Square a template might help us to organize our thoughts and
ourselves.

Template
X      Y       (X  X )       (Y  Y )       (X  X )2   (Y  Y ) 2       ( X  X )(Y  Y )

These 7 columns should enable us to easily calculate the correlation for any set of data. For
numerical interest only I would like to know the correlation for the following set of data. The
sample size is 5. I would strongly encourage you to check my calculations.

Values
1         2     3        4     5        Sum of the values      Mean

X     3         6     4        8      9              30                6
Y     4         6     8        7     10              35                7

232
X      Y       (X-Xbar)            (Y-Ybar) (X-Xbar)2       (Y-Ybar)2          (X-Xbar)(Y-Ybar)
3       4      (3-6) = -3          (4-7) = -3 (-3)(-3) = 9 (-3)(-3) = 9         (-3)(-3) = 9
6       6      (6-6) = 0           (6-7) = -1 (0)(0) = 0    (-1)(-1) = 1         (0)(-1) = 0
4       8      (4-6) = -2         (8-7) = 1    (-2)(-2) = 4 (1)(1) = 1           (-2)(1) = -2
8       7      (8-6) = 2           (7-7) = 0  (2)(2) = 4    (0)(0) = 0           (2)(0) = 0
9      10      (9-6) = 3           (10-7) = 3 (3)(3) = 9    (3)(3) = 9           (3)(3) = 9

Sum     30 36               0              0                  26           20               16

26 = Sum of the (x-xbar)2

20 = Sum of the (y – ybar)2

16 = Sum of the (x – xbar)(y – ybar)                           This is called the cross product term.

Once again, as a check, notice that the sum of the deviations (columns 3 and 4) add to 0 as they
should.

From these column sums it should now be possible to easily calculate the value of the
correlation.

Sum [(x – xbar)(y-ybar)]
r = -------------------------------------------------------------
Square root {[sum (x – xbar)2][sum (y – ybar)2]}

16                              16                16
= ------------------------------ = ----------------------- = ------- = .702 = correlation
Square root [(26)(20)]             Square root [520]         22.8

Note that this correlation is positive and fairly large (close to 1, the maximum value).

The last remaining piece before we do a problem is the test statistic for correlation, which is
r
t 
(1  r )
2

(n  2)

The degrees of freedom, , associated with the test statistic for correlation are “n – 2.”
Here 5 – 2 = 3

.702                      .702           .702
t3                                                          1.708
[1  (. 702 ) 2 ]                 .507           .411
(5  2)              3

233
Everything in this equation should be familiar now with the exception of “t.”

“t” is another distribution. So far in this course we have looked at the tabled distributions in Unit
21 for the Standard Normal (Z) and for Chi Square (). Here is a third. The “t” distribution is
very similar in appearance and usage to the Standard Normal table. Just like the Standard
Normal, the “t” is

1. symmetric

2. has mean = 0, centered at 0

3. can have critical values in the right tail, left tail, or both tails

To use the “t” table in Unit 21 to find the CV (critical value) we will need to use the first
column (degrees of freedom), and the third column ( = .05) if we have a one-tailed
alternative and the fourth column ( = .025) if we have a two-tailed alternative.

In the example above, the degrees of freedom () = 3.

If we had a one tailed alternative, what would be the CV? If we hold a ruler under the third row
(degrees of freedom = 3) and going over to the .05 column (the one that we use for one tailed
alternatives), the CV = 2.353

If we had a two tailed alternative, what would be the CV? If we hold a ruler under the third row
(degrees of freedom = 3) and going over to the .025 column (the one that we use for two tailed
alternatives), the CV = 3.182

If we refer back to the table in the beginning of Unit 12 we will see that Correlation is one of the
5 test situations in this class which have a specific and a general alternative hypothesis form. In
light of our discussion so far, what would these hypotheses look like?

Before getting to the alternative hypotheses, it should be noted that there is only one form of
the null hypothesis, which is the statement that the two variables are not related (this can
also be stated as the two variables are not correlated). Note that this is identical to the null
hypothesis for the independence test situation presented in Unit 14.

In general the various forms of the alternative hypothesis all express that the two variables are
related. But as we saw in the scatterplot portion of this unit, there are two basic types of
relationships (positive ones and negative ones). Thus, there are three possible alternative
hypotheses for the correlation test situation. These are

234
First form. The correlation is expected to be positive. This is a specific question since only one
direction for the pattern is specified, positive. This is also called one-tailed (one direction). In
symbolic notation this looks like

HA:  > 0       the correlation is speculated to be positive

The letter "" is the Greek letter rho that corresponds to the Roman letter r. We always write
hypotheses in terms of the Greek letters which represent our populations rather than Roman
letters which represent our samples.

Second form. The correlation is expected to be negative. This is a specific question since only
one direction for the pattern is specified, negative. This is also called one-tailed (one direction).
In symbolic notation this looks like

HA:  < 0       the correlation is speculated to be negative

Third form. The correlation is expected to be either positive or negative. This is the general
question since both directions for the pattern are specified, positive and negative. This is also
called two-tailed (two directions). In symbolic notation this looks like

HA:   0      the correlation is speculated to be different from zero (may be positive or
it may be negative)

Putting all of this information together about the hypotheses, the calculation of the correlation,
the calculation of the test statistic, and the determination of the critical value, we are ready to do
a correlation problem.

Example 1
Problem Scenario: “Is a person‟s attitude about gun control related to her/his attitude about the
death penalty?”

Note: this is the same question used in the Independence section of Unit 14. To answer this
question in this unit I randomly sampled 10 residents of the State of Wyoming and asked them
the following two questions.

Question 1: Do you believe that the State has a right to limit your ownership and use of guns?”

Possible answers were on a scale of increasing endorsement from 0 (not at all) to 10 (totally)

In Unit 14 the people could respond only with YES or No.

235
Question 2: Do you believe that certain crimes should be punishable by the death penalty?”

Possible answers were on a scale of increasing endorsement from 0 (not at all) to 10 (totally)

In Unit 14 the people could respond only with YES or No.

Here are the results of the study.

Resident Selected
1     2        3    4     5     6    7       8       9   10

Gun Control            1     5       3     2    5    7         3       5   4    5
Death Penalty          9     6       4     6    7    2         4       4   5    3

This situation is exactly the same as the Gun Control/Death Penalty problem in the previous unit
with the ONLY exception being the possible answers. Here they are on a scale from 0 to 10 and
in the previous unit they were on a YES/NO scale.

Before jumping into the 5-step testing guideline, let‟s look at our 3 preliminary questions.

1. How many variables are there? Two; the first is attitude about gun control and the second is
attitude about the death penalty (same answer as in the previous unit).

2. What is the level of measurement of these two variables? Both are measured on the same 0 to
10 scale. This is an ordinal level of measurement, but according to our measurement unit (Unit 2)
an ordinal scale with 5 or more possible values can be considered “ratio.”

3. How many samples are there? There is one sample of 10 residents (n=10). A sample size of 10
is truly inadequate to represent the population of residents of the State of Wyoming; however,
the numerical calculations for correlation (as seen earlier in this unit) are time consuming and
calculator intense, so the sample size was intentionally selected to be small in order to fully
illustrate the calculations.

What is the indicated test situation? If you have a problem in which there is only one
sample, there are two variables each measured at a ratio or “ratio” level of measurement,
and a form 2 is being asked, then the test statistic is Pearson Product Moment Correlation
or simply correlation.

Notice that correlation (this unit) and independence (Unit 14) are essentially the same test
situation. What is the difference? Simply the level of measurement for the variables. If the
level of measurement is categorical, then the test situation is independence. If the level of
measurement is ratio or "ratio," then the test situation is correlation. The situation when
the level of measurement is ordinal is left for Unit 16.

236
Given the problem scenario above, is the indicated alternative specific (one tailed) or general
(two tailed)? In the problem scenario do you see the words positive or negative? No. The
question simply asks are the variables related. Hence, the relationship could be positive or it
could be negative. Thus, the alternative is two tailed.

Step 1: H0 =0
HA:   0

Step 2:  = .05 and the test situation is correlation

Step 3: Use the template (Once again, I would strongly encourage you to check my
calculations for yourself)

Sum of the 10 X scores are 1+5+3+2+5+7+3+5+4+5 = 40

Mean of the X variable (Gun Control) = 40/10 = 4

Sum of the 10 Y scores are 9+6+4+6+7+2+4+4+5+3 = 50

Mean of the Y variable (Death Penalty) = 50/10 = 5

X      Y    (X-Xbar)      (Y-Ybar)       (X-Xbar)2      (Y-Ybar)2       (X-Xbar)(Y-Ybar)

1      9    (1-4) = -3     (9-5) = 4     (-3)(-3) = 9 (4)(4) = 16      (-3)(4) = -12
5     6    (5-4) = 1      (6-5) = 1     (1)(1) = 1    (1)(1) = 1      (1)(1) = 1
3     4    (3-4) = -1     (4-5) = -1   (-1)(-1) = 1 (-1)(-1) = 1      (-1)(-1) = 1
2     6    (2-4) = -2     (6-5) = 1     (-2)(-2) = 4 (1)(1) = 1        (-2)(1) = -2
5     7    (5-4) = 1      (7-5) = 2     (1)(1) = 1     (2)(2) = 4      (1)(2) = 2
7     2    (7-4) = 3      (2-5) = -3    (3)(3) = 9     (-3)(-3) = 9   (3)(-3) = -9
3    4     (3-4) = -1    (4-5) = -1    (-1)(-1) = 1    (-1)(-1) = 1   (-1)(-1) = 1
5    4     (5-4) = 1     (4-5) = -1    (1)(1) = 1      (-1)(-1) = 1   (1)(-1) = -1
4     5     (4-4) = 0     (5-5) = 0      (0)(0) = 0      (0)(0) = 0    (0)(0) = 0
5     3     (5-4) = 1     (3-5) = -2    (1)(1) = 1      (-2)(-2) = 4   (1)(-2) = -2

Sum 40       50      0             0             28                38            -21

Sum[( X  X )(Y  Y )]
r
[ Sum ( X  X ) 2 ][Sum (Y  Y ) 2 ]

-21                           -21                  -21
= ------------------------------ = ----------------------- = ------- = -.644
Square root [(28)(38)]            Square root [1064]         32.6

And

237
r
t 
(1  r 2 )
(n  2)

The degrees of freedom, , associated with this test statistic are “n – 2.” Here 10 – 2 = 8

 .644                    .644          .644
t8                                                              2.385
[1  ( .644 ) 2 ]                  .585            .270
(10  2)              7

Here is a close up of the numerator in the square root.

1 - (-.644)2 = 1 - (.415) = .585

Step 4: Using the “t” table in Unit 21 with  = 8 and  = .025 (note in step 2 that the  = .05,
but since we have a two-tailed alternative we need to look up the critical value for  = .025)

produces a critical value in the right hand tail (these are the tabled values) = 2.306. Since the “t”
distribution just like the Z distribution is symmetric, then our left hand tail critical value is -
2.306.

Possibility 1: test statistic < -2.306, we reject H0 and conclude that the correlation is
negative

Possibility 2: test statistic > 2.306, we reject H0 and conclude that the correlation is positive

Possibility 3: test statistic is between -2.306 and 2.306, we fail to reject H0 and conclude that
there isn‟t enough evidence to indicate that the variables are related, they seem to be
unrelated.

In this situation, we have

Test statistic (-2.385) < critical value (-2.306) which is Possibility 1, hence we reject H0, the
evidence of our study indicates that the two variables are negatively correlated.

Step 5: What does our conclusion mean? The evidence seems to indicate that a person‟s attitude
about gun control is negatively related to her/his attitude about the death penalty. What this
means is that as a person endorses a higher degree of support for the State‟s control of the
ownership and use of guns s/he will be less likely to support the death penalty. And conversely
as a person believes that the State should not have the right to control the ownership and use of
guns s/he will be more likely to support the use of the death penalty. In the interpretation of a
significant correlation, we discuss the pattern of the result in the context of its being
positive or negative. In this case negative.

238
Example 2
Problem Scenario: At the Berkeley Research Center for Happiness, a study was conducted on a
sample of 8 volunteers to determine if a person's level of happiness was positively related to the
amount of sunshine they relaxed in on a typical day.

Level of Happiness was measured on a 0 to 100 scale. 0 = no happiness at all, increasing to 100
= completely happy

Amount of Sunshine was measured by the number of minutes spent outside in the sun on a
typical day

Based on this information, which test situation is appropriate? Looking at our 3 preliminary
questions,

Question 1: how many variables are there? Two, level of happiness and amount of sunshine.

Question 2: what level of measurement are the variables? Level of happiness is probably ordinal
(anytime the word "scale" is used usually indicates an ordinal level of measurement), but with 5
or more values this can be easily considered to be "ratio." Amount of sunshine is clearly ratio.

Question 3: how many samples are there? Only one sample of 8 volunteers (n=8) is mentioned.

With one sample and two variables measured at the ratio or "ratio" level, and a form 2 question
(underlined portion of the problem scenario above), the indicated test situation is correlation.

Using the following data from the 8 volunteers to conduct the indicated test.

Volunteer

1      2      3     4      5      6     7      8

Level of Happiness         60    70     75    70     65     85     50    77

Amount of Sunshine         60    85    100 105       80    140     60     74

Prior to doing the test, I will go through the calculation of the correlation.

Mean of Happiness = (60+70+75+70+65+85+50+77)/8 = 552/8 = 69.0

Mean of Sunshine = (60+85+100+105+80+140+60+74)/8 = 704/8 = 88.0

239
The template

X        Y        (X-Xbar)        (Y-Ybar)            (X-Xbar)2      (Y-Ybar)2         (X-Xbar)(Y-Ybar)

60       60     (60-69) = -9     (60-88) = -28      (-9)(-9) = 81   (-28)(-28) = 784     (-9)(-28) = 252

70       85      (70-69) = 1     (85-88) = -3       (1)(1) = 1      (-3)(-3) = 9         (-3)(1) = -3

75       100     (75-69) = 6     (100-88) = 12      (6)(6) = 36     (12)(12) = 144       (6)(12) = 72

70       105    (70-69) = 1      (105-88) = 17      (1)(1) = 1      (17)(17) = 289       (1)(17) = 17

65       80     (65-69) = -4      (80-88) = -8      (-4)(-4) = 16    (-8)(-8) = 64       (-4)(-8) = 32

85       140     (85-69) = 16     (140-88) = 52     (16)(16) = 256 (52)(52) = 2704       (16)(52) = 832

50       60      (50-69) = -19 (60-88) = -28 (-19)(-19) = 361 (-28)(-28) = 784           (-19)(-28) = 532

77       74      (77-69) = 8      (74-88) = -14     (8)(8) = 64      (-14)(-14) = 196    (8)(-14) = -112

Sum

552 704               0                 0               816               4974               1622

Sum[( X  X )(Y  Y )]
r
[ Sum ( X  X ) 2 ][Sum (Y  Y ) 2 ]

1622             1622           1622
                                              .805
(816)(4974)          4058784        2014.6

Now for the test.

Step 1: H0:  = 0

HA:  > 0             The phrase underlined in the problem scenario indicates positive (> 0)

Step 2:  = .05 and the test situation is correlation

240
Step 3: Calculation of the test statistic

From the calculations above we saw that the correlation was .805.

r
t 
(1  r 2 )
(n  2)

The degrees of freedom, , associated with this test statistic are “n – 2.” Here 8 – 2 = 6.

.805                         .805           .805
t6                                                             3.326
[1  (. 805 ) 2 ]                     .352           .242
(8  2)                  6

Step 4: For  6 and  = .05 (since this is a one tailed alternative, then we look up the CV for
.05), the CV = 1.943

Since the TS (3.326) is greater than the CV (1.943), we reject H0

Step 5: The correlation for this situation was found to be significantly positive. What does this
mean? It would seem that as the level of happiness increases a person gets out into the sunshine
for more minutes on a typical day.

Quiz
Do people work more in an environment where they are happy? To examine this question the
following two variables were collected on 6 employees of a local business.

X – Variable 1: On a scale of 0 (not happy at all) to 10 (Extremely happy), “how happy are you
at work?”

Y – Variable 2: During an 8-hour work day, how many hours of work to do get done?

Here are the data for the 6 employees

X |    3    5        8      2    6       3
Y |    4    7        8      4    8       5

Test appropriately using our 5-step guideline.

241
Three preliminary questions.

1. How many variables are there? Two; the first is how happy the employee is and the second is
how much work the employee gets done.

2. What is the level of measurement of these two variables? Happy is measured on the same 11
point ordinal scale that we used in the first example. Hence this variable can be considered to be
measured on a “ratio” level of measurement. The second variable is directly a ratio variable.

3. How many samples are there? There is one sample of 6 employees

Is the alternative hypothesis specific (one tailed) or general (two tailed)? In the problem scenario
the following question is asked, "Do people work more in an environment where they are
happy?" The two words "work more" imply that we are only looking for one direction, not two.
Now the question is whether we are looking for a positive or a negative relationship? The
implication is that happier people will work more, thus the direction is positive.

Putting all of this together leads us to the appropriate test situation being correlation with a one
tailed alternative hypothesis indicating a positive correlation.

Step 1: H0:  = 0
HA:  > 0

Step 2:  = .05 and the test situation is correlation

Step 3: Template

Sum of the 6 X scores are 3+5+8+2+6+3 = 27
Mean of the X variable (Happy) = 27/6 = 4.5

Sum of the 6 Y scores are 4+7+8+4+8+5 = 36
Mean of the Y variable (Work) = 36/6 = 6.0

X     Y    (X-Xbar)      (Y-Ybar)     (X-Xbar)2           (Y-Ybar)2      (X-Xbar)(Y-Ybar)

3     4   (3-4.5)=-1.5   (4-6)=-2   (-1.5)(-1.5)=2.25    (-2)(-2)=4      (-1.5)(-2) = 3.0
5     7   (5-4.5)=.5     (7-6)=1    (.5)(.5)=.25          (1)(1)=1       (.5)(1) = .5
8     8   (8-4.5)=3.5    (8-6)=2    (3.5)(3.5)=12.25      (2)(2)=4       (3.5)(2) = 7.0
2     4   (2-4.5)=-2.5   (4-6)=-2   (-2.5)(-2.5)=6.25     (-2)(-2)=4     (-2.5)(-2) = 5.0
6     8   (6-4.5)=1.5    (8-6)=2    (1.5)(1.5)=2.25       (2)(2)=4       (1.5)(2) = 3.0
3     5   (3-4.5)=-1.5   (5-6)=-1   (-1.5)(-1.5)=2.25     (-1)(-1)=1     (-1.5)(-1) = 1.5

Sum 27      36      0            0             25.50              18.00           20.00

242
Sum[( X  X )(Y  Y )]
r
[ Sum ( X  X ) 2 ][Sum (Y  Y ) 2 ]

20.00                  20.00        20.00
                                                 .934
(25.50)(18.00)                459       21.42

And

r
t 
(1  r )
2

(n  2)

The degrees of freedom, , associated with this test statistic are “n – 2.” Here 6 – 2 = 4

.934                     .934           .934
t4                                                           5.218
[1  (. 934 ) 2 ]                   .128           .179
(6  2)                4

Step 4: Using the “t” table in Unit 21 with  = 4 and  = .05 (we have a one-tailed alternative)
produces a critical value in the right hand tail = 2.132.

In this situation, we have

Test statistic (5.218) > critical value (2.132) hence we reject H0, the evidence of our study
indicates that the two variables are positively correlated.

Step 5: What does our conclusion mean? The evidence seems to indicate that happier employees
also get more work done.

There is a wonderful WEB site that can assist you in developing a sense (intuition) about
associating particular scatterplot (visual) patterns with particular correlation (numerical) levels.
Currently the web site below is under repair and is not available.

http://www.stat.uiuc.edu/~stat100/java/guess/GCApplet.html

243
Unit 16: Other Forms of Correlation
In the previous unit on correlation the answers to our three preliminary questions had to be
similar to or in the form of the following in order to justify the Pearson Product Moment
correlation.

1. How many variables? Two

2. What is the level of measurement? Measured at a ratio or “ratio” level of measurement.

3. How many samples? One

What would happen if our variables were not measured at a ratio or “ratio” level?

This unit was constructed to very briefly answer this question. It is for your information
only and will not appear on any assignment of test.

Example, “Is a person‟s performance on the final in a course related to her/his grade in the
course?”

Looking at our three preliminary questions we get the following answers.

1. How many variables? Two; grade on the final and grade in the course.

2. What level of measurement? Can‟t be determined from the information provided.

3. How many samples? One sample of students.

Is the question specific or general? General, neither a positive or negative relationship is
indicated. The question however is clearly correlation (the key word in the example question is
underlined).

Here are 6 variables that I might use to measure grade on the final and in the course.

Variable 1 – grade on the final measured pass/fail (with only two values this is considered to be a
categorical variable)

Variable 2 – grade in the course measured pass/fail (categorical)

244
Variable 3 – grade on the final measured by letter grade (A, B, C, D, F; this would be considered
to be an ordinal variable)

Variable 4 – grade in the course measured by letter grade (ordinal)

Variable 5 – grade on the final measured in percent (ratio variable)

Variable 6 – grade in the course measured in percent (ratio)

Each combination of the variables in the table below answers our example correlation
question. The only difference is in the method we use to calculate the correlation. Here are some
examples.

First Variable                   Second Variable                       Method of Correlation

Variable 1 (categorical)         Variable 2 (categorical)                     Phi

Variable 3 (Ordinal)             Variable 4 (ordinal)                    Kendall‟s Tau B

Variable 5 (ratio)               Variable 6 (ratio)                    Pearson Product Moment

What can we gain from this simple illustration?

The changing of the way we measure a variable (level of measurement) does what?

1. Does it change the question? No

2. Does it change the way we calculate the statistic of interest (correlation in this example)?
Yes

For reference only here are some additional forms of calculating the correlation coefficient.
(None of the numerical methods for their calculation are provided and are unnecessary for this
discussion. This unit has been provided for your information only.)

If you have a correlation question in which the variables are (level of measurement), then you
should use the (form) of the correlation.

Both variables are categorical with only two possible outcomes …….....................………. Phi

Both variables are categorical and at least one has three or more outcomes …....... Cramer‟s V

245
Both variables are ordinal, and both variables have the same number of possible outcomes
……………………………………………………………………………..…....Kendall‟s Tau B

Both variables are ordinal, but don‟t have the same number of possible outcomes
…………………………………………………………………………..….…..Kendall‟s Tau C

Both variables are ordinal and have 10 or more possible outcomes....................…….Spearman

Both variables are ratio or “ratio” ..…………………........................…………..…….Pearson

In this table above I have not included some options, especially those which have one variable at
one particular level of measurement and the other variable at a different level of measurement.
While these correlations do exist, and are different from the ones above, they are not very
common. The alternative correlation methods presented in this unit are common.

246
Unit 17: Regression
Terms
Regression Test Situation - a problem in which there is only one sample, there are two
variables each measured at a ratio or “ratio” level of measurement, and a form 3 question is

Independent Variable - the variable used as the predictor (the variable in the X-axis in the
scatterplot; symbolically labeled as X)

Dependent Variable - the variable being predicted (the variable in the Y-axis in the scatterplot;
symbolically labeled as Y)

Regression Line - the straight line that best fits the data presented in a scatterplot

Constant (Y-intercept) - the value of the dependent variable when the independent variable = 0

Slope - how much the regression line increases in the dependent variable for a 1 unit increase in
the independent variable

- the predicted score for the dependent variable (read as Y hat)

Distribution of the Test Statistic - "t distribution" with  degrees of freedom (t )

 = degrees of freedom = sample size - 2 = n - 2

Equations
Slope (b)

Constant (a)

Regression Line

247
Test Statistic      t = slope / (standard deviation of the slope)

MSE

Variance of the Slope

Standard Deviation of the Slope = square root (variance of the slope)

In Unit 15 the scatterplot and correlation were presented. These were two very powerful methods
for graphically and numerically summarizing the data coming from two ratio variables from a
form 2 research question. Recall that the form 2 question and the form 3 question were very
similar (Basic Reseach Questions portion of the Research Section in Unit 1). Both have two
variables and examine the relationship between them. The primary difference being that in form
3 questions one of the variables is influencing the other or being used to predict the other. The
variable that is considered the influencer (predictor) is called the independent variable and the
variable being influenced (predicted) is called the dependent variable . Put another way; the
dependent variable depends upon the independent variable. This dependency is what marks form
3 questions as unique from form 2 questions.

Using two variables, height and weight, here are how a form 2 and a form 3 question might
appear.

Form 2 – Is a person‟s height related to her or his weight?

Form 3 – Can we use a person‟s height to predict her or his weight?

The graphical summarization method for the data from both a form 2 or a form 3 question is the
scatterplot. The numerical summarization method for a form 2 question is correlation (previous
unit). The numerical summarization method for a form 3 question is regression.

248
Regression
As noted above, Regression is the numerical summarization method appropriate when we have a
Form 3 question and two ratio variables.

The general form of the Form 3 question is, “how does one variable influence or predict another
variable" or "how does one variable depend upon another variable?”

The influencer variable is called the independent variable. The variable being influenced is called
the dependent variable.

Thus, the Form 3 question can be rewritten as, "how does the independent variable influence or
predict the dependent variable" or "how does the dependent variable depend upon the
independent variable?"

In a Form 2 question, either variable can be placed in the horizontal axis or the vertical axis of
the scatterplot. In the example with the Math SAT and Verbal SAT variables, we placed the
Math SAT in the horizontal axis, but it could have gone in the vertical axis equally as well.
However, in a Form 3 question, the independent variable must go in the horizontal axis and
the dependent variable must go in the vertical axis. There is no mathematical reason for doing
this, but all textbooks present regression graphically only in this orientation.

In our discussion of correlation, we saw that correlation assessed the linearity of the relationship
between two variables. This was because of key points 2 and 3. Which were

2. a correlation of +1 appears as a straight line going from the lower left hand corner of the
scatterplot to the upper right hand corner

3. a correlation of –1 appears as a straight line going from the upper left hand corner of the
scatterplot to the lower right hand corner

Correlation as we were using it was blind to curvilinear relationships (the 4th figure in Unit 15).
This will be our orientation in regression as well. Even though curvilinear regression is possible,
we will not consider it in this course. Regression goes beyond correlation to the actual finding
and fitting of a line to the data points presented in the scatterplot.

Before getting too abstract, let‟s consider an example. Before a big test most students study.

In this very simple statement, we have presented or at least implied two variables of interest.
They are performance on the test and studying.

What would a form 2 question look like with these two variables?

Is there a relationship between studying and test performance?

249
What would a form 3 question look like with these two variables?

Does the amount of studying done before a test predict the performance on the test?

Which of these two situations seems more logical? Probably the form 3 question.

The data for 30 students from last semester‟s mid-term are presented in the scatterplot below.

Figure 1

By looking at figure 1, it is easy to see that there is a positive relationship. As the amount of time
spent studying increases there is a corresponding increase in the grade on the test. This type of
description of the scatterplot sounds very much like correlation. How would the description
change if we considered it from the perspective of regression?

First off, in regression we are attempting to fit a line to the data. In the figure above, how might
we be able to place a line through the data points, such that this line would be a fair or reasonable
depiction of the relationship between the two variables?

In Figure 2 below (the same data used in Figure 1 above) three lines have been added. One in
blue, one in red, and one in green. Which line of the three fits the relationship expressed by the
data points the best? [I am sorry, but the link below may not work. If not, then the three lines are
closely approximately by the following description as seen on a clock face: blue (10:00 to 4:00),
red (8:00 to 2:00), and green (7:00 to 1:00)]

250
Figure 2

Clearly the blue does not (10:00 to 4:00). The data points exhibit a positive relationship and the
blue line is indicating a negative relationship (as read from left to right the line is sloping down).

The red line fits the data points somewhat (8:00 to 2:00). At least the line is reflecting a positive
relationship, even though it is not sloping upward (read from left to right) as steeply as the data
points.

The green line clearly fits the data points the best (7:00 to 1:00) of the three lines in the figure.
(1) It is reflecting a positive relationship like the data points. (2) It seems to be sloping upward at
a very similar rate to the data points. (3) It seems to be in the "middle" of the data points.

The statistical analysis method known as regression is designed to numerically determine the line
of best fit (green line) for any set of data points from a form 3 question.

251
The Determination and Testing of the Regression Line
Here is an example from the unit on Pearson Correlation (Unit 15) to assist in the development
of the necessary numerical pieces of the regression problem.

Problem Scenario: Can how happy a person is in her/his work place (on our scale of 0 to 10)
predict how much work they will get done during a typical day (measured in hours of work
completed)?

Notice the difference between the form 3 nature of this question (predict) and the form 2 nature
of the question used in Unit 15. We are now using one of the variables (how happy the person is)
to predict the other (how much work s/he will get done).

The three preliminary questions.

1. How many variables are there? Two; the first (the independent variable) is how happy the
person is and the second (the dependent variable) is how much work the person gets done.

2. What are the levels of measurement for these two variables? Happy is once again ordinal, but
with more than 5 possible outcomes we are going to consider it “ratio.” Work is clearly ratio.

3. How many samples? One. This answer is always one unless it is clearly indicated that multiple
samples were selected.

The direct answers to these questions in this unit are the same as in unit 15. The difference is
seen in the problem scenario with the word predict, influence, or affect rather than the word
associate, relate, or correlate. This is what leads to identifying one of the variables as the
independent variable and the other variable as the dependent variable.

Is the question in the problem scenario specific or general? The question in the scenario asks
“Can how happy a person is predict how much work will get done?” It doesn‟t indicate that
happier workers will produce more, or that happier workers will produce less. It simply asks, can
how happy a worker is predict how much work gets done? Thus the question is general.
However, and once again, the critical issue is the nature of the question. If the question simply
asks a relationship (form 2) question, then the appropriate statistic is correlation; however, if the
relationship indicated is predictive (influence, affect, etc), then the appropriate statistic is
regression.

What is the indicated test situation? If you have a problem in which there is only one
sample, there are two variables each measured at a ratio or “ratio” level of measurement,
and a form 3 question is being asked, then the test situation is regression.

252
Given the data from Unit 15

Table 1

Happy |     3   5    8    2   6     3
Work |      4   7    8    4   8     5

Here are the necessary equations for regression.

In order to calculate the prediction equation, which is

Predicted scores for work = constant + (slope)(happiness scores)

we need to calculate both the constant and the slope. The happiness scores and hours worked
(work) are given in Table 1 above.

Symbolically, the independent variable is represented by the letter "X" and the dependent
variable by the letter "Y." The symbolic regression question is "can we use X to predict Y?" and
the prediction equation is

ˆ
Y  a  bX

ˆ
Here Y (read as Y hat) represents the predicted scores for the dependent variable, a = constant,
and b = slope. The constant is also called the Y-intercept in some textbooks; the value of Y
when X = 0.

The equation for the slope is very similar to that of the Pearson correlation (notice that the
numerator, the top part, is exactly the same).

Sum [( X  X )(Y  Y )]
Slope 
Sum[( X  X ) 2 ]

And the equation for the constant is

Constant = Y  ( slope)( X )

The template we will use for the regression is slightly simpler than the one we used for the
correlation.

253
Template 1
X       Y      (X  X )        (Y  Y )       (X  X )2       ( X  X )(Y  Y )

And for the specifics of our problem (this is reproduced from Unit 15).

Sum of the 6 X scores are 3+5+8+2+6+3 = 27

Mean of the X variable (Happy) = 27/6 = 4.5

Sum of the 6 Y scores are 4+7+8+4+8+5 = 36

Mean of the Y variable (Work) = 36/6 = 6.0

Please note that Xbar is the mean of the Happy variable and Ybar is the mean of the Work
variable.

X    Y    (X-Xbar)       (Y-Ybar)         (X-Xbar)2          (X-Xbar)(Y-Ybar)

3    4    (3-4.5)=-1.5   (4-6)=-2       (-1.5)(-1.5)=2.25    (-1.5)(-2) = 3.0
5    7    (5-4.5)=.5     (7-6)=1        (.5)(.5)=.25         (.5)(1) = .5
8    8    (8-4.5)=3.5    (8-6)=2        (3.5)(3.5)=12.25     (3.5)(2) = 7.0
2    4    (2-4.5)=-2.5   (4-6)=-2       (-2.5)(-2.5)=6.25    (-2.5)(-2) = 5.0
6    8    (6-4.5)=1.5    (8-6)=2        (1.5)(1.5)=2.25       (1.5)(2) = 3.0
3    5    (3-4.5)=-1.5   (5-6)=-1       (-1.5)(-1.5)=2.25    (-1.5)(-1) = 1.5

Sum 27      36      0            0                 25.50                  20.00

Sum [(x – xbar)(y – ybar)]                 20.00
Slope = ----------------------------------- = ---------------- = .78
Sum (x – xbar)2                        25.50

And

Constant = ybar - (slope)(xbar) = 6.0 – (.78)(4.5) = 2.49

So the regression equation (prediction equation) becomes

Predicted Amount of Work Done = 2.49 + (.784) (How Happy the person is)

254
What does this regression say?

1. In general, if you can increase how happy a person is in the workplace, you should be able to
increase the amount of work they do; this is because the sign of the slop is positive.

2. Specifically, for every 1 unit increase in a person‟s happiness we should expect .78 hours (47
minutes) more work from them; this is the interpretation of the slope.

3. We can predict how much work any person will get done if we know how happy they are.
Specifically we are going to predict that a person will work the number of hours equal to .78
times their happiness score + 2.49 hours. For instance, if a person is 4 units happy on our scale,
then we would predict that this person will get 2.49 + (4) (.78) = 5.61 hours of work done.

Mini-quiz 1: if happiness = 5, then how much work will we predict that this person will get
done?

Answer: 2.49 + (5) (.78) = 6.39 hours of work

Mini-quiz 2: if happiness = 6, then how much work will we predict that this person will get
done?

Answer: 2.49 + (6) (.78) = 7.17 hours of work

Is our regression equation significant? Can we really predict how much work a person will do
from how happy they are?

The test statistic for the regression situation is

t = slope / (standard deviation of the slope)

The degrees of freedom, , are once again n – 2 (just like in correlation). Here  = 6 – 2 = 4.

The variance of the slope = MSE / [sum (x – xbar)2]                         MSE = Mean
Squared Error

The standard deviation of the slope = the square root of the variance of the slope.

And the MSE = [sum (y – yhat)2] / (n – 2)                        The definition of MSE

There is a major change in the MSE equation above from what we have seen before. In the
MSE equation, we are using yhat, not ybar. Yhat is the predicted value of the dependent
variable (work) from a specific value of the independent variable (happiness).

ˆ
Yhat ( Y ) = constant + (slope) times (independent variable)

255
This will require another template.

Template 2
x      y   yhat      (y – yhat)     (y – yhat)2

Let‟s now try to put all of these pieces together, starting with this new template.

x      y   yhat                             (y – yhat)          (y – yhat)2

3      4   2.49 + (.78)(3) = 4.83           (4 – 4.83) = -.83   (-.83)(-.83) = .689
5      7   2.49 + (.78)(5) = 6.39           (7 – 6.39) = .61     (.61)(.61) = .372
8      8   2.49 + (.78)(8) = 8.73           (8 – 8.73) = -.73    (-.73)(-.73) = .533
2      4   2.49 + (.78)(2) = 4.05           (4 – 4.05) = -.05    (-.05)(-.05) = .003
6      8   2.49 + (.78)(6) = 7.17           (8 – 7.17) = .83      (.83)(.83) = .689
3      5   2.49 + (.78)(3) = 4.83           (5 – 4.83) = .17      (.17)(.17) = .029

Sum                                                      0                  2.315

Now MSE = [sum (y – yhat)2] / (n – 2) = 2.315 / (6 – 2) = 2.315 / 4 = .579

And the variance of the slope = MSE / [sum (x – xbar)2] = .579 / (25.50) = .023

Thus, the standard deviation of the slope is the square root of the variance of the slope or the
square root of (.023) = .151

And finally

t = slope / (standard deviation of the slope)

t4 = .78 / (.151) = 5.166

Step 1: In the hypothesis for regression, only the slope is involved. The alternative
hypothesis, just like in correlation, can be one-tailed or two tailed. We could have a situation in
which we are predicting that the independent variable is positively predictive of the dependent
variable (slope greater than 0); in which we are predicting that the independent variable is
negatively predictive of the dependent variable (slope less than 0); or in which we are just trying
to predict the dependent variable from the independent variable is any fashion (unequal to 0). In
most introductory textbooks the slope is identified by the symbol “b” and the constant with the
symbol “a.” So the regression equation in general looks like

256
ˆ
Y  a  bX

If the independent variable X is not predictive (slope = b = 0), then the regression equation
reduces to

ˆ
Y a

This equation can be interpreted as saying that we will predict the value “a” for everybody no
matter what their score is on the independent variable. If the independent variable is needed to be
able to make the prediction, then the slope is necessary to adjust the scores on the dependent
variable appropriately in relation to how that person scored on the independent variable.
Remembering that we use the Roman letters for our sample, in this case “b” for the slope, and
that we use the Greek letters for our populations and hypotheses, we will write our hypotheses
for the slope in terms of  (beta, the Greek b).

In our problem scenario above we had a general question and need a two-tailed alternative
hypothesis.

H0:  = 0
HA:   0

Step 2:  = .05 and the test situation is regression

Step 3: The calculation of the test statistic. Done above, this requires the calculation of the
constant, the slope, the predicted scores on the dependent variable (yhat), and the values for both
templates. Thus, there are a lot of calculations that need to be made in order to get to the value of
the test statistic.

t4 = 5.166

Step 4: The distribution of the test statistic in regression is the same as in correlation. The “t”
distribution and it is used in exactly the same manner. Using the “t” table in Unit 21 with  = 4
and  = .025 (we have a two-tailed alternative) produces a critical value in the right hand tail
(these are the tabled values) = 2.776. Since the “t” distribution just like the Z distribution is
symmetric, then our left hand tail critical value is -2.776.

Possibility 1: test statistic < -2.776, we reject H0 and conclude that the slope is negative

Possibility 2: test statistic > 2.776, we reject H0 and conclude that the slope is positive

Possibility 3: test statistic is between -2.776 and 2.776, we fail to reject H0 and conclude that
there isn‟t enough evidence to indicate that the independent variable is predictive of the
dependent variable.

257
In this situation, we have

Test statistic (5.166) > critical value (2.776) which is possibility 2, hence we reject H0. The
evidence of our study indicates that the independent variable is positively predictive of the
dependent variable.

Step 5: What does this mean? The evidence from our study indicates that it is possible to predict
how much a person is going to work if you know how happy they are in the workplace.
Specifically we know that for every unit increase in happiness on our scale, a person will work
.78 hours (47 minutes) more per day.

Another Example
Problem Scenario: I believe that older people have greater common sense. In order to investigate
this belief I collected data (responses to the following 2 questions) on 5 people.

Question 1: How old are you (Age)? This is measured on the scale

1 = 20 and under, 2 = 21 to 25, 3 = 26 to 30, 4 = 31 to 35, and 5 = 35 and over

Question 2: Score on a 10 point common sense test (Common sense). This test is measured such
that

0 = no common sense, ....., 10 = high level of common sense)

The question that I am going to investigate is, "Is age a positive predictor of common sense?"

Here are the data from the 5 people in our sample.

Person              1        2     3     4      5

Age                 2        3    1      5     4

Common Sense        5        7    3      9      6

258
The 3 preliminary questions.

1. How many variables are there? Two; age and common sense.

2. What are the levels of measurement of these 2 variables? Age is measured on a 5-point ordinal
scale, but since there are 5 points, then age can be considered to be measured at a "ratio" level.
Common sense is measured on a 10-point ordinal scale, but since there are 10 points, then
common sense can be considered to be measured at a "ratio" level.

3. How many samples are there? There is only one sample of 5 people (n=5).

Given the nature of the question of interest, do we have a form 2 or a form 3 question? Since the
word predict is used in the question of interest, then we have a form 3 question. Given that we
have an independent variable (measured at a "ratio" level) and a dependent variable (measured at
a "ratio" level), and only one sample, then the test situation is regression.

Lastly, is the question of interest specified in the problem scenario specific or general? The
question directly asks if age is a positive predictor, hence this is specific (one tailed). This is also
seen in the scenario in the phrase, "I believe that older people have greater common sense." This
directly implies that the older the person, the more common sense he or she would be expected to
have.

Prior to conducting the test, let's begin by calculating the variable means, the constant, the slope,
the prediction equation, and the values from our two templates. For simplicity and brevity in the
following tables, let X = Age and Y = Common Sense.

mean of X = Xbar = (2 + 3 + 1 + 5 + 4)/5 = 15/5 = 3.0

mean of Y = Ybar = (5 + 7 + 3 + 9 + 6)/5 = 30/5 = 6.0

Template 1

X    Y     (X - Xbar)      (Y-Ybar)      (X-Xbar)2      (X-Xbar)(Y-Ybar)

2     5     (2-3) = -1    (5-6) = -1    (-1)(-1) = 1    (-1)(-1) = 1

3     7     (3-3) = 0     (7-6) = 1      (0)(0) = 0     (0)(1) = 0

1     3     (1-3) = -2    (3-6) = -3    (-2)(-2) = 4    (-2)(-3) = 6

5     9     (5-3) = 2     (9-6) = 3      (2)(2) = 4     (2)(3) = 6

4     6     (4-3) = 1     (6-6) = 0      (1)(1) = 1     (1)(0) = 0

Sum         0             0             10              13

259
Slope = {Sum [(X-Xbar)(Y-Ybar)]}/{Sum (X-Xbar)2}

= 13 / 10 = 1.30 = b

Constant = Ybar - (slope)(Xbar) = 6.0 - (1.3)(3.0) = 6.0 - 3.9 = 2.10

Prediction equation

Predicted Common Sense = constant + (slope)(Age)

Yhat = 2.10 + 1.30 (Age)

Predicted values of Common Sense for each person in the study.

Person 1     Yhat = 2.10 + 1.30 (Age) = 2.10 + 1.30 (2) = 2.10 + 2.60 = 4.70

Person 2     Yhat = 2.10 + 1.30 (Age) = 2.10 + 1.30 (3) = 2.10 + 3.90 = 6.00

Person 3     Yhat = 2.10 + 1.30 (Age) = 2.10 + 1.30 (1) = 2.10 + 1.30 = 3.40

Person 4     Yhat = 2.10 + 1.30 (Age) = 2.10 + 1.30 (5) = 2.10 + 6.50 = 8.60

Person 5     Yhat = 2.10 + 1.30 (Age) = 2.10 + 1.30 (4) = 2.10 + 5.20 = 7.30

Template 2

X    Y     Yhat   (Y - Yhat)      (Y - Yhat)2

2    5      4.7   (5-4.7) = .3    (.3)(.3) = .09

3    7      6.0   (7-6.0) = 1     (1)(1) = 1.00

1    3      3.4   (3-3.4) = -.4   (-.4)(-.4) = .16

5    9      8.6   (9-8.6) = .4    (.4)(.4) = .16

4    6      7.3   (6-7.3) = -1.3 (-1.3)(-1.3) = 1.69

Sum         0               3.10

n=5

260
Mean Squared Error (from Template 2)

MSE = (Sum (Y - Yhat)2)/(n-2) = 3.10 / 3 = 1.03

Variance of the Slope (from Template 1)

MSE / {Sum (X - Xbar)2} = 1.03 / 10.0 = .103

Standard Error of the Slope

Square root (Variance of the Slope) = square root of (.103) = .321

Now we are ready to conduct the appropriate test

Step 1: H0: =0

HA:  > 0     (positively predictive)

Step 2:  = .05 and the test situation is regression

Step 3: Calculate the value of the test statistic

test statistic = t = slope / (standard deviation of the slope)

t3 = 1.30 / .321 = 4.050

Step 4: Compare to the critical value

Critical value for  = 3 and  = .05 (one tailed alternative) is 2.353 from the "t" table in Unit 21.

TS (4.050) > CV (2.353), thus we reject H0

Step 5: We have determined that age can positively and significantly predict common sense. In
general we know that as people get older they tend to have a greater degree of common sense.
Specifically we know that for every 1 unit increase in our age scale, a person gets 1.30 points
higher on our common sense scale.

NOTICE that the conclusion is NOT for every year increase. Why? Because we did not measure
the independent variable (age) in this manner. Be sure to make your interpretation of the slope in
the measurement units for the independent variable and the dependent variable.

261
Quick Guesses for the Constant and Slope in Scatterplots

I would note that you can make a quick guess about the value of the constant and the slope
for any scatterplot. These guesses should only be used with scatterplot data, such as those
presented in the earlier portion of this unit. To guess the constant, simply draw what you
think to be a best fitting line to the data. Where the line crosses the Y-axis (the value of Y
when X = 0), is the estimate for the constant.

For figure 1, if you draw a line over these data points, you should see the line cross the Y-
axis at about 45. This would mean that a very rough guess of the slope would be about 45.

A quick way of guessing the slope is with the following formula. Quick guess for the slope is
(Maximum Y value - Minimum Y value) divided by (Maximum X value - Minimum X
value). For Figure 1 this would be, quick guess for the slope = (100 - 45) / (10 - 1) = 55 / 9 =
6.1

PLEASE note that these are only quick guesses to be used with scatterplots.

Quiz
It has been hypothesized that the consumption of coffee in the evening before going to bed can
participate in our study and were asked two questions.

Question 1: How much coffee (measured in ounces) did you drink last night before going to bed?

Question 2: How many hours of sleep did you get last night?

Appropriately test the indicated hypothesis.

Ounces of Coffee | 0         0       4   6        16     20     24
Hours of Sleep   | 6.50     7.25   7.00 6.50     6.00   6.50   5.75

262
Three preliminary questions.

1. How many variables? Two; the first is how many ounces of coffee did the subject drink and
the second is amount of sleep.

2. What is the level of measurement of these two variables? Both are ratio.

3. How many samples? One sample of 7 graduate students.

Is the question general or specific? The question identifies a situation in which there is an
independent (ounces of coffee consumed) and a dependent variable (number of hours of sleep).
This is seen in the word “affect.” In the form 3 question, affect, influence, and predict are used
interchangeably. The word “adversely” in the claim indicates that we have a specific concern
which is that the independent variable will negatively predict (adversely affect) the dependent
variable. In essence this means that the more coffee you drink the less you sleep.

Step 1: H0:  = 0
HA:  < 0

Step 2:  = .05 and the test situation is regression

Step 3: The long and winding road of the regression calculations.

Template 1

Sum of the 7 X scores are 0+0+4+6+16+20+24 = 70

Mean of the X variable (Ounces of Coffee) = 70/7 = 10.0

Sum of the 7 Y scores are 6.50+7.25+7.00+6.50+6.00+6.50+5.75 = 45.5

Mean of the Y variable (Hours of Sleep) = 45.5/7 = 6.5

X     Y     (X-Xbar)     (Y-Ybar)         (X-Xbar)2          (X-Xbar)(Y-Ybar)

0   6.50   (0-10)=-10   (6.5-6.5)=0      (-10)(-10) = 100   (-10)(0) = 0
0   7.25   (0-10)=-10   (7.25-6.5)=.75 (-10)(-10) = 100     (-10)(.75) = -7.5
4   7.00   (4-10)=-6    (7.0-6.5)=.5     (-6)(-6) = 36      (-6)(.5) = -3.0
6   6.50   (6-10)=-4    (6.5-6.5)=0      (-4)(-4) = 16      (-4)(0) = 0
16   6.00   (16-10)=6     (6.0-6.5)=-.5 (6)(6) = 36          (6)(-.5) = -3.0
20   6.50   (20-10)=10    (6.5-6.5)=0      (10)(10) = 100    (10)(0) = 0
24   5.75   (24-10)=14    (5.75-6.5)=-.75 (14)(14) = 196     (14)(-.75) = -10.5

Sum 70 45.5          0              0                  584              -24.0

263
Sum [( X  X )(Y  Y )]  24 .00
Slope                                       .041
Sum [( X  X ) 2 ]      584

Constant = ybar - (slope)(xbar) = 6.5 – (-.041)(10) = 6.5 + .41 = 6.91

So the regression equation becomes

Predicted Amount of Hours of Sleep = 6.91 + (-.041) (Number of ounces of coffee consumed)

Template 2

x     y      yhat                         (y – yhat)            (y – yhat)2

0   6.50     6.91 – (.041)(0) = 6.91      (6.50-6.91) = -.41   (-.41)(-.41) = .168
0   7.25     6.91 – (.041)(0) = 6.91      (7.25-6.91) = .34    (.34)(.34) = .116
4   7.00     6.91 – (.041)(4) = 6.75      (7.00-6.75) = .25    (.25)(.25) = .063
6   6.50     6.91 – (.041)(6) = 6.66      (6.50-6.66) = -.16    (-.16)(-.16) = .026
16   6.00     6.91 – (.041)(16) = 6.25     (6.00-6.25) = -.25    (-.25)(-.25) = .063
20   6.50     6.91 – (.041)(20) = 6.09     (6.50-6.09) = .41     (.41)(.41) = .168
24   5.75     6.91 – (.041)(24) = 5.93     (5.75-5.93) = -.18    (-.18)(-.18) = .032

Sum                                                    0                    .636

Now MSE = [sum (y – yhat)2] / (n – 2) = .636 / (7 – 2) = .636 / 5 = .127

And the variance of the slope = MSE / [sum (x – xbar)2] = .127 / (584) = .0002

The standard deviation of the slope = square root of (.0002) = .015

And finally the test statistic is

t = slope / (standard deviation of the slope)

t5 = -.041 / .015 = -2.733

Step 4: The critical value for the “t” distribution is obtained from “t” table in Unit 21 for  = 5
and  = .05 (one-tailed alternative). The critical value is -2.015 (the table provides the critical
value for the right hand tail and we want the left hand tail).

Since the test statistic (-2.733) < the critical value (-2.015), thus we reject H0 . The ounces of
coffee you drink before going to bed is negatively predictive of the hours of sleep a person gets.

264
Step 5: What does this mean? The evidence from our study indicates that the more coffee you
drink before going to bed the fewer hours of sleep you will get. Specifically for each additional
ounce of coffee you drink you can expect to get .041 hours (2.46 minutes) less sleep. Of course
there are many other possible factors that can affect how much a person sleeps, but they are
irrelevant in terms of this example although these other influences would be extremely important
in a real research study.

Causality versus Dependency
At this point of the discussion, it is easy to start thinking about dependency as meaning causality.
This is a very common misconception that many professionals share along with beginners.

What is causality? Causality is the situation where what happens in one variable is caused by
another variable. When a doctor wants to examine your reflexes, one of the tests the doctor might
conduct is the knee jerk reflex. To do this, the doctor uses a small hammer to strike right below
your kneecap and in so doing your lower leg kicks forward. Does the strike of the hammer cause

In this situation, the movement of your lower leg does depend upon the strike of the hammer, but
it is not caused by the strike. To see this, let‟s look at this example more closely. What is actually
going on? The strike of the hammer is stimulating a nerve in your leg. In response to this
stimulation, the nerve sends the appropriate message to the brain, the brain responds by sending
a message back to the lower leg, and the muscles in the lower leg move. What caused the lower
leg to move? The stimulation of the nerve. Can this nerve be stimulated by other things than the
hammer? Yes. Would the lower leg move in response? Yes. Hence, the lower leg can move in
response to other things than the hammer, therefore, it is not the strike of the hammer which
causes the lower leg to move, but the stimulation of the nerve. Thus, the stimulation of the nerve
causes the lower leg to move, and in this experiment, the movement of the lower leg is
dependent upon the strike of the hammer.

It might seem at this point that we are being overly picky, but the causality versus dependency
issue is very important, and as you probably are willing to admit, at least somewhat confusing.

Before a big test most students study.

Once again in this very simple statement, we have presented or at least implied two variables of
interest. They are performance on a test and studying. The regression question looked like,

Does the amount of studying done before a test influence the performance on the test?

265
From these two variables there is at the very least an implied dependency due to time. Since the
studying can only occur prior to taking the test, then it is only possible for the score on the test to
depend upon the amount of studying. The reverse, the amount of studying depending upon the
test, is not logical or feasible. If this is true, can these two variables be used in a form 2 question
or must they by their very nature (one occurring before the other) create a form 3 question? The
answer to this question depends upon a variety of assumptions that we are consciously and

Assumption 1: The studying being done is effective. We would call this validity (Unit 2). Are
we studying the material we should? If we are studying the wrong material, should our studying
have any impact on test performance?

Assumption 2: The amount of studying being done does not relate to any confounding
characteristic of the student. What does this mean? A confounding characteristic is something
that may influence the dependent variable that has nothing to do with our independent variable.
In this context we are looking for something inherent in the students that might influence test
performance, such as IQ or the basic ability of the student. For instance, what would happen if
only the good students studied? If we found a pattern in the relationship between studying and
performance, would it be the result of the amount of studying or the quality of the student or
some combination of the two?

Assumption 3: Test performance does not depend upon anything other than studying. Here the
confounding variable is external and not a characteristic of the student, such as luck.

Based upon our beliefs regarding these assumptions, our two variables might be only related,
might express a dependency, or might even create causality.

If we can not believe in assumption 2, then there is no reason to expect a dependency to occur.
We could very well plot the amount of studying and test performance in a scatterplot and find a
positive relationship, but would it reflect increased performance dependent upon increased
studying? No. Only that those who studied more received higher grades. If we are unwilling or
unable to make assumptions like number 2, then our problems typically degenerate into form 2
questions.

To get to a form 3 question, we are going to have to make assumption 2 and assumption 1. If we
make assumption 2 and not assumption 1, what might you expect to happen? If we aren‟t
studying the correct material, then what impact will this have on the test? Nothing. Therefore, we
would expect no dependency to occur. In fact, if everyone is studying the wrong material (even
the better students), then we would not even expect a relationship to occur. It is assumption 1,
which creates the dependency we are looking for.

However, if we make assumption 1, have we created a causal situation? Causality means that we
have found THE determinant of the dependent variable. We have found what makes it happen. In
this problem, what would you expect to be the predictors of test performance? Probably more
than any other variable, would be the knowledge of the student. Others would be the experience
of the student, the general intelligence of the student, the maturity of the student, the ability to

266
deal with stress in the test situation, and studying. Is studying the ONLY possible predictor of
test performance? No. Thus, it can not be the cause of the performance on the test. The test
performance does depend to some degree on the amount of time spent studying, but it also
depends upon a number of other things. Thus, to get to the causality situation, we need to make
an assumption like number 3.

So what is the bottom line of this discussion?

If we have two ratio variables, then

1. We have one method of graphically displaying their results. The scatterplot.

2. We have two different research questions. Form 2 and Form 3.

3. Form 3 questions can be conceptualized as dependency or causality, depending upon the
assumptions we are willing to make.

4. The numerical summarization appropriate for Form 2 questions is correlation. (Units 15 and
16)

5. The numerical summarization appropriate for Form 3 questions, for either conceptualization,
is regression. (This unit)

6. Since causal conceptualizations are so difficult to justify, we will not deal with them any
further in this course.

There is a very good WEB site at
regression and guessing regression lines on an intuitive level.

267
Unit 18: Two Independent Samples T-Test

Terms
Two Independent Samples T-Test Situation - a problem in which we have two independent
samples, one variable that is measured at a ratio or “ratio” level of measurement, and a question
that asks us to compare one of the samples to the other

Independent Samples - samples are said to be independent when there is no relationship
between the members of one sample and the members of the other sample (typically this simply
means the members of sample 1 are completely different from the members of sample 2)

Distribution of the Test Statistic - "t distribution" with  degrees of freedom (t )

 = degrees of freedom = (sample size of sample 1 - 1) + (sample size of sample 2 - 1)

= (n1 – 1) + (n2 – 1) = n1 + n2 – 2

Equations
Test Statistic

x1  x 2
t 
1 1
Sp        
n1 n2

Pooled Variance

(n1 1) s12  (n2 1) s 2
2
Sp 
2

n1  n2  2

Pooled Standard Deviation - the square root of the Pooled Variance = S P           2
Sp

In the previous three units we have considered statistics that examine the relationship between
variables. In this unit and the next two we will be considering problems that examine the
differences between samples.

268
Problem Scenario -               Over the past several years there has been discussion and
concern about the health benefits of getting the flu shot prior to the flu season. In order to
examine this problem a sample of 5 people who received the flu shot in 2003 were compared
to second sample of 4 people who received a placebo injection. None of the 9 subjects knew if
they had received the flu serum or the placebo (single blind experiment) and all 9 of them
contracted the flu during the 2003-2004 flu season. [Of course in a real study we would have
collected information on many more people, but these greatly reduced samples sizes are intended
for illustration and computational ease.] The question of interest is, “does the flu shot reduce the
severity of flu symptoms for those who get the flu?” Severity of flu symptoms was self reported
by the subject on the following 11-point scale (0=no symptoms to 10=extreme symptoms). Here
are the results of our study.

Severity scores

Flu Serum      3   4    5    5   4

Placebo        5   6    8   7

Before we get to the method of analysis (test situation) and the hypotheses, let‟s look at our 3
preliminary questions.

1. How many variables are there? One; severity of flu symptoms.

2. What is the level of measurement of this severity variable? It is an 11-point ordinal scale, but
we can treat it as “ratio” from the sense that it has 5 or more possible outcome values.

3. How many samples are there? Two; one sample of 5 subjects who received the flu serum and
another sample of 4 subjects who received the placebo.

Given the problem scenario above, is the indicated question specific or general? Before
answering this question I will take a short digression to address how many different alternative
hypothesis questions can be asked in this problem situation.

Digression – What does independence mean?
In the Flu versus Placebo problem scenario we could have asked 3 different questions. First, does
the flu shot (serum) increase the severity of the flu symptoms when a person contracts the flu?
Second, does the flu shot (serum) decrease the severity of the flu symptoms when a person
contracts the flu? Both of these questions would be specific. Third, does the flu shot (serum)
increase or decrease the severity of the flu symptoms when a person contracts the flu? This last
question would be general; we are looking for any effect. In the problem scenario specified at the
beginning of this unit, the question is specific. Like the second option above, it asks "does the flu
shot reduce the severity of flu symptoms for those who get the flu?"

269
Although it is not immediately apparent in this scenario, it is important to note two additional
points that are implied.

The first is that a comparison is being implied. How would we know if the flu shot reduces the
severity of the flu symptoms? We answer this question by comparing the severity of symptoms
of the members of the sample getting the flu shot TO the severity of symptoms of the members
of the sample getting the placebo. The word "TO" indicates that a comparison is being made
between the conditions mentioned immediately before and immediately after the word TO.
Recall from Unit 5 that we use the mean to be the most representative value for an entire sample.
Thus, we can simplify the comparison statement above to the following: "we will compare the
mean of the flu shot sample to the mean of the placebo sample." This first implied point tells us
something very important about comparisons; that the key statistic in nearly all comparison
questions is the mean. I would also note that the statistic of maximum interpretational value in
any comparison problem is typically the mean. Lastly, if you see a comparison question, then
nearly 100% of the time, and universally true in this course, the comparison is between samples.
Thus, in comparison situations you should expect to see 2 or more samples presented in the
problem scenario. The only comparison situation that we have had to this point in the course was
Homogeneity presented in Unit 14.

The second implied point is that the two samples are independent of one another. Quite simply
the two samples can only be independent of one another or dependent upon one another. This
second condition (dependent) is called matched or paired samples and is presented in the next
Unit, Unit 19. What does independent mean? It basically means that the people in the first
sample must be completely different than the people in the second sample. In the context of our
flu problem scenario this means that everyone in the study received EITHER the flu shot OR the
placebo, and no one received both the flu shot and the placebo. Thus, if 10 people received the
flu shot and 8 people received the placebo, then we have a total of 18 people in our study; since
the 8 people receiving the placebo are in addition (different people) to the 10 receiving the flu
shot.

End of the Digression

When we have two independent samples, an outcome variable (severity) that is measured at
a ratio or “ratio” level of measurement, and a question that asks us to compare one of the
samples to the other, then the test situation is the two independent samples t-test.

The test statistic for the two independent samples t-test is

x1  x 2
t 
1 1
Sp       
n1 n2

270
This equation is mostly a combination of terms that we have already seen with the exception of
Sp.

Sp stands for the pooled standard deviation and it is the square root of the pooled variance. The
equation for the pooled variance is

(n1 1) s12  (n2 1) s 2
2
S 
2

n1  n2  2
p

In this equation we are performing a calculation that is called the weighted average of the
variances of the samples. In the average (mean) we add all of the values together and divide by
the total number of things that we have added together. In the weighted average we add some
amount (weight) of the first thing to some amount (weight) of the second thing, etc. and then
divide by the total amount of the weight. In the pooled variance equation above, we are adding
only two things together, the variance of the first sample and the variance of the second sample.
The weight associated with the first variance turns out to be its degrees of freedom (n1 – 1),
which is the amount that the sum of squared deviations is divided by in the equation of the
sample variance (deviation unit). The weight associated with the second variance turns out to be
its degrees of freedom (n2 – 1). Once these two weighted amounts are added together we then
divide by the total amount of the weight, which is (n1 – 1) + (n2 – 1) = n1 + n2 – 2 =  (for this
two independent samples t-test). While the equation for the test statistic in the two independent
samples t-test situation might at first seem complicated, it really is just a combination of means,
variances, and sample sizes. Lastly, the distribution for the test statistic is “t;” which is the same
distribution that we have been using for the last few units and once again can be found in Unit
21.

We are now ready to put these pieces together and test the above problem using our 5-step
guideline.

271
Independent Samples Statistical Test
There are three possible sets of hypotheses for the two independent samples t-test. Recall that we
write hypotheses in terms of the population parameters (Greek letters) rather than in terms of the
sample statistics (Roman letters).

Possibility 1: If we let sample 1 be the flu sample and sample 2 be the placebo sample, then the
first specific question above (flu shot (serum) increases flu symptoms) can be written as follows

H0: 1 = 2
HA: 1 > 2

Note that these two hypotheses can be changed slightly in appearance by subtracting the second
mean from the first mean on both sides of the equation.

H0: 1 - 2 = 2 - 2
HA: 1 - 2 >2 - 2

and this equals

H0: 1 - 2 = 
HA: 1 - 2 > 

Possibility 2: If we let sample 1 be the flu sample and sample 2 be the placebo sample, then the
second specific question above (flu shot (serum) decreases flu symptoms) can be written as
follows

H0: 1 = 2
HA: 1 < 2

and these can similarly be re-written as

H0: 1 - 2 = 
HA: 1 - 2 < 

Possibility 3: If we let sample 1 be the flu sample and sample 2 be the placebo sample, then the
third question above (flu shot (serum) increases or decreases flu symptoms) can be written as
follows

H0: 1 = 2
HA: 1  2

272
and these can be re-written as

H0: 1 - 2 = 
HA: 1 - 2  

For our problem we have the second situation, thus our hypotheses for this problem are

Step 1: H0: flu = placebo
HA: flu < placebo

Or

H0: flu - placebo = 0
HA: flu - placebo < 0

Step 2:  = .05 and the test situation is the two independent samples t-test

Step 3: The key to simplifying the calculation of the test statistic is to break it into its simpler
pieces which are the calculation of the means and variances. Although I didn‟t use a template
back in the previous units where the mean and variance were introduced, let‟s use one here.

Template         – Calculation of the test statistic

1st - calculate the mean of both samples
2nd - calculate the variance of both samples
3rd - use the variance of both samples to calculate the pooled variance (Sp2 )
4th - take the square root of the pooled variance to obtain the pooled standard deviation
5th - calculate the square root of (1/n1 + 1/n2) --- this is probably the hardest of all the
calculations
6th - use the means of both samples to calculate the mean difference
7th - calculate the test statistic by dividing the 6th by (4th times 5th)

1st - Mean of the flu sample = (3+4+5+5+4)/5 = 21 / 5 = 4.2               here n1 = 5

1st - Mean of the placebo sample = (5+6+8+7)/4 = 26 / 4 = 6.5             here n2 = 4

273
Flu Sample                                       Placebo Sample

value (value – mean) (value – mean)2             value (value – mean) (value – mean)2

3      (3–4.2) = -1.2   (-1.2)(-1.2) = 1.44      5     (5-6.5 ) = -1.5   (-1.5)(-1.5)=2.25
4      (4-4.2) = -.2    (-.2)(-.2) = .04         6     (6-6.5) = -.5     (-.5)(-.5) = .25
5      (5-4.2) = .8     (.8)(.8) = .64           8     (8-6.5) = 1.5     (1.5)(1.5) = 2.25
5      (5-4.2) = .8     (.8)(.8) = .64           7     (7-6.5) = .5      (.5)(.5) = .25
4      (4-4.2) = -.2    (-.2)(-.2) = .04

Sum      21         0                2.80                 26        0                5.00

2nd - The variance for the flu sample is     (2.80) / (5-1) = 2.80 / 4 = .70

2nd - The variance for the placebo sample is      (5.00) / (4-1) = 5.00 / 3 = 1.67

3rd - The pooled variance, Sp2, is

(5 – 1) (.70) + (4 – 1) (1.67)
Sp2 = ------------------------------------
5 + 4 - 2

2.80 + 5.00
= -----------------
7

7.80
= ---------- = 1.114
7

and

4th - The pooled standard deviation is

S P  1.114  1.055

and

5th - square root of (1/n1 + 1/n2) =

1   1           1 1
               .2  .25  .45  .671
n1 n2           5 4

274
and

6th - mean difference is mean of the flu shot sample - mean of the placebo sample, which is

4.2 - 6.5 = - 2.3

and finally

7th - The test statistic is   6th / (4th times 5th)

 2.30          2.30
t7                               3.25
(1.055 )(. 671 )    .708

Step 4: As with the previous units using the “t” distribution, the arrow in the alternative
hypothesis points the direction for the critical value. In this case it is to the left. If we look up the
right hand tail critical value in the “t” table from Unit 21, we find that the identified critical value
for  = .05 (since the alternative hypothesis is one tailed) and  = n1 + n2 - 2 = 5 + 4 - 2 = 7 is
1.895. Using the fact that the “t” is symmetric with its center at zero produces the left hand tail
critical value of -1.895.

Two conclusions are possible in our situation.

First, if the TS < CV, then we reject H0

Second, if the TS > CV, then we fail to reject H0

From the information we have for this problem scenario, we have

TS (-3.25) < CV (-1.895), thus we reject H0

Step 5: What does it mean to reject H0? There is enough evidence from our study to indicate that
those people who contracted the flu and had the flu shot (mean = 4.2) had less severe symptoms
than those who received the placebo (mean = 6.5).

275
Another Example
Problem Scenario: Does it take longer to graduate with a bachelor's degree in engineering or
business? In order to answer this question, we conducted a study in which we took a sample of 4
each of these 10 students, how long it took for them to graduate in years. For ease of calculation
each semester was scored as 1/2 year. Thus a person who graduated in 11 semesters would be
scored as 5.5 (5 1/2 years). Here are the data:

Students                              Mean       Variance

Engineering Majors     6.5       6.0   4.5   7.0                   6.00       1.167

Business majors            5.5   5.0   4.5   4.5   5.0   4.0       4.75        .275

Our 3 preliminary questions.

1. How many variables are there? One, years to complete the bachelor's degree.

2. What is the level of measurement of this variable? Ratio.

3. How many samples are there? Two are clearly specified; one sample of engineering majors
and one sample of business majors.

The test statistic is the two independent samples t-test.

What is the alternative hypothesis, specific (one tailed) or general (two tailed)? What does the
question underlined in the problem scenario above indicate? Does the question ask if engineering
longer than engineering majors? No. Thus, the question must be general in nature (two tailed).

Here is a very important note. With the data provided in the problem scenario we clearly
see that the mean for the engineering majors is larger (longer time to graduation) than for
the business majors. If this is what we see in the data, then can we legitimately claim that
the alternative hypothesis must be specific (one tailed), such that the alternative is, "mean
for engineering majors is greater than for business majors." The answer is no. Why?
Hypotheses are theoretical in nature and come from the question presented in the problem
scenario. In contrast, conclusions are applied and come from the data. Thus, we can
NEVER use data (means) in the determination of hypotheses. And we must ALWAYS use
data (means) in the determination of the conclusion.

Therefore, in this problem scenario, even though we can see that the mean for engineering
majors is larger than the mean for the business majors, we must use a two tailed alternative based
on the question (underlined) in the problem scenario.

Now for the test itself.

276
Step 1: H0: engineering = business

Step 2:  = .05 and the test situation is the 2 independent samples t-test

Step 3: Determination of the test statistic, use the template

Let Sample 1 = Engineering Majors (n1 = 4)

Sample 2 = Business Majors (n2 = 6)

 = n1 + n2 - 2 = 4 + 6 - 2 = 8

The 1st and 2nd calculations are given in the problem scenario.

3rd - Pooled Variance = [(n1-1)(s12) + (n2-1)(s22)]/(n1 + n2 - 2)

= [(3)(1.167) + (5)(.275)]/(4 + 6 - 2)

= [3.50 + 1.375]/(8) = 4.875 / 8 = .609

4th - Pooled Standard Deviation = square root of the pooled variance =        .609  .781

1   1         1 1
5th - square root of (1/n1 + 1/n2) =                     .25  .167  .417  .645
n1 n2         4 6

6th - Mean Difference = mean of engineers - mean of business

= 6.00 - 4.75 = 1.25

Test Statistic = t8 = 6th / (4th times 5th)

= 1.25 / [(.781)(.645)]

= 1.25 / [.504] = 2.48

Step 4: For  = 8 and /2 = .025 (since HA is two tailed), the right tail CV is 2.306 (from the t-
table in Unit 21). Since the t-distribution is symmetric, then the left tail CV is -2.306

For the two tailed alternative in general there are 3 possible conclusions,

First, TS > right tail CV, reject H0 and conclude mean of the engineering majors > mean of the

277
Second, TS < left tail CV, reject H0 and conclude mean of the engineering majors < mean of the

Third, the TS is between the left and right tail CV, which is -CV < TS < CV, we fail to reject H0

In our situation we have

TS (2.48) > 2.306, therefore we reject H0

Step 5: Since we reject H0 we know that one of the sample means must be larger than the
other. Which one is larger than the other? Here is where you can now look at the data (the means
presented in the problem scenario). In this situation, we have the mean for engineering majors
being 6.00 years and the mean for business majors being 4.75 years. Since we know that one
must be larger than the other, then the only possible conclusion is that the engineering majors
(mean = 6.00 years) take longer to graduate with a bachelor's degree than the business
majors (mean = 4.75 years). It is even possible to take this a little farther. From our results
I specifically know that engineering majors take on average 1.25 years longer to obtain a
bachelor's degree than do business majors.

Quiz
A perception in our town is that the speed of traffic is increasing. In order to test this perception I
conducted a small study that compared the speed of traffic in May of 2003 to the speed of traffic
in May of 2004. In an attempt to make the comparison as legitimate as possible the speed of
traffic was measured only on our town‟s main street (Grand Avenue). Data were collected on 6
randomly selected days in May of 2003 and on another 6 randomly selected days in May of
2004. Unfortunately the machine used to collect the speed data was determined to be
malfunctioning on one of the selected days in May of 2003 and the data from this day was
dropped from the study. As a result our final data set was comprised of 5 days in May 2003 and
6 days in May 2004. Here are the results (the outcome variable, speed, is average speed for the
selected day).

Average Daily Speed
May 2003         34      33      34      36           33
May 2004         34      35      36      36           37       38

Appropriately test the indicated perception.

278
Three preliminary questions.

1. How many variables are there? One; average daily speed of the traffic.

2. What is the level of measurement of average daily speed? Measured in miles per hour, this
variable is ratio.

3. How many samples are there? Two; the first taken from May of 2003 and the second taken
from May of 2004.

Based on the answers to these four questions, it is clear that the test situation is the two
independent samples t-test.

Is the question of interest specific or general? It is specific. The perception indicates specifically
the speculation that the speed of traffic has increased. Mean speed in May of 2004 is GREATER
THAN the mean speed in May of 2003.

Step 1: H0: 2004 = 2003
HA: 2004 > 2003

Step 2:  = .05 and the test situation is the two independent samples t-test

Step 3: Here is our template.

1st - Calculation of the means

Mean of the 2003 sample = (34+33+34+36+33)/5 = 170 / 5 = 34.0                     here n1 = 5

Mean of the 2004 sample = (34+35+36+36+37+38)/6 = 216 / 6 = 36.0                  here n2 = 6

2nd - Calculation of the variances

2003 Sample                                        2004 Sample
value (value – mean) (value – mean)2               value (value – mean) (value – mean)2

34    (34-34) = 0       (0)(0) = 0                34      (34-36) = -2     (-2)(-2) = 4
33    (33-34) = -1      (-1)(-1) = 1              35      (35-36) = -1     (-1)(-1) = 1
34    (34-34) = 0       (0)(0) = 0                36      (36-36) = 0      (0)(0) = 0
36    (36-34) = 2       (2)(2) = 4                36      (36-36) = 0      (0)(0) = 0
33    (33-34) = -1      (-1)(-1) = 1              37      (37-36) = 1      (1)(1) = 1
38      (38-36) = 2       (2)(2) = 4

Sum     170         0                6                    216         0                10

279
The variance for the 2003 sample is       (6) / (5-1) = 6 / 4 = 1.5

The variance for the 2004 sample is       (10) / (6-1) = 10 / 5 = 2.0

3rd - Calculation of the pooled variance, Sp2, is

(5 1) (1.50 )  (6 1) (2.00 ) 6.00  10 .00 16 .00
Sp 
2
                     1.78
562                     9          9

4th - Calculation of the pooled standard deviation is S P  1.78  1.33

1   1         1 1
5th -                    .200  .167  .367  .606
n1 n2         5 6

6th - mean difference = mean of 2004 - mean of 2003 = 36.0 - 24.0 = 2.0

7th - test statistic

2.00        2.00
t9                          2.48
(1.33)(. 606 ) .806

Step 4: The arrow in the alternative hypothesis points in the direction of the right tail critical
value (this is directly the one in the “t” table in Unit 21). The critical value for  = .05 and = 9
is 1.833.

Test statistic (2.48) is greater than the critical value (1.833), thus we reject H0

Step 5: What does this mean? There is enough evidence from our study to indicate that speed of
the traffic on Grand Avenue in May of 2004 (mean = 36.0) is faster than it was in May of 2003
(mean = 34.0).

280
Unit 19: Matched Samples T-Test

Terms
Matched Samples T-Test Situation - a problem in which we have two matched samples,
one variable that is measured at a ratio or “ratio” level of measurement, and a question that asks
us to compare one of the samples to the other

Matched Samples - samples are said to be matched when there is relationship between the
members of one sample and the members of the other sample (typically this simply means the
the members of sample 1 are the same as the members of sample 2)

Distribution of the Test Statistic - "t distribution" with  degrees of freedom (t )

 = degrees of freedom = sample size - 1 = n - 1

Equations
Test Statistic

d
t 
sd
n

Mean difference = d = the mean of one of the matched samples - the mean of the
other matched sample = mean of the difference scores

d = difference score = score on the variable in the first matched sample - score on the variable
in the second matched sample for the same person

2
Variance difference = s d = the variance of the difference scores

Standard Deviation of the difference scores = s d  sd
2

281
What are matched samples?
The Matched Samples T-test is different from the independent samples T-test only in the nature
of the samples under consideration. What are matched samples and how do they differ from
independent samples? Samples that are independent of one another have no possible way of
logically or reasonably associating one member of one of the samples to any member in the other
sample. In matched samples there is some logical or reasonable means of making such an
association. There are two ways in which this can typically be done.

In this situation there are two groups of subjects who have been measured on a single dependent
(outcome) variable. In this situation, each member of the first sample is matched with a single
member of the second sample. For instance, we might wish do a study on twins. One of the twins
could go in the experimental group and the other twin could go in the control group. Here we
have the two independent samples t-test situation (different subjects in both groups); however, it
would be impossible to consider the subjects in the second sample as being independent of those
in the first. This is because each subject in the first sample can be matched (PAIRED) to a
specific subject (her or his twin) in the second sample. This is called a Matched Pairs, Paired
Samples, or Matched Samples design. [This is actually a better analysis procedure in small
sample size situations than the two independent samples t-test.]

Matched Samples (New Definition)

In this situation there is one group of subjects who have been measured on the dependent
(outcome) variable twice. For instance, I might wish to know if a particular class is teaching the
students anything. In order to assess this I might give the students the final on day 1 (usually
called a pre-test) and then again on the last day (post-test). If the students actually did learn
something from the class, then we might expect that they would perform at a higher level on the
post-test than they did on the pre-test. In fact, the score = post-test minus pre-test = the amount
learned in the class. In this situation there isn‟t an independent variable (there is only one group
of subjects) and there are two dependent variables (pre-test and post-test). In this form we often
are testing for the effect of some intervention. Between the pre-test and the post-test we apply an
experimental intervention to examine its effectiveness. In the example above the intervention is
the course, which occurs between the pre-test and the post-test (final). In this new definition of
the matched samples design the subjects are perfectly matched; each subject in the pre-sample is
matched to herself or himself in the post-sample.

282
Problem Scenario
This particular problem isn‟t terribly interesting, but it is a good problem for illustration. Many
students believe that they can improve their grade at the time of the final. Is this really possible?
For a student to improve her or his grade at the final, they would most likely have to do better
than they did on the mid-term examination. In order to examine the student belief, the following
study was conducted. In my advanced seminar class of 5 students (once again this sample size is
unrealistically small for a real study, but is presented at this size for illustration only and ease of
computation) I recorded the student‟s grades on the midterm and then I recorded these same

Here are their results.

Student Number

1       2        3        4         5

Mid-Term Grade            87      88       86       91        93

Grade on Final            87     90        90       89        94

Let‟s begin by asking our three preliminary questions.

1. How many variables are there? It looks like we have two; grade on the mid-term and grade on
the final.

2. What level of measurement are our variables? Grade on the tests is measured in percent, hence
the level of measurement is ratio.

3. How many samples are there? One; the same 5 students are measured on the mid-term and
then again on the final. This is one of the tricky places in this unit. It can look like our total
sample size is 10, but since the same subjects (students) are measured at the final as were on the
mid-term, then the real sample size is only 5.

The answers to questions 1 and 3 indicate that the matched samples design is appropriate and is
the new definition.

Is the question specific or general? The belief stated is clearly specific. Students “believe that
they can improve their grade at the time of the final.”

As was true in the two independent samples t-test situation, it is once again important to note that
the problem situation above is asking us to make a comparison of what appears to be two
samples. The first sample (students at the mid-term) to the second sample (students at the final)
and that these two samples are matched to one another (traditional definition).

283
When we have two matched samples (using either the traditional or new definition), an
outcome variable that is measured at a ratio or “ratio” level of measurement, and a
question that asks us to compare one of the samples to the other, then the test situation is
the matched samples t-test.

The test statistic for the matched samples t-test is

d
t 
sd
n

Although this equation uses different notation than we are familiar with, in reality everything in
the equation is old and simple material. The secret to the matched samples situation is to create a
third sample from the first two samples. This third sample is the sample of the differences
between the scores of the first sample and the second sample. This difference can be calculated
in two ways. In reality the matched samples situation is a one sample situation where the one
sample is the sample of differences.

Method 1. Score from 1st sample – score from 2nd sample

Method 2. Score from 2nd sample – score from 1st sample

In our problem, which makes more sense? The belief is that students can improve on the final.
This means that they must score higher on the final than the on mid-term. Thus, if we use the
second way to calculate the difference

difference = score on the final – score on the mid-term

then according to the students' belief this difference should be greater than 0. We use the little
letter "d" to stand for the difference scores. Thus, dbar ( d ) is the mean of the difference
scores and s d is the standard deviation of the differences scores.

From our two samples above (pre- and post-scores) here is the creation of the third sample (d;
the difference scores)

Student Number

1             2               3             4              5

Mid-Term Grade                  87            88              86            91             93

Grade on Final                  87             90             90             89            94

Difference (final – mid)      (87-87)=0     (90-88)= 2    (90-86)= 4     (89-91)= -2    (94-93)= 1

284
Once again, let‟s use a template.

Template
1. calculate the difference scores (d) --- Above

2. calculate the mean of the difference scores ( d )

(0 + 2 + 4 + (-2) + 1) / 5 = 5 / 5 = 1.0

2
3. calculate the variance of the difference scores ( s d )

Below

4. calculate the standard deviation of the difference scores ( s d )

Below

Score          Score         Difference (d)

On Final     On Mid-term        (Final-Mid)        (d - dbar)      (d – dbar)2

87            87            (87-87) = 0        (0 – 1) = -1    (-1)(-1) = 1

90            88            (90-88) = 2        (2 – 1) = 1     (1)(1) = 1

90            86            (90-86) = 4        (4 – 1) = 3     (3)(3) = 9

89            91            (89-91) = -2       (-2 – 1) = -3   (-3)(-3) = 9

94            93            (94-93) = 1        (1 – 1) = 0     (0)(0) = 0

Sum                                    5.0                   0             20.0

Now the calculation for the variance of the difference scores is

s d = sum of the squared difference deviations (last column above) divided by (n – 1)
2

= 20.0 / (5 – 1) = 20.0 / 4 = 5.0

285
and

2
s d = the standard deviation of the difference scores = square root of s d

= square root of (5.0) = 2.236

and finally the degrees of freedom  for the matched samples t-test is n – 1 = 4 for our
problem. Once again, the degrees of freedom are n-1 because in truth we only have a one sample
problem where the variable is difference.

We are now ready to put these pieces together and test the above problem using our 5-step
guideline.

Matched Samples Statistical Test
As with the two independent samples t-test there are once again three possible sets of hypotheses
for the matched samples t-test.

Possibility 1: If we calculate the differences as sample 2 (final) - sample 1 (mid-term), then d is
the population mean associated with this difference calculation. The speculation that there is
improvement can be written as follows

H0: d = 
HA: d > 0

Here d > 0 indicates that we are expecting the mean on the final to be greater than the mean on
the mid-term or that final mean - mid-term mean must be > 0.

Possibility 2: If we calculate the differences as sample 2 (final) - sample 1 (mid-term), then d is
the population mean associated with this difference calculation. The speculation that there is a
degradation of performance (scores get worse) can be written as follows

H0: d = 
HA: d < 0

Here d < 0 indicates that we are expecting the mean on the final to be less than the mean on the
mid-term or that final mean - mid-term mean must be < 0.

286
Possibility 3: If we calculate the differences as sample 2 (final) - sample 1 (mid-term), then d is
the population mean associated with this difference calculation. The speculation that there is
either improvement or degradation of performance can be written as follows

H0: d = 
HA: d  0

For our problem we have the first situation, thus our hypotheses for this problem are

Step 1: H0: d = 
HA: d > 0

Step 2:  = .05 and the test situation is the matched samples t-test

Step 3: From the template above, we have

dbar ( d ) = 1.0

and

s d = 2.236

thus
1.0
t = t4 = ------------------------------
2.236 / (square root of 5)

1.0
= ------------------
2.236 / 2.236

1.0
= ------- = 1.0
1.0

Note: in the calculation of the test statistic above we use “n” in the denominator (bottom
part) not “n–1.” [Square root of 5, not the square root of 4]

Step 4: As with the previous units using the “t” distribution, the arrow points the direction for the
critical value. In this case it is to the right. If we look up the right hand tail critical value in the
“t” table from Unit 21, we find that the identified critical value for  = .05 and  = 4 is 2.132.
For our problem,

287
Test statistic (1.0) is not greater than the critical value (2.132), thus we fail to reject H0

Step 5: What does this mean? There is not enough evidence from our study to indicate that
students score higher on the final than they do on the mid-term. We have not been able to
support the students belief.

In this problem and in general, the fail to reject H0 result is probably attributable to the small
sample size. The smaller the sample size the harder it is to reject the null hypothesis.

COMPARISON - Perspective Moment
The matched samples situation should look somewhat familiar to you. In fact, we have dealt with
essentially this same problem setting twice already in the course. In the illustration above we
have two variables (score on the mid-term and score on the final) and only one sample. Below
are three questions of interest using these two variables.

1. Is a student‟s score on the mid-term related to her/his score on the final?

2. Can we predict a student‟s score on the final from her/his score on the mid-term?

3. Is there are difference between a student‟s score on the mid-term and her/his score on the
final?

Now, can you identify the test situation for each?

1. The key word in the first question is “related.” This leads us to the appropriate test situation
being correlation.

2. The key word in the second question is “predict.” This leads us to the appropriate test
situation being regression.

3. The key word in the third question is “difference.” This leads us to the appropriate test
situation being the matched samples t-test.

Notice how similar these three questions are and yet they are to be tested in three radically
different ways. I have just pointed out this situation as an example of how difficult the
application of statistics can be to real problems, but I will leave the tricky part of distinguishing
between them to real life and will not address them any further in this course.

288
Another Example
Problem Scenario: As a resident of Laramie Wyoming I am convinced that our summers are
getting warmer. In order to examine this belief, I collected temperature data for 2005 and 2006.
I went into the officially recorded daily high temperature data (measured in degrees Fahrenheit)
and selected the following four dates in 2005 and 2006: May 1, June 1, July 1, and August 1.

Three preliminary questions.

1. How many variables? One, daily high temperature

2. What is the level of measurement for this variable? This particular variable is measured at an
interval level. This is an extremely rare level of measurement and for us can easily be considered
ratio.

3. How many samples are there? Two, 2005 and 2006. Are they independent samples or matched
samples? If I had selected 4 dates in 2005 at random and had selected 4 dates in 2006 at random,
then the samples would have been independent. However, since I have used exactly the same 4
dates in 2005 and in 2006, these two samples are matched (according to the traditional
definition).

The appropriate test situation is the matched samples t-test.

Is the alternative hypothesis specific (one tailed) or general (two tailed)? My belief that is being
tested specifically states a single alternative possibility, getting warmer. Thus, the alternative is
one tailed.

Alternative Hypothesis is that the mean temperature in 2006 is greater than the mean temperature
in 2005. It is important to notice that this alternative can be expressed in two different ways.

First,    mean temperature in 2006 > mean temperature in 2005

or

Second, mean temperature in 2005 < mean temperature in 2006

I much prefer the first method to the second. Why? The simple answer is that I like to keep
things as simple as possible. By selecting the first alternative I can stay in the right tail, think of
things in terms of positive numbers, and I can use the t-table in Unit 21 exactly as presented. For
me this is much simpler (not better) than the second method which would place me in the left tail
and looking for a negative result.

Let's conduct the test.

289
Here are the data.

May 1                 June 1     July 1        August 1

2006                    67                  78         91           96

2005                    62                  67         82           89

2006 - 2005           67-62 = 5          78-67 = 11    91-82 = 9   96-89 = 7

mean difference = dbar = (5 + 11 + 9 + 7) / 4 = 32 / 4 = 8.0

Template

d       (d - dbar)        (d - dbar)2

5       (5-8) = -3        (-3)(-3) = 9

11        (11-8) = 3        (3)(3) = 9

9       (9-8) = 1         (1)(1) = 1

7       (7-8) = -1        (-1)(-1) = 1

Sum               0                 20

Variance of the difference scores = s d = (d - dbar)2 / (n-1) = 20 / (4-1) = 20/3 = 6.67
2

Standard deviation of the difference scores

2
= square root of s d

= square root of (6.67) = 2.582 = s d

Step 1: H0: 2006 = 2005           which is also 2006 - 2005 = 0

HA: 2006 > 2005          which is also 2006 - 2005 > 0

Step 2:  = .05 and the test situation is matched samples t-test

Step 3: Determination of the test statistic

degrees of freedom  = n - 1 = 4 - 1 = 3

290
t = t3 = dbar / [sd /square root (n)]

= 8.0 / [2.582 / square root (4)]

= 8.0 / [2.582 / 2] = 8.0 / [1.291] = 6.197

Step 4: for = 3 and a = .05 (since HA is one tailed), then CV = 2.353

Since TS (6.197) > CV (2.353), we reject H0

Step 5: Based on the results of this test we know that the mean temperature in 2006 was higher
than the mean temperature in 2005. It does appear that the summers in Laramie Wyoming are
getting warmer. Specifically we know from these data that 2006 was 8.0 degrees Fahrenheit
warmer than 2005 on average (dbar = 8.0).

Quiz
Is the price of food changing or is it stable? To answer this question I went to our local grocery
store and found the price for the following 7 grocery items: 1 dozen medium eggs, 1 pound of
regular hamburger, 1 pound of navel oranges, 1 gallon of 2% milk, 1 pound of raw spinach, ½
gallon of the store brand vanilla ice cream, and 1 loaf of store brand bread. I collected the prices
for the 7 items on the 3rd Friday of the month of June and then again on the 3rd Friday of the
month of September. Here are the results of this comparative study.

Item               Price in June         Price in September

Eggs               \$1.39                 \$1.29
Hamburger          \$1.59                 \$1.82
Oranges            \$ .97                 \$1.16
Milk               \$2.49                 \$2.63
Spinach            \$1.42                 \$1.48
Ice Cream          \$2.97                 \$3.29

Appropriately test the indicated question.

291
Three preliminary questions.

1. How many variables? One; price of the grocery item.

2. What level of measurement is price? Measured in dollars and cents, price is ratio.

3. How many samples are there? Two; but they are matched. The first sample is taken in June
and the second sample is taken in September; however, these two samples measure exactly the
same grocery items (matched). Once again this is the Traditional definition of the matched
samples situation.

Is the question specific or general? Since the question doesn‟t ask if the prices are going up or
going down, then the question must be general. The question is only "is the price of food
changing?" This will be the two-tailed alternative.

Given the above, the appropriate test situation is the matched samples t-test.

Step 1: H0: d = 0
HA: d  0

Step 2:  = .05 and the test situation is the matched samples t-test

Step 3: The template

1. calculate the difference scores (d) – Here I will take the difference as the price in September
minus the price in June.

2. calculate the mean of the difference scores (dbar) --- see below for these numbers

dbar = (-.10 + .23 + .19 + .14 + .06 + .32 + .07) / 7 = .91 / 7 = .13

2
3. calculate the variance of the difference scores ( s d )

292
4. calculate the standard deviation of the difference scores ( s d )

Difference
Item             June     September        (Sept – June)        (d – dbar)           (d – dbar)2

Eggs            \$1.39      \$1.29           (1.29-1.39) = -.10    (-.10-.13) = -.23   (-.23)(-.23) = .0529
Hamburger       \$1.59      \$1.82           (1.82-1.59) = .23    (.23-.13) = .10      (.10)(.10) = .0100
Oranges         \$ .97      \$1.16           (1.16-.97) = .19      (.19-.13) = .06     (.06)(.06) = .0036
Milk            \$2.49      \$2.63            (2.63-1.49) = .14    (.14-.13) = .01      (.01)(.01) = .0001
Spinach         \$1.42      \$1.48            (1.48-1.42) = .06   (.06-.13) = -.07     (-.07)(-.07) = .0049
Ice Cream       \$2.97      \$3.29           (3.29-2.97) = .32    (.32-.13) = .19      (.19)(.19) = .0361
Bread           \$1.49      \$1.56           (1.56-1.49) = .07    (.07-.13) = -.06      (-.06)(-.06) = .0036

Sum                                        .91                         0               .1112

Now the calculation for the variance of the difference scores is

s d = sum of the squared difference deviations (last column above) divided by (n – 1)
2

= .1112 / (7 – 1) = .1112 / 6 = .0185

and

2
s d = the standard deviation of the difference scores = square root of s d

= square root of (.0185) = .1360

and finally the degrees of freedom  for the matched samples t-test is n – 1 = 6 for our problem.

.13
t = t6 = ------------------------------
.1360 / (square root of 7)

.13
= ------------------
.1360 / 2.646

.13
= --------- = 2.529
.0514

293
Step 4: As with the previous units using the “t” distribution, the arrow points the direction for the
critical value. In this case (unequal) we have the two tailed alternative and need to establish a
critical value in each tail. If we look up the right hand tail critical value in the “t” table from Unit
21, we find that the identified critical value for  = .025 (divide by two because of the two-tailed
alternative) and  = 6 is 2.447. Thus, the left hand tail critical value is -2.447. For our problem,

Test statistic (2.529) is greater than the critical value (2.447), thus we reject H0

Step 5: What does this mean? There is enough evidence from our study to indicate that prices of
our 7 grocery items are more expensive in September than they were in June. In fact, on average
we are paying 13 cents more per item (dbar = .13).

294
Unit 20: One-way Analysis of Variance

Terms
One-way Analysis of Variance Test Situation - a problem in which we have three or more
samples, one variable that is ratio or “ratio,” and a question of interest that asks for a comparison
of the samples to one another

MSA - Mean Squared Among

MSW - Mean Squared Within

k = number of samples

Distribution of the Test Statistic - "f distribution" with 1 and 2 degrees of freedom (f1,2)

1 = first degrees of freedom = k - 1

2 = second degrees of freedom = n – k

295
Equations
Test Statistic

MSA

MSW

The test situation known as One-way analysis of variance is exactly the same situation as the two
independent samples t-test with the only exception being that there can be more than two
samples; there can be three or more samples. In this regard everything we learned in Unit 18 is
applicable in this unit as well. Here is an example to consider.

Example
How is it best to treat a person who has heart problems? Often people with heart problems have
reduced blood flow through the heart due to obstructions in the vessels leading into the
heart. Currently there are two popular methods for treating people who are experiencing heart
problems; these are by-pass surgery and drug therapy. In order to see how effective these
treatments are we measured blood flow into the heart in percent of optimal flow. Thus a score of
0 means that there is no blood flow (the person is dead) and a score of 100 means that there is
optimal blood flow (no obstruction). If we took a sample of 10 by-pass surgery patients and a
sample of 15 drug therapy patients, then which test situation would be most appropriate? Given
the nature of the question, that there are two independent groups (in this problem you can‟t be in
both samples at the same time), and that the dependent (outcome) variable is ratio, then the test
situation is the two independent samples t-test. Suppose for illustration that the researchers at the
National Heart Institute have developed a new procedure for treating heart patients using
ultrasound. The question of interest now is, “how do the three treatment programs compare to
one another?”

Let‟s dissect this problem using our three preliminary questions.

296
1. How many variables are there? One; blood flow.

2. What level of measurement is blood flow? Blood flow is measured in percent and is thus a
ratio level of measurement.

3. How many samples are there? There will be three; one sample of surgery patients, one sample
of drug patients, and one sample of ultrasound patients.

When you have three or more samples, an outcome (dependent) variable that is ratio or
“ratio,” and a question of interest that asks for a comparison of the samples to one another,
then the test statistic is One-way Analysis of Variance.

Is the question specific or general? In problems involving three or more samples, this question is
irrelevant. For One-way Analysis of Variance the alternative is always that the means of the
populations are not equal. This will be explained in greater detail later in this unit.

The template to use in the calculation of this test statistic is

Template
1. Calculate the mean of each sample

2. Calculate the variance of each sample

3. Calculate the grand mean

4. Calculate MSA using

Sample       ni     xi       ( xi  x )     ( xi  x ) 2           (ni )( xi  x ) 2

5. Calculate MSW using

Sample       ni     ni  1        si2           (ni 1)( si2 )

6. Calculate the test statistic

f1,2 = MSA / MSW

with          1 = k – 1       and      2 = n – k

Lastly, the most unique thing for us in the equation for the test statistic is the distribution of the
test statistic which is “f.” Of the distributions that we have seen in this course, the “f”
distribution is most like the Chi-Square distribution in appearance. Compare the graph at the top
of the “f” distribution table in Unit 21 with the one at the top of the Chi-Square distribution table

297
in the same unit. However, the “f” distribution has two degrees of freedom, 1 and 2, where the

To use the table note that the left hand column is labeled “degrees of freedom for denominator.”
In our notation, which is nearly universal in the literature, this left hand column is 2. The top
row in the table is labeled “degrees of freedom for numerator” and is 1 in our notation. PLEASE
NOTE that the degrees of freedom for 1 and 2 are not the same. This means that you must take
particular care to calculate 1 and 2 correctly, and to be sure to find the column in the table
associated with 1 and the row in the table associated with 2. The entire table is for  = .05. All
of the values in the table are the critical values for the specified 1 and 2 degrees of freedom.

Here is an illustration. If 1 = 3 and 2 = 14, then critical value for  = .05 is 3.34.

In the use of the “f” distribution we

Reject H0 if the value of the test statistic is GREATER than the critical value, and we

Fail to reject H0 if the value of the test statistic is LESS than the critical value.

Testing with the "f" distribution is much easier than with the "t" distribution. Why? Because in
every test situation that uses the "t" distribution both one-tail and two-tail alternative hypotheses
are possible. However, with the analysis of variance there is only one possible alternative. Here
are couple more example problems to help you become familiar with the use of the "f" table.

Example 1: If 1 = 2 and 2 = 8, then critical value for  = .05 is 4.46

Example 2: If 1 = 2 and 2 = 25, then critical value for  = .05 is 3.39

Example 3: If 1 = 5 and 2 = 17, then critical value for  = .05 is 2.81

Before we complete this example using the 5-step guideline, let‟s take a brief look at the
hypotheses for the One-way Analysis of Variance situation. For all One-way analyses of
variance the

H0 is that all of the population means are equal to one another

HA is that H0 is not true, the population means are somehow different from one another.

298
One-Way Analysis of Variance Test
Now let‟s look at the data for this example.

Sample              scores on the outcome variable (% of optimal blood flow)

1. Surgery          75     76    77     72
2. Drug             65     67    64     63   66
3. Ultrasound       80     78    79     83

For this problem

k (the number of samples) = 3
n1 (size of sample 1; size of the surgery sample) = 4
n2 (size of sample 2; size of the drug sample) = 5
n3 (size of sample 3; size of the ultrasound sample) = 4
and
n (the total sample size) = 4 + 5 + 4 = 13

Step 1: H0: 1 = 2 = 3      [The 3 treatments are equally effective]
HA: not H0            [The 3 treatments are not equally effective]

Step 2:  = .05 and the test situation is the One-way analysis of variance

Step 3: Determine the value of the test statistic. Use the template

1. Calculate the mean of each sample

Surgery mean (xbar1) = (75 + 76 + 77 + 72) / 4 = 300 / 4 = 75.0

Drug mean (xbar2) = (65 + 67 + 64 + 63 + 66) / 5 = 325 / 5 = 65.0

Ultrasound mean (xbar3) = (80 + 78 + 79 +83) / 4 = 320 / 4 = 80.0

2. Calculate the variance of each sample

Sample 1 – Surgery                     Sample 2 – Drug             Sample 3 – Ultrasound
(score–mean) (score–mean)2            (score–mean) (score–mean)2      (score–mean) (score–mean)2

(75-75)= 0        (0)(0) = 0          (65-65)= 0    (0)(0) = 0        (80-80)= 0    (0)(0) = 0
(76-75)= 1        (1)(1) = 1          (67-65)= 2    (2)(2) = 4        (78-80)= -2   (-2)(-2) = 4
(77-75)= 2        (2)(2) = 4          (64-65)= -1   (-1)(-1) = 1      (79-80)= -1   (-1)(-1) = 1
(72-75)= -3       (-3)(-3) = 9        (63-65)= -2   (-2)(-2) = 4      (83-80)= 3     (3)(3) = 9
(66-65)= 1    (1)(1) = 1
Sums       0             14                 0             10                 0           14

299
s12 = 14 / 3 = 4.67                                     Remember to divide by the (sample size minus 1)

s22 = 10 / 4 = 2.50

s32 = 14 / 3 = 4.67

3. Calculate the grand mean

xbar = sum of all the values divided by the total sample size

= (75+76+77+72+65+67+64+63+66+80+78+79+83) / 13 = 945 / 13 = 72.69

4. Calculate MSA using

Sample        ni       xi            ( xi  x )                    ( xi  x ) 2         (ni )( xi  x ) 2

Surgery    4          75          (75-72.69)= 2.31             (2.31)(2.31)=5.34       (4)(5.34) = 21.36
Drug       5          65          (65-72.69)= -7.69            (-7.69)(-7.69)=59.14   (5)(59.14) = 295.70
Ultrasound 4          80          (80-72.69)=7.31              (7.31)(7.31)=53.44     (4)(53.44) = 213.76

Sum = 530.82

MSA = sum / (k-1) = 530.82 / (3-1) = 530.82 / 2 = 265.41

5. Calculate MSW using

Sample         ni           ni  1                si2            (ni 1)( si2 )

Surgery    4                 3                4.67            (3)(4.67) = 14.00
Drug       5                 4                2.50            (4)(2.50) = 10.00
Ultrasound 4                 3                4.67            (3)(4.67) = 14.00

Sum = 38.00

MSW = sum / (n-k) = 38.00 / (13-3) = 38.00 /10 = 3.80

6. Calculate the test statistic

1 = k - 1 = 3 - 1 = 2
2 = n - k = 13 - 3 = 10

test statistic = MSA / MSW = f1,2 = f = 265.41 / 3.80 = 69.84

300
Step 4: The critical value for 1 = 2 and 2 = 10, and  = .05 is 4.10

Since the test statistic (69.84) is Greater Than the critical value (4.10), we reject H0

Step 5: What does this mean? We have enough evidence to indicate that the three heart
treatments are not equally effective. If this is the case, then which is best and which is worst?
To answer this part of the question we need to go back to the calculation of the MSA and
look at the values for (mean – grand mean). In essence this is a special type of deviation
such that the larger the deviation the greater the influence attributable to the specific
sample. If the value is negative then the influence is negative and if the value is positive
then the influence is positive. Thus, we see that the drug therapy is the least effective (the
only treatment with a negative deviation) and that the ultrasound therapy is the most
effective (the treatment with the highest positive deviation).

Another Example
Problem Scenario: At the present the issue of illegal immigration is of concern to many citizens
of our nation. It is logical to expect that the attitudes of people in States more heavily impacted
by illegal immigration would be different from the attitudes of people in States less heavily
impacted. In order to examine this belief, I conducted the following study. I took a random
sample of 7 people from Arizona, a random sample of 6 people from Utah, and a random sample
of 4 people from West Virginia. My question of interest is, "do the attitudes of the people in
these 3 states about illegal immigration differ?" Here is the attitude question that I asked each of
the people in my study,

Using the scale, 1 = Very much disagree, 2 = disagree, 3 = slightly disagree, 4 = slightly agree, 5
= agree, and 6 = very much agree; How much do you agree with the following statement?

Illegal immigration is hurting our nation's economy.

Here are the data and the answers to the first 3 questions in our template.

State        Sample Size      Mean       Variance

Arizona            7           5.00       .667

Utah              6            4.00       .800

West Virginia      4            2.25      .250

Grand Mean       4.00

301

1. How many variables are there? One; degree of agreement with the statement, "illegal
immigration is hurting our nation's economy."

2. What level of measurement is the variable amount of lost weight? This scale is a Likert Scale.
All Likert scales are ordinal levels of measurement. However, since this scale has 5 or more
possible outcomes, it can be considered "ratio."

3. How many samples are there? Three; the Arizona sample, the Utah sample, and the West
Virginia sample.

Based on these answers the appropriate test situation is the One-way Analysis of Variance.

Is the question specific or general? Since the test situation is One-way Analysis of Variance, then
the alternative is always general.

For this problem k = 3, n1 = 7, n2 = 6, n3 = 4, and n (the total sample size) = 7 + 6 + 4 = 17

Step 1: H0: 1 = 2 = 3    [The 3 States have the same attitude about illegal immigration]

HA: not H0         [The 3 States do not have the same attitude about illegal immigration]

Step 2:  = .05 and the test situation is the One-way analysis of variance

Step 3: Determination of the Test Statistic. Use the template

1. Calculate the mean of each sample. This was provided.

Arizona mean (xbar1) = 5.00

Utah mean (xbar2) = 4.00

West Virginia mean (xbar3) = 2.25

2. Calculate the variance of each sample. This was provided.

s12 = .667

s22 = .800

s32 = .250

302
3. Calculate the grand mean. This was provided.

grand mean = xbar = 4.00

4. Calculate MSA using

Sample           ni         xi             ( xi  x )               ( xi  x ) 2           (ni )( xi  x ) 2

Arizona           7       5.00         (5.00-4.00) = 1.00      (1.00)(1.00) = 1.00      (7)(1.00) = 7.00

Utah              6       4.00         (4.00-4.00) = 0.00      (0)(0) = 0               (6)(0) = 0.00

West Virginia 4           2.25         (2.25-4.00) = -1.75     (-1.75)(-1.75) = 3.063    (4)(3.063) = 12.250

Sum = 19.250

MSA = sum / (k-1) = 19.250 / (3-1) = 19.250 / 2 = 9.625

5. Calculate MSW using

Sample            ni             ni  1                  si2        (ni 1)( si2 )

Arizona               7            6                    .667    (6)(.667) = 4.00

Utah                  6            5                    .800    (5)(.800) = 4.00

West Virginia      4               3                    .250     (3)(.250) = 0.75

Sum = 8.75

MSW = sum / (n-k) = 8.75 / (17-3) = 8.75 / 14 = .625

6. Calculate the test statistic

1 = k - 1 = 3 - 1 = 2

2 = n - k = 17 - 3 = 14

Test statistic = MSA / MSW = f = f = 9.625 / .625 = 15.40

Step 4: Using the f Table in Unit 21, the critical value for 1 = 2 and 2 = 14, and  = .05 is 3.74

Since the test statistic (15.40) is Greater Than the critical value (3.74), we reject H0

303
Step 5: What does this mean? We do have enough evidence to indicate that the 3 States do not
have the same attitude about illegal immigration. If this is the case, then how do the State
attitudes differ from one another? To answer this question we will once again have to look at the
(mean - grand mean) section of MSA. Here are those results.

(mean - grand mean)

Arizona             1.00

Utah                0.00

West Virginia      -1.75

At the level of this course to interpret these values we typically look at only the largest positive
deviation and the largest negative deviation. Thus our conclusion can be stated as, the States
selected for inclusion in this study differ in their attitudes about illegal immigration. Specifically
we have found that the residents of Arizona believe most strongly that illegal immigration is
hurting our nation's economy (largest positive deviation) and that the residents of West Virginia
in general do not believe that illegal immigration is hurting our nation's economy (largest
negative deviation).

Quiz
I would like to compare three diet plans with one another to see if one of them is better than the
others. The first is the Atkins diet, the second is a low-fat diet, and the third is a low-calorie diet.
12 people agreed to be in my study. Four of these people were randomly assigned to each of the
three diet programs (samples). Each person at the start of my study was assessed to be 25 pounds
overweight and the outcome variable of interest was the number of pounds lost after being on the
assigned diet for 4 weeks. The results of this study are listed below.

Sample                 Pounds Lost

1. Atkins              10    11    12 7
2. Low-fat             12    11     8 13
3. Low-calorie          8    10     9 9

Appropriately test the hypothesis of equal diet effectiveness.

304

1. How many variables are there? One; amount of lost weight in 4 weeks.

2. What level of measurement is the variable amount of lost weight? Measured in the number of
pounds lost it is ratio.

3. How many samples are there? Three; the Atkins sample, the low-fat sample, and the low-
calorie sample.

Based on these answers the appropriate test situation is the One-way Analysis of Variance.

Is the question specific or general? Since the test situation is One-way Analysis of Variance, then
the alternative is always general.

For this problem k = 3, n1 = 4, n2 = 4, n3 = 4, and n (the total sample size) = 4 + 4 + 4 = 12

Step 1: H0: 1 = 2 = 3    [The 3 diet programs are equally effective]
HA: not H0          [The 3 diet programs are not equally effective]

Step 2:  = .05 and the test situation is the One-way analysis of variance

Step 3: Use the template

1. Calculate the mean of each sample

Atkins mean (xbar1) = (10 + 11 + 12 + 7)) / 4 = 40 / 4 = 10.0

Low Fat mean (xbar2) = (12 + 11 + 8 + 13) / 4 = 44 / 4 = 11.0

Low Calorie mean (xbar3) = (8 + 10 + 9 + 9)) / 4 = 36 / 4 = 9.0

2. Calculate the variance of each sample

Sample 1 – Atkins              Sample 2 – Low Fat               Sample 3 – Low Calorie
(score–mean) (score–mean)2        (score–mean) (score–mean)2        (score–mean) (score–mean)2

(10-10)= 0      (0)(0) = 0         (12-11)= 1       (1)(1) = 1       (8-9)= -1     (-1)(-1) = 1
(11-10)= 1      (1)(1) = 1         (11-11)= 0       (0)(0) = 0       (10-9)= 1     (1)(1) = 1
(12-10)= 2      (2)(2) = 4         (8-11)= -3       (-3)(-3) = 9      (9-9)= 0     (0)(0) = 0
(7-10)= -3      (-3)(-3) = 9       (13-11)=2         (2)(2) = 4       (9-9)= 0     (0)(0) = 0

Sums     0             14                  0              14                 0            2

305
s12 = 14 / 3 = 4.67                                      Remember to divide by the sample size minus 1

s22 = 14 / 3 = 4.67

s32 = 2 / 3 = .67

3. Calculate the grand mean

xbar = sum of all the values divided by the total sample size

= (10 + 11 + 12 + 7 + 12 + 11 + 8 + 13 + 8 + 10 + 9 + 9) / 12 = 120 / 12 = 10.0

4. Calculate MSA using

Sample         ni        xi          ( xi  x )                  ( xi  x ) 2          (ni )( xi  x ) 2

Atkins      4           10         (10 – 10) = 0                 (0)(0) = 0            (4)(0) = 0.0
Low Fat     4           11         (11 – 10) = 1                 (1)(1) = 1            (4)(1) = 4.0
Low Calorie 4            9          (9 – 10) = -1                (-1)(-1) = 1          (4)(1) = 4.0

Sum = 8.0

MSA = sum / (k-1) = 8.0 / (3-1) = 8.0 / 2 = 4.0

5. Calculate MSW using

Sample          ni        ni  1                   si2             (ni 1)( si2 )

Atkins      4                 3                   4.67           (3)(4.67) = 14.00
Low Fat     4                 3                   4.67            (3)(4.67) = 14.00
Low Calorie 4                 3                    .67           (3)(.67) = 2.00

Sum = 30.00

MSW = sum / (n-k) = 30.00 / (12-3) = 30.00 /9 = 3.33

6. Calculate the test statistic

f = MSA / MSW = 4.00 / 3.33 = 1.20

with 1 = 2        and        2 = 9

306
Step 4: Using the f Table in Unit 21, the critical value for 1 = 2 and 2 = 9, and  = .05 is 4.26

Since the test statistic (1.20) is NOT Greater Than the critical value (4.26), we fail to reject H0

Step 5: What does this mean? We do not have enough evidence to indicate that the three diet
programs are not equally effective. If this is the case, then at this point it is not possible to
determine which of the diet programs is the best and which is the worst; we either have to
conclude that the diet programs are equally effective or we need to collect more data.

At this point you have completed all of the unit material. Congratulations !!!

307
Unit 21: Distribution Tables
Table 1 - The Standard Normal Table (Z)

308
Table 2 - The Chi Square Table

309
Table 3 - The "T" Table

310
Table 4 - The "F" Table

311
Unit 22: Experiments

What is an experiment?

An experiment is a study conducted in a highly controlled manner such that the precise
manipulation of an explanatory variable (independent variable) will result in some measurable
and predictable change in the response variable (dependent variable). In this definition we should
hear the echoes of the extensive discussion of form 3 questions in the regression unit (Unit 17).
The key issues in this definition are the presence of a highly controlled environment, the ability
to manipulate the explanatory variable, and the ability to measure change in the response
variable. Earlier in the semester we dealt with the issues of measurement (Unit 2), types of
research questions (Unit 1), and the use of independent and dependent variables (Unit 17). As an
aside, in experiments the participants are usually referred to as subjects.

Highly Controlled Environment
What do we need in order to take control of the experimental environment? The two essential
elements that need to be controlled are how we conduct the experiment (this is often called
design) and how we attempt to minimize or eliminate potential sources of bias.

Design
Experimental design is a plan for running an experiment. There are many possible plans for
the running of experiments, so many in fact that there are entire courses and even sequences of
courses whose only goal is to present various designs and the nuances accompanying them. I will
simplify things greatly and present two basic designs. The first is a multiple-group design and the
second is a longitudinal design.

Multiple-Group Design
The term multiple-group should be fairly self-explanatory. It means to have two or more groups.
The simplest multiple-group design is one that contains only two groups. Typically in this
design, one of the groups is a control group and the other group is a treatment or experimental
group.

The treatment group is self-explanatory. It is composed of those individuals who have received
the treatment, the intentional manipulation.

312
The control group is composed of those individuals who have not received the treatment, those
who have not been manipulated. In truth, the term control group is a misnomer, since both the
treatment and control groups are both controlled.

Ideally, the experimenter should exercise complete control over all of the individuals
participating in the study. Through this control, the experimenter has hopefully equalized the
treatment and control groups. They should be alike in all regards with only one exception. Of
course, that exception is that the treatment group has received the experimental manipulation and
the control group hasn‟t. Thus, the control group is a reference for comparison. They should look
like what the treatment group would have looked like, if they had not received the treatment.
Hence, any difference between the treatment and control group on the response variable is
directly attributable to the effect of the manipulation. Now you should be able to see where the
notion of causality is introduced into the experimental setting.

In the context of an observational study, the control group is called a comparison group. Since
we can not typically exercise control over people in an observational study, we attempt to select
a reference group that is as similar to the treatment group as possible. In recognition of this loss
of control, we refer to this group as a comparison group.

Longitudinal Design
In the longitudinal design, a group of individuals usually act as their own control group. How
this is accomplished is by measuring them prior to receiving the experimental manipulation and
then measuring them a second time after they have received the manipulation. This design is also
called a pre-post design. This should be clear in that we measure the individuals prior to (pre) the
manipulation and after (post) the manipulation. Often in this design, the manipulation is called an
intervention and the pre-measurements are called the baseline.

The longitudinal design is generally considered to be superior to the multiple-group design. This
is because of several advanced statistical properties and because it is easier to exercise control.
However, the longitudinal design is typically less popular because it is more expensive to
conduct, takes more time, and has a higher risk of individuals dropping out of the experiment
(this is called mortality among design people). Another weakness of the longitudinal design is
that change does naturally occur over time (time heals all wounds). Hence, the use of a control
group in conjunction with the longitudinal design would take care of this possibility, since the
control group would experience the non-aided healing that time alone would impart.

313
Bias
Bias is basically something that is introduced in our study such that the final result is not
reflective of the actual truth. Bias is not a new topic to this course and we have seen various
discussions about bias throughout most of the previous weeks. Hopefully by this time, it should
be easy to see that one of the primary objectives of statistics is to control for bias. Here are some
of the controls on bias applicable to the experimental setting.

Random assignment. In the multiple-group design the fundamental issue is the equalizing of the
control and treatment groups for all variables with the exception of the explanatory variable. This
is generally accomplished through the process of random assignment, which is very close to the
notion of simple random sampling presented earlier. In random assignment, the individuals are
assigned to be members of the treatment and control group in a completely impartial, totally
random manner. Thus, we should not expect the members of the control group to be different in
any systematic manner from the treatment group. Recall that one useful definition of bias is
systematic prejudice. If the random assignment failed in some fashion, such that the groups were
somehow systematically different in a manner other than the treatment, this is typically due to
the presence of a confounding variable. A confounding variable is a variable that can affect our
variable of interest (the dependent variable) that has nothing what-so-ever to do with the
treatment. In a perfect experiment, one in which there was no bias or confounding variables, any
change in the scores of the treatment group should be directly attributable to the treatment itself.
In the situation where a confounding variable exists, the change we see in the scores of the
treatment group could be the result of the treatment and/or the confounding variable. Such a
situation would destroy the meaningfulness of results from our experiment, and essentially waste
our time and money.

For example, suppose we will be using 30 people in the next experiment that we are going to
conduct. The treatment in this study is a new meditation program designed to facilitate short-
term memory. Rather than going through the random process of assignment, we decide instead to
place the first 15 people coming into the laboratory into the treatment group (this is because it
takes longer to test the treatment subjects and we would like to go home early for a party this
afternoon). The second set of 15 people coming in will be placed in the control group. Have we
introduced a confounding variable? Yes, at least one and probably more. Possibility 1: early
risers versus late risers. Possibility 2: those eager to participate versus those less eager.
Possibility 3: those who have something else to do (hence they want to come in early, get it over,
and get onto something else). All three of these possibilities could have an effect on our response
variable (short-term memory ability), but none of them has anything to do with the treatment.
Thus, our results would be confounded by their presence.

A frequently encountered reaction of many people to being in an experiment is the desire to
please the experimenter by providing responses that the subject feels the experimenter wants or
would like. This of course would introduce a form of bias, typically labeled respondent bias. To
eliminate or minimize this form of bias, several actions can be taken. This bias is called the
Hawthorne Effect, which was first reported in a study of factory workers in the Western Electric
Company plant in Hawthorne Illinois in 1924.

314
The placebo group. A placebo is something that appears like the treatment in every capacity
with the exception of the active agent (treatment). Thus, a placebo group is another form of a
control group. Here is an example of how a placebo can be used.

Placebo Example
Our question of interest is, "does a new vaccine effectively treat the flu?"

Our design will be a multiple-group design. For the next 30 flu sufferers coming into our office,
we will use an excellent random process to assign these sufferers to either the treatment or
control group.

The control group will receive nothing.

The experimental group will receive an injection with the new vaccine.

Do you see any problems?

In this setting it would be easy for the control subjects to realize that they are in the control group
and it would be similarly easy for the treatment subjects to realize that they are in the treatment
group. This would certainly create a problem (see blinding in the next section). An additional
and probably greater problem in this study is that the treatment group differs from the control
group in two ways. First, they have received the vaccine and second they have received an
injection. Thus, the mere fact that they are receiving something (the injection) could be a
confounding variable. How do we make this design better? Through the introduction of a
placebo group. The placebo group will receive an injection of saline that appears in every way
identical to the vaccine. In this design, it is possible to measure the effect due to the injection
(compare the control group to the placebo group) and the effect due to the treatment (compare
the placebo group to the treatment group). This latter comparison permits us to separate the
treatment effect from the treatment effect confounded with the injection effect.

Blinding is another method to assist in the removal of bias. There are two forms of blinding,
single blind and double blind. In single blind experiments, the subjects in the experiment are
unaware of whether they are receiving the treatment or the control. Obviously, all experiments
should exercise this stipulation as a minimum (as seen above). In a double blind experiment, the
experimenter who interacts with the subjects is also blind to who is receiving the treatment and
the control. We can expand on the placebo example above to illustrate this condition. You, as the
senior experimenter, might come in early to fill up a number of syringes, some will contain the
treatment and the others will contain the placebo. You of course know which ones are which. I as
the junior experimenter (who will be giving the injections) come in later and give the injections
to the subjects as you direct. Since the placebo appears in every way like the treatment, then I
will be completely unaware of who is receiving what. Hence, I will not be able to give any
direct, non-direct, or non-verbal clues (bias) to the subjects.

315
Manipulation of the Explanatory Variable
The scary word in this section is manipulation. Do you want to submit yourself into the hands of
an experimenter and allow her/him to manipulate you? In psychology, several studies have been
conducted post World War II that have called into question the ethics surrounding experiments in
which humans are manipulated. The outgrowth of this concern has been the establishment of a
human subjects board on every academic campus where human research is conducted. In part, it
is the responsibility of this board to scrutinize every study involving humans to be performed
under its jurisdiction and insure the ethical treatment of all subjects. This task is even more
difficult than it might seem. Let‟s pretend that you are a member of the human subjects board
and the following two cases appear before you for judgment.

Case 1 is a study to assist adults with severe emotional problems. The treatment group in a
longitudinal study will receive regression hypnosis (not to be confused with the regression
statistical procedure presented in Unit 10) in an effort to uncover abuse as a child that is
currently being repressed. After the regression session, these adults are thanked for their
participation and released. After a week they will be assessed again to see if their problems have
been reduced. How will you vote on this study?

My Regression Hypnosis Vote

I would emphatically reject this study. There is no protection for the subject for the entire week
after receiving the treatment. In fact, if the treatment works and a particular subject does become
aware of being abused as a child, then they may experience increased emotional problems. For
people who already are experiencing severe emotional problems, this treatment could even push
them to the point of becoming suicidal. Definitely this study is unethical.

Case 2 is a study to test a new HIV drug. This will be a classic multiple-group design containing
one treatment group, which will receive the new drug through an injection and one control
group, which will receive a placebo injection. Since this is a new, still unapproved drug by the
FDA, all of the subjects in this study will be those who are considered beyond all other known
medical treatments. It is evident from the material presented to us that great care has been taken
to see that the subjects will be randomly assigned to the two groups. How will you vote on this
study?

316
My New HIV Drug Vote

This is a tough one. By all appearance the study is certainly ethical and statistically sound.
However, the subjects can all be classified as desperate. If the treatment is in fact a good one, as
we would hope, then how can we justify the deaths of the subjects in the control group who
could have been saved if they had only been in the treatment group? Could we have somehow
place all of the subjects in the treatment group, and still have been ethically and statistically
sound? I will leave you with these questions unanswered.

A final word on manipulation is that some things can not be manipulated. We might have a very
easy question about whether blondes in fact do have more fun. The explanatory variable is hair
color. The response variable is an assessment of how much fun the subject is having. While we
can dye a person‟s hair, we can not really change their actual hair color. If our question deals
with apparent hair color, then yes we can manipulate the explanatory variable. However, if our
question deals with actual hair color, then this question can not be submitted to an experimental
study. Since actual hair color can not be changed, then to answer the question we would have to
conduct an observational study.

From the two discussions above we can see that an experiment is a great thing, if it can be done.
Unfortunately, many of the most interesting and pressing problems on society are not
predisposed to being controlled and/or manipulated. Thus, even though we would like to conduct
an experiment, we often have to resort to observational studies instead. Observational studies are
not nearly as powerful as experiments and most commonly only a correlational relationship can
be established (probably not a dependency relationship and most assuredly not a causal
relationship). However, as is often the case, something is better than nothing.

317
The Biggest Public Health Experiment Ever:
The 1954 Field Trial of the Salk Poliomyelitis
Vaccine
This article was written by Paul Meier from the University of Chicago and appeared in Statistics:
A guide to the Unknown (2nd Edition)

The largest and most expensive medical experiment in history was carried out in 1954. Well over
a million young children participated, and the immediate direct costs were over 5 million dollars.
The experiment was carried out to assess the effectiveness, if any, of the Salk vaccine as a
protection against paralysis or death from poliomyelitis. The study was elaborate in many
respects, most prominently in the use of placebo controls (children who were inoculated with
simple salt solution) assigned at random (that is, by a carefully applied chance process that gave
each volunteer an equal probability of getting vaccine or salt solution) and subjected to a double-
blind evaluation (that is, an arrangement under which neither the children nor the physicians who
evaluated their subsequent state of health knew who had been given vaccine and who got the salt
solution).

Why was such elaboration necessary? Did it really result in more or better knowledge than could
have been obtained from much simpler studies? These are the questions on which this discussion
is focused.

Here are some summary statistics for this study.
1,349,145          Children participated in this study

422,743        Children were in the Treatment Group

926,402        Children were in the two “Control” Groups below

201,229         Children were in a Placebo Group

725,173         Children were in a Comparison Group

----------

15 of the children in the “control” groups died during the course of the study

None of the children in the treatment group died during the course of the study

445 of the children in the “control” groups contracted paralytic polio during the course of
the study (relative frequency = .000480)

318
71 of the children in the treatment group contracted paralytic polio during the course of the
study (relative frequency = .000168)

BACKGROUND

This part of the article has been deleted. It is not relevant or interesting from the perspective of
this unit.

EVALUATION OF EFFECTIVENESS

In the early fifties the Advisory Committee convened by the National Foundation for Infantile
Paralysis (NFIP) decided that the killed-virus vaccine developed by Jonas Salk at the University
of Pittsburgh had been shown to be both safe and capable of inducing high levels of the antibody
in children whom it had been tested. This made the vaccine a promising candidate for general
use, but it remained to prove that the vaccine actually would prevent polio in exposed
individuals. It would be unjustified to release such a vaccine for general use without convincing
proof of its effectiveness, so it was determined that a large-scale "field trial" should be
undertaken.

That the trial had to be carried out on a very large scale is clear. For suppose we wanted the trial
to be convincing if indeed the vaccine were 50% effective (for various reasons, 100%
effectiveness could not be expected). Assume that, during the trial, the rate of occurrence of
polio would be about 50 per 100,000 (which was about the average incidence in the United
States during the fifties). With 40,000 in the control group and 40,000 in the vaccinated group,
we would find about 20 control cases and about 10 vaccinated cases, and a difference of this
magnitude could fairly easily be attributed to random variation. It would suggest that the vaccine
might be effective, but it would not be persuasive. With 100,000 in each group, the expected
numbers of polio cases would be 50 and 25, and such a result would be persuasive. In practice, a
much larger study was clearly required, because it was important to get definitive results as soon
as possible, and if there were relatively few cases of polio in the test area, the expected number
of cases might be well under 40. It seemed likely, for reasons we shall discuss later, that paralytic
polio, rather than all polio, would be a better criterion of disease, and only about half the
diagnosed cases are classified "paralytic." Thus the relatively low incidence of the disease, and
its great variability from place to place and time to time, required that the trial involve a huge
number of subjects - as it turned out, over a million.

THE VITAL STATISTICS APPROACH

Many modern therapies and vaccines, including some of the most effective ones, such as
smallpox vaccine, were introduced because preliminary studies suggested their value. Large-
scale use subsequently provided clear evidence of efficacy. A natural and simple approach to the
evaluation of the Salk vaccine would have been to distribute it as widely as possible, through the
schools, to see whether the rate of reported polio was appreciably less than usual during the
subsequent season. Alternatively, distribution might be limited to one or a few areas because
limitations of supply would preclude effective coverage of the entire country. There is even a

319
fairly good chance that were one to try out an effective vaccine against the common cold or
against measles, convincing evidence might be obtained in this way.

In the case of polio--and, indeed, in most cases--so simple an approach would almost surely fail
to produce clear cut evidence. First, and foremost, we must consider how much polio incidence
varies from season to season, even without any attempts to modify it. From Figure 1, which
shows the annual reported incidence from 1930 through 1955, we see that had a trial been
conducted in this way in 1931, the drop in incidence from 1931 to 1932 would have been
strongly suggestive of a highly effective vaccine because the incidence dropped to less than a
third of its previous level. Similar misinterpretations would have been made in 1935, 1937, and
other years--most recently in 1952. (On the general problem of drawing inferences from such
time series data see the essay by Campbell.) One might suppose that such mistakes could be
avoided by using the vaccine in one area, say, New York State, and comparing the rate of
incidence there with that of an unvaccinated area, say, Illinois. Unfortunately, an epidemic of
polio might well occur in Chicago--as it did in 1956--during a season in which New York had a
very low incidence.

Another problem, more subtle, but equally burdensome, relates to the vagaries of diagnosis and
reporting. There is no difficulty, of course, in diagnosing the classic respirator case of polio, but
the overwhelming majority of cases are less clearcut. Fever and weakness are common
symptoms of many illnesses, including polio, and the distinction between weakness and slight
transitory paralysis will be made differently by different observers. Thus the decision to diagnose
a case as non-paralytic polio instead of some other disease may well be influenced by the
physician's general knowledge or feeling about how widespread polio is in his community at the
time.

These difficulties can be mitigated to some extent by setting down very precise criteria for
diagnosis, but it is virtually impossible to obviate them completely when, as would be the case
after the widespread introduction of a new vaccine, there is a marked shift in what the physician
expects to find. This is most especially true when the initial diagnosis must be made by family
physicians who cannot easily be indoctrinated in the use of a special set of criteria, as is the case
with polio. Later evaluation by specialists cannot, of course, bring into the picture those cases
originally diagnosed as something other than polio.

THE OBSERVED CONTROL APPROACH

The difficulties of the vital statistics approach were recognized by all concerned, and the initial
study plan, although not judged entirely satisfactory, got around many of the problems by
introducing a control group similar in characteristics to the vaccinated group. More specifically,
the idea was to offer vaccination to all children in the second grade of participating schools and
to follow the polio experience not only in these children, but in the first- and third-grade children
as well. Thus the vaccinated second-graders would constitute the treated group, and the first- and
third-graders would constitute the control group. This plan follows what we call the observed
control approach.

320
It is clear that this plan avoids many of the difficulties that we listed above. The three grades all
would be drawn from the same geographic location so that an epidemic affecting the second
grade in a given school would certainly affect the first and third grades as well. Of course, all
subjects would be observed concurrently in time. The grades, naturally, would be different ages,
and polio incidence does vary with age. Not much variation from grade to grade was expected,
however, so it seemed reasonable to assume that the average of first and third grades would
provide a good control for the second grade.

Despite the relative attractiveness of this plan and its acceptance by the NFIP advisory
committee, serious objections were raised by certain health departments that were expected to
participate. In their judgment, the results of such a study were likely to be insufficiently
convincing for two important reasons. One is the uncertainty in the diagnostic process mentioned
earlier and its liability to influence by the physician's expectations, and the other is the selective
effect of using volunteers.

Under the proposed study design, physicians in the study areas would have been aware of the
fact that only second-graders were offered vaccine, and in making a diagnosis for any such child,
they would naturally and properly have inquired whether he had or had not been vaccinated. Any
tendency to decide a difficult diagnosis in favor of non-polio when the child was known to have
been vaccinated would have resulted in a spurious piece of evidence favoring the vaccine.
Whether or not such an effect was really operating would haye been almost impossible to judge
with assurance, and the results, if favorable, would have been forever clouded by uncertainty.

A less conjectural difficulty lies in the difference between those families who volunteer their
children for participation in such a trial and those who do not. Not at all surprisingly, it was later
found that those who do volunteer tend to be better educated and, generally, more well-to-do
than are those who do not participate. There was also evidence that those who agree to
participate tend to be absent from school with a noticeably higher frequency than others. The
direction of effect of such selection on the incidence of diagnosed polio is by no means clear
before the fact, and this important difference between the treated group and the control group
also would have clouded the interpretation of the results.

RANDOMIZATION AND THE PLACEBO CONTROL APPROACH

The position of critics of the NFIP plan was that the issue of vaccine effectiveness was far too
important to be studied in a manner which would leave uncertainties in the minds of reasonable
observers. No doubt, if the vaccine should appear to have fairly high effectiveness, most public
health officials and the general public would accept it, despite the reservations. If, however, the
observed control scheme were used, a number of qualified public health scientists would have
remained unconvinced, and the value of the vaccine would be uncertain. Therefore, the critics
proposed that the study be run as a scientific experiment with the use of appropriate randomizing
procedures to assign subjects to treatment or to control and with a maximum effort to eliminate
observer bias. This plan follows what we call the placebo control approach.

The chief objection to this plan was that parents of school children could not reasonably be
expected to permit their children to participate in an experiment in which they might be getting

321
only an ineffective salt solution instead of a probably helpful vaccine. It was argued further that
the injection of placebo might not be ethically sound, since a placebo injection carries a small
risk, especially if the child unknowingly is already infected with polio.

The proponents of the placebo control approach maintained that, if properly approached, parents
would consent to their children's participation in such an experiment, and they judged that
because the injections would not be given during the polio season, the risk associated with the
placebo injection itself was vanishingly small. Certain health departments took a firm stand: they
would participate in the trial only if it were such a well-designed experiment. The consequence
was that in approximately half the areas, the randomized placebo control method was used, and
in the remaining areas, the alternating-grade observed control method was used.

A major effort was put forth to eliminate any possibility of the placebo control results being
contaminated by subtle observer biases. The only firm way to accomplish this was to insure that
neither the subject, nor his parents) nor the diagnostic personnel could know which children had
gotten the vaccine until all diagnostic decisions had been made. The method for achieving this
result was to prepare placebo material that looked just like the vaccine, but was without any
antigenic activity, so that the controls might be inoculated and otherwise treated in just the same
fashion as were the vaccinated.

Each vial of injection fluid was identified only by a code number so that no one involved in the
vaccination or the diagnostic evaluation process could know which children had gotten the
vaccine. Because no one knew, no one could be influenced to diagnose differently for vaccinated
cases and for controls. An experiment in which both the subject getting the treatment and the
diagnosticians who will evaluate the outcome are kept in ignorance of the treatment given each
individual is called a double-blind experiment. Experience in clinical research has shown the
double-blind experiment to be the only satisfactory way to avoid potentially serious observer
bias when the final evaluation is in part a matter of judgment.

For most of us, it is something of a shock to be told that competent and dedicated physicians
must be kept in ignorance lest their judgments be colored by knowledge of treatment status. We
should keep in mind that it is not deliberate distortion of findings by the physician which concern
the medical experimenter. It is rather the extreme difficulty in many cases of making an
uncertain decision which, experience has shown, leads the best of investigators to be subtly
influenced by information of this kind. For example, in the study of drugs used to relieve
postoperative pain, it has been found that it is quite impossible to get: an unbiased judgment of
the quality of pain relief, even from highly qualified investigators, unless the judge is kept in
ignorance of which patients were given which drugs.

The second major feature of the experimental method was the assignment of subjects to
treatments by a careful randomization procedure. As we observed earlier, the chance of coming
down with a diagnosed case of polio varies with a great many factors including age,
socioeconomic status, and the like. If we were to make a deliberate effort to match up the
treatment and control groups as closely as possible, we should have to take care to balance these
and many other factors, and, even so, we might miss some important ones. Therefore, perhaps
surprisingly, we leave the balancing to a carefully applied equivalent of coin tossing: we arrange

322
that each individual has an equal chance of getting vaccine or placebo, but we eliminate our own
judgment entirely from the individual decision and leave the matter to chance.

The gain from doing this is twofold. First, a chance mechanism usually will do a good job of
evening out all the variables - those we didn't recognize in advance, as well as those we did
recognize. Second, if we use a chance mechanism in assigning treatments, we may be confident
about the use of the theory of chance, that is to say, probability theory, to judge the results. We
can then calculate the probability that so large a difference as that observed could reasonably be
due solely to the way in which subjects were assigned to treatments, or whether, on the contrary,
it is really an effect due to a true difference in treatments.

To be sure, there are situations in which a skilled experimenter can balance the groups more
effectively than a random-selection procedure, typically would. When some factors may have a
large effect on the outcome of an experiment, it may be desirable, or even necessary, to use a
more complex experimental design that takes account of these factors. However, if we intend to
use probability theory to guide us in our judgment about the results, we can be confident about
the accuracy of our conclusions only if we have used randomization at some appropriate level in
the experimental design.

The final determinations of diagnosed polio proceeded along the following lines. First, all cases
of polio like illness reported by local physicians were subjected to special examination, and a
report of history, symptoms, and laboratory findings was made. A special diagnostic group then
evaluated each case and classified it as non-polio, doubtful polio, or definite polio. The last
group was subdivided into non-paralytic, paralytic, and fatal polio. Only after this process was
complete was the code broken and identification made for each case as to whether vaccine or

RESULTS OF THE TRIAL

The main results are shown in Table 1 [I have deleted this table, it is not necessary to see it for
the purpose of this discussion], which shows the size of the study populations, the number of
cases classified as polio, and the disease rates, that is, the number of cases per 100,000
population. For example, the second line shows that in the placebo control area there were 428
reported cases of which 358 were confirmed as polio, and among these, 270 were classified as
paralytic (including 4 that were fatal). The third and fourth rows show corresponding entries for
those who were vaccinated and those who received placebo, respectively. Beside each of these
numbers is the corresponding rate. Using the simplest measure--all reported cases--the rate in the
vaccinated group is seen to be half that in the control group (compare the boxed rates in Table 1)
for the placebo control areas. This difference is greater than could reasonably be ascribed to
chance, according to the appropriate probability calculation. The apparent effectiveness of the
vaccine is more marked as we move from reported cases to paralytic cases to fatal cases, but the
numbers are small and it would be unwise to make too much of the apparent, very high
effectiveness in protecting against fatal cases. The main, point is that the vaccine was a success;
it demonstrated sufficient effectiveness in preventing serious polio to warrant its introduction as
a standard public health procedure.

323
Not surprisingly, the observed control area provided results that were, in general, consistent with
those found in the placebo control area. The volunteer effect discussed earlier, however, is
clearly evident (note that the rates for those not inoculated differ from the rates for controls in
both areas). Were the observed control information alone available, considerable doubt would
have remained about the proper interpretation of the results.

Although there had been wide differences of opinion about the necessity or desirability of the
placebo control design before, there was great satisfaction with the method after the event. The
difference between the two groups, although substantial and definite, was not so large as to
preclude doubts had there been no placebo controls. Indeed, there were many surprises in the
more detailed data. It was known, for example, that some lots of vaccine had greater antigenic
power than did others, and it might be supposed that they should have shown a greater protective
effect. This was not-the case; lots judged inferior in antigenic potency did just as well as those
judged superior. Another surprise was the rather high frequency with which apparently typical
cases of paralytic polio were not confirmed by laboratory test. Nonetheless, there were no
surprises of a character to cast serious doubt on the main conclusion. The favorable reaction of
those most expert in research on polio was expressed soon after the results were reported. By
carrying out this kind of study before introducing the vaccine, it was noted, we now have facts
about Salk vaccine that we still lack about typhoid vaccine, after 50 years of use, and about
tuberculosis vaccine, after 30 years of use.

Summary statistics for this study (copied from above)
1,349,145           Children participated in this study

422,743         Children were in the Treatment Group

926,402         Children were in the two “Control” Groups below

201,229         Children were in a Placebo Group

725,173         Children were in a Comparison Group

----------

15 of the children in the “control” groups died during the course of the study

None of the children in the treatment group died during the course of the study

445 of the children in the “control” groups contracted paralytic polio during the course of
the study (relative frequency = .000480)

71 of the children in the treatment group contracted paralytic polio during the course of the
study (relative frequency = .000168)

-------

324
The null hypothesis in this study is that the Salk vaccine does not successfully prevent paralytic
polio.

The alternative hypothesis in this study is that the Salk vaccine does successfully prevent
paralytic polio.

The major question of the study comes down to the following, “is the relative frequency for
contracting paralytic polio in the treatment group (.000168) lower than the relative frequency for
contracting paralytic polio in the “control” groups (.000468)?”

Statistics is the only way to answer this question.

325
Unit 23: Probability and Some Examples
Probability is one of those concepts that we use in our normal everyday language and therefore
everyone should be used to seeing it and using it. The American College Dictionary defines
probability:

Probability is the likelihood or chance of something.

The statistical use of the word is very similar.

Probability is the likelihood that an event will occur.

There are a couple of key components of this definition. First, an event is some definable
situation. For instance, if I were trying to predict tomorrow's weather, then an event could be
defined as one of the possible outcomes. Possible events could be rain, snow, sunny, partly
cloudy, etc. This use of the word event corresponds with our previous use of the concept of a
level of a variable. The second component in the above definition is the notion of the future.
Probability does not deal with events that have occurred or are occurring, but with those events
yet to come. Statistics seems to have loaded its language with p words that deal with its bigger,
future context, such as prediction, population, parameter, and now probability. In the weather
illustration, will we ever know the true probability that it will rain tomorrow? No. But as we saw
in Unit 3, we can estimate what we expect to be true through the examination of appropriate
statistics obtained from representative samples. Is there something similar we can do to estimate
probabilities? Yes.

In statistics, we use relative frequencies to estimate probabilities.

326
Example 1
An experiment will be conducted in which we are going to flip a coin ten times. After each flip I
am going to record whether the result was a head (H) or a tail (T). I would like to use the
information that I will obtain from this experiment to answer two questions.

Question 1. What is the probability that the flip of this coin will result in a head? In this
question, the event that I am interested in is that a head will occur as the result of flipping a coin.

Question 2. From the results of our experiment, do we have evidence that this coin is "fair?"

Recall that in the beginning of this course I indicated that statistics could be used to make
observations, decisions, and predictions. We will use all three in this example. I have just flipped
a quarter taken from my pocket 10 times. Here are the results of this experiment.

HHTHTHHTHT

An enumeration (above) is one way in which we can present the results of our study
(observations) and we have seen in Unit 7 how such results can be graphically summarized. For
this example however a tabular summary is probably most useful. Recall from Unit 3 that there
are 5 basic elements in statistics. The first three were measurement (what to observe; a coin is
flipped and a head or a tail is recorded), sampling (record the observation from each of the 10
flips; at this point the sample of size 10 can be enumerated), and summarization (the table
immediately below).

x_____ f__________ rf

H…… 6……. 6/10 = .6

T……. 4……. 4/10 = .4

Total.. 10………….. 1.0

This table very quickly communicates to us several important pieces of information. First, we
know that heads occurred 6 times (frequency). Second, we know that the proportion of our flips
that were heads was .60 (relative frequency). This is the 4th element; the pattern.

Some definitions.

   Frequency is the number of times that our event has occurred.
   Relative frequency is the proportion of times that our event has occurred.
   Probability is the proportion of times that our event is expected to occur.

From these definitions, it is easy to see that relative frequency and probability are exactly the
same concept separated only in time. Relative frequency being in the past and probability being
in the future. Thus, we can answer our first question above through the use of the relative

327
frequency. Using the claim presented in Unit 3, that the past is the best predictor of the future,
then our guess (estimate) for the probability of a head resulting from any flip of this coin is .6
(the relative frequency obtained from our experiment). This is the prediction part; the 5th
element.

Before answering the second question above, we need to address the issue of "fair." What does
"fair" mean in the context of this question?

"Fair" indicates that all of the individual outcomes of an experiment are equally likely to occur.

In the experiment described at the beginning of this example, how many outcomes (events) are
possible for each flip of the coin? Two; with one of the outcomes being a head and the other
outcome being a tail. Thus, "fair" means that the probability of the head should equal the
probability of the tail. The calculation of the "fair" probabilities of events is quite simple. [We
usually use the two letters "Pr" to represent probability.]

Pr (event) = 1 / # of possible events

In the case of this example, there are only two possible events (head, tail). Thus if the coin is
"fair," then the Pr(head) should equal the Pr(tail), which should be 1/(number of outcomes) = ½.

Before going any further, here are some basic rules of probability (there are more, but these are
some of the simpler ones). In mathematical probability based systems such as those in this unit,
these rules must be obeyed.

1. A probability can be zero, but it can not be less than zero.

2. A probability can be one, but it can not be greater than one.

3. The probability of any event occurring must be greater than or equal to zero and less than or
equal to 1. [This follows from 1 and 2]

4. The sum of the probabilities of all possible events must equal 1.

5. The sum of the probabilities of an event occurring and the same event not occurring must be 1.
[This follows from 4]

As a note: the 5 rules above also apply to relative frequencies as well.

From the results of our experiment, do we have evidence that this coin is "fair?"

What do we know so far?

328
1. If the coin is "fair," then the Pr (head) = Pr (tail) = ½ . Note that these probabilities satisfy
rules 1, 2, 3, and 4.

2. Our estimate of the Pr (head) is .6 (the relative frequency for a head from the 10 repetitions of
our experiment). Since the sum of the relative frequencies of all events must equal 1.00 (Rule 5
above), then we have also estimated the Pr (tail) = 1 - .6 = .4.

By comparing 1 to 2, we see that our estimate of the probability of a head (.6) is not the same as
we expected it should be (.5). Does this indicate that our coin is not "fair?" This is the first
illustration in this course of one of the most important principles in all of statistics;
statistical decisions often are made on the basis of comparing what we observed to what we
expected. The logical here being that if what we predicted is very close to what we
observed, then our predictions must be good. On the other hand, if what we predicted is
very far removed from what we observed, then our predictions must be bad. This basic
principle is essentially the difference between what we observed (O) and what we expected (E)
and can be symbolically represented by O – E. This is a principle worth remembering.

Now let‟s return to the question at hand, does the difference we see between 1 and 2 above
provide evidence for our prediction (coin is fair) being good or bad? The answer to this simple
question is hardly simple. The manner in which even such simple questions are answered,
requires us to call upon all of the complexity, sophistication, and mystery in the field of statistics.
This area of statistics (called Testing or Hypothesis Testing, Unit 18) represents the bulk of the
material in most traditional introductory courses, and nearly all of the pain and struggle
experienced by most students. This area will only be alluded to in this course in this unit and
then more formally presented in the last two. So rather than approaching the answer to our
simple question in a statistically rigorous manner, let‟s consider answering the question from a
loose conceptual perspective. See the digression below.

Digression
If we were to repeat this experiment of flipping a coin 10 times, what would you predict the
outcome of this second experiment to be? Specifically, if the coin is "fair," how many heads
would you expect out of the 10 flips? The simplest and most logical answer is five. But if we did
this experiment three times, would you expect to see 5 heads as the result of 10 flips, on all three
occasions? Probably not. We could see 5 heads out of 10 flips on first occasion, maybe only 4
out of the 10 flips on the second occasion, and maybe 6 out of the 10 flips on the third occasion.

The point here is that in conducting any single experiment, we might not perfectly see the result
that we would expect if the coin were fair, but we should expect to see something close.

Thus, the results of an experiment are typically not a perfect depiction of the truth (probability),
but in fact, a close reflection of it.

This notion represents much of the complexity and mystery of statistics.

329
Said another way, we would not expect our relative frequency to actually equal the true
probability, but we would expect it to be reasonably close. This is the same concept as sampling.
We don‟t know the population parameter, so we estimate it through the determination of the
sample statistic. In this situation, we do not know the true probability, but we estimate it through
the relative frequency from our experiment. And as with sampling, there are a variety of ways in
which we can obtain better and better estimates of the truth. In the spirit of this example, the
primary manner in which estimates are improved is through increasing the number of repetitions
in the experiment (increasing the number of flips). Obviously, this is very close to the notion that
a bigger sample size is better.

Now we can rephrase our second question.

Is the relative frequency of our event (obtained from our experiment) sufficiently close to
what we would have expected the probability of this event to be (if the coin was "fair"),
to consider that the coin is actually "fair?"

Here is an abbreviated form of this question,

Is the rf (event) sufficiently close to the expected Pr (event)?

Now we are almost done. We have known for some time in this example that the rf(head) = .6
and that the expected Pr (head) = .5. Now the only question is, "is .6 sufficiently close to .5?"

From a simple perspective, sufficiently close is assessed through the concept of the margin of
error. The margin of error is most commonly seen in TV polls, newspaper polls, survey results,
and the like. In such polls, typically near or at the bottom of the results, a proportion is given
with the notation plus or minus some amount added. The plus or minus is seen as a plus sign
directly over a minus sign (+). Sometimes, this is referred to as the level of accuracy. You might
see a statement like the following.

In a survey conducted recently, 65% of the people questioned endorsed capital punishment +
3%.

What this statement DOES NOT mean. It does not mean that the probability of a person
endorsing capital punishment is .65. Nor does it indicate that 65% of the people in the population
endorse capital punishment.

What this statement DOES mean. Given the evidence from this study (rf = .65) it would be
reasonable to estimate a person‟s endorsement of capital punishment to be between .62 (.65 -
.03) and .68 (.65 + .03). It does indicate that between 62% and 68% of the population endorse
capital punishment.

It is through the margin of error that we will assess sufficiently close. The margin of error is
calculated from a very simple equation.

Margin of Error = 1 divided by the square root of the sample size

330
In the experiment we are conducting, the sample size is the number of repetitions (10 flips). And
the margin of error is thus

1 divided by the square root of 10 = 1 / 3.16 = .316 (.3 rounded off)

Is our relative frequency sufficiently close to our expected probability? As in the capital
punishment example, we take the relative frequency plus and minus the margin of error. Then we
use the following rule to answer our question in blue above.

If the expected probability is within the two limits (lower limit = relative frequency
minus the margin of error, upper limit = relative frequency plus the margin of error), then
the relative frequency is sufficiently close.

If the expected probability is not within the two limits, then the relative frequency is not
sufficient close.

Now let‟s return to our primary question, which was “does the difference we see between our
observation (rf = .6) and our expectation (pr(head)=.5) provide evidence for our prediction (coin
is fair) being good or bad?

Is .6 sufficiently close to .5? Using the margin of error,

.6 + .3 produces the following --->

lower limit = .6 - .3 = .3

upper limit = .6 + .3 = .9

Our expectation (pr(head)=.5) is considered to be close enough to our observation, if it is
contained in the interval expressed by the margin of error. In this case, is our expectation (.5)
contained in the interval from .3 to .9? Yes it is. Therefore, our conclusion is that the evidence
from our experiment indicates that the coin is most likely "fair."

Obviously, the decision part of statistics is the most complicated of its five aspects. This
illustration was provided for its simplicity (believe it or not) and conceptual correctness, rather
than for its rigor and statistical correctness.

331
Example 2
In this example we have the very same experiment being conducted as in example 1, with the
exception that this experiment produced the following results.

HHHHHHHHHH

In the next unit the notion of personal probability will be developed in detail, but I will introduce
it briefly here for the benefit of this example.

Personal probability is the degree to which a given individual believes that an event will
happen (see Unit 24).

If I were to conduct the above experiment in real time before you, the following thinking might
occur. When I tell you that I have taken a coin out of my pocket to conduct this experiment, you
might expect the coin to be normal and fair. Thus, if I were to ask you, "what is the probability of
obtaining a head on the first flip," you would most likely say .5. This would be quite reasonable.
But based on no information, your guess about the probability of getting a head, would be a
personal probability (personal belief).

I have now just flipped the coin for the first time and I tell you that the result was a head. I now
ask the question, "what is the probability of obtaining a head on the second flip?" Most likely
you would still respond .5.
I have now just flipped the coin for the second time and I tell you that the result was a head. I
now ask the question, "what is the probability of obtaining a head on the third flip?"

This is repeated.

At what point would you use the information coming from the experiment to adjust your
estimate about the probability of getting a head? Most likely by the 10th flip, no one would
answer the question with .5. In fact, most people would probably have come to the conclusion
that I was using a two headed coin. The use of information in this fashion to change our
personal probabilities is called calibrating.

1. Based on the results of this experiment what would you estimate the probability that the flip
of this coin will result in a head?

Relative frequency of getting a head was 10/10 = 1.0

Relative frequency of getting a tail is thus 0.0 (rule 5)

2. From the results of our experiment, do we have evidence that this coin is "fair?"

Margin of error = 1 divided by the square root of 10 = .3

Relative frequency + margin of error = 1.0 + .3

332
Lower limit = 1.0 - .3 = .7

Upper limit = 1.0 + .3 = 1.0 (can‟t have a probability > 1)

Is the expected probability if the coin were fair (.5), within the two limits?

No.

Thus our answer to the second question above is that we have evidence that the coin is not fair. It
is nice to see that the statistics here support the same conclusion that we reached through
intuition.

Example 3
People in gambling casinos go to great lengths to insure randomness as much as they can. What
concept have we been using throughout this unit that most closely relates to randomness? FAIR.
If the dice are fair, if the roulette wheel is fair, if the cards are fair, then the laws of probability
(some of which were presented earlier) should prevail and the casino will win its given
percentage of the time. However, if one of the outcomes should in some manner be more likely
to occur than we would expect and we know what this outcome is, then we should be able to beat
the casino at that particular game.

Aside: In the 19th Century the casinos in Monte Carlo were considered the most fabulous in the
world and probably the most honest and fair. In an effort to be seen as above board and fair, they
published monthly the outcomes of their Roulette wheels. These results would be seen as vast
enumerations very similar to those that I have presented in examples 1 and 2 (for a very much
smaller problem). Although the principles that have been presented in this Unit were not widely
known among the populace, they were known by many mathematicians and statistics of the age.
It wouldn‟t take a lot of skill nor computing power to summarize these monthly reports, to figure
out the confidence associated with each of the 38 outcomes in Roulette using the margin of error,
and when irregularities were found to make use of this information against the casinos. While
there is no reliable account of anyone ever succeeding in this undertaking, it has been rumored
for decades to have happened. Such a speculation has been the theme of several short stories and
movies over the past century. This third example is an illustration of how this “breaking the bank
at Monte Carlo” theme can be put into practice.

333
In this example, I would like to conduct a fairly easy experiment to determine if a die is fair. If
we are going to roll a die, how many outcomes are possible? 6. The possible outcomes are 1, 2,
3, 4, 5, and 6. If the die is fair, then each outcome should have probability 1/6 = .167. I am going
to roll this die 1111 times and record the results. What is the margin of error for this problem?

1 divided by the square root of 1111 = 1 / 33.33 = .03 = m.e.

Rather than showing the enumeration, let‟s go directly to the frequency table. Recall, the lower
limit is rf-m.e. and the upper limit is rf+m.e.

X______ f______ rf____ rf – m.e.__rf + m.e.             expected       Fair or Not
1_____ 178_____ .16_____ .13_____ .19                 .167            Fair
2_____ 211_____ .19_____ .16_____ .22                 .167            Fair
3_____ 200_____ .18_____ .15_____ .21                 .167            Fair
4_____ 233_____ .21_____ .18_____ .24                 .167            Not Fair
5_____ 133_____ .12_____ .09_____ .15                 .167            Not Fair
6_____156 _____ .14_____ .11_____ .17                 .167            Fair
Total_ 1111____ 1.00

Given these results we should be able to Break the Bank. We certainly wouldn‟t want to place a
bet on a 5 coming up, since it occurs less often that we would expect. However, if we constantly
bet on a 4 coming up we should be able to win, since it is coming up more than we would expect
if in fact the die was fair. This is also the basic principle in what is called card counting systems
in Blackjack (also known as 21).

334
Probability Applications – Expected Value
At the beginning of this unit you were introduced to some very basic notions about probability,
and to some not so basic notions about decision making and testing. It should be clear that
probability represents an ideal; a depiction about what could and should happen in the future,
even though in any particular situation we may find some deviance from this ideal. In addition,
we most likely have come to the conclusion that probability is a pretty abstract concept, is
difficult to understand, is even more difficult to calculate, and that we can easily get along
without it. These conclusions are probably all fairly legitimate. However, these are exactly the
conclusions that every casino owner, every insurance company, and many more organizations
would like for you to believe. Why? Because many situations in which probability applies are
not immediately obvious or intuitive, such as those illustrated in the Personal Probability Unit
(Unit 24), and hence if you think about them you might just be able to realize how you are being
manipulated.

In this unit we will look at a couple of the more commonly encountered interactions we have
with probability, gambling and insurance. In addition, we will look at a way of determining the
truth behind these situations, thus enabling us, if we choose, to get the upper hand on those who
are trying to get the upper hand on us.

Games of chance are run in nearly every State in the United States. You can even check up on
the most recent lottery results by logging onto the Lottery section of the USA Today web site.
(http:///www.USAToday.com)

The payoffs in lotteries and casino games are such that the lottery organization and casinos will
make money if the games are fair, since the payoffs are determined by the probabilities of
occurrence being fair. Above we saw how to assess whether the probabilities are fair for some
very highly defined situations by using the margin of error. In this unit we will assume that the
probabilities are known and can be determined. For all games of chance the probabilities are well
known, although most are quite complex to calculate, but there are many books available that
present them. In the insurance area the probabilities have been and continue to be determined
every day by actuaries through the use of relative frequencies. This information is known and
most insurance representatives can produce or obtain the actuarial estimates (relative
frequencies) for literally any situation that you can think of. You just have to ask. Having
obtained the probabilities we can move to the step of getting the upper hand through the use of
expected value.

Before getting into the concept of expected value I will use the following example to present
some preliminary concepts and terms.

Example – An insurance company is selling a health policy that covers three things for children
under the age of 15. These are broken bones, heart problems, and diabetes. To get the policy the
child must currently be 5 years old and can not have any of these problems at the time of
purchase. The following is made up, but for illustration let‟s say that we know from our
company‟s actuaries that over the 10 years of the policy that

335
Problem                       Probability          Total Cost to the Insurance Company
Broken bone                      .7                          \$375
Heart problem                   .01                         \$10,000
Diabetes                       .001                        \$15,000

What is the anticipated cost to the company for every child it insures? We will never know what
the actual cost will be for any child (the future is never known), but if we know the probabilities
and can insure enough children, then even though we will make mistakes in predicting any single
child we should be fairly accurate at predicting what will happen to a lot of children.
Anticipated cost = sum of the probabilities times the total cost
= (.7)(\$375) + (.01)(\$10,000) + (.001)(\$15,000) = \$262.50 + \$100 + \$15 = \$377.50
Notice that the majority of the anticipated cost to the insurance company comes from broken
bones and NOT from heart problems or diabetes. Why? Because heart problems and diabetes are
much, much less likely to occur even though they are much more costly when they do.

Can this insurance company make money by selling this policy? Yes. All they have to do is to
charge the parents more than \$377.50 for the policy. On a monthly payment schedule this would
only be \$3.15 a month. Thus, if the company charges more than \$377.50 for the policy it will
make money and if it charges less than \$377.50 it will lose money. Here then is the importance
of the actuaries. In order to set the price we need to know the outcomes of interest (broken
bones, heart problems, and diabetes), we need to know the probability of these outcomes
occurring (usually obtained by looking at relative frequencies over the past many years), and
then determine the medical costs associated with each of these outcomes. In a nutshell we once
again have the 5 elements of statistics as presented several times throughout this course.

Expected Value
Some basic terminology.

In the problem or situation that we will be examining there are “k” possible outcomes. These
are outcomes 1, 2, 3, 4, …, k. [The outcomes of interest.] These statements in blue refer back to
the key points in the previous example.

The payoff for outcome 1, how much we receive when outcome 1 occurs, is A1. This would be
how much we would receive from the insurance company if our child broke one of his or her
bones from our earlier example.

The payoff for outcome 2, how much we receive when outcome 2 occurs, is A2

Etc.

In general the payoff for outcome i is Ai

The i is a generic indicator of any one of the possible outcomes from 1 to k.

The probability that outcome 1 will occur is p1.

336
The probability that outcome 2 will occur is p2.

Etc.

In general the probability that outcome i will occur I pi. [The probabilities of these outcomes.]

The cost of the situation, how much we paid to participate in this situation, is defined as “c.” In
our earlier example this would be the cost to us for the insurance policy. [The costs associated
with each of these outcomes.] In this case the costs for each outcome would be the same.

A gain occurs when Ai – c > 0. We receive more than we paid.

A loss occurs when Ai – c < 0. We receive less than we paid.

Now for the definition. Expected value is the long term gain or loss associated with any
particular situation or problem that we would like to examine. Said another way it is the sum of
all possible net gains and losses, for each outcome, times their respective probabilities.

Here the equation might actually be easier to read than the English.

Expected Value = EV = sum of all [ (Ai – c ) ( pi ) ]

337
Expected Value – Example 1
This first example comes from one of the California Lottery games.

In this lottery game, the object is to select 3 numbers from a possible set of 40 and to match as
many of the 10 selected by the computer as possible. The cost of playing this lottery game is \$1.
With the selection of 3 numbers, there are 4 distinct outcomes; match all three, match two of the
three, match one of the three, and match 0 of the 3. I have collapsed outcomes 3 and 4 together to
form a new third outcome which is to match 0 or 1 of the numbers. I have placed them together
since they both have a payoff of \$0. The outcomes, payoffs (Ai), and probabilities (pi) are shown
in the table below.

Outcome_______Outcome Description_____ Payoff___ Probability
1____________ match all 3 numbers________ \$20______ .012
2___________ match 2 of the 3 numbers_____ \$2_______ .137
3_____________ 0 or 1 matches___________ \$0_______ .851

What is the expected value of this game?

EV = sum of all [ (Ai – c ) ( pi ) ]

= (A1 – c ) ( p1 ) + (A2 – c ) ( p2 ) + (A3 – c ) ( p3 )

= (\$20 - \$1) (.012) + (\$2 - \$1) (.137) + (\$0 - \$1) (.851)

= (\$19) (.012) + (\$1) (.137) + (-\$1) (.851)

= \$.228 + \$.137 - \$.851

= -\$.486

What does an EV = -\$.486 mean? A minus EV means that you can expect to lose the indicated
amount, on average, for every time you choose to play the game. For this game, you can expect
to lose approximately 49 cents each time you play the game. Thus, if you played it 100 times,
you could expect to lose a total of \$48.60.

338
Expected Value – Example 2
going to pick 6 numbers from 42 numbers that were possible. The object of the game was to pick
6 numbers and match as many of the computer‟s numbers as possible. The cost of the game was
\$1. At the outset of a new game the prize money (payoffs) was as follows.

Outcome_____ OutcomeDescription_______ Payoff_____ Probability
1___________ match all 6 numbers_____ \$1,500,000__ .000000191
2_________ match 5 of the 6 numbers____ \$100,000___ .0000412
3_________ match 4 of the 6 numbers____ \$375,000____ .0018
4_________ match 3 of the 6 numbers____ \$500,000____ .027
5_________ match 0, 1, or 2 of the 6________ \$0______ .97116

The calculation of the expected value of this game is considerably more complex than the game
in example 1. At first glance, the table above should look rather odd. It would seem that the
payoff for matching 3 of the 6 numbers is higher than the payoff for matching 5 of the 6.
However, the payoff as listed is not actually correct. What is being indicated is that all the people
who match 3 of the 6 numbers will together split the \$500,000. Hence, in order to calculate the
EV for this game we will have to determine what the payoff will be per person. To determine
that we will first have to determine how many people are expected to win at each outcome.

From past experience, it is known that for a new Lotto game, with the payoffs above that about
5,084,746 individual bets will be made. What is the probability of matching all 6 numbers
(.000000191) mean? First, it means that you have almost quite literally no chance of winning.
Second, in a possibly more meaningful sense, it means that you have 1 chance in 5,245,786 of
winning. With approximately 5,084,746 bets being made per game and with a 1 in 5,245,786
chance of winning, then about one and only one bet made during this new game should be
expected to match all 6 numbers. Hence, with one winner, that person should be expected to win
all of the \$1,500,000. These corrected payoffs will be collected after this discussion in the next
table.

How many people would be expected to match 5 of the 6 numbers? For ease of calculation, we
will round off the number of bets being made to 5,000,000. With 5,000,000 bets being made and
a probability of matching 5 of 6 numbers being .0000412, then we would expect 5,000,000 times
.0000412 or 206 winners (round off to 200). These 200 winners would share the \$100,000
payoff; hence each winner‟s payoff would be \$100,000/200 or \$500.

With 5,000,000 bets being made and a probability of matching 4 of 6 numbers being .0018, then
we would expect 5,000,000 times .0018 or 9,000 winners. These 9,000 winners would share the
\$375,000 payoff; hence each winner‟s payoff would be \$375,000/9,000 or \$42.

With 5,000,000 bets being made and a probability of matching 3 of 6 numbers being .027, then
we would expect 5,000,000 times .027 or 135,000 winners. These 135,000 winners would share
the \$500,000 payoff; hence each winner‟s payoff would be \$500,000/135,000 or \$4.

339
Corrected Payoff Table

Outcome____ Outcome Description_____ Payoff_____ Probability
1__________ match all 6 numbers____ \$1,500,000__ .000000191
2________ match 5 of the 6 numbers____ \$500_____ .0000412
3________ match 4 of the 6 numbers_____ \$42_____ .0018
4________ match 3 of the 6 numbers______ \$4_____ .027
5________ match 0, 1, or 2 of the 6_______ \$0_____ .97116

Notice how this table isn‟t nearly as attractive to potential game players as the one presented at
the beginning of this example. But at least now we can calculate our expected value of playing
this game.

EV = sum of all [ (Ai – c ) ( pi ) ]

= (A1 – c) (p1) + (A2 – c) (p2) + (A3 – c) (p3) + (A4 – c) (p4) + (A5 – c) (p5)

= (1,500,000-1)(.000000191) + (500-1)(.0000412) + (42-1)(.0018) + (4-1)(.027) + (0-1)(.97116)

= \$.2865 + \$.0206 + \$.0738 + \$.0810 - \$.97116

= -\$.5093

Which of course means that you can expect to lose about 51 cents for every dollar you bet. This
EV is very common for most lottery games.

340
Expected Value – Example 3
Another area, very much related to gambling, in which expected values are fundamental is
insurance. All insurance plans for health, life, dental, house, etc. set their costs and payoffs in
accordance with ascertained probabilities for the occurrence of specific events. Of course no one
knows the true probabilities of getting lung cancer or of your house burning down. But as we
have seen earlier in this unit these true probabilities can be estimated by appropriately measuring
what has gone on in the past through the use of relative frequencies. In insurance, people called
actuaries conduct this activity.

The figures used in this example are made up. They are being used for illustration only and
should not be taken as a precise depiction of any insurance plan in particular.

An insurance agent calls your home and would like to talk with you about adding flood insurance
to your household policy. Since floods in surrounding areas have been on the news lately (this is
the use of the availability heuristic from the previous unit), you decide to have this person over to
hear what they have to say. Cutting through the pitch, the bottom line is the following.

The extra flood insurance will cost you \$100/year, only \$8.33 a month. If you experience flood
damage, this insurance policy will pay the damages up to \$25,000. This agent indicates that
recent claims made for flood damage have not come even close to this \$25,000 limit. Do you
want the flood insurance?

What do you need to know in order to make an informed decision?

It would be nice to calculate the EV of this policy and see the policy in its truer light. What do
you need to know in order to determine the EV of this policy?

1. What is the probability of a flood in your area?_______ p1
2. What is the payoff if you experience a flood?________ A1 = Up to \$25,000
3. What is the probability of not having a flood?________ 1-p1
4. What is the payoff if you do not have a flood?_______ A2 = 0
5. What is the cost of the policy?___________________ c = \$100/year

Could your insurance agent (or at least the company being represented) answer all of these
questions? Do they know the EV of the policy? YES.

If you knew the questions to ask, you probably could get the answers. Let‟s fill them in for
illustration.

From past records in your area and similar areas, it has been estimated that the probability of a
flood in your area is .015 over the next 10 years. More than merely knowing the probability, it is
very important to know over what length of time this probability is being used for. This is
because the time component will impact your cost (which will accrue over time). By knowing
this probability we now have the answer to all of our questions.

341
1. .015 over 10 years

2. Since it is up to \$25,000, let‟s use the maximum. This will over-inflate the value of the policy.

3. 1-.015 = .985

4. 0

5. To make this problem relatively simple, let‟s use the maximum as we did in question 2.
\$1,000 (\$100 for each of the 10 years).

EV = (flood payoff – cost) (flood prob.) + (0 – cost) (no flood prob.)

= (\$25,000 - \$1,000) (.015) + (0 - \$1,000) (.985)

= \$360 - \$985

= -\$625

On average, the insurance company is going to make \$625 on every customer it can get to

Last Probability Example

Let's Make a Deal
The following problem comes from the old TV game show called LET'S MAKE A DEAL that
was hosted by Monty Hall. In the problem, a contestant is given the opportunity of selecting one
of three presented doors. Behind one of the doors is a particularly nice prize and behind the other
two doors are worthless prizes. After the contestant selects one of the doors, Monty would
frequently show the contestant what was behind one of the doors not selected. Of course, the
prize behind this door is a loser. After showing the contestant the contents behind this door,
Monty then asks the contestant if s(he) wants to change her(his) selection. This famous problem
has spawned the creation of many WEB sites to examine the nuances behind this situation. Go to
the problem.

Here is another Monty Hall web site. This one talks about the solution in detail.

http://www.wiskit.com/marilyn.gameshow.html

342
Unit 24: Personal Probability
The mathematically regulated systems of probability, such as those presented in the previous
unit, require the development of rules and these rules must be obeyed. For reference here are the
five rules that were presented.

1. A probability can be zero, but it can not be less than zero.

2. A probability can be one, but it can not be greater than one.

3. The probability of any event occurring must be greater than or equal to zero and less than or
equal to 1. [This follows from 1 and 2]

4. The sum of the probabilities of all possible events must equal 1.

5. The sum of the probabilities of an event occurring and the same event not occurring must be 1.
[This follows from 4]

Also from this previous unit we defined probability as the likelihood or chance that something
will occur.

The topic of this unit, personal probability, is a related but distinct notion of probability from that
which we have already seen. Here personal probability is defined as the degree to which a person
believes that something will happen. Notice that this definition is very similar to that already
established with the exception that we have introduced belief. This word removes us from the
world of formalized probability and allows us to enter the world of belief systems. As stated in
the introduction, people are probabilistic creatures; however, we rarely follow the rules of
mathematical probability and thus a new system of rules must be developed. These rules are
unfortunately subjective and often inconsistent, but much more fun and interesting. While it is
relatively easy to live outside the box of mathematical probability, it is impossible to live outside
the box of personal probability.

343
The “Rules” of Personal Probability
There are at least 9 rules that can be placed under this heading. Most, if not all, of them were
developed in the context of psychology and are presented in several resources, including Seeing
Through Statistics by Utts that I have referred to several times in this course. Here are the 9 that I
know of.

Comparative “Rules”
Certainty Effect

Pseudocertainty Effect

Logical Intransitivities

Individual “Rules”
Availabity Heuristic (Tversky & Kaheman, 1982)

Detailed Imagination

Anchoring

Representativeness Heuristic (Tversky & Kaheman, 1982)

Optimism

Reluctance to Change

All of these so-called “rules” are influencers on our beliefs. Since personal probabilities are our
beliefs in the likelihood of something happening, all of these “rules” are methods that are used to
increase or decrease our belief that something will happen. Unfortunately all of these “rules” are
used to manipulate us into believing that something is more or less likely to occur than is truly
the case. Thus as noted in the introduction,

“An informed population will be better able to tell the difference between being informed and
being manipulated.”

“Students should analyze statistical information from the media, interpret results from polls and
surveys, and recognize valid and misleading uses of statistics.”

“The secret language of statistics, so appealing in a fact minded culture, is employed to
sensationalize, inflate, confuse, and oversimplify. The crooks already know these tricks; honest
people must learn them in self-defense.”

344
It should be noted that there many variations on each of the “rules” presented in this unit. I am
not making any effort to present the breadth of these “rules,” but rather I am trying to present at
least one fairly clear illustration of each.

A Closer Look at the Top Nine
The Comparative “Rules”
In comparative rules you, as the target to be manipulated, are typically presented two options.
Our task is the supposedly relatively simply job of comparing the two choices and opting for one
or the other of them.

Certainty Effect

The basic premise of the certainty effect “rule” is that we will tend to overrate the value of an
option that expresses certainty (a guarantee) in comparison to an option that does not contain this
aspect.

Example – Let‟s say you go to your health insurance provider and they give you the choice
between the following two options. This particular example is also known as the Peace of Mind

Plan A – Covers all claims, except well care. The employee pays \$200/month.
Plan B – Covers all claims, including well care. The employee pays \$300/month.

What is the difference between these two plans? There are two. The first is that Plan B will cost
\$100 more per month or \$1,200 more per year. The second is that Plan B covers all claims. This
is the certainty aspect (Peace of Mind).

What do we know from these two plans? Plan B is completely clear and leaves nothing to doubt.
Everything is covered and you know how much you are going to have to spend. Plan A appears
to be fairly clear, but it is probably less clear than you think. First, the way in which you define
well care is probably different than how the insurance carrier defines it. Second, it is impossible
for any family to know in an advance for any year how much well care that will be needed.

Most likely in any given year Plan B will be more costly to a family than Plan A. However, the
Peace of Mind aspect (Certainty Effect) is so strong that most people when faced with these two
options will choose Plan B over Plan A.

345
Pseudocertainty Effect

The basic premise of the pseudocertainty effect “rule” is that we will tend to overrate the value
of an option that expresses certainty (a guarantee) in comparison to an option that does not
contain this aspect, even if this certainty is not really complete (pseudo). Obviously this “rule”
is a special case of the previous rule (only the red portion is different between this rule and the
previous rule).

Example – Let‟s say you go to your health insurance provider and they give you the choice
between the following two options.

Plan A – All claims except well care are covered and the patient is responsible for 8% of any
claim.

Plan B – All claims except well care are covered and the patient is responsible for 10% of any
claim, but well care is covered 100%.

What is the difference between these two plans? There are two. On any claim made during the
year except well care Plan B will cost 2% more. The second is that Plan B covers all well care.
This is the certainty aspect, but it is only pseudo since the certainty doesn‟t cover all claims only
well care.

What do we know from these two plans? If we opt for Plan A we will pay 8% of any claim
except well care and for all well care costs. If we opt for Plan B we will pay 10% of any claim
except well care. Thus, the comparison is to pay more on any “catastrophic” claim in order to get
well care covered or pay less on “catastrophic” claims by being willing to pay for well care.

It is relatively difficult to make a simple comparison of which of these plans is better financially.
To a great degree is depends on whether something “catastrophic” happens or not. If something
“catastrophic” does happen, then Plan A will be much better than Plan B. However, the certainty
component in the well care has a tendency to appear as a better benefit than it really is and thus
we tend to opt for the probably less cost effective Plan B option. In this particular example we
also see a slight contaminant influence of the individual optimism “rule” that will be discussed
later.

346
Logical Intransitivities

Transitive is a mathematical principle which states that if A is preferred to B, and B is preferred
to C, then A is preferred to C. This is sort of like saying that if you like pizza better than
hamburgers, and that you like hamburgers better than tacos, then you should like pizza better
than tacos. A condition of intransitivity exists when A is preferred to B, and B is preferred to C,
and yet A is not preferred to C. This situation should be impossible; however, it can and does
occur.

Example – Let‟s say you go into a car dealership to buy a new car. There is always a car on the
showroom floor that is fully loaded, shiny bright, and too expensive. Let‟s call this car S. If the
Now the stage is set. From this point we usually exit the showroom to look over the cars in the
lot. Most good salespeople will now show you a car on the opposite extreme from the one on the
showroom floor. The stripped down version that has been sitting outside that is dirty from the
weather, doesn‟t have as nice a color, seats, tires, etc. At this point, even though this car on the
lot (call it L1) is less desirable, the cost difference between L1 and S is sufficiently great such
that we would prefer this car on the lot to the car in the showroom. We are next shown a car
which is a step up from the first one we saw on the lot. This car is a little nicer and of course it
costs only a little more money; call it L2. If we were given the choice now between the first car
we were shown on the lot (L1) or this second car (L2), we would probably choose the second
car. It is nicer and it doesn‟t cost all that much more. Next we are shown a car, call it L3, which
is a step up from the previous one (L2) and of course it costs just a little bit more as well. At this
point, we would probably prefer the third car (L3) to the second (L2), since it is nicer and and it
doesn‟t really cost all that much more. This process might continue for a few more comparisons.
In each case the increase is slight or modest (tires, seats, electric windows, …) and the cost
increase is relatively slight or modest. The comparison from L2 to L1 was slight, thus we chose
L2 over L1. The comparison of L3 to L2 was slight, thus we chose L3 over L1. At some point
we finally get back to the showroom car which isn‟t really all that much more than the previous
that we looked at and thus we opt for it. This is the car (S) that we didn‟t prefer at the outset, but
because of the ratcheting up of the choices we now find ourselves preferring that which we
didn‟t. This is the intransitivity. But the progression was logical. This is a very commonly used
selling technique in a great variety of areas.

347
The Individual “Rules”
In the individual rules choices are not presented to us. Typically we are presented with only one
story or situation and we tend to overrate the likelihood of this story or situation being true in
accordance with the degree that the rule is being applied.

Availability Heuristic

This is one of the most powerful weapons that the media has to influence us. The basic premise
of the availability heuristic rule is that the more frequently we are exposed to something the more
likely we are to believe this thing to exist. This is the opposite concept to the story of the boy
who cried wolf.

Example 1 – One of the classic illustrations of this rule is associated with the greatly increased
attention given to the AIDS problem by the media during the 1990s. It was difficult to turn on a
news program during this period without some story directly or indirectly mentioning aids either
in this country or somewhere in the world. As a result of this increased availability (media time)
of the AIDS problem, the general population of the United States started to believe that AIDS
was a far more serious issue facing the country than it actually was. What impact did this have?
As a consequence of this increased attention, we started to pay less attention (money and
research) to problems of greater importance, such as cancer, smoking, etc. than we paid to this
singular issue that in truth was far less problematic.

Example 2 – In this country we are almost always facing an impending election at some level or
another. At the present (when I am writing this material) we are at the outset of a presidential
election. Regardless of the level, one of the principle maxims in politics is that exposure directly
relates to success. This is one of the primary purposes of campaign, which is to increase the
amount of exposure (media time) given to your candidate. This is the basic availability heuristic
in action. The more available my candidate (more visual media time) the more I can inflate the
general population‟s willingness to vote for her or him.

Detailed Imagination

Detailed imagination is another extremely powerful influencer of your beliefs. The basic premise
here is that greater detail in a description translates into greater believability. How this happens is
the following. When describing any event we are trying to establish a mental image. The longer
the description goes on and the greater the amount of detail presented, the more we establish and
give a life to the image that is being created. It almost doesn‟t matter if the story is far fetched or
not, because if the description is detailed enough it can become believable in itself.

348
Example 1 – Headlines on a newspaper are not designed to necessarily be truthful, but to be
sensational. They are the attraction to get a reader to buy a paper or go further into the story. [We
will see that this is an example of the next “rule” called anchoring.] Once we are into the story
the more detailed the story the more believable it becomes. This is one of the primary reasons
that news shows now have in-depth segments, that there are night time news soap operas (60
minutes, 20/20), and the push for investigative reporting.

Example 2 – If you are going to lie, but sure to go all the way

The devil is in the details

Lie 1 – I was late to the meeting because I picked up a hitch hiker.

Lie 2 – I was late to the meeting because I picked up a hitch hiker whose car had just run out of
gas, we had to go to a gas station for a can and the gas, and then I had to take him back out to his
car.

Lie 3 – I was late to the meeting because I picked up an elderly man who had just run out of gas
who was a mile from the Conoco Station at 30th and Grand; there I had to help him get a can for
the gas, pay for his gas, and help him carry the gas because it was too heavy for him; then I had
to drive him back to his car where I had to put the gas into the tank for him; and finally I
followed him back into town to make sure that he was OK.

Greater detail typically creates a better mental image which once implanted tends to become
more believable.

Example 3 – One of the most common applications of this manipulation principle is found in the
selling of extended warranties. This could be called the Peace of Mind Principle, not to be
confused with the Peace of Mind Paradox presented earlier in this unit. After purchasing some
fairly expensive product, such as a TV, appliance, car, a salesperson will often ask if you would
like to purchase an extended warranty. At this point the pitch is made. A particular costly
problem is explained to you in detail (detailed imagination) such that if it occurred would cost a
lot of money, even more than the price of the extended warranty. What is not mentioned, and
probably not known by the salesperson, is how likely this problem is to occur. The field of
actuarial science helps to determine the probability of these problems occurring for almost every
product in existence, including us. The people who write the extended warranty insurance
policies are very aware of the probability of each and every conceivable problem and how much
each would cost to repair. They aren‟t selling you the extended coverage to help you; they are
selling the extended coverage to make money.

Anchoring

It is hard to wander far from a good reference point. The principle here is much like how an
anchor is used for a boat. If you come into a harbor and you want to stay awhile, you can simply
drop anchor. You can now leave your boat, stay awhile on shore, and when you return your boat

349
should be almost exactly in the same spot as when you left. This is kind of like a cowboy tying
up his horse to a rail when he came into town.

Example 1 – Headlines in a newspaper. The headline not only draws your attention to a specific
article to read or newspaper to buy, but gives you some idea about what to expect (anchor) in the
article itself. The National Enquirer is a great example of the use of anchors.

Example 2 – Almost always when there is a debate between two or more candidates on
television, the debate is followed by “professional” commentary. What is the purpose of this
commentary? In part it is provided for the purpose of telling you how to believe. This is the
anchor. When you see and hear the debate much information is presented and it would be very
possible for 10 observers to see and hear 10 totally different things. However once the
professional commentary is presented, what you will find is 10 near carbon copies of this
position tomorrow surrounding the water cooler. This is the reason for having professional
commentators who are seen as credible. The more credible the commentary is perceived the
greater will be the impact of the anchor, and consequently the greater will be the power of a
network to mold its constituency to its own opinion.

Representativeness Heuristic

This particular rule is very similar in name to the availability heuristic and is also very similar in
how it operates. The difference however is between an external influencer and an internal
influencer. Availability heuristic is external. It operates as we have seen above by being
essentially bombarded with information about a particular situation or topic. Representativeness
heuristic is internal, and is consequently much harder to control and a much more subtle
influencer. Here is how representativeness works. If you have some already determined pre-
existing belief system (prejudice certainly fits in here), then any situation presented to you that
taps into your pre-existing belief system is much more likely to be believed as true. The power of
this particular influencer should not be under estimated. Quite frequently we can be manipulated
in this manner and not even realize it. This happens every time you have a pre-existing belief
system and are unaware of it. No matter what you might think, we all have such systems. These
pre-existing belief systems can be considered a special case of anchoring, and the more detailed
and extensive they are certainly would relate to detailed imagination.

Example – In the recent war with Iraq, France was seen as a very vocal non-supporter of the war
and the United States decision to down play the roll of the United Nations. Here is a statement to
evaluate.

Statement – France is a weak country with little concern for anyone other than itself.

Whether this statement is true or not is of no concern for this discussion. What is of concern is
the pre-existing attitude of a reader of this statement. Let‟s consider the perspectives of two

350
Reader 1 – is a World War II veteran who served in France and believes that France has been
ungrateful of his involvement and the involvement of the United States in helping to liberate
France.

Reader 2 – is a foreign exchange student who has lived in France for the past three years and has
very much enjoyed his time in France.

Do you think that readers 1 and 2 will have pre-existing belief systems? Yes, definitely. Will
they be the same? No, definitely. The pre-existing belief system for reader 1 is probably fairly
anti-France. The pre-existing belief system for reader 2 is probably fairly pro-France. How will
each respond to Statement? Reader 2 will probably be very little impacted by the statement,
which means that he won‟t give it much credibility and it won‟t change his belief system very
much. However, reader 1 will probably give a very high endorsement to the legitimacy of this
statement since it conforms so well to his pre-existing belief system. Once again his belief
system will be left intact.

Notice that the truth is not an issue here. The principle is simply that we tend to give more
credibility to those things which agree with what we already believe. Anything that seemingly
disagrees with our pre-existing belief system is discounted. We will see that this is related to the
last rule Reluctance to Change.

Optimism

Optimism the basic psychological belief system that good things will happen and bad things
won‟t.

Example 1 – Even though there is an incredible amount of undeniable information about the
severe health hazards associated with smoking, many smokers do not believe that these things
will happen to them.

Example 2 – Even though there is an incredible amount of undeniable information about the
futility of playing lottery games, people still play them thinking, somebody‟s got to win.
However, there are millions of these people. One of the amazing things that lottery organizers
depend upon is optimism and people‟s general inability to understand numbers greater than they
can count on their fingers and toes.

Reluctance to Change

Reluctance to change is another aspect of anchoring and pre-existing belief systems, such that
when faced with information that conflicts with our beliefs, we tend to discredit their value so
that we can maintain our existing beliefs.

351
Example – It has long been touted by dentists and dental hygienists that flossing is good for your
teeth and gums. If I have never flossed and don‟t have any problems with my teeth or gums, then
I most likely will be reluctant to change my attitude about flossing when this is brought up by the
dental hygienist. Notice that this is also to some degree optimism. The dental hygienists horror
stories about not flossing I dismiss as being only applicable to someone else because that
wouldn‟t happen to me.

Calibrating
This topic leads us into a discussion on calibrating. Calibrating is the process of changing ones
personal probabilities based on information. Of the personal probability “rules,” which do you
think will be amenable to calibration?

We should be able to calibrate all of the comparative rules. This is because the basic influence of
the comparative rules is misdirection. Information should be helpful to overcome misdirection
and help us to see more truthfully the situations being compared. Thus, an important aspect of
overcoming comparative rule manipulation is information. Ask questions.

One would think that since availability heuristic and detailed imagination are essential external
influencers, then the presence of real information should counteract them. The same could be
said of the external component of anchoring. However, since the basic functional aspect of the
representativeness heuristic, optimism, reluctance to change, and internal anchoring is internal,
then we should expect these rules to be extremely resistant to calibration. To overcome these
biases, manipulations, and pre-existing belief systems usually takes massive amounts of
information and long periods of time, if they can be overthrown at all.

Conclusion
At the completion of this tour of some of the rules for personal probabilities (notice how much
more complex they are than the 5 basic probability rules on the first page of this unit), it is easy
to see that many of these rules overlap and work in conjunction with one another. While the rules
for mathematical probability are easily escapable, the rules for personal probability are
inescapable. Every time you buy a product in any store, any time you listen to any radio
program, and any time you watch any television program (especially the commercials) you are
being exposed to one, two, or even quite probably all 9 of these rules. Thus I would draw your
attention one more time to the quote from Darrell Huff.

“The secret language of statistics, so appealing in a fact minded culture, is employed to
sensationalize, inflate, confuse, and oversimplify. The crooks already know these tricks; honest
people must learn them in self-defense.”

352
Probability
Terms
Relative Frequency is the likelihood of something occurring in our sample. This is past tense.
We collect data (sample) and then determine the relative frequency of particular outcomes
having occurred.

Probability is the likelihood of something occurring in the future. Probability is a theoretical
likelihood that a particular outcome will occur.

Looking at these two definitions, it is relatively clear that relative frequency and probability are
very similar. In fact, it can be argued that they are identical with the sole difference being past
tense (description of something that HAS OCCURRED; relative frequency) versus future tense
(prediction that something WILL OCCUR; probability). In this the difference between these two
is reminiscent of the distinction between samples and populations, and the distinction between
statistics and parameters. As we saw in Unit 3 we use statistics from samples to make predictions
about parameters from populations. In a completely analogous fashion we are going to use
relative frequencies to make predictions about probabilities.

There are 3 basic rules governing probabilities which have logical and precise counterparts to the
basic rules for relative frequencies. Since frequency is defined as how often something occurs, a
frequency can only be 0 (never occurs) or positive. It can not be negative. For example, what
would be the interpretation of something occurring less than never? Thus 0 is the smallest
frequency that can be associated with any particular outcome. What is the largest frequency that
is possible for any set of data? The largest frequency would be associated occur when every
person sampled gave the same answer (outcome). In this instance the frequency would be equal
to the sample size (n). Thus, the smallest possible frequency is 0 and the largest possible
frequency is n. We know from Unit 4 that the relative frequency is defined as the frequency
divided by the sample size. Hence, the smallest relative frequency is 0/n = 0 and the largest
relative frequency is n/n = 1. These 2 results provide us with the first two basic rules governing
probabilities.

Rule 1: Any probability must be greater than or equal to 0

Rule 2: Any probability must be less than or equal to 1

The third rule once again comes from the fourth Unit. What is the sum of the frequencies for all
possible outcomes? The answer is the sample size (n). Thus, the sum of the relative frequencies
for all possible outcomes must be one (sum of the frequencies for all possible outcomes divided
by n = n/n = 1.0).

Rule 3: The sum of the probabilities for all possible outcomes must equal 1.0

353
Fair is a way of describing a variable in which all possible individual outcomes are equally
likely to occur. A variable is fair when each and every outcome has the same probability of
occurring. Since the concept of fair is defined in terms of probability, it should be noted that fair
is a theoretical notion.

Example 1: Coin Flip.
If I flip a coin how many outcomes are possible? Two. It is possible for the coin to result in a
Head or a Tail. We will not include the possibility of it landing and staying on its edge.

If I flip this coin 200 times, I could summarize the result of this experiment using the methods of
Unit 4. A fully reported summary table might look like

Table 1: 200 flips of a Coin

Outcome        f        rf
Tail           90      .45
Sum      200     1.00

Here are some questions that can be asked about the results in this table and their answers.

What is the interpretation of the relative frequency for the Head outcome? Relative
frequency is a proportion which is often most easily communicated in terms of percentage. Thus,
the interpretation of .55 for the Head outcome would be that the “coin flip resulted in a Head
55% of the time.”

What is the interpretation of the relative frequency for the Tail outcome? Similarly, that the
“coin flip resulted in a Tail 45% of the time.”

What proportion of the time did the result of our coin flip result in a Head or a Tail? This
would be the sum of the relative frequency for the Head outcome plus the relative frequency for
the tail outcome or .55 + .45 = 1.00 (Rule 3).

What would you predict the probability to be for the Head outcome? From above, “we are
going to use relative frequencies to make predictions about probabilities.” Hence, our prediction
for the probability of the Head outcome would be .55 (the relative frequency for the Head
outcome).

What is the interpretation of the probability for the Head outcome? I would predict that the
next flip of the coin would more likely result in a Head than in a Tail. Why, because the
probability of a Head outcome is .55 and the probability of a Tail outcome is .45; and .55 (Head)
> .45 (Tail). Said another way, I would expect that 55% of all future flips of this coin will result
in the Head outcome. Thus, if I flip this coin 100 times, how many Head outcomes would you
expect? 55

354
Given this coin flip example, what would be the probabilities associated with the Head
outcome and the Tail outcome if you were told that this was a fair coin? From above, a
variable is fair when each and every outcome has the same probability of occurring. Hence, each
outcome would have the same probability and since there are only two possible outcomes, each
outcome would have probability = ½. The probability of the Head outcome would be ½ and the
probability of the Tail outcome would be ½. Note that in general if there are k possible
outcomes, then each and every one of the fair probabilities would equal 1/k.

Fair probability = 1 divided by the number of outcomes = 1 / k

For illustration with the coin example k = 2 (2 possible outcomes) and the fair probability = 1/k
= ½ as shown above.

If I were to claim that a die was fair, then what would be the fair probability associated
with each and every one of the outcomes? In this example, k = 6 and the fair probability = 1/k
= 1/6.

Now for the most complicated question of this unit. Does the evidence from the 200 flips of my
coin (presented in Table 1) indicate that my coin is fair or not fair (biased)?

Recall that the relative frequencies are used to predict probabilities. If a coin were fair, then we
would expect the relative frequencies (the proportion of our outcomes that we DID OBTAIN) to
be very close to the fair probabilities (the proportion of our outcomes WE WOULD HAVE
EXPECTED TO OBTAIN). It would be highly unusual for the relative frequencies to perfectly
equal the fair probabilities, but we would expect them to be close. From the data above, our last
question boils down to the following.

Is the relative frequency of the Head outcome (.55) sufficiently close to the fair probability of the
Head outcome (.50)? The concept of sufficiently close and its use will not be developed until
after the first test.

Given the information from our previous units can we push this example further? Yes. Here are
some more questions.

What is the variable in this example?      The result from the flip of our coin

What level of measurement is this variable?         Categorical

Is it possible to graphically display these results?    Yes. Which graphical summary is most
appropriate? The Bar Graph. [In the graph below the decimal point is missing in the vertical
axis. This is a computer glitch. For instance the number that appears as 60, should be

355
In the graph above the shaded area reflects the relative frequency. If we consider the width of
each rectangle to be 1 unit, then the area = width times height or (1)(.55) = .55 for the Head
outcome. Similarly the area associated with the Tail outcome is (1)(.45) = .45

outcome = .55 + .45 = 1.00 (which is of course Rule 3).

Example 2: I asked 300 people who were in attendance at the most recent meeting of the
Cowboy Joe Club (held before the last basketball game), “How much cash do you have on you at
this time (rounded to the nearest dollar)?”

Results:         Mean = \$50.20
Standard Deviation = \$17.80

What is the variable in this example?      How much cash do you have on you at this time?

What is this variable‟s level of measurement?        Ratio

How many possible outcomes are there?            Very many

While “how many possible outcomes” is a question we can ask about any variable, it is usually a
question we only ask in regard to categorical or ordinal variables.

Is it possible to determine “fair probabilities” for ratio variables?     No

356
For nearly all ratio variables the number of possible outcomes is usually unknown. Since the
definition of “fair probability” is 1 divided by the number of outcomes, the “fair probability”
would be impossible to calculate and hence is almost never used in conjunction with a ratio
variable.

What are the most common summary statistics used with ratio variables?

The mean and standard deviation.

If I wanted to predict how much money a Cowboy Joe Club member would have at the
next meeting, what would I use for my prediction? From Unit 6 we learned that the best
predictor is the mean.

Interpret the mean.

I would predict that a Cowboy Joe Club member will be carrying \$50.20 at the next meeting.

Is it possible to make different kinds of predictions?      Yes

For instance, what would be the probability that a Cowboy Joe Club member will be
carrying more than \$75 in cash at the next meeting?

This type of prediction is not possible using the summary methods presented in Units 5 or 6;
however, we can use the summary methods of Units 4 and 5 to help us. When summarizing the
results from a ratio variable it will help us to summarize the data into bins. The biggest question
when summarizing in this manner is “How many bins?”

For illustration I am going to present this portion of the example from the perspective of 4
different summaries; using 4 bins, 5 bins, 10 bins, and 20 bins.

4 Bin Summary
f             rf
Bin 1 (\$0 to \$25)      24        24/300 = .08
Bin 2 (\$26 to \$50)    126        126/300 = .42
Bin 3 (\$51 to \$75)    126        126/300 = .42
Bin 4 (\$76 to \$100)    24        24/300 = .08

Sum         300           1.00

This particular summary will allow me to answer the previous question; which was “what would
be the probability that a Cowboy Joe Club member will be carrying more than \$75 in cash (Bin
4) at the next meeting?”

Recalling that we use relative frequencies to predict probabilities, the answer to our question is
.08 [the relative frequency for Bin 4]. Hence, we would expect about 8% of the Cowboy Joe
Club members at the next meeting to be carrying more than \$75 in cash.

357
What would the histogram look like for this 4 Bin Summary? [As indicated above, all of the
relative frequency graphs in this section are missing the decimal points in the vertical axis.]

Once again the shaded areas in this graph can be directly related to the relative frequencies. If we
consider the width of any Bin to be one unit, then the area associated with each bin = (width)
times (height) = (1)(rf) = rf. It is important to note that the width of these bins in any histogram is
equal to 1, not the maximum value – minimum value.

It can also be seen in this graph that the sum of the relative frequencies = 1.00. [rf(bin 1) +
rf(bin2) + rf(bin3) + rf(bin4) = .08 + .42 + .42 + .08 = 1.00]

From the histogram above, what would you predict the probability would be for “a Cowboy Joe
Club member to be carrying \$50 or less at the next meeting?” The phrase \$50 or less corresponds
to bins 1 and 2. It can be seen from the graph that the relative frequency associated with bins 1
and 2 is .50 [.08 from bin 1 + .42 from bin 2]. This would be interpreted as, “we would predict
that it is 50% likely that a Cowboy Joe Club member will be carrying \$50 or less in cash at the
next meeting.

5 Bin Summary
f             rf
Bin 1 (\$0 to \$20)       16        16/300 = .053
Bin 2 (\$1 to \$40)       64        64/300 = .213
Bin 3 (\$41 to \$60)     140        140/300 = .468
Bin 4 (\$61 to \$80)      64        64/300 = .213
Bin 5 (\$81 to \$100)     16        16/300 = .053

Sum          300           1.00

358
What would the histogram look like for this 5 Bin Summary?

As in the 4 Bin discussion, we could use the 5 Bin Summary Table and 5 Bin Summary
Histogram to answer questions about relative frequencies and the prediction of probabilities.

For instance, what is your prediction for a Cowboy Joe Club member carrying more than
\$40 but \$80 or less in cash at the next meeting?

The answer to this question is the sum of the relative frequencies for Bin 3 (\$41 to \$60) and Bin
4 (\$61 to \$80); which is .468 + .213 = .681 [which is interpreted as a 68.1% likelihood that a
Cowboy Joe Club member will be carrying this amount].

10 Bin Summary
f            rf
Bin 1 (\$0 to \$10)          6         6/300 = .020
Bin 2 (\$11 to \$20)        10        10/300 = .033
Bin 3 (\$21 to \$30)        20        20/300 = .067
Bin 4 (\$31 to \$40)        44        44/300 = .147
Bin 5 (\$41 to \$50)        70        70/300 = .233
Bin 6 (\$51 to \$60)        70        70/300 = .233
Bin 7 (\$61 to \$70)        44        44/300 = .147
Bin 8 (\$71 to \$80)        20        20/300 = .067
Bin 9 (\$81 to \$90)        10        10/300 = .033
Bin 10 (\$91 to \$100)       6         6/300 = .020

Sum           300             1.00

359
What would the histogram look like for this 10 Bin Summary?

As in the 4 Bin and 5 Bin discussions, we could use the 10 Bin Summary Table and 10 Bin
Summary Histogram to answer questions about relative frequencies and the prediction of
probabilities. It should be relatively clear that the 10 Bin Summary is much more complex than
either the 4 Bin or 5 Bin Summaries. Complexity in this context means that it would take much
more time to generate the table and considerably more time to draw the graph. However, the
benefit from a more complex table is the ability to make more complex and much more precise
predictions.

For instance, from the 10 Bin Summary Table it would be possible to predict the percentage of
Cowboy Joe Club members who would be expected to carry more than \$90 in cash at the next
meeting. This is very simply 100 times the relative frequency of Bin 10 (\$91 to \$100) or 100
times (.02) = 2%.

Can we make this prediction from the 4 Bin Summary? Not really since Bin 4 is too wide (from
\$76 to \$100). We could note that this Bin is 2.5 times larger than the bin we are interested in
(\$10 in our question and \$25 in Bin 4 from the 4 Bin summary or \$25/\$10 = 2.5). Thus we might
be able to guess at the correct answer by dividing the relative frequency (from the 4 Bin
Summary) by 2.5 = .08/2.5 = .032 [Note that this answer (3.2%) is not very close and is too
large].

Logically we might think that if we used the 5 Bin Summary instead (a little more complex),
then perhaps we would obtain a better guess. Using the same process, we observe that Bin 5
from the 5 Bin Summary is only 2.0 times bigger (\$20 dollars wide in Bin 5 of the 5 Bin
Summary versus the \$10 in our question or \$20/\$10 = 2.0). Once again we might be able to
guess at the correct answer by dividing the relative frequency (from the 5 Bin Summary) by 2.0
= .053/2.0 = .0265 [Note that this answer (2.65%) is closer to the correct answer, but still too
large].

360
The 3 paragraphs above are somewhat complicated and are provided only for your information.
This type of thinking or problem will not be included on any of the assignments or tests. The
important point of this discussion is that greater complexity requires more work and time, but
provides us with better answers to a greater number of possible questions.

20 Bin Summary
f            rf
Bin 1 (\$0 to \$5)          3        6/300 = .010
Bin 2 (\$6 to \$10)         3        6/300 = .010
Bin 3 (\$11 to \$15)        4        4/300 = .013
Bin 4 (\$16 to \$20)        6        6/300 = .020
Bin 5 (\$21 to \$25)        8        8/300 = .027
Bin 6 (\$26 to \$30)       12        12/300 = .040
Bin 7 (\$31 to \$35)       19        19/300 = .063
Bin 8 (\$36 to \$40)       25        25/300 = .083
Bin 9 (\$41 to \$45)       32        32/300 = .107
Bin 10 (\$46 to \$50)      38        38/300 = .127
Bin 11 (\$51 to \$55)      38        38/300 = .127
Bin 12 (\$56 to \$60)      32        32/300 = .107
Bin 13 (\$61 to \$65)      25        25/300 = .083
Bin 14 (\$66 to \$70)      19        19/300 = .063
Bin 15 (\$71 to \$75)      12        12/300 = .040
Bin 16 (\$76 to \$80)       8        8/300 = .027
Bin 17 (\$81 to \$85)       6        6/300 = .020
Bin 18 (\$86 to \$90)       4        4/300 = .013
Bin 19 (\$91 to \$95)       3        6/300 = .010
Bin 20 (\$96 to \$100)      3        6/300 = .010

Sum            300            1.00

What would the histogram look like for this 20 Bin Summary?

361
It could easily be argued that the 20 Bin Summary is overkill. Although it enables us to answer
many more questions and much more precisely than any of the other summaries above, we could
legitimately ask “have we now given ourselves the ability to answer questions which are so
precise that they have lost their meaningfulness?” I believe that the answer to this question is
yes. However, it points out a very difficult issue in statistics. We only want to be as precise as
necessary, but we need to be precise as required. This can be difficult to ascertain and is beyond
this course.

However, before leaving the 20 Bin Summary there are a couple of additional predictions that I
would like to consider.

For instance, what dollar amount is associated with the top 12% of the histogram?

In the preceding questions I have asked you to provide relative frequencies associated with
predictions about specific dollar amounts. This particular question is the reverse. How might we
place to start is on the extreme right (top) of the histogram and work our way back to the left.

The largest bin (Bin 20 from \$96 to \$100) has relative frequency .01 [this is most easily
determined from the Bin 20 Summary Table]. The interpretation of this result is the following.
1% of the sample carried more than \$95 in cash. If the question above had been, “what dollar
amount is associated with the top 1% of the histogram,” then the correct answer would have been
\$96. 1% of the sample carried \$96 or more. Since we want the top 12%, not the top 1%, then we
will need to take more bins. What would be the new answer if we took the top 2 bins?

Bin 20 has relative frequency = .01
Bin 19 has relative frequency = .01

Together they account for the top 2%, which is still not enough.

362
What about the top 3 bins? Add Bin 18 to the result above. The relative frequency of Bin 18 is
.013, which means that the top 3 bins together account for 3.3%. If we add Bin 17 (relative
frequency = .020), then collectively the top 4 bins account for 5.3%; still not enough. If we add
Bin 16 (relative frequency = .027), then collectively the top 5 bins account for 8.0%. If we add
Bin 15 (relative frequency = .04), then collectively the top 6 bins account for exactly 12%. Thus,
we have reached our answer. The top 12% of the histogram is associated with Bins 15, 16, 17,
18, 19, and 20; or stated another way, only 12% of our sample carried \$76 or more in cash. This
is the answer to the question that we started with.

Here are a couple more questions.

“What dollar amount is associated with the top 2% of the histogram?”

From the discussion above we saw that 2% was associated with Bins 19 and 20. Hence, the

“What dollar amount is associated with the bottom 3.3% of the histogram?”

Since we are now interested in the bottom instead of the top, we need to begin at the left hand
side of the histogram (smallest bins in the 20 Bin Summary Table) and work our way right.

The relative frequency for Bin 1 is .01 (too small; we need to take at least one more bin)

The relative frequency for Bin 2 is .01, which means that Bins 1 and 2 account for the smallest
2% (still too small; we need to take at least one more bin)

The relative frequency for Bin 3 is .013, which means that Bins 1, 2, and 3 account for the
smallest 3.3% (the exact percentage answer to our question). What dollar amount is associated
with this answer? \$15. Which is interpreted as 3.3% of our sample cased \$15 or less in cash.

“What dollar amount is associated with the top 5% of the histogram?” Once again the
question is about the top of the histogram, so let‟s begin with the largest bins.

The relative frequency for Bin 20 is .01 (too small; we need to take at least one more bin).

The relative frequency for Bin 19 is .01, which means that Bins 20 and 19 account for the largest
2% (still too small; we need to take at least one more bin).

The relative frequency for Bin 18 is .013, which means that Bins 20, 19, and 18 account for the
largest 3.3% (still too small; we need to take at least one more bin).

The relative frequency for Bin 17 is .020, which means that Bins 20, 19, 19, and 17 account for
the largest 5.3% (too large).

363
What do we do now? The top 3 bins did not account for enough and the top 4 bins accounted for
too much. Although it would seem that we need only part of Bin 17 (not all of it), we are not
going to do that.

What will we do? We have 3 choices; go with the smaller amount in the top 3 bins (3.3%), go
with the larger amount in the top 4 bins (5.3%), or develop a result with more bins, which would
give us greater precision. Remember, more bins always provide us with the ability to make more
precise predictions. For us the best answer is the second possibility (go with the larger amount in
the top 4 bins). Why? Because it is the closest answer to our question. Hence the correct answer
would be \$81.

Before leaving this second example I would like to make one additional point.

Another form of graphical summary for ratio variables is the line chart. The line chart is the
histogram with a line connecting the midpoints of tops of the rectangles and then removing the
rectangles. Please refer to the material in Unit 5 for a more detailed discussion.

Notice what happens with the shapes of the Line Charts as we progress from the 4 Bin to the 5
Bin to 10 Bin and finally to the 20 Bin Summary.

4 Bin Summary – Line Chart

Reading this graph from left to right gives the appearance of an upward straight line followed by
a flat straight line and finally followed by a downward straight line. This graph very much looks
like the connection of 3 straight lines.

364
5 Bin Summary – Line Chart

Reading this graph from left to right gives the appearance of an upward straight line followed by
an even more upward straight line followed by a downward straight line and finally followed by
less downward straight line. This graph very much looks like the connection of 4 straight lines.

10 Bin Summary – Line Chart

365
Reading this graph from left to right gives the appearance of an upward straight line followed by
an even more upward straight line followed by an another even more upward straight line
followed by a flat straight line followed by a downward straight line followed by a less
downward straight line and finally followed by even less downward straight line. This graph
very much looks like the connection of 7 straight lines.

20 Bin Summary – Line Chart

Reading this graph from left to right gives a somewhat different appearance. The first 5 upward
lines appear straight, but the increase in their upward appearance now seems gradual enough to
take on the appearance of a slightly increasing curve rather than 5 starkly different straight lines.
The same can be said for the 5 downward lines.

The trend in the Line Charts from the 5 Bin Summary to the 20 Bin Summary is to provide us
with a smoother and smoother curve. In fact with a little use of imagination it is possible to
envision what the line might look like if we went to 100 Bins or more; a very smooth curve.
Such a curve is displayed over the 20 Bin Histogram below; it is called the Normal Curve.

366
Throughout this unit we have used relative frequencies to predict probabilities. In example 1
(coin flip; categorical variable) we had a means of determining probabilities theoretically which
was called “fair.” In example 2 (cash; ratio variable) we were confronted with a more complex
problem whose results (Summary Tables and Histograms) were entirely dependent upon the
manner in which we summarized the results (determination of the number of bins). In a
somewhat comparable, but much more complex manner, we can develop theoretical probabilities
for ratio variables by considering what the results would look like if we used an infinite number
of bins. The way in which such probabilities are calculated require us to know the nature of the
curve (what it looks like; such as the Normal Curve in the figure immediately above) and then
apply integral calculus to calculate areas. Although such knowledge is far beyond the level of
this class, we can still find such theoretical probabilities by using already developed tables; such
as the Standard Normal Table found in Unit 21.

The Use of the Standard Normal Table
If you look at the first table in Unit 21 you will see at the top of the page a graph which looks
very similar to the one above. The biggest difference is found in the horizontal axis which is not
in dollars, but in “z” values [Note the little “z” off to the right]. The value 0 is found in the
middle of this graph.

The shaded area in the graph is the probability. This is the same as our discussion above about
the shaded areas of this bar graph and histogram being relative frequencies. In the table, the
probabilities are the four digit decimal numbers. The “z” values are found by combining the first
column of the table (labeled CUTOFF) and the first row of the table (also labeled CUTOFF).
The “z” values are 3 digit numbers of the form a.bc (one digit to the left of the decimal and two
digits to the right of the decimal). The a.b values are found in the first column and the .0c values
are found in the first row. The “z” value is the combination of these two numbers to identify the
row (a.b) and column (.0c) to locate the probability. The probabilities identified in this table are
the answer to the following question, “what is the probability of getting this “z” value or
higher?” For example,

367
What is the probability of getting a “z” value of 1.52 or bigger? The “z” value 1.52 is the
combination of 1.5 (a=1, b=5) and .02 (c=2). This means that the answer to our question will be
the 4 digit decimal number found at the intersection of the 1.5 row and .02 column. This number
is .0643 and is interpreted as “the theoretical standard normal probability of getting a „z‟ value of
1.52 or bigger is 6.43%.”

What is the probability of getting a “z” value of 2.18 or bigger? The “z” value 2.18 is the
combination of 2.1 (a=2, b=1) and .08 (c=8). This means that the answer to our question will be
the 4 digit decimal number found at the intersection of the 2.1 row and .08 column. This number
is .0146 and is interpreted as “the theoretical standard normal probability of getting a „z‟ value of
2.18 or bigger is 1.46%.”

Quick Quiz: What is the probability associated with each of the following “z” values?

1.   z = 0.04 _______
2.   z = .77 _______
3.   z = 1.96 _______

1.   .4840 (found in the .0 row and .04 column)
2.   .2206
3.   .0250

This Standard Normal Table can be used to answer a second type of question, which is “what „z”
value is associated with a particular probability.

For instance, “what „z‟ value is associated with a probability of .0708?” [This particular “z”
value can be written symbolically with a subscript as z.0708]

For this type of question we look among the 4 digit decimal numbers to find the specified
probability, and then identify the corresponding row and column. If you look among the 4 digit
probabilities you will notice that in general the bigger numbers are toward the top of the table
and to the left. Hence, to find an identified probability you can start at the bottom and move up to
make big changes and to the left to make small changes. Using this strategy you should be able
to locate the probability .0708 in the 1.4 row and .07 column. Thus the answer to our question is
1.47

Another Quick Quiz: What is the “z” value associated with each of the following probabilities?

1.   probability = .2266 ______
2.   probability = .0901 ______
3.   probability = .0028 ______

368

1.   .75    (the probability was found in the .7 row and .05 column)
2.   1.34
3.   2.77

Using the Standard Normal Table to generate theoretical probabilities, I can now answer one last
problem from example 2.

What is the probability of a Cowboy Joe Club member carrying \$91 or more in cash at the next
meeting?

As we have learned in this unit we can answer this question by using relative frequencies
obtained from collecting a sample. The answer from the 10 Bin Summary table was .020 or 2%.

However, now we have a second way of answering this question based on the theoretical answer
obtained from the Standard Normal Table. This answer can be found by using the following
equation (conceptually accurate given the level of this course) to convert the very first results we
obtained (mean and standard deviation) to a “z” value. This equation is

“z” value = [score – mean]/[standard deviation]

The “score” is the number presented in the question (\$91). What “z” value is associated with a
score of \$91? If you go back to the beginning of example you will see that the mean = \$50.20
and the standard deviation = \$17.80. Thus the “z” value is

“z” value = [91 – 50.20]/[17.80] = [40.8]/[17.8] = 2.29    rounded to two decimal places

Now all we have to do is find the probability associated with this “z” value in the Standard
Normal Table, which is .0110

As might be expected, the estimate of our probability from the sample (relative frequency) does
not perfectly agree with the theoretically determined probability, but they are close [1.1% from
the Normal Table and 2.0% from the data]

One Last Quick Quiz: If the mean = 50.2 and the standard deviation = 17.8, then what “z” values
(rounded to 2 decimal places) and probabilities are associated with the following scores?

1.    score = \$61      “z” value = ______     and probability = ______
2.    score = \$75      “z” value = ______     and probability = ______

369

1.   “z” value = [61 – 50.20]/[17.80] = [10.80]/[17.80] = .61 (rounded to 2 decimal places)

Probability = .2709

2.   “z” value = [75 – 50.20]/[17.80] = [24.8]/[17.80] = 1.39 (rounded to 2 decimal places)

Probability = .0823

Unit End Quiz
I asked 90 students in my Stat 2070 course during the Fall of 2008 the following question. What
is your favorite number between 1 and 9. The results were

Outcome          f
1              1
2              2
3             20
4              6
5             2
6              3
7             37
8             14
9              5

1. What is the variable?
2. What is the variable‟s level of measurement?
3. Calculate the relative frequencies.
4. Draw the relative frequency bar graph.
5. What is the “fair probability?”
6. What would you predict for the probability of selecting the number 7 (from the sample
results)?
7. What would you predict for the probability of selecting the number 3 or 7 (from the sample
results)?
8. Do the sample results indicate that people‟s favorite numbers are random (fair)?

370

1. Variable: “what is your favorite number between 1 and 9.”

2. Level of Measurement: categorical (the numbers do not represent quantities, there are just
different from one another)

3.

Outcome          f          rf
1              1        1/90 = .01
2              2        2/90 = .02
3             20        20/90 = .22
4              6        6/90 = .07
5              2        2/90 = .02
6              3        3/90 = .03
7             37        37/90 = .41
8             14        14/90 = .16
9              5        5/90 = .06
Sum       90           1.00

4.

5. “fair probability” = 1 / (number of outcomes) = 1/9 = .11

6. predict the probability of selecting a 7 using the sample = .41 (relative frequency)

7. predict the probability of selecting a 3 or 7 using the sample

= relative frequency (3) + relative frequency (7) = .22 + .41 = .63

371
8. The selection of favorite numbers does not appear random (fair). Why? Because the relative
frequencies are very different from the “fair probabilities” for nearly every outcome, but
especially for 3 and 7. Since the relative frequencies for these outcomes is much greater than the
“fair probabilities” they seem to be selected at a higher rate than we would expect by chance
(random, fair).

Cowboy Cash on Hand                28.0                                38.0
Example (Data Set)                 29.0                                38.0
29.0                                39.0
1.0                                29.0                                39.0
3.0                                29.0                                39.0
4.0                                29.0                                39.0
6.0                                47.0                                39.0
7.0                                48.0                                39.0
9.0                                49.0                                39.0
11.0                               31.0                                39.0
12.0                               31.0                                39.0
12.0                               31.0                                39.0
13.0                               32.0                                39.0
48.0                               32.0                                41.0
16.0                               32.0                                41.0
17.0                               33.0                                41.0
18.0                               33.0                                41.0
19.0                               33.0                                41.0
19.0                               33.0                                42.0
19.0                               34.0                                42.0
39.0                               34.0                                42.0
41.0                               34.0                                42.0
21.0                               34.0                                42.0
21.0                               34.0                                42.0
22.0                               34.0                                43.0
22.0                               34.0                                43.0
23.0                               34.0                                43.0
23.0                               34.0                                43.0
24.0                               36.0                                43.0
24.0                               36.0                                43.0
42.0                               36.0                                44.0
43.0                               36.0                                44.0
46.0                               37.0                                44.0
26.0                               37.0                                44.0
26.0                               37.0                                44.0
27.0                               37.0                                44.0
27.0                               38.0                                44.0
28.0                               38.0                                44.0
28.0                               38.0                                44.0

372
44.0   52.0   59.0
44.0   52.0   59.0
44.0   52.0   59.0
46.0   52.0   59.0
46.0   52.0   59.0
46.0   53.0   59.0
46.0   53.0   61.0
46.0   53.0   61.0
47.0   53.0   61.0
47.0   53.0   61.0
47.0   53.0   61.0
47.0   53.0   61.0
48.0   54.0   62.0
48.0   54.0   62.0
48.0   54.0   62.0
48.0   54.0   62.0
48.0   54.0   62.0
48.0   54.0   63.0
48.0   54.0   63.0
48.0   54.0   63.0
49.0   54.0   63.0
49.0   54.0   63.0
49.0   54.0   64.0
49.0   56.0   64.0
49.0   56.0   64.0
49.0   56.0   64.0
49.0   56.0   64.0
49.0   56.0   64.0
49.0   56.0   64.0
49.0   56.0   64.0
49.0   57.0   66.0
49.0   57.0   66.0
49.0   57.0   66.0
49.0   57.0   66.0
49.0   57.0   66.0
49.0   57.0   67.0
51.0   58.0   67.0
51.0   58.0   67.0
51.0   58.0   67.0
51.0   58.0   68.0
51.0   58.0   68.0
51.0   58.0   68.0
51.0   59.0   69.0
51.0   59.0   69.0
52.0   59.0   69.0
52.0   59.0   69.0

373
69.0   52.0   82.0
69.0   54.0   83.0
69.0   76.0   83.0
71.0   76.0   61.0
71.0   76.0   59.0
71.0   77.0   86.0
71.0   77.0   86.0
72.0   78.0   87.0
72.0   78.0   53.0
72.0   79.0   89.0
73.0   56.0   91.0
73.0   57.0   93.0
73.0   53.0   94.0
74.0   81.0   96.0
74.0   81.0   97.0
51.0   82.0   98.0

374
Unit 25: Summary Help Section
Probably the most difficult aspect of the second half of the introductory statistics course is the determination of the test situation. From
the problem description you should expect to find all of the necessary information to determine (1) the number of samples, (2) the
number of variables, (3) the level of measurement of each of the variables, and (4) the alternative hypothesis. The first three of these
are all necessary to determine the test statistic. Summary Table 1 below provides a guide for determining the test situation from the
problem description. All of the statistical methods used in the second half of the course are listed in these 3 summary tables below. For
assignment 5, already completed, you could have used this first table to assist you in selecting between the One Sample Proportion,
Multinomial, Independence, and Homogeneity situations. For assignment 6, you can use this first table to assist you in selecting
between Correlation and Regression, although this information is given in assignment 6. For assignment 7, you can use this first table
to assist you in selecting between the Two Independent Samples t-test, the Matched Samples t-test, and One-way Analysis of
Variance. And of course for the final this entire first table should be useful. The second and the third tables are extensions of this first
table and are linked by test situation. In the second table guidelines are given for the determination of the appropriate alternative
hypotheses (probably the second most challenging aspect of the introductory course), the critical values, degrees of freedom, and the
part of the test situation that should be used for the interpretation (Step 5). The third table is a brief outline that provides the various
calculations that are needed to basically calculate the test statistic. Although many students think this is one of the more difficult
aspects of the introductory course, it is actually less of a problem than the first two issues mentioned above. However, it can still be
difficult. This is why most of the calculations are actually done for you on the assignments and will also be done for you on the final.
For instance, rather than having you calculate the MSE for the regression test statistic, this is given to you.

Keys for determining the alternative hypothesis.

Note that if you identify the test situation as multinomial, there is only one possible alternative hypothesis (the proportion of responses
associated with the various outcomes are different). This makes the alternative of the multinomial fairly easy. The same is true for the
Independence, Homogeneity, and One-way Analysis of Variance test situations. (4 of our 9 test situations are relatively easy)

The difficulty in the other test situations (One Sample Proportion, Correlation, Regression, Two Independent Samples t-test, and One-
way Analysis of Variance) is the determination of whether the alternative is one or two tailed. (5 of the 9 test situations are more
difficult) The simplest way to tell the difference between the one and two tailed alternative situations is the following:

375
If the problem description SPECIFICALLY indicates (asks, speculates, wants to test) a specific direction of interest, then the
alternative is one tailed. Examples of this specific direction are the following: there should be more democrats than republicans (one
sample proportion), the relationship between the two variables should be positive (correlation), our independent variable should be
negatively predictive of the dependent variable (regression), the first sample mean should be less than the second sample mean (two
independent samples t-test and matched samples t-test). Notice the directional words in red. If a specific direction is not indicated,
then the alternative is two tailed. Here are some key words for two tailed hypotheses: unequal or different (one sample proportion,
two-independent samples t-test, matched samples t-test), related (correlation, independence), and predictive (without the qualifier of
positive or negative, regression). Notice the generic non-specific direction words in blue.

376
Summary Table 1 – Key to Finding the Appropriate Test Situation

Number of        Number of         Level of                                    Test
Samples          Variables         Measurement (Variable(s))                   Situation                                    Unit

One              One               Categorical (2 – values)                     One Sample Proportion                         12

One              One               Categorical*                                 Multinomial                                   13

One              Two                1st Variable is Categorical*                Independence                                  14
2nd Variable is Categorical*

One              Two                1st Variable is Ratio*                       Correlation                                 15
2nd Variable is Ratio*

One              Two                One Variable is the IV                       Regression                                    17
The Other Variable is the DV
Both Variables are Ratio*
-----------------------------------------------------------------------------------------------------------------------------------------
Two               One               Categorical                                  Homogeneity                                   14

Two               One               Ratio*                                      Two Independent Samples t-test                 18

Two               One               Samples are Matched                          Matched Samples t-test                        19
Variable is Ratio*
-----------------------------------------------------------------------------------------------------------------------------------------
Three             One               Ratio*                                       One-way Analysis of Variance                  20

Categorical* means that the variable is either categorical or ordinal with 4 or fewer possible response outcomes

Ratio* means that the variable is either ratio or ordinal with 5 or more possible response outcomes (“ratio”)

377
Summary Table 2 – Determing Hypotheses, Distributions for the Critical Value, Degrees of Freedom, and Interpretation

Test               Alternative                                  Distribution of the    Degrees of
Situation          Hypothesis                  Probability*       Critical Value        Freedom                Interpretation

One Sample          3 Forms:                                    Normal (Z)             None                      phat
Proportion           1-tailed; phat > .5           
phat < .5          
2-Tailed ; phat ≠ .5          

Multinomial        Responses Possibilities                      Chi-Square            # of outcomes – 1          O–E
are Different

Independence       The two variables                            Chi-Square      (# rows – 1)(# columns – 1)      O–E
are related

Correlation        3 Forms:                                             t                  n–2                 r (correlation)
1-tailed; r > 0 (positive)    
r < 0 (negative)   
2-tailed; r ≠ 0            

Regression          3 Forms:                                        t                      n–2                   b (slope)
1-tailed; slope > 0         
slope < 0         
2-tailed; slope ≠ 0         

Homogeneity         1st Sample is Different                     Chi-Square      (# rows – 1)(# columns – 1)     O–E
from the 2nd Sample

378
Test                  Alternative                                         Distribution of the     Degrees of
Situation             Hypothesis                   Probability*             Critical Value         Freedom               Interpretation

Two Independent        3 Forms:                                               t                   n1 + n2 – 2         mean 1 and mean 2
Samples t-test           1-tailed; µ2 > μ1           
µ2 > μ1            
2-tailed; µ2 ≠ μ1          

Matched Samples        3 Forms:                                               t                       n–1               mean 1 and
mean 2
t-test                  1-tailed; µ2 > μ1            
µ2 > μ1            
2-tailed; µ2 ≠ μ1           

One-way Analysis       The means are                                         f           df1 = k – 1 & df2 = n – k        the means
of Variance            not the same                                                                                   (mean-grand mean)

Probability* is the probability to be used when looking for the critical value (s) in the table

379
Summary Table 3 – Calculations that Need to be Performed

Test
Situation                              Calculations (from Data)                              Calculations (from Table)

One Sample Proportion                  phat, Test statistic                                  Critical Value (2 Values if two tailed)

Multinomial                            E, O – E, (O – E)2 / E, Test Statistic                Critical Value (always only one value)

Independence                           E, O – E, (O – E)2 / E, Test Statistic                 Critical Value (always only one
value)

Correlation                           Mean X, Mean Y, (X – Mean X)2 ,                         Critical Value (2 Values if two tailed)
(Y – Mean Y)2 , (X – Mean X)(Y – Mean Y),
r, Test Statistic

Regression                             Mean X, Mean Y, (X – Mean X)2 ,                       Critical Value (2 Values if two tailed)
(X – Mean X)(Y – Mean Y), a, b, MSE,
Test Statistic

Homogeneity                            E, O – E, (O – E)2 / E, Test Statistic                 Critical Value (always only one
value)

Two Independent Samples t-test         Mean 1, Mean 2, Variance 1, Variance 2,                 Critical Value (2 Values if two
tailed)
Sp2 , square root of (1/n1 + 1/n2) , Test Statistic

Matched Samples t-test                 Difference Scores (D), Mean (D),                      Critical Value (2 Values if two tailed)
Standard Deviation (D), Test Statistic

One-way Analysis of Variance           Mean 1, Mean 2, Mean 3, Grand Mean,                   Critical Value (always only one value)
Variance 1, Variance 2, Variance 3,
MSA, MSW, Test Statistic

380

DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 776 posted: 9/27/2010 language: English pages: 380
How are you planning on using Docstoc?