DATA: numbers with a context
The number ―75‖ carries no information, but a ―75 lb dog‖ engages our background knowledge and
allows us to make judgements.
STATISTICS: the process of collecting, organizing, and drawing conclusions from data.
1) Data beat anecdotes— a striking story that sticks in our minds
National Cancer Institute spent 5 years and $5 million and determined there is no connection
between leukemia and exposure to magnetic fields produced by power lines
VS.
1 television interview with a mother whose child has leukemia and happens to live near a power
line.
2) Check the origin of the data
Ann Landers asked her readers whether they would have children again, 70% of almost 10,000
responses said no
Opinion polls have shown most parents don’t regret having children
3) Beware of hidden variables—―what other information could explain this‖
Lurking variables/confounding
Crime is higher in counties with gambling casinos
Crime is higher in urban & poor counties (where casinos are typically built)
Loras gives students with better grades 1st choice at housing—is this fair?
4) Variation is everywhere
Repeated measurements (height/weight/temperature) on the same person will vary
5) Statistical conclusions are not absolute
Smoking causes lung cancer
6) Data reflect social values—Society influences what and how we measure
Catholic countries typically have fewer suicides than Japan
―Unemployment‖ rate only considers people who want a job AND have actively looked in the last 2
weeks (4 weeks)
Chapter 3 Getting Trustworthy Data
Does your sample data represent an entire population?
POPULATION: the entire group of individuals about which we want information
SAMPLE: a part of the population from which we actually collect information
used to draw conclusions about the whole
1
1) OBSERVATIONAL STUDY: Observes individuals and measures variables of interest in order to
describe some group or situation. Does not attempt to influence the responses
Jane Goodall watched chimps to realize that they were not vegetarians
Sample Survey - a sample is selected and observed because it is believed they represent the entire
population
Example: Poll asks voters who they will elect. If the pollsters only call homes between 9-5, then the
population isn’t really the entire US population, but only people in the US that are at home during 9-5.
NOTE if 50 or 50,000 people were called in this sample it still doesn’t change the fact that it does not
represent the US population.
Example: Nielsen ratings do not call college campuses (how does this affect advertising sales for WB)
CENSUS: A sample survey that attempts to include the entire population in the sample
--time consuming and expensive (can’t do census with destructive testing)
--a careful sample can produce more accurate data then a census
2) EXPERIMENT: A treatment is imposed on an individual before results are measured in order to
determine whether the treatment caused a change in the result.
Experiments can give evidence for cause and effect relationships (on average, but not every individual)
3.2 Design of Experiments
RESPONSE VARIABLE: measures an outcome/result of a study (dependent)
EXPLANATORY VARIABLE: might explain or causes changes in the response variable
(independent)
Often called factors
May be several levels of each factor
SUBJECT: Individual, Experimental Unit
TREATMENT: experimental condition applied to subjects
Each trt may be specific levels of several factors
College students at Nova Southeastern University have the option of taking the course on line.
Abecedarian Project 111 (healthy, low-income black infants in 1972). All infants received
nutritional supplements and help from social workers. ½ chosen at random were placed in an intensive
preschool program. Over time measured test scores, college attendance, and employment of these
children
Placebo: a dummy treatment
Example 3.5 p 231 Gastric Freezing to prevent ulcers
2
Randomization uses chance to assign subjects to the trt.
-- Creates trt groups that are similar before the trt is applied
-- Prevents bias
BIASED: A statistical study that systematically favors certain outcomes
-- typically due to hidden (lurking) variables
SIMPLE RANDOM SAMPLE ( SRS): of size n consists of n individuals from the population chosen
in such a way that every set of n individuals has an equal chance to be the sample actually selected.
--each individual in the population has an equal chance of being selected
Joans accounting firm serves 30 clients, and wants to interview a sample of 5 clients to improve client
satisfaction.
Step 1) Label each client
01 A-1 Plumbing 16 JL Records
02 Accent Printing 17 Johnson Commodities
03 Action Sport Shop 18 Keiser Construction
04 Anderson Construction 19 Liu's Chinese Restaurant
05 Bailey Trucking 20 MagicTan
06 Balloons, Inc 21 Peerless Machine
07 Bennett Hardware 22 Photo Arts
08 Best's Camera Shop 23 River City Books
09 Blue Print Specialties 24 Riverside Tavern
10 Central Tree Service 25 Rustic Boutique
11 Classic Flowers 26 Satellite Services
12 Computer Answers 27 Scotch Wash
13 Darlene's Dolls 28 Sewer's Center
14 Fleisch Realty 29 Tire Specialties
15 Hernandez Electronics 30 Von's Video Store
Step 2) Enter Table B anywhere and read two-digit groups
line 130 shows: 69051 64817 87174 09517 84534 06489 87201 97245
two-digit groups are:
69 05 16 48 17 87 17 40 95 17 84 53 40 64 89 87 20 19 72
05 16 17 20 19 are the 5 clients selected
Randomized Comparative Experiment
- randomization produces similar groups
- comparative design eliminates confounding
other influences (lurking variables) operate equally on all groups
- use enough subjects to reduce chance variation in results
then we conclude differences in the response variable are due to the effect of the treatments
Control of the effects of lurking variables on the response,
3
Randomization the use of impersonal chance to assign experimental units to trts
Replication of the experiment on many units reduce chance variation in the results
COMPLETELY RANDOMIZED experimental design: all subjects are allocated at random among all
the treatments.
Effects of TV advertising (length, repetition)
1 time 3 times 5 times
30 sec Trt 1 Trt 2 Trt 3
90 sec Trt 4 Trt 5 Trt 6
2 explanatory, 6 Trts (interactions may be negative)
MATCHED PAIRS DESIGN: compares 2 trts by pairs of subjects that are closely matched as possible
Right & left hand Pepsi VS. Coke
BLOCK DESIGN: the subjects are randomly assigned within each block
- A group of subjects known to be similar – strawberries next to a wall
- Blocks control the effects of some outside variables
less variation in the experiment overall
STATISTICAL SIGNIFICANT: observed effect is so large, that it would rarely occur by chance
Double blind-neither the subjects or evaluators know which trt was applied
Lack of realism-experiment may not duplicate real world
3.3 Sampling design –How to choose a sample from the population
CONVENIENCE SAMPLING: selecting a sample based on what is easiest to reach
VOLUNTARY RESPONSE SAMPLE: chooses itself by responding to a general appeal. They attract
responses from those who feel strongly about a topic.
Example: conducting a survey at the mall
The population of the study changes based on the sample
Statisticians use randomness or chance to select a sample in order to avoid bias
SIMPLE RANDOM SAMPLE ( SRS): of size n consists of n individuals from the population chosen
in such a way that every set of n individuals has an equal chance to be the sample actually selected.
STRATIFIED RANDOM SAMPLES:
1) divide into distinct groups of individuals (strata)
2) take a SRS in each strata
Stratified random samples can have smaller margin of errors because each strata is very uniform
Stratified random samples may not give every individual an equal chance to be chosen
4
MULTISTAGE SAMPLING DESIGN: Samples within samples
Example: Counties, townships, blocks, households
Who carried out the survey?
What was the population?
How was the sample selected?
How large was the sample? (margin of error)?
What was the response rate?
How were the subjects contacted? (phone, mail, etc…)
When was the survey conducted?
What were the exact questions asked?
SAMPLING ERROR: errors caused by the act of taking a sample that cause sample results to be
different than a census result
UNDERCOVERAGE: when some groups in the population are left out of the process of
choosing a sample
-½ of the population in larger cities have unlisted numbers
-7 % of households don’t have a phone
CONVENIENCE SAMPLING, VOLUNTARY RESPONSE SAMPLE
RANDOM SAMPLING ERROR: deviation between sample statistic and population.
- caused by chance in selecting a random sample
- margin of error includes only random sampling error
NONSAMPLING ERROR: errors not related to the act of sampling (they can occur in a census)
PROCESSING ERRORS: Computer entry, arithmetic
RESPONSE ERROR: age, income, use of illegal drugs, faulty memory
NONRESPONSE: can’t be contacted or refuse to cooperate
substitute other households for nonresponders ( use similar demographics)
weight the responses
WORDING QUESTIONS:
13% think we are spending too much on ―assistance to the poor‖,
44% think we are spending too much on welfare.
BIAS: is consistent, repeated deviation of the sample statistic from the population parameter in the
same direction.
VARIABILITY: the spread of the sample statistic values when taking many samples.
-If variability is large, the result of the sample is not repeatable
….
Bais, low Variability small bias, variability bias, variability small bias, low
variability
Random sampling reduces bias (use SRS)
Larger samples reduce variability
5