# Confidence Intervals and Hypothesis Tests

Document Sample

```					Inference
Confidence intervals for means
– Margin of error
– Small populations
Confidence intervals for proportions
Sample size
Introduction to hypothesis testing
Overview of Inference
Select Simple Random Sample

Compute Sample Statistics and
Verify Assumptions

Construct a Confidence Interval
that Includes a Margin of Error

Population Parameter
Metropolitan buses
A simple random sample of 36 buses shows
a sample mean of 225 passengers carried
per day per bus. The sample standard
deviation is 60 passengers.
What’s a 99% confidence interval estimate
of the mean number of passengers carried
per bus during a 1-day period?
Metropolitan buses
Assumptions:
One is valid because we used simple
random sampling to select the sample.
Two is valid because the sample is > 30.
So we can use do this confidence interval.
Metropolitan buses: Statpro
Use Statpro function in Excel.
Data file is metrobus.xls from website
Population mean number of passengers carried
per day is between 198 and 253 at 99%
confidence level
Metropolitan buses: by hand
Compute standard error: 60 / 6 = 10
Compute margin of error = 2.58 x 10 = 25.8
99% confidence interval is (roughly)
between 199.2 and 250.8
Why is this different from Statpro?
– Rounding of mean and standard deviation
– Rule of thumb z value not exact
Does it matter? – not usually
Metropolitan buses: template
This problem is also solvable using the
Excel template for Confidence Intervals and
Hypothesis tests.
Enter data for mean, standard deviation,
number in sample and level of confidence
desired.
Interpretation of Confidence
Interval
99% confident that interval 225 + 25.8
contains the unknown population mean
number of passengers.
This means:
If we selected 100 samples of size n = 36
and constructed 100 confidence intervals,
about 99 would contain the unknown
population mean and 1 would not.
CI overview
A sample mean is a point estimate of the
population mean
A confidence interval is an interval estimate
of the population mean
A confidence interval gives information,
not only about where the population mean
is, but also about how accurate the
information is.
CI overview - procedure
1) Assumptions?
2) /confidence level?
3) Use Statpro on data to get interval or use
Excel template or
- compute standard error
- use z-value rule of thumb to
compute margin of error
- write down confidence interval
Sample Means vs. Proportions
Sample Means
– Are computed from quantitative data.
– Can be all possible values, positive or negative.
– Estimate population means.
Sample Proportions
– Are computed from yes/no data (binomial)
– Are numbers between 0 and 1 (inclusive).
– Estimate population proportions.
Procedure for drawing conclusions
from sample proportions
Select Simple Random Sample.

Compute Sample Proportion.
Check for Normality

p  z p(1  p) / n
ˆ     ˆ     ˆ

Population Proportion, p
Procedure for computing CIs
for proportions.
Check np  5 and n(1  p)  5
ˆ              ˆ
 = 1 - confidence level
Confidence interval is:

p (1  p )
ˆ      ˆ            p (1  p )
ˆ      ˆ
pz
ˆ               p pz
ˆ
n                   n
Mortgage Lending
Last year, of a total of 58,000 customers,
2.7% defaulted on their mortgage. If this
year is like last year, what are optimistic
and pessimistic projections of the
percentage who will default?
Sample size vs. study cost and
width of confidence interval

Sample Size   Study Cost    Width of CI

Small and
Large          High
Meaningful
Wide and Less
Small         Low
Meaningful
Sample size calculations
For a mean:
2   2
( z) s
n          2
( MOE )
For a proportion:

( z ) p (1  p )
2
n             2
( MOE )
Secondhand cars again
Suppose the dealer wants to know how
many people to survey to be able to get
within 1 year of the average age of the
population of secondhand car buyers with
95% confidence.
Product popularity
A manager has done a small initial study
that shows that 40% of shoppers will buy a
new product. He wants to be able to
estimate the population percentage with a
margin of error of no more than 1% with
calculate the necessary sample size.
Three Key Ideas
Sample statistics estimate population
parameters.
As sample size increases, sample
statistics tend to better approximate their
population parameters.
Confidence intervals provide a probable
range within which the population
parameter will fall.
What if...
A bus drivers union claimed that on
average more than 280 people per day travel
on buses in the Metropolitan Bus example.
Would you be inclined to believe them?
Why or why not?
So confidence intervals can be used to test
population parameters. Which leads us to….
Hypothesis Testing
State Hypotheses

Determine Type I, II errors

Set Significance Level

Run study and collect data

Make decision, compute p-value
What is a hypothesis test doing?
Usually there is some default assumption or
common wisdom about the true mean or true
proportion of a population.
Any sample you take from the population will
probably not conform exactly to this default or
common wisdom.
What we want to know is: how unusual does
your sample data have to be before you are
willing to say the common wisdom is wrong?
Testing Approach
Hypothesis testing is decision oriented.
– Is a population parameter less than, equal to, or greater
than a specific value (has decision-making implications)?
Highlights that two different decision
making errors are possible.
– TYPE I OR  ERROR
– TYPE II OR  ERROR
p-value (prob. value) aids in interpreting
results.
Hypothesis Testing Process

Assume the
population mean
income is \$35K
(Null Hypothesis)                       Population
The Sample
Does X  65 come from
a population with  35?   Mean Is \$65K

No, not likely!
REJECT
Sample
Null Hypothesis
Example: Customer ages
A random sample of 28 customers were asked
their age. The sample mean and sample
standard deviation are, respectively, 51.2 and
22.8 years.
Is the company’s claim that the mean age of
customers is 40 years credible?
What is the p-value? (If we take issue with the
company, what is the chance, based on this
sample, that their claims are actually correct?)
Hypotheses
The null hypothesis, H0, is usually the “default”
or current situation. Rejecting the null
hypothesis will cause us to take a new course of
action.
The alternative hypothesis, H1, is almost
always a claim made, or challenge to current
situation/wisdom.
Together, the null and alternative hypothesis
take into account all possible values.
Hypothesis Testing
State Hypotheses
Step 1: State Hypotheses
CORRECT     H 0 :   40
H1 :   40
Company’s Claim that
INCORRECT    H 0 :   40   average age is 40 is Null
H1 :   40    Hypothesis. (Null is “default”
or current situation)

INCORRECT    H 0 :   40   Null and Alternative Hypoth-
eses Must be Mutually
H1 :   40    Exclusive and Exhaustive.

INCORRECT    H 0 : x  40   Hypotheses about Unknown
Population mean, Not Known
H1 : x  40    Sample mean.
One or two-tailed
What if you care about customers being “young” or
not : i.e. the null hypothesis is that average age of
customers is 40 or less.
Alternative hypothesis is that average age of
customers is more than 40.
This is a one-tailed hypothesis test and hypotheses
would look like:
H 0 :   40
H1 :   40
Hypothesis Testing
State Hypotheses

Determine Type I, II errors
Type I and II errors
Type I error:
– reject null hypothesis when it should have been
accepted
Type II error:
– accept null hypothesis when it should have
been rejected.
Type I and II Errors

H0 is true in   H0 is false in
population      population
Decision:   Correct         Type II
Accept H0   Decision        error
Decision:   Type I          Correct
Reject H0   error           Decision
Customer ages: Possible Errors
Type I Error        Reject Null Hypothesis When Null is
True ( Error)
Type II Error       Do Not Reject Null When Null is
False ( Error)

Type IError    Reject Null;         Customer age is
Null is True;        mean 40 years but
change marketing
strategy
Type IIError   Don’t Reject Null;   Customer age is not
Null is False;       40 years but don’t
change strategy
Customer age: Possible Costs
of a Type II Error
Aiming products at the wrong market
Lost opportunities
Marketing spending that is not effective
Customer age: Possible Costs
of a Type I Error
Expensive rethink of marketing strategy,
unnecessary.
Loss of profit.

Question: Do we consider the cost of the
study into this decision?
Hypothesis Testing
State Hypotheses

Determine Type I, II errors

Set Significance Level
Customer age: Step 3: Set the
Significance Level, 
Significance Level, , is maximum risk of
making a type I error that decision maker
can “live with.”
Decision maker sets significance level prior
to data collection.
For costly type I error, set  at 0.05 or less.
Guidelines to Selecting a Value
for Alpha
Type I Error Cost   Type II Error Cost   Set Significance Level

High                Low                  .01 or less
Low                 High                 .2 or above
High                High                 .05 or .01
Hypothesis Testing
State Hypotheses

Determine Type I, II errors

Set Significance Level

Run study and collect data
Customer age: Step 4: Run
Study and Collect Data
Put data for each variable in columns in an
Excel spreadsheet with labels at the top of
each column
In this case it is already done for you and
customerages.xls
Hypothesis Testing
State Hypotheses

Determine Type I, II errors

Set Significance Level

Run study and collect data

Make decision, compute p-value
Step 5: The p-value
We want to know how unusual the sample
data is by finding the p-value.
We ask, if the null hypothesis was true what
proportion of samples would be further away
from the hypothesised value (more unusual)
than this sample we have taken?
This is the p-value.
Finding the p-value
Use Statpro:
– Statistical Inference > One sample analysis…
Enter data range
Choose “Hypothesis test for mean”
Specify null value (in this case 40) and
exact form of alternative hypothesis (in this
case “not equal”)
Statpro will give you the p-value for test
The p-value for Customer age
example

p-value is 0.015
level = .01
0  40

51.2

Found using Statpro in Excel.
Risk of making a type I error is 1.5% - more than the level
the company decided it was prepared to accept ()
Interpreting the p-value
If you chose to reject the null hypothesis
(that mean customer age is 40) then you
would have a 1.5% chance of being wrong.
P-value is the actual chance of making a
type I error if you reject the null hypothesis.
Comparing the p-value and 

p-value         Significance Level

Range            0<p<1                  0 <  < 1

Maximum
Actual Probability
Definition                         Probability of Making
of Making  Error
 Error
How Determined
From Sample Data     Manager Determines
and When
Known
After Study          Before Study
Decision-making using the p-value
If p <  then reject the null hypothesis
If p   then accept the null hypothesis
For customer age example:
p >0.01 and  = 0.01 so accept null
Hypothesis Testing Summary
Set Null and Alternative Hypothesis.
– Alternative is generally the challenge to
default
Consider Type I and II Errors. Set .
Run Study, Collect Data, Make Decision,
Compute p-value.
Key Ideas
Develop the alternative hypothesis first. Base it
on the claim (challenge to default situation) that
is to be tested.
Develop the null hypothesis. Together, the null
and the alternative must be mutually exclusive
and exhaustive.
Determine the correct rejection region.
Confidence Intervals and Hypothesis
Tests: what’s the difference?
Confidence intervals centred around sample mean
Hypothesis tests centred around null value and
use sample to test whether null likely to be true
Otherwise same concept. So to do a two-sided
hypothesis test “by hand” you can do a confidence
interval as usual around the null value and then
check to see whether the sample mean falls in the
interval you’ve constructed. If yes – accept the
null. If no – reject the null.
Hypothesis tests for proportions
Check normality assumptions
Use z-value rules of thumb
Remember to use the hypothesised proportion
value to calculate the standard error and to
compute the interval.
Otherwise same “by hand” procedure as for a
hypothesis test for a mean.
There is a separate worksheet in the Excel
template for hypothesis tests for proportions.
Comparing two sample means
When might this be important?
What kinds of questions might you ask?
Important distinction:
– Unpaired or independent samples
– Paired data
Confidence intervals for difference between
means
Procedure for difference of means
for two independent samples
2 Independent Random Samples.

Compute Sample Means and
Stddevs.
2    2
s1   s2
x1  x2  z      
n1   n2

Population Difference
Car repairs
RACV has “approved repairers” where the repair
shop fixes the car and sends the bill directly to
the insurance company.
Suppose RACV wanted to check approved
repairers were charging fair prices.
Collected data for shop A: 36 cars, average cost =
\$1840 and stddev = \$370
Data for shop B: 40 cars, average cost = \$1630
and stddev = \$280
What is a 95% confidence interval for the
difference in costs between the two repair shops?
But then…
But it would make more sense, in a
situation like that, to compare the prices
shop A and shop B were quoting for the
same cars.
This is called a Paired Sample.
Procedure for difference of means
for paired samples
A Paired Random Sample.

Compute Mean Difference, d, and
Stddev of Differences, sd.

sd
d z
n

Population Difference, D
Car repairs with same cars
Send same 36 cars to both Shops A and B
Mean difference in cost (A – B) is \$170
with a standard deviation of difference in
cost of \$39.
What is a 95% confidence interval for the
difference in cost?
What did we do?
Constructed confidence intervals for means
and proportions
Calculated sample sizes required for particular
levels of confidence and margins of error
Looked at the steps required to set up a
hypothesis test
Managerial applications
What did you learn today that makes a
difference to the way you manage?
What are the three most important things to
remember from today’s lecture?
Next class
+ Hedging and More Correlation.
Read Colmar Brunton Opinion Poll and
“Family car is killing us”
Prepare Home Education case. Assessed for
syndicate groups who chose option 1.