Confidence Intervals and Hypothesis Tests

Document Sample
Confidence Intervals and Hypothesis Tests Powered By Docstoc
Confidence intervals for means
  – Margin of error
  – Small populations
Confidence intervals for proportions
Sample size
Introduction to hypothesis testing
Overview of Inference
    Select Simple Random Sample

   Compute Sample Statistics and
       Verify Assumptions

   Construct a Confidence Interval
   that Includes a Margin of Error

      Draw Conclusion about a
       Population Parameter
Metropolitan buses
A simple random sample of 36 buses shows
 a sample mean of 225 passengers carried
 per day per bus. The sample standard
 deviation is 60 passengers.
What’s a 99% confidence interval estimate
 of the mean number of passengers carried
 per bus during a 1-day period?
Metropolitan buses
One is valid because we used simple
 random sampling to select the sample.
Two is valid because the sample is > 30.
 So we can use do this confidence interval.
 Metropolitan buses: Statpro
Use Statpro function in Excel.
Data file is metrobus.xls from website
Population mean number of passengers carried
 per day is between 198 and 253 at 99%
 confidence level
Metropolitan buses: by hand
Compute standard error: 60 / 6 = 10
Compute margin of error = 2.58 x 10 = 25.8
99% confidence interval is (roughly)
 between 199.2 and 250.8
Why is this different from Statpro?
  – Rounding of mean and standard deviation
  – Rule of thumb z value not exact
Does it matter? – not usually
Metropolitan buses: template
This problem is also solvable using the
 Excel template for Confidence Intervals and
 Hypothesis tests.
Enter data for mean, standard deviation,
 number in sample and level of confidence
Interpretation of Confidence
99% confident that interval 225 + 25.8
 contains the unknown population mean
 number of passengers.
This means:
 If we selected 100 samples of size n = 36
 and constructed 100 confidence intervals,
 about 99 would contain the unknown
 population mean and 1 would not.
CI overview
A sample mean is a point estimate of the
 population mean
A confidence interval is an interval estimate
 of the population mean
A confidence interval gives information,
 not only about where the population mean
 is, but also about how accurate the
 information is.
CI overview - procedure
1) Assumptions?
2) /confidence level?
3) Use Statpro on data to get interval or use
  Excel template or
           - compute standard error
           - use z-value rule of thumb to
             compute margin of error
           - write down confidence interval
 Sample Means vs. Proportions
Sample Means
  – Are computed from quantitative data.
  – Can be all possible values, positive or negative.
  – Estimate population means.
Sample Proportions
  – Are computed from yes/no data (binomial)
  – Are numbers between 0 and 1 (inclusive).
  – Estimate population proportions.
Procedure for drawing conclusions
from sample proportions
        Select Simple Random Sample.

         Compute Sample Proportion.
            Check for Normality

           p  z p(1  p) / n
           ˆ     ˆ     ˆ

           Draw Conclusion about
           Population Proportion, p
Procedure for computing CIs
for proportions.
Check np  5 and n(1  p)  5
           ˆ              ˆ
 = 1 - confidence level
Confidence interval is:

    p (1  p )
    ˆ      ˆ            p (1  p )
                        ˆ      ˆ
ˆ               p pz
        n                   n
Mortgage Lending
Last year, of a total of 58,000 customers,
 2.7% defaulted on their mortgage. If this
 year is like last year, what are optimistic
 and pessimistic projections of the
 percentage who will default?
Sample size vs. study cost and
width of confidence interval

Sample Size   Study Cost    Width of CI

                            Small and
  Large          High
                           Wide and Less
   Small         Low
Sample size calculations
For a mean:
                         2   2
              ( z) s
         n          2
            ( MOE )
For a proportion:

            ( z ) p (1  p )
         n             2
               ( MOE )
Secondhand cars again
Suppose the dealer wants to know how
 many people to survey to be able to get
 within 1 year of the average age of the
 population of secondhand car buyers with
 95% confidence.
Product popularity
A manager has done a small initial study
 that shows that 40% of shoppers will buy a
 new product. He wants to be able to
 estimate the population percentage with a
 margin of error of no more than 1% with
 95% confidence and asks your help to
 calculate the necessary sample size.
Three Key Ideas
Sample statistics estimate population
As sample size increases, sample
 statistics tend to better approximate their
 population parameters.
Confidence intervals provide a probable
 range within which the population
 parameter will fall.
What if...
A bus drivers union claimed that on
 average more than 280 people per day travel
 on buses in the Metropolitan Bus example.
 Would you be inclined to believe them?
 Why or why not?
So confidence intervals can be used to test
 out claims made by people about the
 population parameters. Which leads us to….
Hypothesis Testing
           State Hypotheses

       Determine Type I, II errors

         Set Significance Level

       Run study and collect data

     Make decision, compute p-value
 What is a hypothesis test doing?
Usually there is some default assumption or
 common wisdom about the true mean or true
 proportion of a population.
Any sample you take from the population will
 probably not conform exactly to this default or
 common wisdom.
What we want to know is: how unusual does
 your sample data have to be before you are
 willing to say the common wisdom is wrong?
Advantages of Hypothesis
Testing Approach
Hypothesis testing is decision oriented.
  – Is a population parameter less than, equal to, or greater
    than a specific value (has decision-making implications)?
Highlights that two different decision
 making errors are possible.
p-value (prob. value) aids in interpreting
      Hypothesis Testing Process

  Assume the
  population mean
  income is $35K
  (Null Hypothesis)                       Population
                           The Sample
Does X  65 come from
a population with  35?   Mean Is $65K

   No, not likely!
 Null Hypothesis
 Example: Customer ages
A random sample of 28 customers were asked
 their age. The sample mean and sample
 standard deviation are, respectively, 51.2 and
 22.8 years.
Is the company’s claim that the mean age of
 customers is 40 years credible?
What is the p-value? (If we take issue with the
 company, what is the chance, based on this
 sample, that their claims are actually correct?)
The null hypothesis, H0, is usually the “default”
 or current situation. Rejecting the null
 hypothesis will cause us to take a new course of
The alternative hypothesis, H1, is almost
 always a claim made, or challenge to current
Together, the null and alternative hypothesis
 take into account all possible values.
Hypothesis Testing
         State Hypotheses
Step 1: State Hypotheses
CORRECT     H 0 :   40
            H1 :   40
                            Company’s Claim that
INCORRECT    H 0 :   40   average age is 40 is Null
             H1 :   40    Hypothesis. (Null is “default”
                            or current situation)

INCORRECT    H 0 :   40   Null and Alternative Hypoth-
                            eses Must be Mutually
             H1 :   40    Exclusive and Exhaustive.

INCORRECT    H 0 : x  40   Hypotheses about Unknown
                            Population mean, Not Known
             H1 : x  40    Sample mean.
One or two-tailed
What if you care about customers being “young” or
 not : i.e. the null hypothesis is that average age of
 customers is 40 or less.
Alternative hypothesis is that average age of
 customers is more than 40.
This is a one-tailed hypothesis test and hypotheses
 would look like:
                        H 0 :   40
                        H1 :   40
Hypothesis Testing
          State Hypotheses

      Determine Type I, II errors
Type I and II errors
Type I error:
  – reject null hypothesis when it should have been
Type II error:
  – accept null hypothesis when it should have
    been rejected.
Type I and II Errors

              H0 is true in   H0 is false in
              population      population
  Decision:   Correct         Type II
  Accept H0   Decision        error
  Decision:   Type I          Correct
  Reject H0   error           Decision
Customer ages: Possible Errors
 Type I Error        Reject Null Hypothesis When Null is
                      True ( Error)
 Type II Error       Do Not Reject Null When Null is
                      False ( Error)

Type IError    Reject Null;         Customer age is
                Null is True;        mean 40 years but
                                     change marketing
Type IIError   Don’t Reject Null;   Customer age is not
                Null is False;       40 years but don’t
                                     change strategy
Customer age: Possible Costs
of a Type II Error
 Aiming products at the wrong market
 Lost opportunities
 Marketing spending that is not effective
Customer age: Possible Costs
of a Type I Error
 Expensive rethink of marketing strategy,
  advertising and collateral which is
 Loss of profit.

 Question: Do we consider the cost of the
  study into this decision?
Hypothesis Testing
          State Hypotheses

      Determine Type I, II errors

        Set Significance Level
Customer age: Step 3: Set the
Significance Level, 
Significance Level, , is maximum risk of
 making a type I error that decision maker
 can “live with.”
Decision maker sets significance level prior
 to data collection.
For costly type I error, set  at 0.05 or less.
 Guidelines to Selecting a Value
 for Alpha
Type I Error Cost   Type II Error Cost   Set Significance Level

High                Low                  .01 or less
Low                 High                 .2 or above
High                High                 .05 or .01
Hypothesis Testing
          State Hypotheses

      Determine Type I, II errors

        Set Significance Level

      Run study and collect data
Customer age: Step 4: Run
Study and Collect Data
Put data for each variable in columns in an
 Excel spreadsheet with labels at the top of
 each column
In this case it is already done for you and
 the resulting spreadsheet is
Hypothesis Testing
           State Hypotheses

       Determine Type I, II errors

         Set Significance Level

       Run study and collect data

     Make decision, compute p-value
Step 5: The p-value
We want to know how unusual the sample
 data is by finding the p-value.
We ask, if the null hypothesis was true what
 proportion of samples would be further away
 from the hypothesised value (more unusual)
 than this sample we have taken?
This is the p-value.
Finding the p-value
Use Statpro:
  – Statistical Inference > One sample analysis…
Enter data range
Choose “Hypothesis test for mean”
Specify null value (in this case 40) and
 exact form of alternative hypothesis (in this
 case “not equal”)
Statpro will give you the p-value for test
The p-value for Customer age

                        p-value is 0.015
                                            level = .01
    0  40


Found using Statpro in Excel.
Risk of making a type I error is 1.5% - more than the level
the company decided it was prepared to accept ()
Interpreting the p-value
If you chose to reject the null hypothesis
 (that mean customer age is 40) then you
 would have a 1.5% chance of being wrong.
P-value is the actual chance of making a
 type I error if you reject the null hypothesis.
 Comparing the p-value and 

                      p-value         Significance Level

    Range            0<p<1                  0 <  < 1

                 Actual Probability
   Definition                         Probability of Making
                 of Making  Error
                                             Error
How Determined
                 From Sample Data     Manager Determines
and When
                    After Study          Before Study
Decision-making using the p-value
 If p <  then reject the null hypothesis
 If p   then accept the null hypothesis
 For customer age example:
 p >0.01 and  = 0.01 so accept null
Hypothesis Testing Summary
 Set Null and Alternative Hypothesis.
   – Alternative is generally the challenge to
 Consider Type I and II Errors. Set .
 Run Study, Collect Data, Make Decision,
  Compute p-value.
 Key Ideas
Develop the alternative hypothesis first. Base it
 on the claim (challenge to default situation) that
 is to be tested.
Develop the null hypothesis. Together, the null
 and the alternative must be mutually exclusive
 and exhaustive.
Determine the correct rejection region.
  Confidence Intervals and Hypothesis
  Tests: what’s the difference?
Confidence intervals centred around sample mean
Hypothesis tests centred around null value and
 use sample to test whether null likely to be true
Otherwise same concept. So to do a two-sided
 hypothesis test “by hand” you can do a confidence
 interval as usual around the null value and then
 check to see whether the sample mean falls in the
 interval you’ve constructed. If yes – accept the
 null. If no – reject the null.
Hypothesis tests for proportions
Check normality assumptions
Use z-value rules of thumb
Remember to use the hypothesised proportion
 value to calculate the standard error and to
 compute the interval.
Otherwise same “by hand” procedure as for a
 hypothesis test for a mean.
There is a separate worksheet in the Excel
 template for hypothesis tests for proportions.
Comparing two sample means
When might this be important?
What kinds of questions might you ask?
Important distinction:
  – Unpaired or independent samples
  – Paired data
Confidence intervals for difference between
Procedure for difference of means
for two independent samples
       2 Independent Random Samples.

        Compute Sample Means and
                          2    2
                         s1   s2
           x1  x2  z      
                         n1   n2

           Draw Conclusion about
           Population Difference
Car repairs
RACV has “approved repairers” where the repair
 shop fixes the car and sends the bill directly to
 the insurance company.
Suppose RACV wanted to check approved
 repairers were charging fair prices.
Collected data for shop A: 36 cars, average cost =
 $1840 and stddev = $370
Data for shop B: 40 cars, average cost = $1630
 and stddev = $280
What is a 95% confidence interval for the
 difference in costs between the two repair shops?
But then…
But it would make more sense, in a
 situation like that, to compare the prices
 shop A and shop B were quoting for the
 same cars.
This is called a Paired Sample.
Procedure for difference of means
for paired samples
         A Paired Random Sample.

        Compute Mean Difference, d, and
           Stddev of Differences, sd.

                d z

           Draw Conclusion about
          Population Difference, D
Car repairs with same cars
Send same 36 cars to both Shops A and B
Mean difference in cost (A – B) is $170
 with a standard deviation of difference in
 cost of $39.
What is a 95% confidence interval for the
 difference in cost?
 What did we do?
Constructed confidence intervals for means
 and proportions
Calculated sample sizes required for particular
 levels of confidence and margins of error
Looked at the steps required to set up a
 hypothesis test
Managerial applications
What did you learn today that makes a
 difference to the way you manage?
What are the three most important things to
 remember from today’s lecture?
Next class
Read supplementary material on Correlation
 + Hedging and More Correlation.
Read Colmar Brunton Opinion Poll and
 “Family car is killing us”
Prepare Home Education case. Assessed for
 syndicate groups who chose option 1.
Download data file funding.xls and bring on

Shared By: