# Hypothesis Testing - PowerPoint 3

Document Sample

```					  Hypothesis Testing

Keller’s powerpnt
modified by Tony WONG

1
Description of a Single
Population

2
11.1 Introduction
• In this chapter we utilize the approach developed
before for making statistical inference about
populations.
– Identify the parameter to be estimated or tested .
– Specify the parameter’s estimator and its sampling
distribution.
– Construct an interval estimator or perform a test.

3
• We will develop techniques to estimate and test
three population parameters.
– The expected value m
– The variance s2
– The population proportion p (for qualitative data)
• Examples
– A bank conducts a survey to estimate the number of
times customer will actually use ATM machines.
– A random sample of processing times is taken to test the
mean production time and the variance of production
time on a production line.
4
Mean When the Population
Standard Deviation Is Unknown
• Recall that when s is known x is normally distributed
– If the sample is drawn from a normal population, or if
– the population is not normal but the sample is sufficiently large.

• When s is unknown, we use its point estimator s,
and the Z statistic is replaced then by the t-statistic

5
ZZZt tm t
t t
ZZ x
Z
ttt
t t tt t 
x m
s n
ss
ss s ss ss s      sssss n
ss        s s
When the sampled population is normally distributed,
the statistic t is Student t distributed.
The “degrees of freedom”,        The t distribution is mound-shaped,
a function of the sample size    and symmetrical around zero.
distribution is (compared to the    d.f. = n2
normal distribution)
d.f. = n1
n1 < n2
0                    6
Probability calculations for the t distribution
– The t table provides critical value for various
probabilities of interest.
– The form of the probabilities that appear in table 4
Appendix B are:
P(t > tA, d.f.) = A
– For a given degree of freedom, and for a
predetermined right hand tail probability A, the entry
in the table is the corresponding tA.
– These values are used in computing interval
estimates and performing hypotheses tests.
7
A = .05

tA
Degrees of Freedom   t.100   t.05        t.025      t.01     t.005
1           3.078   6.314       12.706    31.821   63.657
2           1.886    2.92       4.303     6.965    9.925
.             .       .           .         .        .
.             .       .           .         .        .
20          1.325   1.725       2.086     2.528    2.845
.             .       .           .          .        .
.             .       .           .          .        .
200          1.286   1.653       1.972     2.345    2.601
            1.282   1.645        1.96     2.326    2.576

8
Testing the population mean when the
population standard deviation is unknown

• If the population is normally distributed, the test
statistic for m when s is unknown is t.

x m
t
s   n
• This statistic is Student t distributed with n-1
degrees of freedom.
9
• Example 11.1Trainees productivity

10
• Example 11.1 Trainees productivity
– In order to determine the number of workers required
to meet demand, the productivity of newly hired
trainees is studied.

– It is believed that trainees can process and distribute
more than 450 packages per hour within one week
of hiring.

– Can we conclude that this belief is correct, based on
productivity observation of 50 trainees, See file
XM11-01.
11
• Solution
– The problem objective is to describe the population
of the number of packages processed in one hour.
– The data are quantitative.
H0:m = 450
H1:m > 450
– The t statistic
x m
t
s   n          d.f. = n - 1 = 49

12
– Solving by hand
• The rejection region is t > ta,n - 1
• ta,n - 1 = t.05,49 = approximately to 1.676.
• From the data we have

    x i  23,019           x i2  10,671,357, thus
23,019
x         460.38, and
50
 x              2

s2   
  x i2
n
i

 1507.55.
n 1
s  1507.55  38.83
13
Rejection region
• The test statistic is                       1.676   1.89

x m          460.38  450
t                                1.89
s   n         38.83   50

• Since 1.89 > 1.676 we reject the null hypothesis in favor
of the alternative.
• There is sufficient evidence to infer that the mean
productivity of trainees one week after being hired is
greater than 450 packages at .05 significance level.

14
Test of Hypothesis About MU (SIGMA Unknown)
Data
505     Test of MU = 450 Vs MU greater than 450
400     Sample standard deviation = 38.8271
499     Sample mean = 460.38
415     Test Statistic: t = 1.8904                                    .05
418     P-Value = 0.0323
.                                                                          .0323
.
.

• Since .0323 < .05, we reject the null hypothesis in favor of the
alternative.
• There is sufficient evidence to infer that the mean productivity of
trainees one week after being hired is greater than 450
packages at .05 significance level.
15
Estimating the population mean when the
population standard deviation is unknown

• Confidence interval estimator of m when s is
unknown

s
x  ta   2       d.f .  n  1
n

16
• Example 11.2
– An investor is trying to estimate the return on
investment in companies that won quality awards
last year.
– A random sample of 50 such companies is selected,
and the return on investment is calculated had he
invested in them.
– Construct a 95% confidence interval for the mean
return.

17
• Solution
– The problem objective is to describe the population
of annual returns from buying shares of quality
award-winners.
– The data are quantitative.
– Solving by hand
2
• From the data we determine x  14.75 s  66.90
s  8.18
s                     8.18
x  ta 2        14.75  2.009           12.43,17.07
n                      50

18
0.95 Confidence Interval Estimate of MU (SIGMA Unknown)
Data
18.58    Sample mean = 14.7522
10.35    Sample standard deviation = 8.1793
22.41    Lower confidence limit = 12.4277
4.51    Upper confidence limit = 17.0767
.
.
.             [12,42, 17.07]

19
Checking the required conditions

• We need to check that the population is normally
distributed, or at least not extremely non-normal.
• There are statistical methods to test for normality
(to be introduced later in the book).
• Currently, we can plot the histogram of the data
set.

20
14
A Histogram for XM11- 01
12
10
8
6
4
2
0
400   425    450     475        500    525    550   575   More
Packages
12
A Histogram for XM11- 02
10
8
6
4
2
0
5     10      15         20         25     30     35    More
Returns
21
Variance
• Some times we are interested in making inference
• Examples:
– The consistency of a production process for quality
control purposes.
– Investors use variance as a measure of risk.
• To draw inference about variability, the parameter
of interest is s2.
22
• The sample variance s2 is an unbiased,
consistent and efficient point estimator for s2.
(n  1) s 2
• The statistic    s2
has a distribution called
Chi-squared, if the population is normally
distributed.
(n  1)s 2
2           2
d.f .  n  1
d.f. = 1
s

d.f. = 5   d.f. = 10

23
The 2 table
A =.01

A =.01
1 - A =.99

21-A                                     2 A
.990                                     .010

2.01,10  23.2093
Degrees of
freedom  .995           2.990 2.975                   2.010 2.005
2
1     0.0000393    0.0001571     0.0009821   .   .    6.6349    7.87944
.
.
10      2.15585      2.55821       3.24697    .   . 23.2093      25.1882
.          .            .             .                .            .
.          .            .             .      .   .     .            .

24
Estimating the population variance

• From the following probability statement
P(21-a/2 < 2 < 2a/2) = 1-a
we have (by substituting 2 = [(n - 1)s2]/s2.)

(n  1)s 2     2    (n  1)s 2
s 
2 / 2
a
2
1a / 2

25
• Example 11.3 (operation management application)
– A container-filling machine is believed to fill 1 liter
containers so consistently, that the variance of the
filling will be less than 1 cc (.001 liter).
– To test this belief a random sample of 25 1-liter fills
was taken, and the results recorded.
– The data are provided in file XM11-03.
– Do these data support the belief that the variance is
less than 1cc at 5% significance level?

26
• Solution
– The problem objective is to describe the population of
1-liter fills from a filling machine.
– The data are quantitative, and we are interested in the
variability of the fills.
– The complete test is:
H0: s2 = 1
H1: s2 <1 The test statistic is  2     (n  1)s 2
2
.
s
2
The rejection region is  2  1a,n1

27
– Solving by hand
• Note that (n - 1)s2 = S(xi - x)2 = Sxi2 - Sxi/n
• From the sample (data is presented in units of cc-1000 to
avoid rounding) we can calculate Sxi = -3.6, and Sxi2 = 21.3.
• Then (n - 1)s2 = 21.3 - (-3.6)2/25 = 20.8.
• The complete test is shown next There is insufficient evidence
(n  1)s 2 20.8             to reject the hypothesis that
 
2
 2  20.8,   the variance is equal to 1cc,
s 2
1
in favor of the hypothesis that
1a ,n1   .95,251  13.8484.
2           2
it is smaller.
Since 13.8484  20.8, do not reject
the null hypothesis.
28
a = .05                               1-a = .95

Rejection
region
 2  13.8484
13.8484 20.8          2
 .295,251

Do not reject the null hypothesis

29
Proportion
• When the population consists of qualitative or
categorical data, the only inference we can make
is about the proportion of occurrence of a certain
value.
• The parameter “p” was used before to calculate
probabilities using the binomial distribution.

30
• Statistic and sampling distribution
– the statistic employed is
x
p
ˆ       where
n
x  the number of successes.
n  sample size.
– Under certain conditions, [np > 5 and n(1-p) > 5],
ˆ
p is approximately normally distributed, with
m = p and s2 = p(1 - p)/n.

31
• Test statistic for p

p p
ˆ
Z
p(1  p) / n
where np  5 and n(1  p)  5

• Interval estimator for p (1-a confidence level)
p  za / 2 p(1  p) / n
ˆ          ˆ     ˆ
provided np  5 and n(1  p)  5
ˆ               ˆ
32
• Example 11.5 (marketing application)
– For a new newspaper to be financially viable, it has
to capture at least 12% of the Toronto market.
– In a survey conducted among 400 randomly selected
prospective readers, 58 participants indicated they
would subscribe to the newspaper if its cost did not
exceed \$20 a month.
– Can the publisher conclude that the proposed
newspaper will be financially viable at 10%
significance level?

33
• Solution
– The problem objective is to describe the population
– The responses to the survey are qualitative.
– The parameter to be tested is “p”.
– The hypotheses are:
H0: p = .12
H1: p > .12       We want to prove that the newspaper is financially viable

34
– Solving by hand
• The rejection region is z > za = z.10 = 1.28.
ˆ
• The sample proportion is p  58 400  .145
• The value of the test statistic is
ˆ
pp                .145  .12
Z                                            1.54
p(1  p) / n       .12(1  .12) / 400

• The p-value is = P(Z>1.54) = .0618

There is sufficient evidence to reject the null hypothesis
in favor of the alternative hypothesis. At 10% significance
level we can argue that at least 12% of Toronto’s readers
will subscribe to the new newspaper.
35
Test of p = 0.12 Vs p greater than 0.12
Sample Proportion = 0.145
Test Statistic = 1.5386
P-Value = 0.0619

36
• Example 11.6 (marketing application)
– In a survey of 2000 TV viewers at 11.40 p.m. on a
certain night, 226 indicated they watched “The Tonight
Show”.
– Estimate the number of TVs tuned to the Tonight Show
in a typical night, if there are 100 million potential
television sets. Use a 95% confidence level.
– Solution
ˆ           ˆ     ˆ
p  z a / 2 p(1  p) / n  .113  1.96 .113(.887) / 2000
.113  .014
37
Selecting the Sample Size to Estimate the
Proportion
• The interval estimator for the proportion is

ˆ           ˆ     ˆ
p  z a / 2 p(1  p) / n

• Thus, if we wish to estimate the proportion to within W,
ˆ     ˆ
we can write W  z a / 2 p(1  p) / n
• The required sample size is
2
 z a / 2 p(1  p) / n
ˆ     ˆ          
n                          
           W              
                                        38
• Example
– Suppose we want to estimate the proportion of
customers who prefer our company’s brand to within
.03 with 95% confidence.
– Find the sample size needed to guarantee that this
requirement is met.                              2
– Solution                        1.96 p(1  p) 
ˆ     ˆ
n                 
W = .03; 1 - a = .95,                           .03         
                    
therefore a/2 = .025,         Since the sample has not yet
so z.025 = 1.96               been taken, the sample proportion
is still unknown.

We proceed using either one of the
following two methods:               39
• Method 1:
ˆ
– There is no knowledge about the value of p
ˆ
• Let p  .5 , which results in the largest possible n needed
ˆ
for a p  W 1-a confidence interval.
• If the sample proportion does not equal .5, the actual W
will be narrower than .03.
• Method 2:
ˆ
– There is some idea about the value of p
ˆ
• Use the value of p to calculate the sample size
2
 1.96 .5(1  .5)                                   2
n                     1,068    1.96 .2(1  .2)   

       .03         
         n                     683

       .03         
     40
Chapter 12

Comparison of
Two Populations
41
12.1 Introduction
• Variety of techniques are presented whose
objective is to compare two populations.

• We are interested in:
– The difference between two means.
– The ratio of two variances.
– The difference between two proportions.

42
between Two Means: Independent
Samples
• Two random samples are drawn from the two
populations of interest.

• Because we are interested in the difference
between the two means, we shall build the
statistic x for each sample (and support the
analysis by the statistic S2 as well).         43
The Sampling Distribution x  x
of              1       2

   x1  x 2 is normally distributed if the (original)
population distributions are normal .

   x1  x 2 is approximately normally distributed if the
(original) population is not normal, but the sample
size is large.
   Expected value of    x 1  x 2 is m1 - m2

   The variance of   x 1  x 2 is s12/n1 + s22/n2
44
• If the sampling distribution of x1  x 2 is normal or
approximately normal we can write:

( x 1  x 2 )  ( m1  m 2 )
Z
s12 s 2
 2
n1 n2

• Z can be used to build a test statistic or a
confidence interval for m1 - m2

45
• Practically, the “Z” statistic is hardly used,
because the population variances are not known.

( x 1  x 2 )  ( m1  m 2 )
t
Z
s1122 s222
? S2
S       ?

n1 n2

• Instead, we construct a “t” statistic using the
sample “variances” (S12 and S22).

46
• Two cases are considered when producing the
t-statistic.

– The two unknown population variances are equal.

– The two unknown population variances are not equal.

47
Case I: The two variances are equal
• Calculate the pooled variance estimate by:
(n1  1)s12  (n2  1)s 22
Sp 
2

n1  n2  2                               n2 = 15
n1 = 10

S   2   Sp
2
S2
2
1

Example: S12 = 25; S22 = 30; n1 = 10; n2 = 15. Then,

(10  1)(25)  (15  1)(30)
Sp 
2
 28.04347
10  15  2
48
• Construct the t-statistic as follows:

( x1  x 2 )  (m1  m 2 )
t
2 1       1
sp (  )
n1 n2
d.f .  n1  n2  2

• Perform a hypothesis test           Build an interval estimate
H0: m1  m2 = 0
H1: m1  m2 > 0;                                       1 1
( x1  x 2 )  t a 2   sp (  )
2

n1 n2
or < 0;        or   0    where 1  a is the confidence level.
49
Case II: The two variances are unequal

( x1  x2 )  ( m1  m 2 )
t
2
s12 s2
(  )
n1 n2

( s12 n1  s2 / n2 ) 2
2
d.f.                              2
2       2      2
( s n1 ) ( s n2 )
1
     2
n1  1   n2  1
50
Run a hypothesis test
as needed, or,
build an interval estimate

Estimator

2
s1    s2
(x 1  x 2 )  t a 2      2
n1 n 2
where 1  a is the confidence level.

51
52
• Example 12.1
– Do people who eat high-fiber cereal for
breakfast consume, on average, fewer
calories for lunch than people who do not eat
high-fiber cereal for breakfast?
– A sample of 150 people was randomly drawn.
Each person was identified as a consumer or
a non-consumer of high-fiber cereal.
– For each person the number of calories
consumed at lunch was recorded.
53
Calories consumed at lunch
Consmers Non-cmrs
568      705
498      819
589      706            Solution:
681      509
540      613
646      582            • The data are quantitative.
636      601
739      608
539      787            • The parameter to be tested is
596      573              the difference between two means.
607      428
529      754
637      741            • The claim to be tested is that
617      628
633      537              mean caloric intake of consumers (m1)
555      748              is less than that of non-consumers (m2).
.        .
.        .
.        .
.        .

54
• Identifying the technique
–The hypotheses are:

H0: (m1 - m2) = 0
H1: (m1 - m2) < 0   m1 < m2)

– To check the relationships between the variances, we use a
computer output to find the samples’ standard deviations.
We have S1 = 64.05, and S2 = 103.29. It appears that the
variances are unequal.

– We run the t - test for unequal variances.

55
Calories consumed at lunch
Consmers Non-cmrs          t-Test: Two-Sample Assuming
568      705
498      819            Unequal Variances
589      706                                      Nonconsumers
Consumers
681      509            Mean                 604.023 633.234
540      613            Variance             4102.98 10669.8
646      582            Observations              43     107
636      601                                       0
Hypothesized Mean Difference
739      608            df                       123
539      787            t Stat              -2.09107
596      573            P(T<=t) one-tail 0.01929
607      428            t Critical one-tail 1.65734
529      754            P(T<=t) two-tail 0.03858
637      741            t Critical two-tail 1.97944
617      628
633      537
555      748
.        .              • At 5% significance level there is
.        .
.        .                sufficient evidence to reject the null
.        .                hypothesis.

56
• Solving by hand
– The interval estimator for the difference between two
means is

s2 s2
(x  x )  t     ( 1  2)
1 2        a 2 n    n
1   2
64.05 2 103.29 2
 (604.02  633.239)  1.9796        
43      107
 29.21  27.65

57
• Example 12.2
– Do job design (referring to worker movements) affect
worker’s productivity?

– Two job designs are being considered for the
production of a new computer desk.

– Two samples are randomly and independently selected
• A sample of 25 workers assembled a desk using design A.
• A sample of 25 workers assembled the desk using design B.
• The assembly times were recorded

– Do the assembly times of the two designs differs?       58
Assembly times in Minutes
Design-A Design-B
6.8      5.2
5.0      6.7
7.9      5.7
5.2      6.6   Solution
7.6      8.5
5.0      6.5
5.9      5.9   • The data are quantitative.
5.2      6.7
6.5      6.6
.        .
• The parameter of interest is the difference
.        .      between two population means.
.        .
.        .
• The claim to be tested is whether a difference
between the two designs exists.

59
• Solving by hand                             (6.288  6.016)  0
t                         0.93
1    1
–The hypotheses test is:                      1.075(  )
25 25
H0: (m1 - m2) = 0                d.f .  25  25  2  48
H1: (m1 - m2)  0

– To check the relationship between the two variances calculate
the value of S1 and S2. We have S1= 0.92, and S2 =1.14.
We can infer that the two variances are equal to one another.

– To calculate the t-statistic we have:
Let us determine the
x1  6.288 x 2  6.016 s  0.8481 s  1.2996
2            2
1            2
rejection region
(25  1)(0.8481)  (25  1)(1.2996)
Sp 
2
 1.075
25  25  2
60
• The rejection region is   t  t a 2,d.f.  t 0.025,48  2.009
Notice the absolute value
For a = 0.05
• The test: Since t= 0.93 < 2.009, there is
|t|
insufficient evidence to reject the null hypothesis.

.025
Rejection region
.093 2.009
61
• Conclusion: From this experiment, it is unclear at
5% significance level if the two job designs are
different in terms of worker’s productivity.

.025
Rejection region
.093 2.009
62
Design-A Design-B
6.8      5.2                   The Excel printout
5.0      6.7
7.9      5.7
5.2      6.6
7.6      8.5
5.0      6.5
5.9      5.9                t-Test: Two-Sample Assuming Equal Variances
5.2      6.7
6.5      6.6                                        Design-A      Design-B
.        .                 Mean                    6.288         6.016
.        .                 Variance            S 1 0.847766667
2
1.3030667        S2
2
.        .                 Observations            25            25
.        .                 Pooled Variance         1.075416667       Sp  2

0
Hypothesized Mean Difference
Degrees of freedom             df                      48            m1  m 2
t - statistic                  t Stat                  0.927332603
P-value of the one tail test   P(T<=t) one-tail        0.179196744
t Critical one-tail     1.677224191
P-value of the two tail test   P(T<=t) two-tail        0.358393488
t Critical two-tail     2.01063358
63
A 95% confidence interval for m1 - m2 is calculated as follows:

1 1
( x1  x 2 )  t a 2   sp (  ) 
2

n1 n2
1    1
 6.288  6.016  2.0106 1.075(  ) 
25 25
 0.272  0.5896  [ 0.3176, 0.8616 ]

Thus, at 95% confidence level
-0.3176 < m1 - m2 < 0.8616

Notice: “Zero” is included in the interval
64
Checking the required Conditions for the
equal variances case (example 12.2)
12
Design A                        The distributions are not
10                                               bell shaped, but they
8                                               seem to be approximately
6                                               normal. Since the technique
4                                               is robust, we can be confident
0
5   5.8    6.6
7
7.4   8.2
Design B
More

6
5
4
3
2
1
0
4.2   5     5.8    6.6      7.4    More
65
12.4 Matched Pairs Experiment
• What is a matched pair experiment?

• Why matched pairs experiments are needed?

• How do we deal with data produced in this way?

The following example demonstrates a situation
where a matched pair experiment is the correct
approach to testing the difference between two
population means.

66
67
Example 12.3
• To determine whether a new steel-belted radial tire lasts
longer than a current model, the manufacturer designs
the following experiment.
– A pair of newly designed tires are installed on the rear wheels
of 20 randomly selected cars.
– A pair of currently used tires are installed on the rear wheels
of another 20 cars.
– Drivers drive in their usual way until the tires worn out.
– The number of miles driven by each driver were recorded.
See data next.
68
Solution
New-Design   Exstng-Dsn       • Compare two populations of
70           47
83           65             quantitative data.
78           59
46           61
74           75
56           65
74
52
73
85           • The parameter is m1 - m2
99           97
57           84
77           72
84           39
72
98
72
91
The hypotheses are:
81
63
64
63                            H0: (m1 - m2) = 0
88           79
69           74                            H1: (m1 - m2) > 0
54           76
97           43

m1    Mean distance driven before worn out
occurs for the new design tires

m2    Mean distance driven before worn out
occurs for the existing design tires         69
• The hypotheses are                      t-Test: Two-Sample Assuming
Equal Variances
H0: m1 - m2 = 0                                       New Dsgn Exstng dsgn
Mean                        73.6      69.2
H1: m1 - m2 > 0                  Variance            243.4105263      226.8
Observations                   20       20
Pooled Variance 235.1052632
The test statistic is            Hypothesized Mean Difference0
df                             38
x1  x 2  ( m1  m 2 )   t Stat              0.907447484
t
1 1                P(T<=t) one-tail 0.184944575
s(  )
2
p                   t Critical one-tail 1.685953066
n1 n1              P(T<=t) two-tail     0.36988915
t Critical two-tail 2.024394234
We run the t test, and
We conclude that there is insufficient
obtain the following   evidence to reject H0 in favor of H1.
Excel results.                                             70
7
New design
6
5
4
3
2
1
0
45     60     75     90    105    More

12
Existing design
10
8
6
4
2
0
45     60     75    90     105   More

While the sample mean of the new design is larger than the sample mean
of the existing design, the variability within each sample is large enough
for the sample distributions to overlap and cover about the same range.
It is therefore difficult to argue that one expected value is different than
the other.                                                                     71
• Example 12.4                                   Car
1
New-Dsn Exst-Dsn
57       48
– to eliminate variability                  2
3
64
102
50
89
among Sample
t-Test: Paired Twoobservations within                4         62       56
each
for Meanssample the experiment
5         81       78
6         87       75
was redone.        New-Dsn Exst-Dsn      7         61       50
Mean                                73.6    69.05    8         62       49
–
Variance One tire of each type was 316.366
242.779              9         74       70
10         62       66
Observationsinstalled on the rear wheel of 20 20
20
11        100       98
Pearson Correlation             0.91468
randomly selected
Hypothesized Mean Differencecars (each
12         90       86
0            13         83       78
df          car was sampled twice, 19 thus          14         84       90
t Stat
creating a pair of observations).
P(T<=t) one-tail
2.81759
0.0055
15         86       98
16         62       58
– The number of miles until
t Critical one-tail             1.72913             17         67       58
18         40       41
wear-out was recorded
P(T<=t) two-tail
t Critical two-tail
0.01099
2.09302             19         71       61
20         77       82
72
The range of observations
sample A

So what really
The
happened here? values each sample consists of might markedly vary...

The range of observations
sample B

73
Differences

...but the differences between pairs of observations
might be quite close to one another, resulting in a small
variability.                                       The range of the
differences

0

74
Observe the statistic t shown below
and notice how a small variability of
the differences (small sD) helps in
rejecting the null hypothesis.

75
• Solving by hand
– Calculate the difference for each xi
– Calculate the average differences and the standard
deviation of the differences
– Build the statistics as follows:
xD  m D
t
sD       nD
– Run the hypothesis test using t distribution with nD - 1
degrees of freedom.
76
– The hypotheses test for this problem is
H0: mD = 0                     New-Dsn Exst-Dsn Difference
57       48         9

H1: mD > 0                           region is:
The rejection 64       50        14
102       89        13
=
t > ta with d.f. = 20-1 56 19.
62                  6

If a = .05, t.05,19 = 1.729.
81
87
78
75
3
12
The statistic is                         61
62
50
49
11
13
xD  m D             Since 2.817 > 1.729, there
74         70         4

t                                         62
100
66
is sufficient evidence in the data
98
-4
2
sD      nD           to reject the null hypothesis in
90
83
86
78
4
5
favor of the alternative hypothesis.
4.55  0
84         90        -6
86         98       -12
                                       62
67
58
58
4
9
7.22186           20 Conclusion: At 5% significance
40         41        -1

level the new type tires last longer
71         61        10

 2.817                                 77
than the current type.
82
Average =
-5
4.55
Standard Deviation =   7.2218677
Estimating the mean difference
Interval Estimator of m D

sD
x D  t a / 2, n D 1
nD

The 95% confidence int erval of the mean difference
7.22
in Example 12.4 is 4.55  2.093       4.55  3.38
20

78
Checking the required conditions
for the paired observations case
• The validity of the results depends on the
normality of the differences.
8

6

4

2

0
-12   -6   0   6    12   More

79
of two variances
• In this section we discuss how to compare the
variability of two populations.
• In particular, we draw inference about the ratio of
two population variances.
• This question is interesting because:
– Variances can be used to evaluate the consistency of
processes.
– The relationships between variances determine the technique
used to test relationships between mean values
80
• Point estimator of s12/s22

– Recall that S2 is an unbiased estimator of s2.
– Therefore, it is not surprising that we estimate s12/s22
by S12/S22.

• Sampling distribution for s12/s22
– The statistic [S12/s12] / [S22/s22] follows the F distribution.
– The test statistic for s12/s22 is derived from this
statistic.
81
• Testing s12 / s22

– Our null hypothesis is always

H0: s12 / s22 = 1

S12/s12
– Under this null hypothesis the F statistic F =
becomes                                        S22/s22

S12
F=
S22
82
83
Calories consumed at lunch
Example 12.5                                       Consmers Non-cmrs
568        705
498        819
(see example 12.1)                                 589        706
The hypotheses are:                           681        509
In order to2perform a                              540        613
s
test regarding average
H0: 1  1
646        582
636        601
2
s2
consumption of                                     739        608
2                                      539        787
at
calories s1people’s                                596        573
lunch 1in s 2  1 to the Two-Sample for Variances 607
H : relation
F-Test                       529
428
754
2
inclusion of high-fiber                            637
617
Consumers Nonconsumers
741
628
cereal in their       Mean       604.0232558       633
633.2336449      537
Variance 4102.975637         555
10669.76565      748
breakfast, the variance
Observations        43        . 107      .
. 106      .
ratio of two samplesF
df                  42
0.384542245        .          .
has to be tested first.          0.000368433
P(F<=f) one-tail
0.637072617
F Critical one-tail
.          .
84
• Solving by hand
– The rejection region is
F>Fa/2,n1,n2 or        F<1/Fa2,n2,n1
which becomes (for a=0.05)...
F  Fa / 2,n1,n 2  F.025,42,106  F.025,40,120  1.61
F  1/ Fa / 2,n 2,n1  1/ F.025,106,42  1/ F.025,120,40  .63

– The F statistic value is F=S12/S22 = .3845

– Conclusion: Because .3845<.63 we can reject the null
hypothesis in favor of the alternative hypothesis.

– There is sufficient evidence in the data to argue at 5%
significance level that the variance of the two groups differ.    85
Estimating the Ratio of Two Population
Variances
• From the statistic F = [S12/s12] / [S22/s22] we can
isolate s12/s22 and build the following interval
estimator:
 s1 
2
1        s1  s 1 
2     2
                  2   2 Fa / 2,n 2,n1
 s2  F            s2  s2 
 2  a / 2,n1,n 2        
w here n1  n  1 and n 2  n2  1

86
• Example 12.6
– Determine the 95% confidence interval estimate of
the ratio of the two population variances in example
12.1
– Solution
• we find Fa/2,v1,v2 = F.025,40,120 = 1.61 (approximately)
Fa/2,v2,v1 = F.025,120,40 = 1.72 (approximately)
• LCL = (s12/s22)[1/ Fa/2,v1,v2 ]
= (4102.98/10,669.770)[1/1.61]= .2388
• UCL = (s12/s22)[ Fa/2,v2,v1 ]
= (4102.98/10,669.770)[1.72]= .6614
87
between two population proportions
• In this section we deal with two populations
whose data are qualitative.
• When data are qualitative we can (only) ask
questions regarding the proportions of
occurrence of certain outcomes.
• Thus, we hypothesize on the difference p1-p2,
and draw an inference from the hypothesis test.

88
• Sampling Distribution of the Difference
ˆ     ˆ
p1  p 2
Between Two sample proportions
– Two random samples are drawn from two populations.
– The number of successes in each sample is recorded.
– The sample proportions are computed.
Sample 1
Sample size n1               Sample 2
Number of successes x1       Sample size n2
Sample proportion            Number of successes x2
x1
p1 
ˆ                     Sample proportion
n1                           x2
ˆ
p2 
n2     89
ˆ     ˆ
– The statistic p1  p 2 is approximately normally
distributed if n1p1, n1(1 - p1), n2p2, n2(1 - p2) are all
equal to or greater than 5.         Because p1, p2, are unknown,
ˆ ˆ
– The mean of p1  p 2 is p1 -                   ˆ ˆ       ˆ    ˆ
p2. Thus, n1p1,n1q1,n2p2 ,n2q2
are all equal to or greater than 5.
ˆ ˆ
– The variance of p1  p 2 is p1(1-p1) /n1)+ (p2(1-p2)/n2)

The statistic
ˆ ˆ
(p1  p 2 )  (p1  p 2 )
Z
p1 (1  p1 ) p 2 (1  p 2 )

n1            n2
is approximately normally distribute d
90
• Testing the Difference between Two
Population p1  p 2 Proportions
– We hypothesize on the difference between the two
proportions, p1 - p2.
– There are two cases to consider:
Case 1:                               Case 2:
H0: p1-p2 =0                          H0: p1-p2 =D (D is not equal to 0)
Calculate the pooled proportion       Do not pool the data
x1  x 2                               x1           x2
ˆ
p                                   ˆ
p1          ˆ
p2 
n1  n 2                               n1           n2
Then                                  Then
ˆ     ˆ
(p1  p 2 )  (p1  p 2 )                       ˆ     ˆ
(p1  p 2 )  D
Z                                    Z
1     1                     ˆ       ˆ       ˆ      ˆ
p1 (1  p1 ) p 2 (1  p 2 )
ˆ      ˆ
p(1  p)(  )                                       
n1 n2                             n1             n2
91
• Example 12.7
– A research project employing 22,000 American
physicians was conduct to discover whether aspirin
can prevent heart attacks.
– Half of the participants in the research took aspirin,
and half took placebo.
– In a three years period,104 of those who took aspirin
heart attacks.
– Is aspirin effective in preventing heart attacks?

92
• Solution
– Identifying the technique
• The problem objective is to compare the population of
those who take aspirin with those who do not.
• The data is qualitative (Take/do not take aspirin)
• The hypotheses test are
Population 1 - aspirin takers
H0: p1 - p2 = 0           Population 2 - placebo takers
H1: p1 - p2 < 0
• We identify here case 1 so
ˆ ˆ
(p1  p 2 )  (p1  p 2 )
Z
1 1
p       ˆ
ˆ (1  p)(  )
n1 n2
93
– Solving by hand
• For a 5% significance level the rejection region is
z < -za = -z.05 = -1.645
- 5.02 < - 1.645, so reject
The sample proportion s are          the null hypothesis.
ˆ                                 ˆ
p1  104 11,000  .00945, and p 2  189 11,000  .01718

The pooled proportion is
ˆ
p  ( x1  x 2 ) (n1  n2 )  (104  189) (11,000  11,000)  .01332

The z statistic becomes
ˆ ˆ
( p1  p 2 )  ( p1  p 2 )         .009455  .01718
Z                                                               5.02
1 1                               1      1
ˆ      ˆ
p(1  p)(  )              .01332(.98668)(              )
n1 n2                           11,000 11,000
94
• Example 12.8 (Marketing application)
– Management needs to decide which of two new
packaging designs to adopt, to help improve sales of a
soap.
– A study is performed in two communities:
• Design A is distributed in Community 1.
• Design B is distributed in Community 2.
• The old design packages is still offered in both communities.
– For design A to be financially viable it has to outsell
design B by at least 3%.

95
– Summary of the experiment results
• Community 1 - 580 packages with new design A sold
324 packages with old design sold
• Community 2 - 604 packages with new design B sold
442 packages with old design sold
– Use 1% significance level and perform a test to find
which type of packaging to use.

96
• Solution
– Identifying the technique
• The problem objective is to compare two populations,
consisting of the values “purchase of the new design”,
and “purchase of the old design”.
• Data are qualitative. We need to test p1 - p2..
• The hypotheses to test are
H0: p1 - p2 = .03
H1: p1 - p2 > .03
• We have to perform case 2 of the test for difference in
proportions (the difference is not equal to zero).

97
• Solving by hand
ˆ     ˆ
(p1  p 2 )  D
Z
ˆ       ˆ       ˆ      ˆ
p1 (1  p1 ) p 2 (1  p 2 )

n1             n2
 580   604 
                         .03
580  324   604  442 
                                   1.58
.642(1  .642) .577(1  .577)

904           1046
.642                    The rejection region is z > za = z.01 = 2.33.
Conclusion: Do not reject the null hypothesis.
There is insufficient evidence to infer that
packaging with design A will outsell design B
by 3% or more.
98
• Estimating the Difference Between Two
Population Proportions
ˆ       ˆ     ˆ       ˆ
p1 (1  p1 ) p 2 (1  p 2 )
ˆ    ˆ
(p1  p 2 )                
n1           n2

• Example 12.9
Estimate with 95% the proportion of men who would avoid a heart
attack if they take aspirin regularly.

.009455(.999545 ) .01718(.98282 )
(.009455  .01718 )  1.96                                    
11,000           11,000
 [ .010753,  .004697 ]

99
12.7 Market Segmentation
(Optional)
• Marketing Segmentation is a statistical analysis
aimed at determining the differences that exist
product.
• Statistics plays a major role in market segmentation.
– Surveys are used to gather the relevant data.
– Statistical tests are used to differentiate among segments.
– Sales and profit estimates are derived.
100
• Example 12.10
– A new company in the market offers no-wait services
for car oil and filter change.

– The company wants to make decisions about where to

– A sample of 1000 car owners was selected. The
drivers were asked to report whether or not they used
a no-wait station, as well as several characteristics of
their lives (including age).

101
– The research should reveal whether differences in age
exist between customers of no-wait service and
customers of other types of facilities (see file XM12-10)
• Solution
– Identifying the technique
• The problem objective is to compare the population of ages
of no-wait customers, to the population of ages of other
facility users.
• Data are quantitative.
• Samples are independent.
• The parameter to be tested is m1 - m2., (m represents mean
age)
102
– The hypotheses are
H0: m1 - m2 = 0
H1: m1 - m2 = 0

– When testing for the relationship between the two
variances we get the following results
F-Test Two-Sample for Variances

No-Wait    Other
Mean         47.78331 44.03448
Variance 77.17323 60.09721
We run the test for m1 - m2
Observations        623     377   with two equal variances
df                  622     376
F             1.28414
0.003822
P(F<=f) one-tail
1.166224
F Critical one-tail
103
Tutorial
•   11.1, 11.9, 11.27, 11.33
•   11.41, 11.45, 11.49
•   11.59, 11.63, 11.69
•   12.1, 12.11, 12.25
•   12.39, 12.43, 12.51, 12.71, 12.77

104

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 12 posted: 8/13/2011 language: English pages: 104