Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

12. Test of Goodness of fit

VIEWS: 44 PAGES: 71

									           Statistics

            Tests of
Goodness of Fit and Independence

                               1/71
     Contents
 STATISTICS in PRACTICE
 Goodness of Fit Test: A Multinomial
  Population
 Test of Independence
 Goodness of Fit Test:
     Poisson and Normal Distributions



                                         2/71
       STATISTICS in PRACTICE
   United Way of Greater
     Rochester is a nonprofit
     organization dedicated
     to improving the quality
     of life for people. Because of enormous
    volunteer involvement, United Way of Greater
    Rochester is able to hold its operating costs at
    just eight cents of every dollar raised.
                                                       3/71
       STATISTICS in PRACTICE
   One of statistical tests was to determine whether
    perceptions of administrative expenses were
    independent of occupation.
   In this chapter, you will learn how a statistical
    test of independence.




                                                    4/71
   Hypothesis (Goodness of Fit)
   Test for Proportions of a
   Multinomial Population
 Consider the case that each element of a
  population is assigned to one and only one
  of several classes or categories.
 Such a population is a multinomial
  population(多項母體).
 We want to test if the population follows a
  multinominal distribution with specified
  probabilities for each of the k categories.
                                                5/71
   Hypothesis Test for Proportions
   of a Multinomial Population--
   Procedures
1. Set up the null and alternative hypotheses.
2. Select a random sample and record the
   observed frequency, fi , for each of the k
   categories.
3. Assuming H0 is true, compute the expected
   frequency, ei , in each category by
   multiplying the category probability by the
   sample size.
                                                 6/71
    Hypothesis Test for Proportions
    of a Multinomial Population--
    Procedures
4. Compute the value of the test statistic.
               ( f i  ei ) 2
               k
        
         2
          i 1       ei
where:
             fi = observed frequency for category i
             ei = expected frequency for category i
          k = number of categories

 Note: The test statistic has a chi-square distribution with (k – 1) df
         provided that the expected frequencies are 5 or more for all
         categories.                                                      7/71
 Hypothesis Test for Proportions
 of a Multinomial Population--
 Procedures
5. Rejection rule:
    p-value approach:
       Reject H0 if p-value < 

    Critical value approach:
       Reject H0 if    2  2
        where  is the significance level and
         there are k - 1 degrees of freedom
                                                8/71
         Hypothesis (Goodness of Fit) Test
         for Proportions of a Multinomial
         Population
   Example:
       The market share study being conducted by Scott
        Marketing Research. Over the past year market shares
        stabilized at 30% for company A, 50% for company B,
        and 20% for company C.
       Recently company C developed a ―new and improved‖
        product to replace its current entry in the market.
       Company C retained Scott Marketing Research to
        determine whether the new product will alter market
        shares.
                                                         9/71
    Hypothesis (Goodness of Fit) Test
    for Proportions of a Multinomial
    Population
   Scott Marketing Research conduct a sample survey
    and compute the proportion preferring each
    company’s product.
   A hypothesis test will then be conducted to see
    whether the new product caused a change in market
    shares.
   The null and alternative hypotheses are



                                                   10/71
Hypothesis (Goodness of Fit) Test
for Proportions of a Multinomial
Population
   Perform a goodness of fit test that will
    determine whether the sample of 200 customer
    purchase preferences is consistent with the null
    hypothesis.
   Data




                                                       11/71
    Hypothesis (Goodness of Fit) Test
    for Proportions of a Multinomial
    Population
   Expected: the expected frequency for each
    category is found by multiplying the sample
    size of 200 by the hypothesized proportion for
    the category.




                                                     12/71
    Hypothesis (Goodness of Fit) Test
    for Proportions of a Multinomial
    Population
   Computation of the Chi-square Test Statistic
    for The Scott Marketing Research Market
    Share Study




                                                   13/71
Hypothesis (Goodness of Fit) Test
for Proportions of a Multinomial
Population

p-Value Approach
  The test statistic χ2 = 7.34 and 5.991 < 7.34 <
   7.378. Thus, the corresponding upper tail area
   or p-value must be between .05 and .025. with
   p-value < .05, we reject H0.
  Minitab or Excel can be used to show χ 2 = 7.34
   provides a p-value = .0255.



                                               14/71
Hypothesis (Goodness of Fit) Test
for Proportions of a Multinomial
Population

   Critical Value Approach
       With α = .05 and 2 degrees of freedom, the
        critical value for the test statistic is =
        5.991. The upper tail rejection rule becomes
                 Reject H0 if χ 2 > 5.991
       With 7.34 > 5.991, we reject H0.
       Comparisons of the observed and expected
        frequencies for the other two companies
        indicate that company C’s gain in market share
        will hurt company A more than company B. 15/71
Hypothesis (Goodness of Fit) Test
for Proportions of a Multinomial
Population
Note:
The chi-square test is an approximate test
 and the test result may not be valid when
 the expected value for a category is less
 than 5.
If one or more categories have expected
 values less than 5, you can combine them
 with adjacent categories to achieve the
 minimum required expected value.
                                              16/71
      Multinomial Distribution
      Goodness of Fit Test
   Example: Finger Lakes Homes (A)
    Finger Lakes Homes manufactures four models of
    prefabricated homes, a two-story colonial, a log
    cabin, a split-level, and an A-frame. To help
    in production planning, management would like to
    determine if previous customer purchases indicate
    that there is a preference in the style selected.


                                                        17/71
    Multinomial Distribution
    Goodness of Fit Test
   Example: Finger Lakes Homes (A)
      The number of homes sold of each
    model for 100 sales over the past two
    years is shown below.
                       Split-  A-
    Model Colonial Log Level Frame
    # Sold   30     20   35    15



                                            18/71
   Multinomial Distribution
   Goodness of Fit Test
Hypotheses
      H0: pC = pL = pS = pA = .25
      Ha: The population proportions are not
          pC = .25, pL = .25, pS = .25, and pA = .25

where:
 pC = population proportion that purchase a colonial
 pL = population proportion that purchase a log cabin
 pS = population proportion that purchase a split-level
 pA = population proportion that purchase an A-frame
                                                       19/71
Multinomial Distribution
Goodness of Fit Test
Rejection Rule
    Reject H0 if p-value < .05 or χ2 > 7.815.

                     With  = .05 and
                       k-1=4-1=3
                        degrees of freedom


         Do Not Reject H0           Reject H0
                                                χ2
                            7.815                    20/71
    Multinomial Distribution
    Goodness of Fit Test

   Expected Frequencies
      e1 = .25(100) = 25    e2 = .25(100) = 25
      e3 = .25(100) = 25    e4 = .25(100) = 25


   Test Statistic
         ( 30  25) 2 ( 20  25) 2 ( 35  25) 2 (15  25) 2
      
      2
                                             
              25           25           25          25
       =1+1+4+4
       = 10                                                   21/71
    Multinomial Distribution
    Goodness of Fit Test

   Conclusion Using the p-Value Approach
     Area in Upper Tail     .10   .05   .025   .01   .005

     2 Value (df = 3)    6.251 7.815 9.348 11.345 12.838

    Because χ2 = 10 is between 9.348 and 11.345, the
    area in the upper tail of the distribution is
    between .025 and .01.
    The p-value <  . We can reject the null hypothesis.
                                                       22/71
    Multinomial Distribution
    Goodness of Fit Test
   Conclusion Using the Critical Value
    Approach

     χ2 = 10 > 7.815

We reject, at the .05 level of significance, the
assumption that there is no home style preference.



                                                     23/71
  Test of Independence:
  Contingency Tables(列聯表)
 Another important application of the chi-
  square distribution involves using sample
  data to test for the independence of two
  variables.
 Contingency table is a table that lists all
  possible combinations of the two variables.



                                                24/71
        Test of Independence: Contingency
        Tables
   Example:
        A test of independence addresses the question of
        whether the beer preference (light, regular, or dark)
        is independent of the gender of the beer drinker
        (male, female).
       If the independence assumption is valid, we argue
        that the fraction of drinking light beer, regular beer
        and dark beer must be applicable to both male and
        female beer drinker.
                                                           25/71
  Test of Independence:
  Contingency Tables
1. Set up the null and alternative hypotheses.
2. Select a random sample and record the
   observed frequency, fij , for each cell of the
   contingency table.
3. Compute the expected frequency, eij , for
   each cell.
         (Row i Total)(Column j Total)
   eij 
                 Sample Size

                                                    26/71
 Test of Independence:
 Contingency Tables
4. Compute the test statistic.
                   ( f ij  eij ) 2
     2   
           i   j         eij
5. Determine the rejection rule.
     Reject H0 if p -value <  or   .
                                   2   2



     where  is the significance level and,
     with n rows and m columns, there are
     (n - 1)(m - 1) degrees of freedom.
                                              27/71
        Test of Independence: Contingency
        Tables
   Example:
        A test of independence addresses the question of
        whether the beer preference (light, regular, or dark)
        is independent of the gender of the beer drinker
        (male, female).
       The hypotheses for this test of independence are:
         H0: Beer preference is independent of the gender
             of the beer drinker
         Ha: Beer preference is not independent of the
                                                           28/71
             gender of the beer drinker
Test of Independence:
Contingency Tables
   Sample Results



                      1/3   7/15   1/5   1
   Expected Frequencies
       80*1/3=26.67




                                             29/71
Test of Independence:
Contingency Tables




                        30/71
Test of Independence:
Contingency Tables

   Computation of the Chi-square Test
    Statistic
   p-value = .0468. At the .05 level of
    significance, p-value < α= .05. We
    reject the null hypothesis


                                           31/71
    Test of Independence:
    Contingency Tables

   Example: Finger Lakes Homes (B)
   Each home sold by Finger Lakes Homes can be
classified according to price and to style. Finger
Lakes’manager would like to determine if the
price of the home and the style of the home are
independent variables.


                                                     32/71
    Test of Independence:
    Contingency Tables
   Example: Finger Lakes Homes (B)
        The number of homes sold for each model
     and price for the past two years is shown below.
     For convenience, the price of the home is listed
     as either $99,000 or less or more than $99,000.

      Price     Colonial   Log   Split-Level   A-Frame
    < $99,000      18       6        19           12
    > $99,000      12      14        16            3

                                                         33/71
    Test of Independence:
    Contingency Tables

   Hypotheses
     H0: Price of the home is independent of the
         style of the home that is purchased
    Ha: Price of the home is not independent of the
        style of the home that is purchased




                                                      34/71
   Test of Independence:
   Contingency Tables
 Expected Frequencies

 Price   Colonial   Log   Split-Level   A-Frame   Total
< $99K     18        6        19          12      55
> $99K     12       14        16           3      45
 Total     30       20        35          15      100




                                                          35/71
Contingency Table
(Independence) Test

   Rejection Rule
                                          .2 =7.815
With α = .05 and (2 - 1)(4 - 1) = 3 d.f., 05
      Reject H0 if p-value < .05 or 2 > 7.815

   Test Statistic
         (18  16.5) 2 (6  11) 2     (3  6.75) 2
    2                          
             16.5         11             6.75
     = .1364 + 2.2727 + . . . + 2.0833 = 9.149

                                                       36/71
     Contingency Table
     (Independence) Test
   Conclusion Using the p-Value
    Approach
    Area in Upper Tail   .10   .05   .025   .01   .005
    2 Value (df = 3)    6.251 7.815 9.348 11.345 12.838

    Because χ2 = 9.145 is between 7.815 and 9.348,
    the area in the upper tail of the distribution is
    between .05 and .025.
    The p-value <  . We can reject the null
    hypothesis.
                                                           37/71
  Contingency Table
  (Independence) Test

Conclusion Using the Critical Value
 Approach
       χ2 = 9.145 > 7.815

       We reject, at the .05 level of significance,
    the assumption that the price of the home is
    independent of the style of home that is
    purchased.
                                                      38/71
    Goodness of Fit Test: Poisson
    Distribution

 In general, the goodness of fit test can be used
 with any hypothesized probability distribution.
Here we will demonstrate the cases of Poisson
 distribution and Normal distribution.




                                                 39/71
    Goodness of Fit Test: Poisson
    Distribution

1. Set up the null and alternative hypotheses.
    H0: Population has a Poisson probability
           distribution
    Ha: Population does not have a Poisson
           distribution
2. Select a random sample and
   a. Record the observed frequency fi for each
     value of the Poisson random variable.
   b. Compute the mean number of occurrences . 40/71
    Goodness of Fit Test: Poisson
    Distribution

3. Compute the expected frequency of occurrences ei
   for each value of the Poisson random variable.
4. Compute the value of the test statistic.



     where:
        fi = observed frequency for category i
        ei = expected frequency for category i
         k = number of categories                     41/71
  Goodness of Fit Test: Poisson
  Distribution
5. Rejection rule:

    p-value approach:
              Reject H0 if p-value < 

   Critical value approach:
              Reject H0 if    2  2
     where  is the significance level and
      there are k - 2 degrees of freedom
                                             42/71
     Goodness of Fit Test: Poisson
     Distribution—Example
   Consider the arrival of customers at
    Dubek’s Food Market in Tallahassee,
    Florida.
   A statistical test conducted to see
    whether an assumption of a Poisson
    distribution for arrivals is reasonable.

                                               43/71
       Goodness of Fit Test: Poisson
       Distribution—Example
   Hypotheses are
     H0: The number of customers entering
            the store during 5-minute intervals
            has a Poisson probability distribution
     Ha: The number of customers entering the
            store during 5-minute intervals does
            not have a Poisson distribution

                                                     44/71
     Goodness of Fit Test: Poisson
     Distribution—Example
   The Poisson probability function,
                      x 
                      e
          f ( x) 
                      x!
where
 --μ represents the expected number of customers
 arriving per 5-minute period,
 --x is the random variable indicating the number of
 customers arriving during a 5-minute period, and
 --f (x) is the probability that x customers will arrive in
 a 5-minute interval.                                     45/71
     Goodness of Fit Test: Poisson
     Distribution—Example

   Sample Data
   μ known, an estimate
    of μ is 640/128 = 5
    customers


                                     46/71
Goodness of Fit Test: Poisson
Distribution—Example
   Expected frequency




                                47/71
    Goodness of Fit Test: Poisson
    Distribution—Example
   Computation of the Chi-square Test
    Statistic




   p-value = .1403. With p-value > α
    = .05, we cannot reject H0.          48/71
     Goodness of Fit Test:
     Poisson Distribution

   Example: Troy Parking Garage
  In studying the need for an additional
entrance to a city parking garage, a
consultant has recommended an analysis
approach that is applicable only in situations
where the number of cars entering during a
specified time period follows a Poisson
distribution.
                                                 49/71
  Goodness of Fit Test:
  Poisson Distribution

 A random sample of 100 one-minute time
intervals resulted in the customer arrivals
listed below. A statistical test must be
conducted to see if the assumption of a
Poisson distribution is reasonable.

# Arrivals 0 1 2 3 4 5 6 7 8 9 10 11 12
Frequency 0 1 4 10 14 20 12 12 9 8 6 3 1

                                           50/71
    Goodness of Fit Test:
    Poisson Distribution

   Hypotheses
    H0: Number of cars entering the garage during
        a one-minute interval is Poisson distributed
    Ha: Number of cars entering the garage during a
        one-minute interval is not Poisson distributed



                                                         51/71
     Goodness of Fit Test:
     Poisson Distribution
   Estimate of Poisson Probability
    Function
    otal Arrivals = 0(0) + 1(1) + 2(4) + . . . + 12(1) = 600
    Estimate of  = 600/100 = 6
    Total Time Periods = 100

    Hence,
                              x   6
                             6 e
                    f ( x) 
                               x!
                                                           52/71
     Goodness of Fit Test:
     Poisson Distribution
   Expected Frequencies
     x   f (x )   nf (x )    x       f (x )   nf (x )
     0   .0025      .25       7      .1377     13.77
     1   .0149     1.49       8      .1033     10.33
     2   .0446     4.46       9      .0688      6.88
     3   .0892     8.92      10      .0413      4.13
     4   .1339    13.39      11      .0225      2.25
     5   .1606    16.06      12+     .0201      2.01
     6   .1606    16.06     Total   1.0000    100.00    53/71
    Goodness of Fit Test:
    Poisson Distribution
   Observed and Expected Frequencies
          i         fi    ei     f i - ei
      0 or 1 or 2    5    6.20   -1.20
           3        10    8.92    1.08
           4        14   13.39    0.61
           5        20   16.06    3.94
           6        12   16.06   -4.06
           7        12   13.77   -1.77
           8         9   10.33   -1.33
           9         8    6.88    1.12
      10 or more    10    8.39    1.61      54/71
      Goodness of Fit Test:
      Poisson Distribution
   Rejection Rule
                                                   .2  14.067
   With  = .05 and k - p - 1 = 9 - 1 - 1 = 7 d.f. 05
(where k = number of categories and p = number
of population parameters estimated),
     Reject H0 if p-value < .05 or 2 > 14.067.

   Test Statistic
          (1.20) 2 (1.08) 2     (1.61) 2
     2                               3.268
            6.20      8.92        8.39                     55/71
    Goodness of Fit Test:
    Poisson Distribution
   Conclusion Using the p-Value Approach
    Area in Upper Tail  .90    .10    .05   .025    .01
    2 Value (df = 7)  2.833 12.017 14.067 16.013 18.475
Because χ2 = 3.268 is between 2.833 and 12.017 in the
Chi-Square Distribution Table, the area in the upper tail
of the distribution is between .90 and .10.
The p-value >  . We cannot reject the null hypothesis.
There is no reason to doubt the assumption of a Poisson
distribution.
                                                            56/71
      Goodness of Fit Test: Normal
      Distribution
1. Set up the null and alternative hypotheses.

2. Select a random sample and
       a. Compute the mean and standard deviation.
       b. Define intervals of values so that the expected
          frequency is at least 5 for each interval.
       c. For each interval record the observed frequencies

 3. Compute the expected frequency, ei , for each interval.
                                                         57/71
    Goodness of Fit Test: Normal
    Distribution

4. Compute the value of the test statistic.



5. Reject H0 if  2    (where  is the significance
                        2


   level and there are k - 3 degrees of freedom).



                                                         58/71
     Goodness of Fit Test: Normal
     Distribution--Example
   Chemline Employee
    Aptitude Scores
   To test the null
    hypothesis that the
    population of test
    scores has a normal
    distribution.
   Sample Data                     59/71
      Goodness of Fit Test: Normal
      Distribution—Example
   Estimates of μ and  are x= 68.42, s = 10.41.
   Hypotheses
     H0: The population of test scores has a
       normal distribution with mean
       68.42 and standard deviation 10.41.
    Ha: The population of test scores does not
            have a normal distribution with mean
            68.42 and standard deviation 10.41
                                             60/71
     Goodness of Fit Test: Normal
     Distribution—Example
   Expected Frequency
   Normal Distribution with 10 Equal-
    Probability Intervals (The Chemiline
    Example)
   Note: 55.10 = 68.42 - 1.28(10.41)


                                           61/71
Goodness of Fit Test: Normal
Distribution—Example




                               62/71
    Goodness of Fit Test: Normal
    Distribution—Example
   Computation of the Chi-square Test
    Statistic




   p-value = .4084 >α = .10. Cannot reject
    H0.                                       63/71
    Normal Distribution
    Goodness of Fit Test                              IQ



   Example: IQ Computers
   IQ Computers (one better than
HP?) manufactures and sells a
general purpose microcomputer. As part of
a study to evaluate sales personnel, management
wants to determine, at a .05 significance level, if the
annual sales volume (number of units sold by a
salesperson) follows a normal probability
                                                      64/71
distribution.
    Normal Distribution
    Goodness of Fit Test                        IQ


   Example: IQ Computers
      A simple random sample of 30 of the
      salespeople was taken and their
      numbers of units sold are below.
     33 43 44 45 52 52 56 58 63 64
     64 65 66 68 70 72 73 73 74 75
     83 84 85 86 91 92 94 98 102 105
      (mean = 71, standard deviation = 18.54)

                                                 65/71
    Normal Distribution
    Goodness of Fit Test

   Hypotheses

    H0: The population of number of units sold
        has a normal distribution with mean 71
        and standard deviation 18.54.


    Ha: The population of number of units sold
        does not have a normal distribution with
        mean 71 and standard deviation 18.54.
                                                   66/71
    Normal Distribution
    Goodness of Fit Test

   Interval Definition
    To satisfy the requirement of an expected
    frequency of at least 5 in each interval we
    will divide the normal distribution into
    30/5 = 6 equal probability intervals.




                                                  67/71
    Normal Distribution
    Goodness of Fit Test
   Interval Definition

    Areas
    = 1.00/6
    = .1667



             53.02     71   88.98 = 71 + .97(18.54)
71  .43(18.54) = 63.03 78.97
                                                      68/71
Normal Distribution
Goodness of Fit Test

   Observed and Expected Frequencies
          i            fi   ei   f i - ei
    Less than 53.02    6     5      1
     53.02 to 63.03    3     5     -2
     63.03 to 71.00    6     5      1
     71.00 to 78.97    5     5      0
     78.97 to 88.98    4     5     -1
    More than 88.98    6     5      1
        Total         30    30
                                            69/71
     Normal Distribution
     Goodness of Fit Test
   Rejection Rule
        With  = .05 and k - p - 1 = 6 - 2 - 1 = 3 d.f.
     (where k = number of categories and p = number
     of population parameters estimated),
       Reject H0 if p-value < .05 or 2 > 7.815.

   Test Statistic
          (1)2 (2)2 (1) 2 (0) 2 (1) 2 (1) 2
     2                                  1.600
            5    5     5    5      5      5
                                                          70/71
     Normal Distribution
     Goodness of Fit Test
    Conclusion Using the p-Value Approach
     Area in Upper Tail   .90   .10     .05    .025    .01
     2 Value (df = 3)    .584 6.251 7.815     9.348 11.345

 Because χ2 = 1.600 is between .584 and 6.251 in the Chi-
Square Distribution Table, the area in the upper tail of the
distribution is between .90 and .10. The p-value >  . We
cannot reject the null hypothesis.
There is little evidence to support rejecting the
assumption the population is normally distributed with  =
71 and  = 18.54.                                            71/71

								
To top