Docstoc

A Broad Overview of Key Statistical Concepts

Document Sample
A Broad Overview of Key Statistical Concepts Powered By Docstoc
					Linear Lack of Fit (LOF) Test

   An F test for checking whether a linear
    regression function is inadequate in
      describing the trend in the data
     Where does this topic fit in?
•   Model formulation
•   Model estimation
•   Model evaluation
•   Model use
 Example 1
                                          Regression Plot
                               Mortality = 389.189 - 5.97764 Latitude
                       S = 19.1150     R-Sq = 68.0 %     R-Sq(adj) = 67.3 %



                       200
           Mortality




                       150




                       100




                                30                40                 50

                                             Latitude



Do the data suggest that a linear function is inadequate in describing
the relationship between skin cancer mortality and latitude?
Example 2
                                         Regression Plot
                             Weight = -393.264 + 5.90235 Length
                    S = 54.0115   R-Sq = 83.6 %     R-Sq(adj) = 82.9 %
                    700


                    600


                    500


                    400
           Weight




                    300


                    200


                    100


                     0



                          60   70   80   90   100   110   120   130   140   150

                                              Length



Do the data suggest that a linear function is inadequate in describing
the relationship between the length and weight of an alligator?
Example 3
                                               Regression Plot
                                        wgtloss = 129.787 - 24.0199 iron
                               S = 3.05778    R-Sq = 97.0 %     R-Sq(adj) = 96.7 %
                         130




                         120
           Weight loss




                         110




                         100




                         90




                         80

                                 0                   1                   2

                                               Iron content



Do the data suggest that a linear function is inadequate in describing
the relationship between iron content and weight loss due to corrosion?
                                           Some notation
                                    y11  y12  87 .2
                                    ˆ     ˆ                                  y61  y62  148 .1
                                                                             ˆ     ˆ

                                    150
           Number of new accounts




                                                                                                  y62  124
                                    100                                                           y6  114
                                                        y  50 .7  0.49 x
                                                        ˆ                                         y61  104

                                     50

y12  42
y1  35                                          100             150               200
                                                   Size of minimum deposit
y11  28
                               Decomposing the error


                                                                   y                 yij   14742
                                                                                            2
                         150                                                             ˆ
Number of new accounts




                                                                                 ij
                                                                 i       j


                                                                   y          i  yij   13594
                                                                                            2
                         100
                                                                                     ˆ
                                                                  i      j
                                     y  50 .7  0.49 x
                                     ˆ
                                                                   y                 yi   1148
                                                                                            2
                          50                                                      ij
                                                                     i       j


                               100             150         200
                                 Size of minimum deposit
                                  Decomposing the error


                                                                           y                yij   45.1
                         150                                                                           2
                                                                                                ˆ
                               y  50 .7  0.49 x
                               ˆ                                                         ij
Number of new accounts




                         140
                                                                         i       j


                                                                           y                    yij   6.6
                         130
                                                                                                           2
                         120
                                                                                              i
                                                                                                    ˆ
                         110                                                 i       j


                                                                           y                    yi   38.5
                         100                                                                          2
                          90
                                                                                         ij
                                                                         i       j
                          80
                                   100                150          200
                                         Size of minimum deposit
             The basic idea
• Break down the residual error (“error sum
  of squares – SSE) into two components:
  – a component that is due to lack of model fit
    (“lack of fit sum of squares” – SSLF)
  – a component that is due to pure random error
    (“pure error sum of squares” – SSPE)
• If the lack of fit sum of squares is a large
  component of the residual error, it suggests
  that a linear function is inadequate.
A geometric decomposition

                           150
  Number of new accounts




                           100

                                                 y ij  b0  b1 x ij
                                                 ˆ
                                                 yi
                            50
                                                 y ij
                                           100                150      200
                                             Size of minimum deposit


                            y   ij    yij    yi  yij    yij  yi 
                                        ˆ              ˆ
The decomposition holds for the sum of the
squared deviations, too:


  y            yij     yi  yij     yij  yi 
 c   ni                     c   ni                    c   ni

            ij
                   ˆ  2
                                     ˆ          2                    2

i 1 j 1                  i 1 j 1                 i 1 j 1



Error sum of squares (SSE)

                   Lack of fit sum of squares (SSLF)
                                       Pure error sum of squares (SSPE)


          SSE  SSLF  SSPE
     Breakdown of degrees of freedom
Degrees of freedom associated with SSE




           n  2  c  2  n  c 
Degrees of freedom associated with SSLF

                     Degrees of freedom associated with SSPE
  Definitions of Mean Squares
The lack of fit mean square (MSLF) is defined as:


                     y         yij 
                                   ˆ   2
                                             SSLF
            MSLF                          
                             i

                          c2                 c2


And, the pure error mean square (MSPE) is defined as:


                   y           yi 
                                       2
                                             SSPE
          MSPE                            
                            ij

                         nc                  nc
        Expected Mean Squares
                           ni  i   0  1 X i 
                                                     2

     E ( MSLF )   2 
                                     c2

     E ( MSPE )   2


• If μi = β0+β1Xi, we’d expect the ratio MSLF/MSPE to be …
• If μi ≠ β0+β1Xi, we’d expect the ratio MSLF/MSPE to be …
• Use ratio, MSLF/MSPE, to reject whether or not μi = β0+β1Xi.
                   Expanded
           Analysis of Variance Table
Source       DF                SS              MS                F
                   SSR    yij  y 
                           c       ni
                                                   SSR           MSR
Regression    1               ˆ        2
                                             MSR           F
                          i 1 j 1                 1            MSE

                   SSE    yij  yij 
Residual                   c       ni
                                    ˆ    2
                                             MSE 
                                                     SSE
             n-2                                     n2
error                     i 1 j 1



                   SSLF    yi  yij  MSLF 
                               c    ni
                                                     SSLF    MSLF
Lack of fit c-2                     ˆ 2                   F
                           i 1 j 1                  c2    MSPE

                   SSPE    yij  yi  MSPE 
                               c    ni
                                         2           SSPE
Pure error n-c                                       nc
                           i 1 j 1


                   SSTO    yij  y 
                               c    ni
                                         2
Total        n-1            i 1 j 1
         The formal lack of fit F-test
Null hypothesis        H0: μi = β0+β1Xi
Alternative hypothesis HA: μi ≠ β0+β1Xi

                      MSLF
Test statistic    F 
                    *

                      MSPE

P-value = What is the probability that we’d get an F* statistic as
large as we did, if the null hypothesis is true?

The P-value is determined by comparing F* to an F distribution
with c-2 numerator degree of freedom and n-c denominator
degrees of freedom.
         LOF Test in Minitab
• Stat >> Regression >> Regression …
• Specify predictor and response.
• Under Options…
  – under Lack of Fit Tests, select the box labeled
    Pure error.
• Select OK. Select OK.
                               Decomposing the error


                                                                   y                 yij   14742
                                                                                            2
                         150                                                             ˆ
Number of new accounts




                                                                                 ij
                                                                 i       j


                                                                   y          i  yij   13594
                                                                                            2
                         100
                                                                                     ˆ
                                                                  i      j
                                     y  50 .7  0.49 x
                                     ˆ
                                                                   y                 yi   1148
                                                                                            2
                          50                                                      ij
                                                                     i       j


                               100             150         200
                                 Size of minimum deposit
         Is there lack of linear fit?
Analysis of Variance

Source           DF      SS     MS      F       P
Regression        1     5141   5141   3.14    0.110
Residual Error    9    14742   1638
  Lack of Fit     4    13594   3398   14.80   0.006
  Pure Error      5     1148    230
Total            10    19883

1 rows with no replicates
                                  Decomposing the error


                                                                           y                yij   45.1
                         150                                                                           2
                                                                                                ˆ
                               y  50 .7  0.49 x
                               ˆ                                                         ij
Number of new accounts




                         140
                                                                         i       j


                                                                           y                    yij   6.6
                         130
                                                                                                           2
                         120
                                                                                              i
                                                                                                    ˆ
                         110                                                 i       j


                                                                           y                    yi   38.5
                         100                                                                          2
                          90
                                                                                         ij
                                                                         i       j
                          80
                                   100                150          200
                                         Size of minimum deposit
         Is there lack of linear fit?
Analysis of Variance

Source           DF        SS       MS       F       P
Regression        1    5448.9   5448.9 1087.06   0.000
Residual Error    9      45.1      5.0
  Lack of Fit     4       6.6      1.7    0.21   0.919
  Pure Error      5      38.5      7.7
Total            10    5494.0

1 rows with no replicates
 Example 1
                                           Regression Plot
                                Mortality = 389.189 - 5.97764 Latitude
                        S = 19.1150     R-Sq = 68.0 %     R-Sq(adj) = 67.3 %



                        200
            Mortality




                        150




                        100




                                 30                40                 50

                                              Latitude



Do the data suggest that a linear function is not adequate in describing
the relationship between skin cancer mortality and latitude?
  Example 1: Mortality and Latitude
Analysis of Variance

Source           DF    SS    MS       F       P
Regression        1   36464 36464   99.80   0.000
Residual Error   47   17173   365
  Lack of Fit    30   12863   429   1.69    0.128
  Pure Error     17    4310   254
Total            48   53637

19 rows with no replicates
Example 2
                                         Regression Plot
                             Weight = -393.264 + 5.90235 Length
                    S = 54.0115   R-Sq = 83.6 %     R-Sq(adj) = 82.9 %
                    700


                    600


                    500


                    400
           Weight




                    300


                    200


                    100


                     0



                          60   70   80   90   100   110   120   130   140   150

                                              Length



Do the data suggest that a linear function is not adequate in describing
the relationship between the length and weight of an alligator?
    Example 2: Alligator length and
               weight
Analysis of Variance

Source           DF     SS     MS      F     P
Regression        1   342350 342350 117.35 0.000
Residual Error   23    67096   2917
  Lack of Fit    17    66567   3916 44.36 0.000
  Pure Error      6      530      88
Total            24   409446

14 rows with no replicates
Example 3
                                               Regression Plot
                                        wgtloss = 129.787 - 24.0199 iron
                               S = 3.05778    R-Sq = 97.0 %     R-Sq(adj) = 96.7 %
                         130




                         120
           Weight loss




                         110




                         100




                         90




                         80

                                 0                   1                   2

                                               Iron content



Do the data suggest that a linear function is not adequate in describing
the relationship between iron content and weight loss due to corrosion?
    Example 3: Iron and corrosion
Analysis of Variance

Source         DF    SS     MS     F         P
Regression      1 3293.8 3293.8 352.27   0.000
Residual Error 11 102.9     9.4
  Lack of Fit   5   91.1   18.2   9.28   0.009
  Pure Error    6   11.8    2.0
Total          12 3396.6

2 rows with no replicates
Example 4



                    400




                    300
           groove




                    200




                          0   10             20   30
                                   mileage

Do the data suggest that a linear function is not adequate in describing
the relationship between mileage and groove depth?
        Example 4: Tread wear
Analysis of Variance

Source        DF     SS     MS       F         P
Regression     1   50887   50887   140.71   0.000
Residual Error 7    2532     362
Total          8   53419

No replicates. Cannot do pure error test.
          When is it okay to
        perform the LOF Test?
• When the “INE” part of the “LINE”
  assumptions are met.
• The LOF test requires repeat observations,
  called replicates, for at least one of the
  values of the predictor X.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:1/31/2012
language:English
pages:30