# A Broad Overview of Key Statistical Concepts

Document Sample

```					Linear Lack of Fit (LOF) Test

An F test for checking whether a linear
describing the trend in the data
Where does this topic fit in?
•   Model formulation
•   Model estimation
•   Model evaluation
•   Model use
Example 1
Regression Plot
Mortality = 389.189 - 5.97764 Latitude
S = 19.1150     R-Sq = 68.0 %     R-Sq(adj) = 67.3 %

200
Mortality

150

100

30                40                 50

Latitude

Do the data suggest that a linear function is inadequate in describing
the relationship between skin cancer mortality and latitude?
Example 2
Regression Plot
Weight = -393.264 + 5.90235 Length
S = 54.0115   R-Sq = 83.6 %     R-Sq(adj) = 82.9 %
700

600

500

400
Weight

300

200

100

0

60   70   80   90   100   110   120   130   140   150

Length

Do the data suggest that a linear function is inadequate in describing
the relationship between the length and weight of an alligator?
Example 3
Regression Plot
wgtloss = 129.787 - 24.0199 iron
S = 3.05778    R-Sq = 97.0 %     R-Sq(adj) = 96.7 %
130

120
Weight loss

110

100

90

80

0                   1                   2

Iron content

Do the data suggest that a linear function is inadequate in describing
the relationship between iron content and weight loss due to corrosion?
Some notation
y11  y12  87 .2
ˆ     ˆ                                  y61  y62  148 .1
ˆ     ˆ

150
Number of new accounts

y62  124
100                                                           y6  114
y  50 .7  0.49 x
ˆ                                         y61  104

50

y12  42
y1  35                                          100             150               200
Size of minimum deposit
y11  28
Decomposing the error

  y                 yij   14742
2
150                                                             ˆ
Number of new accounts

ij
i       j

  y          i  yij   13594
2
100
ˆ
i      j
y  50 .7  0.49 x
ˆ
  y                 yi   1148
2
50                                                      ij
i       j

100             150         200
Size of minimum deposit
Decomposing the error

  y                yij   45.1
150                                                                           2
ˆ
y  50 .7  0.49 x
ˆ                                                         ij
Number of new accounts

140
i       j

  y                    yij   6.6
130
2
120
i
ˆ
110                                                 i       j

  y                    yi   38.5
100                                                                          2
90
ij
i       j
80
100                150          200
Size of minimum deposit
The basic idea
• Break down the residual error (“error sum
of squares – SSE) into two components:
– a component that is due to lack of model fit
(“lack of fit sum of squares” – SSLF)
– a component that is due to pure random error
(“pure error sum of squares” – SSPE)
• If the lack of fit sum of squares is a large
component of the residual error, it suggests
that a linear function is inadequate.
A geometric decomposition

150
Number of new accounts

100

y ij  b0  b1 x ij
ˆ
yi
50
y ij
100                150      200
Size of minimum deposit

y   ij    yij    yi  yij    yij  yi 
ˆ              ˆ
The decomposition holds for the sum of the
squared deviations, too:

  y            yij     yi  yij     yij  yi 
c   ni                     c   ni                    c   ni

ij
ˆ  2
ˆ          2                    2

i 1 j 1                  i 1 j 1                 i 1 j 1

Error sum of squares (SSE)

Lack of fit sum of squares (SSLF)
Pure error sum of squares (SSPE)

SSE  SSLF  SSPE
Breakdown of degrees of freedom
Degrees of freedom associated with SSE

n  2  c  2  n  c 
Degrees of freedom associated with SSLF

Degrees of freedom associated with SSPE
Definitions of Mean Squares
The lack of fit mean square (MSLF) is defined as:

  y         yij 
ˆ   2
SSLF
MSLF                          
i

c2                 c2

And, the pure error mean square (MSPE) is defined as:

  y           yi 
2
SSPE
MSPE                            
ij

nc                  nc
Expected Mean Squares
 ni  i   0  1 X i 
2

E ( MSLF )   2 
c2

E ( MSPE )   2

• If μi = β0+β1Xi, we’d expect the ratio MSLF/MSPE to be …
• If μi ≠ β0+β1Xi, we’d expect the ratio MSLF/MSPE to be …
• Use ratio, MSLF/MSPE, to reject whether or not μi = β0+β1Xi.
Expanded
Analysis of Variance Table
Source       DF                SS              MS                F
SSR    yij  y 
c       ni
SSR           MSR
Regression    1               ˆ        2
MSR           F
i 1 j 1                 1            MSE

SSE    yij  yij 
Residual                   c       ni
ˆ    2
MSE 
SSE
n-2                                     n2
error                     i 1 j 1

SSLF    yi  yij  MSLF 
c    ni
SSLF    MSLF
Lack of fit c-2                     ˆ 2                   F
i 1 j 1                  c2    MSPE

SSPE    yij  yi  MSPE 
c    ni
2           SSPE
Pure error n-c                                       nc
i 1 j 1

SSTO    yij  y 
c    ni
2
Total        n-1            i 1 j 1
The formal lack of fit F-test
Null hypothesis        H0: μi = β0+β1Xi
Alternative hypothesis HA: μi ≠ β0+β1Xi

MSLF
Test statistic    F 
*

MSPE

P-value = What is the probability that we’d get an F* statistic as
large as we did, if the null hypothesis is true?

The P-value is determined by comparing F* to an F distribution
with c-2 numerator degree of freedom and n-c denominator
degrees of freedom.
LOF Test in Minitab
• Stat >> Regression >> Regression …
• Specify predictor and response.
• Under Options…
– under Lack of Fit Tests, select the box labeled
Pure error.
• Select OK. Select OK.
Decomposing the error

  y                 yij   14742
2
150                                                             ˆ
Number of new accounts

ij
i       j

  y          i  yij   13594
2
100
ˆ
i      j
y  50 .7  0.49 x
ˆ
  y                 yi   1148
2
50                                                      ij
i       j

100             150         200
Size of minimum deposit
Is there lack of linear fit?
Analysis of Variance

Source           DF      SS     MS      F       P
Regression        1     5141   5141   3.14    0.110
Residual Error    9    14742   1638
Lack of Fit     4    13594   3398   14.80   0.006
Pure Error      5     1148    230
Total            10    19883

1 rows with no replicates
Decomposing the error

  y                yij   45.1
150                                                                           2
ˆ
y  50 .7  0.49 x
ˆ                                                         ij
Number of new accounts

140
i       j

  y                    yij   6.6
130
2
120
i
ˆ
110                                                 i       j

  y                    yi   38.5
100                                                                          2
90
ij
i       j
80
100                150          200
Size of minimum deposit
Is there lack of linear fit?
Analysis of Variance

Source           DF        SS       MS       F       P
Regression        1    5448.9   5448.9 1087.06   0.000
Residual Error    9      45.1      5.0
Lack of Fit     4       6.6      1.7    0.21   0.919
Pure Error      5      38.5      7.7
Total            10    5494.0

1 rows with no replicates
Example 1
Regression Plot
Mortality = 389.189 - 5.97764 Latitude
S = 19.1150     R-Sq = 68.0 %     R-Sq(adj) = 67.3 %

200
Mortality

150

100

30                40                 50

Latitude

Do the data suggest that a linear function is not adequate in describing
the relationship between skin cancer mortality and latitude?
Example 1: Mortality and Latitude
Analysis of Variance

Source           DF    SS    MS       F       P
Regression        1   36464 36464   99.80   0.000
Residual Error   47   17173   365
Lack of Fit    30   12863   429   1.69    0.128
Pure Error     17    4310   254
Total            48   53637

19 rows with no replicates
Example 2
Regression Plot
Weight = -393.264 + 5.90235 Length
S = 54.0115   R-Sq = 83.6 %     R-Sq(adj) = 82.9 %
700

600

500

400
Weight

300

200

100

0

60   70   80   90   100   110   120   130   140   150

Length

Do the data suggest that a linear function is not adequate in describing
the relationship between the length and weight of an alligator?
Example 2: Alligator length and
weight
Analysis of Variance

Source           DF     SS     MS      F     P
Regression        1   342350 342350 117.35 0.000
Residual Error   23    67096   2917
Lack of Fit    17    66567   3916 44.36 0.000
Pure Error      6      530      88
Total            24   409446

14 rows with no replicates
Example 3
Regression Plot
wgtloss = 129.787 - 24.0199 iron
S = 3.05778    R-Sq = 97.0 %     R-Sq(adj) = 96.7 %
130

120
Weight loss

110

100

90

80

0                   1                   2

Iron content

Do the data suggest that a linear function is not adequate in describing
the relationship between iron content and weight loss due to corrosion?
Example 3: Iron and corrosion
Analysis of Variance

Source         DF    SS     MS     F         P
Regression      1 3293.8 3293.8 352.27   0.000
Residual Error 11 102.9     9.4
Lack of Fit   5   91.1   18.2   9.28   0.009
Pure Error    6   11.8    2.0
Total          12 3396.6

2 rows with no replicates
Example 4

400

300
groove

200

0   10             20   30
mileage

Do the data suggest that a linear function is not adequate in describing
the relationship between mileage and groove depth?
Analysis of Variance

Source        DF     SS     MS       F         P
Regression     1   50887   50887   140.71   0.000
Residual Error 7    2532     362
Total          8   53419

No replicates. Cannot do pure error test.
When is it okay to
perform the LOF Test?
• When the “INE” part of the “LINE”
assumptions are met.
• The LOF test requires repeat observations,
called replicates, for at least one of the
values of the predictor X.

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 3 posted: 1/31/2012 language: English pages: 30