# Weighted least squares Heteroscedasticity that is related to the by qok10781

VIEWS: 393 PAGES: 20

• pg 1
```									                                 Weighted least squares

Heteroscedasticity that is related to the level of the target (y) variable can be handled
using variance–stabilizing transformations. Sometimes, nonconstant variance is driven by
other characteristics in the data. In that case, weighted least squares is the appropriate
solution.
The idea behind weighted least squares (WLS) is that least squares is still a good
thing to do if the target and predicting variables are transformed to give a model with
2
errors with constant variance. Say V (εi ) = σi . To keep things simple, consider simple
regression. The regression model is

yi = β0 + β1 xi + εi .

If we divide both sides of this equation by σi we get

yi         1            xi         εi
= β0         + β1           +      .
σi         σi           σi         σi

This can be rewritten
∗
yi = β0 z1i + β1 z2i + δi ,

∗
where yi , z1i , z2i , and δi are the obvious substitutions from the previous equation and
V (δi ) = 1 for all i. Thus, ordinary least squares (OLS) estimation (without an intercept
term) of y ∗ on z1 and z2 gives fully eﬃcient estimates of β0 and β1 . Note that using a
constant multiple of σi works just as well, since the only requirement is that V (δi ) be
constant for all i.
The value WTi = 1/σi is the ith value of the weighting variable. Ordinary least squares
2

is a special case of WLS with WTi = 1 for all i (and, in fact, most regression packages only
2
include code for WLS, with OLS the default special case). The problem is that σi is
unknown, and must be estimated.
It is important to recognize the reasons behind the use of weighted least squares. The
goal is not to improve measures of ﬁt like R2 or F ; rather, the goal is to analyze the
data in an appropriate fashion. There are several advantages to addressing nonconstant
variance:

c 2009, Jeﬀrey S. Simonoﬀ                                                                   1
(1) The estimates of β are more eﬃcient. That is, on average, the WLS estimates should
be closer to the true regression coeﬃcients than the OLS estimates are.
(2) Predictions are more sensible. If the underlying variability of a certain type of ob-
servation is larger than that for another type of observation, the prediction interval
should reﬂect that. This is not done under OLS, but it is under WLS.
Weighted least squares can be formulated more generally using matrix notation, since
it is just a special case of generalized least squares. The model is

y = Xβ + ε,

2
with E(ε) = 0 and V ar(ε) = V σ 2 , where V is the n×n matrix with σi /σ 2 on the diagonal
and zeroes elsewhere. Matrix manipulations are then as follows:

⇒ V −1/2 y = V −1/2 Xβ + V −1/2 ε

⇔ y∗ = Zβ + δ.

Thus,
ˆ
β = (Z ′ Z)−1 Z ′ y∗

= (X ′ V −1 X)−1 X ′ V −1 y.

Addressing nonconstant variance — practical issues

Consider a regression analysis based on regressing y on x. Examination of residual
plots (including plots of residuals versus each of the predictors xj ) suggests potential
nonconstant variance (heteroscedasticity). What should you do? The following steps
summarize a systematic approach to the problem. As we go along, we’ll examine a real
data set. The data have as a target variable the price of a 2003 model automobile, while the
predicting variables are the horsepower of the auto’s engine and the number of cylinders
in the engine (source: Consumer Reports, April 2003; the values given refer to the top–of–
the–line model).
A scatter plot of the variables immediately suggests heteroscedasticity. Since the
target variable here is long right–tailed and represents dollars, a reasonable correction is a
variance–stabilizing transformation (taking logs, or a semilog model).

c 2009, Jeﬀrey S. Simonoﬀ                                                                   2
The logged plot looks much better, but there are still signs of nonconstant variance,
particularly in the plot on horsepower.

c 2009, Jeﬀrey S. Simonoﬀ                                                              3
Here are regression results.

Regression Analysis: Logged price versus Horsepower, Cylinders

The regression equation is
Logged price = 3.91 + 0.00184 Horsepower + 0.0218 Cylinders

Predictor           Coef         SE Coef             T         P         VIF
Constant         3.90626         0.03955         98.76     0.000
Horsepow       0.0018361       0.0001590         11.55     0.000         1.8
Cylinder        0.021834        0.008750          2.50     0.015         1.8

S = 0.08007         R-Sq = 79.7%           R-Sq(adj) = 79.2%

Analysis of Variance

Source                DF             SS             MS          F          P
Regression             2         2.0616         1.0308     160.77      0.000
Residual Error        82         0.5258         0.0064

The regression ﬁts well (R2 = 79.7%), and each variable is highly statistically signiﬁ-
cant. Here are residual plots:

c 2009, Jeﬀrey S. Simonoﬀ                                                                4
c 2009, Jeﬀrey S. Simonoﬀ   5
These plots constitute step 1 in the analysis of potential heteroscedasticity:

(1) Save the standardized residuals from the regression (SRES, say). Plot them
versus any variables that make sense.

In this case there appears to be heteroscedasticity related to each of the variables.
Note, by the way, that there are potential outliers in the data (standardized residuals
around ±3), but they might not really be outliers — they might be far from the regression
line because of higher variability oﬀ the line.

(2) Now, create two new variables: the absolute values of the standardized
residuals (ABSRES, say), and the natural logarithm of the squares of the
standardized residuals (LGSRESSQ, say). The latter variable can be formed
in MINITAB using the transformation Let ’LGSRESSQ’ = LN(SRES*SRES).

A simple, but reasonably eﬀective, test for nonconstant variance is Levene’s test.

(3) Regress ABSRES on any variables you have that might be related to the non-
constant variance. This could include the xj ’s, as well as other variables;
indicator (0/1) variables are particularly appropriate if nonconstant vari-
ance is linked to group membership. You should avoid including variables

c 2009, Jeﬀrey S. Simonoﬀ                                                                6
that are not related to the nonconstant variance if possible, since their
inclusion makes it harder to detect nonconstant variance if it exists. A
signiﬁcant F –statistic for Levene’s test suggests the presence of noncon-
stant variance. Minitab includes a version of Levene’s test (under ANOVA / Test
for Equal Variances) if the predictors being used are only categorical; we’ll talk more
about that when we discuss analysis of variance models. It’s a good idea to look
at scatter plots of the absolute residuals versus the Levene’s test predic-
tors, to make sure that you’re using the right predictors, and that unusual
observations aren’t unduly aﬀecting the test (plots using the square root
of the absolute residuals on the vertical axis are sometimes easier to inter-
pret). DON’T include the target (y) variable as a predictor in the Levene’s
test.
Here are scatter plots of the absolute standardized residuals versus the two possible
heteroscedasticity predictors. They show that the absolute residuals are larger on aver-
age (more variability oﬀ the regression line) for higher horsepower values, although the
relationship with the number of cylinders is less clear.

c 2009, Jeﬀrey S. Simonoﬀ                                                                7
Here is the Levene’s test output:

Regression Analysis: absres versus Horsepower, Cylinders

The regression equation is
absres = 0.502 + 0.00256 Horsepower - 0.0529 Cylinders

Predictor          Coef        SE Coef            T         P       VIF
Constant         0.5015         0.2977         1.68     0.096
Horsepow       0.002560       0.001197         2.14     0.035       1.8
Cylinder       -0.05294        0.06587        -0.80     0.424       1.8

S = 0.6028         R-Sq = 6.0%           R-Sq(adj) = 3.8%

Analysis of Variance

Source               DF            SS             MS           F       P
Regression            2        1.9174         0.9587        2.64   0.078
Residual Error       82       29.7957         0.3634
Total                84       31.7130

c 2009, Jeﬀrey S. Simonoﬀ                                                     8
The test is signiﬁcant at a .10 level; in particular, it seems that horsepower alone
accounts for the heteroscedasticity (p = .035):

Regression Analysis: absres versus Horsepower

The regression equation is
absres = 0.341 + 0.00192 Horsepower

Predictor           Coef         SE Coef               T              P
Constant          0.3413          0.2207            1.55          0.126
Horsepow       0.0019169       0.0008889            2.16          0.034

S = 0.6015           R-Sq = 5.3%           R-Sq(adj) = 4.2%

Analysis of Variance

Source                 DF            SS             MS                  F       P
Regression              1        1.6827         1.6827               4.65   0.034
Residual Error         83       30.0304         0.3618
Total                  84       31.7130

Say nonconstant variance is indicated (this could be from the plots, or the Levene’s
test, or both; don’t refer only to the formal test, as it is not guaranteed to always have
enough power to identify nonconstant variance). The variances need to be estimated. An
exponential/linear model for them is often used, whose parameters can be estimated from
the data. The model is
              

var(εi ) = σi = σ 2 exp 
2
λj zij  ,                    (1)
j

where zij is the value of the j th variance predictor for the ith case (these would presumably
be the predictors that were used in part (3) for the Levene’s test).
(4) The coeﬃcient (λj ) values can be estimated using regression. Perform
a regression of LGSRESSQ on the variance predictor variables, and record
the ﬁtted regression coeﬃcients (don’t worry about measures of ﬁt for this
regression). DON’T include the target (y) variable as a potential predictor
here. Save the ﬁtted values from this regression.

c 2009, Jeﬀrey S. Simonoﬀ                                                                   9
Here are the results for the auto price data, using horsepower as the predictor:

Regression Analysis: lgsressq versus Horsepower

The regression equation is
lgsressq = - 2.50 + 0.00548 Horsepower

Predictor          Coef        SE Coef             T          P
Constant        -2.5020         0.7204         -3.47      0.001
Horsepow       0.005477       0.002901          1.89      0.063

S = 1.963          R-Sq = 4.1%           R-Sq(adj) = 3.0%

Analysis of Variance

Source               DF            SS             MS           F          P
Regression            1        13.734         13.734        3.56      0.063
Residual Error       83       319.915          3.854
Total                84       333.649

(5) Create a weight variable for use in the weighted least squares analysis.
The weights are estimates of the inverse of the variance of the errors for
each observation. They have the form WT = 1/ exp(FITS1), where FITS1 is
the variable with ﬁtted values from the regression in step 4.
So, for the auto price data, the weight variable has the form

WT = 1/ exp(−2.50 + .0055 ∗ Horsepower),

based on the regression output.
(6) Perform a weighted least squares regression, specifying WT as the weighting
variable (click on Options, and enter WT under Weights:). You should redo
steps 1–3 to make sure that the nonconstant variance has been corrected.
Here is the regression output for the WLS regression:

Regression Analysis: Logged price versus Horsepower, Cylinders

Weighted analysis using weights in wt

c 2009, Jeﬀrey S. Simonoﬀ                                                              10
The regression equation is
Logged price = 3.87 + 0.00205 Horsepower + 0.0202 Cylinders

Predictor         Coef      SE Coef             T        P     VIF
Constant       3.86649      0.03763        102.74    0.000
Horsepow     0.0020550    0.0001834         11.20    0.000     1.8
Cylinder      0.020231     0.008575          2.36    0.021     1.8

S = 0.1412        R-Sq = 78.3%        R-Sq(adj) = 77.7%

Analysis of Variance

Source              DF          SS             MS         F       P
Regression           2      5.8836         2.9418    147.60   0.000
Residual Error      82      1.6343         0.0199
Total               84      7.5179

c 2009, Jeﬀrey S. Simonoﬀ                                                11
c 2009, Jeﬀrey S. Simonoﬀ   12
The nonconstant variance has apparently been addressed. This is supported by the
Levene’s test (note that you should redo the Levene’s test based on the standardized
residuals after doing WLS to check this, and remember that the Levene’s test regression
does not use weights):

Regression Analysis: absres versus Horsepower, Cylinders

The regression equation is
absres = 0.902 + 0.00135 Horsepower - 0.0717 Cylinders

Predictor          Coef        SE Coef            T         P         VIF
Constant         0.9025         0.3010         3.00     0.004
Horsepow       0.001349       0.001210         1.11     0.268         1.8
Cylinder       -0.07170        0.06660        -1.08     0.285         1.8

S = 0.6095           R-Sq = 1.7%         R-Sq(adj) = 0.0%

Analysis of Variance

Source                 DF          SS             MS           F        P
Regression              2      0.5355         0.2677        0.72    0.489
Residual Error         82     30.4600         0.3715
Total                  84     30.9955

c 2009, Jeﬀrey S. Simonoﬀ                                                           13
There is apparently an outlier (the Saab 9-3, an expensive car with moderate horse-
power and only 4 cylinders), but omitting it changes little. Overall, given the number
of cylinders, larger engine cars cost more (each additional horsepower multiplying the ex-
pected price by 10.00206 = 1.005, or a .5% increase), and given horsepower, each additional
cylinders cost more (two additional cylinders multiply expected price by 10.0404 = 1.098,
or a 9.8% increase).
The R2 value given by Minitab is not
Residual SS
1−
Total SS
in this case, because of the weighting. In fact, there is no single universally accepted
deﬁnition of R2 for weighted least squares ﬁtting. Proposed measures use relationships
between R2 and other statistics in OLS. One such measure is based on the closeness of the
observed and ﬁtted y values,

2
Rw1 = corr(Fitted values, Target variable values)2 .

So, the ﬁtted values from the WLS should be saved, and then the correlation coeﬃcient
calculated:

Correlations (Pearson)

Correlation of Logged price and FITS1 = 0.892

2
Thus, Rw1 = .8922 = .796, which can be compared to the OLS value of .797. Another
common measure (the one Minitab reports) uses the connection between F and R2 in
ordinary least squares regression,

2              pF
Rw2 =                     .
pF + n − p − 1
2
Thus, the R2 for the WLS ﬁt above is

2         (2)(147.6)
Rw2 =                   = .783.
(2)(147.6) + 82

c 2009, Jeﬀrey S. Simonoﬀ                                                               14
2
The two measures diﬀer in that Rw2 downweights the eﬀect of observations with higher
2
variability on the measure of the strength of the ﬁt, while Rw1 does not.
Another possible use of this model is to use it to make predictions of prices from
horsepower and size values. For example, what would we have predicted the price for an
Audi A6 to have been (this car is not in the sample)? Its horsepower was 455, with an
8–cylinder engine; here is Minitab output asking for a prediction interval based on these
horsepower and cylinder values:

* WARNING * The prediction interval output assumes a weight of 1.
* An adjustment must be made if a weight other than 1 is used.

New Obs       Fit        SE Fit            95.0% CI                  95.0% PI
1          4.9634        0.0353     (   4.8932, 5.0335)       (   4.6739, 5.2528)

Values of Predictors for New Observations

New Obs    Horsepow    Cylinder
1               455        8.00

ber that the model was in semilog form, so the intervals given are in terms of logged price.
That is, we must antilog the prediction interval to obtain the version in terms of dollars.
There is another problem, however, that is highlighted by the warning message that
Minitab gives you. Minitab does not allow a weight to be given for the prediction being
made, which means that unless the appropriate weight turns out to be 1, the prediction
interval is not correct (the standard error of the ﬁtted value and conﬁdence interval are
still correct). We wish to use the weight that the Audi A6 would have been assigned if it
had been included in the sample; from the model used to determine the weights, that is
WT = 1/ exp(−2.50 + .0055 ∗ Horsepower), or 1/ exp(−2.50 + .0055 ∗ 455) = 1.025.
The standard error of the ﬁtted value (used for conﬁdence intervals) is given correctly,
but we need to calculate the standard error of the predicted value. This equals

(Standard error of f itted value)2 + (Residual M S)/(W eight).

c 2009, Jeﬀrey S. Simonoﬀ                                                                15
Then, the prediction interval is

P redicted value ± tn−p−1 (Standard error of predicted value).
α/2

The standard error of the predicted value in this case is    .03532 + .0199/1.025 = .1437.
For n = 85 the appropriate critical value for a 95% interval is 1.989, giving prediction
interval
4.9634 ± (1.989)(.1437) = (4.678, 5.249).

Finally, antilogging gives the prediction interval for the Audi A6 price, (\$47643, \$177509)
[no, that’s not a misprint!], with a point estimate of the price being \$91918 (the actual
price was \$58700, a good price for such a powerful car).
The importance of using the WLS model is obvious when this prediction interval is
compared to the one from the OLS model:

Predicted Values for New Observations

New Obs     Fit    SE Fit         95.0% CI            95.0% PI
1       4.91635   0.02762   ( 4.86141, 4.97129) ( 4.74785, 5.08485) X
X denotes a row with X values away from the center

The point estimate for price is not very diﬀerent from that from the WLS model (\$82480
rather than \$91918), but antilogging the interval gives (\$55956, \$121577), which is slightly
more than half the width of the WLS interval (a better comparison is that it is 40% nar-
rower in the log scale). On the other hand, a prediction interval for a low horsepower auto
would probably be narrower than that from OLS, correctly reﬂecting the lower variability
oﬀ the regression line there. Note also that the Audi A6 is considered very unusual in the
OLS prediction, because of its high horsepower, but this is not true in the WLS prediction,
since it is downweighted so strongly.
Note that this same type of correction is needed even for rough prediction intervals.
The value of s from the WLS output is not in any way comparable to that from an OLS
σ
ﬁt. In an OLS ﬁt, the rough prediction interval for any future observation is ±2ˆ , and s is

c 2009, Jeﬀrey S. Simonoﬀ                                                                16
ˆ                                                                                  σ
used as σ . In a WLS ﬁt, the rough prediction interval for the ith future observation is ±2ˆi ,
requiring a (potentially) diﬀerent estimate of the standard deviation for each prediction.
√
This corresponds to ±2s/ wi , where wi is the appropriate weight (this corresponds to
the exact interval constructed above, except that 2 is used instead of the exact t-based
critical value, and the standard error of the ﬁtted value is ignored). Thus, s does not have
any physical interpretation in a WLS model unless it is paired with an appropriate weight
value.
Steps (4) and (5) are more straightforward in the special case where the nonconstant
variance is being driven by the existence of two or more subgroups in the data, as follows:
(4′ ) Create a variable that deﬁnes the subgroups by appropriate codes (e.g.,
1, 2, 3, etc.); call this variable GROUP. Now, construct descriptive statistics
of the standardized residuals variable SRES, entering GROUP under the By
variable option.
(5′ ) The weight variable for the weighted least squares analysis contains esti-
mates of 1 over the variance of the errors for each observation. These are
obtained from the output in (4′ ) by looking under StDev for each group.
So, for example, say there were three groups in the data coded 1, 2 and
3, and for group 1 the entry under StDev is 1.5, for group 2 the entry un-
der StDev is 1.1, and for group 3 the entry under StDev is .8. The weight
variable then equals 1/1.52 for all observations in group 1, 1/1.12 for all
observations in group 2, and 1/.82 for all observations in group 3.
Say we decided to pursue the heteroscedasticity based on the number of cylinders of
the auto. Here are standardized residuals from the OLS regression separated by cylinders:

Descriptive Statistics: SRES3 by Cylinders

Variable      Cylinder          N        Mean        Median        TrMean          StDev
SRES3         4                21      -0.057        -0.331        -0.146          1.088
6                47       0.049        -0.085         0.027          0.860
8                17      -0.070        -0.251        -0.052          1.314

c 2009, Jeﬀrey S. Simonoﬀ                                                                   17
Variable      Cylinder     SE Mean       Minimum       Maximum            Q1            Q3
SRES3         4              0.238        -1.271         2.845        -0.996         0.621
6              0.125        -1.659         2.535        -0.551         0.712
8              0.319        -2.836         2.423        -1.032         0.996

The weight variable would be constructed using Minitab as follows:

Let WT = (Cylinders = 4)/(1.088 ∗ 1.088) + (Cylinders = 6)/(.86 ∗ .86)

+(Cylinders = 8)/(1.314 ∗ 1.314)

Here is WLS output using this weight variable:

Regression Analysis: Logged price versus Horsepower, Cylinders

Weighted analysis using weights in wt

The regression equation is
Logged price = 3.90 + 0.00192 Horsepower + 0.0197 Cylinders

Predictor            Coef          SE Coef             T        P            VIF
Constant          3.90139          0.04519         86.33    0.000
Horsepow        0.0019177        0.0001609         11.92    0.000            1.5
Cylinder         0.019680         0.009389          2.10    0.039            1.5

S = 0.07866          R-Sq = 76.7%            R-Sq(adj) = 76.1%

Analysis of Variance

Source                 DF             SS              MS         F          P
Regression              2        1.66869         0.83434    134.84      0.000
Residual Error         82        0.50740         0.00619
Total                  84        2.17609

The results are very similar to the earlier WLS model, although the heteroscedasticity
isn’t taken care of quite as well:

c 2009, Jeﬀrey S. Simonoﬀ                                                                      18
Note, by the way, that using standard deviations this way avoids the need for the
assumption of a speciﬁc model for the variances, as in equation (1), which is a good thing.
However, this method can only be used if the variable that determines the variances is
categorical (more generally, this method only works if all variables that determine the
variances are categorical). If any of the variables that determine variances are numerical,
you must use the regression–based method of estimating weights, either using the regres-
sion program, or (if any of the variables are categorical) the general linear model program
that we will be discussing later.

There is another mechanism by which variances of errors in a regression model can
be diﬀerent. Say that the response variable at the level of an individual follows the usual
regression model,

yi = β0 + β1 xi1 + · · · + βp xpi + εi ,

with εi ∼ N (0, σ 2 ). Imagine, however, that the ith observed response is actually an average
y i for a sample of size ni with the observed predictor values {x1i , . . . , xpi }. The model is

c 2009, Jeﬀrey S. Simonoﬀ                                                                     19
thus
˜
y i = β0 + β1 xi1 + · · · + βp xpi + εi ,

where
σ2
ε
V (˜i ) = V (y i |{x1i , . . . , xpi }) =      .
ni
An example of this kind of situation could be as follows. Say you were interested in
modeling the relationship between student test scores and (among other things) income.
While it might be possible to obtain test scores at the level of individual students, it would
be impossible to get incomes at that level because of privacy issues. On the other hand,
average incomes at the level of census tract or school district might be available, and could
be used to predict average test scores at that same level.
This is just a standard heteroscedasticity model, and WLS is used to ﬁt it. In fact,
this is a particularly simple case, since the weights do not need to be estimated at all;
since V (˜i ) = σ 2 /ni , the weight for the ith observation is just ni . That is, quite naturally,
ε
observations based on larger samples are weighted more heavily in estimating the regression
coeﬃcients.
It should be noted, however, that this sort of aggregation is not without problems.
and they can be very misleading. As is always the case, we must be aware of confounding
eﬀects of missing predictors; for example, if school districts with wealthier residents also
have lower proportions of non-native English speakers, a positive slope for income could be
reﬂecting an English speaker eﬀect, rather than an income eﬀect. In addition, ecological
inferences potentially suﬀer from aggregation bias, whereby the information lost when
aggregating (as it is clear that some information will be lost) is diﬀerent for some individuals
than for others (for example, if there is more variability in incomes in some school districts
compared to others, more information is lost in those school districts), resulting in biased
inferences.

c 2009, Jeﬀrey S. Simonoﬀ                                                                       20

```
To top