Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Transformations

VIEWS: 4 PAGES: 28

									Polynomial Regression
 and Transformations


          STA 671
          First Summer 2007
Review

   The estimated residuals e1,…,en provide the best
    method for checking the assumptions.
   Remember the residuals εi ~ N(0,σ). The estimated
    residuals should be close to that.
   In a residual plot, you are looking for outliers,
    curvature, or changing variance.
   In this lecture we will discuss polynomial regression
    and transformations, two separate methods. Both
    are possible solutions to curvature, and
    transformations have the added benefit they
    sometimes address changing variance.
Recall the Hooker data. There appears
to be a small amount of curvature.
This curvature is seen more clearly in
the residual plot.
Polynomial regression – one method
for dealing with curvature.

   To account for curvature, we can perform
    something called “polynomial regression”,
    which consists of fitting a polynomial (a
    quadratic or cubic typically) instead of a line.
   Recall the linear model was Yi = β0 + β1 Xi +
    εi. The quadratic model is Yi = β0 + β1 Xi + β2
    Xi2 + εi. The cubic model is Yi = β0 + β1 Xi +
    β2 Xi2 + β3 Xi3 + εi.
   The higher the order of the polynomial, the
    more curvature it can account for.
Quadratic model accounts for the
curvature




 Quadratic equation is 88.017 – 1.1295 Temp + 0.004 Temp2
 If the quadratic model is better than the linear model, what about a cubic?
A cubic model produces no visual
improvement




  equation is Pressure = 124.14 – 1.69 Temp + 0.0069 Temp2
                        - 0.000005 Temp3
Which to choose, quadratic or cubic?

   In general, choose the LOWEST order
    polynomial possible (i.e. prefer linear to
    quadratic, quadratic to cubic, etc.).
   This is aimed at 1) “Occam’s razor” meaning
    that simpler models are preferred, and 2) the
    higher the order, the more parameters to
    estimate. Statistically, it’s easier to estimate a
    few parameters than many.
P-values for selecting order

   The regression output provides a formal
    method for selecting the order of the
    polynomial. This method typically agrees with
    looking at the residual plot.
   The regression output provides p-values for
    each term in the regression.
   The p-value for the highest order term is the
    ONLY one that is used.
Using p-values to select order

   Begin by fitting the cubic model. If the cubic term is
    significant, use the cubic model (you can consider
    higher order models, but we do not in STA671)
   If the cubic term is NOT significant, remove it and
    RERUN the model (p-values change depending on
    what terms are in the model), then look to see if the
    quadratic term is significant.
   If the quadratic term is not significant, remove it and
    RERUN the model, resulting in a linear regression.
   If none of these models produce a reasonable
    residual plot, you may need another method.
For the boiling point data

   We first run the cubic model and acquire the
    following p-values
     Parameter Estimates

                                                     Parameter     Standard
    Variable        Label                      DF     Estimate        Error t Value    Pr > |t|   Type I SS

    Intercept       Intercept                   1     124.13563    384.83452    0.32    0.7495         12434
    Temperature                                 1      -1.68544      5.92071   -0.28    0.7781     444.16724
    Temperature_2   2nd power of TEMPERATURE    1       0.00688      0.03032    0.23    0.8222       2.98566
    Temperature_3   3rd power of TEMPERATURE    1   -0.00000486   0.00005171   -0.09    0.9259    0.00022757



   The p-value is not significant, so remove the
    cubic term and RERUN the model (do NOT
    just remove the quadratic terms based on the
    p-value above)
Quadratic model for boiling point data

    The quadratic model produces the following
     p-values
    Parameter Estimates

                                                  Parameter     Standard
Variable          Label                      DF    Estimate        Error t Value   Pr > |t|   Type I SS

Intercept         Intercept                   1   88.01662      13.93063    6.32    <.0001        12434
Temperature                                   1   -1.12954       0.14336   -7.88    <.0001    444.16724
Temperature_2     2nd power of TEMPERATURE    1    0.00403    0.00036820   10.95    <.0001      2.98566




    The quadratic term is significant AND we
     observe a reasonable residual plot, so we
     stop here. This is our final model.
What if no polynomial model produces
a reasonable residual plot?

   If none of our polynomial model produces a
    reasonable residual plot, we need another
    method.
   Another method to try is to transform the
    response variable.
   Transformations, like polynomial regression,
    can handle curvature, and in addition
    transformations have the potential to handle
    changing spread as well.
Example - the ethanol data

   Data comes from an engine exhaust study.
   NOx is a measure of the exhaust from the
    engine, while E is a measure of the fuel/air
    mixture (high values are almost all fuel, low
    values are almost all air)
   A cubic model does not fit the data. A
    quadratic of linear model would do worse.
A cubic fit to the ethanol data




        Scatterplot    Residual plot shows clear
                       curvature.
Transformations

   Instead of fitting Y as the response variable, we fit a
    function of Y as the response variable.
   Thus, instead of Yi = β0 + β1 Xi + εi, you can fit
    log(Yi) = β0 + β1 Xi + εi, or
    sqrt(Yi) = β0 + β1 Xi + εi, or
    cbrt(Yi) = β0 + β1 Xi + εi, etc.
   Thus, you greatly expand the possible models you
    can fit.
   You can transform the X variable as well, but in the
    interest of time we do not discuss that in detail in
    STA671.
Transformations allow different errors
structures.

   A quadratic regression looks like Yi = β0 + β1 Xi + β2
    Xi2 + εi. At any particular X, the variance is the same.
   Taking the square root transformation sqrt(Yi) = β0 +
    β1 Xi + εi means that
    Yi = [β0 + β1 Xi + εi]2 = β02 + β12 Xi2 + εi2 + 2 β0 β1 Xi +
    2 β0 εi + 2 β1 Xi εi.
   There is a quadratic relationship between X and Y.
   Note the multiplication between Xi and εi, this allows
    the variance to change for each Xi. Thus, in addition
    to handling curvature, transformations allow you to
    address changing variance.
Prototypical Data requiring
transformation
After square root transformation
Which transformation?

   There are no hard and fast rules on which
    transformation to try, no guaranteed method
    for finding a good transformation (in some
    data, you seem to never find a great fit).
   Usually you have to perform trial and error,
    and remember you can combine polynomial
    regression with transformation. Thus for
    example, you can fit a cubic model in X to
    predict log(Y).
Some “typical” transformations

   If you have area data, a square root transform is
    often useful (converts area to something proportional
    to the radius or length).
   Similarly with volume, a cube root transformation
    may be appropriate.
   With financial data (incomes, etc.), a log transform
    may be appropriate. Logs change percentage
    increases to constant increases, thus if a unit
    increase in X results in a 10% increase in Y, it also
    results in a 0.0953 increase in Y.
A general strategy

   Fit the raw data (X and Y) with a least squares line. See if you
    get a good residual plot. If so, stop and be happy 
   If not, try a polynomial regression (quadratic or cubic). If one of
    these fits, stop and be happy (remember, fit the smallest model
    possible).
   If a polynomial regression does not work, try transforming Y to
    log, sqrt, and cube root (i.e. perform three more regressions).
    Fit a cubic polynomial regression on each of these and
    determine the best outcome. Choose the transformation that
    provides the best residual plot.
   If none of those work, then regression might not be effective
    (there are more advanced techniques) or you may have to start
    transforming X as well. This becomes true trial and error.
    Consult your friendly local statistician.
Back to the ethanol data.

   We can see from the scatterplot that E and NOX are
    not linearly related.
   We tried a cubic regression and that didn’t work.
   Now off to the transformations. We fit cubic
    regressions with log(Y), sqrt(Y), and cbrt(Y) as the
    response variables.
   We may be able to get satisfactory results with
    something less than cubic, but if cubic doesn’t work
    the lower order models won’t either, thus we start
    with cubic models.
Square root transformation. Still clear
curvature.




    Scatterplot          Residual plot
Cube root transformation. Improved,
but still some curvature.
Log transformation. Still some lack of
fit, but best of the bunch.
  Log transform is not perfect, but best
  we can do right now (I encourage you
  to play with the data on your own)

      After we have chosen the log transformation
       on the basis of the best residual plot (and
       decided it is “ok”, if certainly not a great
       residual plot), we look at the p-value for the
       cubic term to see if we can remove it. We
       can.
 Parameter Estimates

                                  Parameter   Standard
Variable    Label            DF    Estimate      Error   t Value   Pr > |t|   Type I SS

Intercept   Intercept         1   -11.32827   2.48931      -4.55    <.0001     19.62487
E                             1    25.55212   8.69871       2.94    0.0043      0.31320
E_2         2nd power of E    1   -10.74539   9.86507      -1.09    0.2792     34.56387
E_3         3rd power of E    1    -2.43334   3.63758      -0.67    0.5054      0.02403
Quadratic model for log(NOX)

    The quadratic model produces almost
     identical scatter and residual plots. The
     quadratic term is significant, so this is our
     final model.
 Parameter Estimates

                                   Parameter   Standard
Variable    Label             DF    Estimate      Error   t Value   Pr > |t|   Type I SS

Intercept   Intercept          1   -12.95199   0.55039     -23.53    <.0001     19.62487
E                              1    31.31051   1.24771      25.09    <.0001      0.31320
E_2          2nd power of E    1   -17.32873   0.68084     -25.45    <.0001     34.56387

								
To top