# Incorrect Least-Squares Regression Coefficients in Method- Comparison

Document Sample

```					CLIN. CHEM. 25/3, 432-438           (1979)

IncorrectLeast-SquaresRegressionCoefficientsin Method-

ComparisonAnalysis

P. Joanne Cornbleet” and Nathan Gochman

The least-squares   method isfrequentlyused tocalculate                         where    x is the independent            variable     (reference      method), y is
the slope and intercept of thebestline
througha setofdata                         the dependent variable (testmethod),                             isthe slope,and
points. However, least-squares regression slopes and in-                               is the intercept        of the regression       line.
tercepts may be incorrect if the underlying assumptions                             The least-squares           method is the most commonly                  used sta-
of the least-squares model are not met. Two factors in                          tistical technique         to estimate       the slope and intercept              of lin-
particular that may result in incorrect least-squares re-                       early related        comparison          data. However,           if the basic as-
gression coefficients are: (a) imprecision in the mea-                          sumptions       underlying         the least-squares         model are not met,
surement of the independent (x-axis) variable and (b) in-                       the estimated        line may be incorrect.            It is the purpose of this
clusion of outliers in the data analysis. We compared the                       paper to discuss three criteria for the use of least-squares                           re-
gression analysis          that are frequently            violated     in analyzing
methods of Deming, Mandel, and Bartlett in estimating the
laboratory      comparison          data, to demonstrate            the magnitude
known slope of a regression line when the independent
of the error in calculating             ‘the slope of the line by the least-
variable is measured with imprecision, and found the
squares method when these assumptions                         are not met, and to
method of Deming to be the most useful. Significant error
suggest alternative             techniques       for calculating         the correct
in the least-squares slope estimation occurs when the ratio                     linear relationship          between the two variables.
of the standard deviation of measurement of a single x                              The line obtained by least-squares                regression minimizes the
of
valuetothe standarddeviation the x-datasetexceeds                               sum of squares of the distances                  between the observed                data
0.2.Errorsinthe least-squares coefficients attributable                         points and the line in a vertical direction                     (Figure 1). These
can                         data
tooutliers be avoidedby eliminating points whose                                distances between the y values observed and those predicted
vertical distance from the regressionlineexceeds four                           by the regression            line are called residuals.              For the least-
times the standarderrorof the estimate.                                         squares model to be valid, these residuals                     should be random
(independent        of values of x and y) and have a gaussian dis-
Linear regression analysis is a commonly used technique          in          tribution     with a mean of zero and standard                      deviation,
analyzing    method-comparison       data. If a linear relationship             The standard         deviation       of the residuals       (or standard       error of
between the test and reference method can be defined, then                      the estimate)        should be constant            at every value of x; i.e., at
the slope and intercept      of this line can provide estimates     of          each value of x, repeated               measurements           of y would have a
the proportional    and constant error between the two methods                  standard      deviation       of         If x is a precisely measured              refer-
(1). Furthermore,    a value for the test method can be predicted               ence method, andy an imprecise test method with a constant
from any reference method value within the range of the data                    coefficient     of variation        rather than a constant            measurement
set by the regression    equation:2                                             error at all values, then                will increase with increasing values
of x, in which case a weighted regression                      analysis should be
y   =            +   x                              used (2). However, we will demonstrate                     that within the range
of measurement           error likely to be encountered               in the labora-
tory (coefficient         of variation        up to 20%), the least-squares
regression     still calculates        the correct line when                    is pro-
Department of Pathology, University of California, San Diego, La              portional    to x.
Jolla, CA 92093; and Veterans Administration Hospital, 3350 La Jolla
Spurious     data points can be an important                   source of error
Village Drive, San Diego, CA 92161.
‘Present address: Department of Pathology, Stanford University               in the least-squares          estimate.      Outlying data points generate
Medical Center, Stanford, CA 94305. Address to which reprint re-                 large squared residuals, and the calculated line may be shifted
quests should be sent.                                                           toward the errant point(s).               Draper and Smith (2) have sug-
2 Nonstandard    abbreviations used:         y- intercept of the linear      gested that data points that generate residuals greater than
relationship between x andy, when x is the independent variable;                 4        be omitted from the least-squares                  regression       analysis.
slope of the linear relationship between x andy when x is the inde-
We will illustrate        how analysis of the residuals about the re-
pendent variable;         slope of the linear relationship between x and
y when y is the independent variable;            standard deviation of the       gression line can provide a criterion                   for rejecting        spurious
residual error of regression (standard error of the estimate) when x             values and eliminating              their effect on the least-squares                 re-
is the independent variable (i.e., standard deviation of the differences         gression slopes.
between the actual y values and the 9 values predicted by the re-                    Least-squares        regression       analysis is the appropriate              tech-
gression line); Si,, standard deviation of they data set; S, standard            nique to use in Model I regression problems-that                          is, cases in
deviation of the x data set; r, product moment correlation coefficient;
which the independent variable, x, is measured without error,
Set, standard deviation of repeated measurement          of a single x value;
5ey. standard deviation of repeated measurement          of a single y value;    and the dependent                 variable,     y, is a random               variable.
X, the ratio Se2/Sey2.                                                           Method-comparison               studies in which the x variable is a pre-
Received March 16, 1978; accepted Dec. 27, 1978.                              cisely measured        reference method, the result of which can be

CHEMISTRY,Vol.
432 CLINICAL            25,No.3,1979
from the slope:

=   y   -

The standard    deviation of the residual error of regression
in the y direction can be calculated   and used as an indication
of scatter of the points about the regression    line:
Y
YI

Deming              V     ‘
N-2

4/N-   1
V N-2

Unlike the least-squares    method,   Deming’s method always
results in one line, whether x or y is used as the independent
variable.
Mandel (6) states that an approximate   relationship  between
x                                                 the least-squares    slope and Deming’s     slope exists as fol-
Fig. 1. Least-squares vs. Deming regression model                                                                              lows:
In the least-squares analysis, the line is chosen to minimize the residual errors
In the ydirection, i.e.,        =     , (y, - j   for all data points Is minimized.                                            Mandel    estimate    of Deming
However, in the Deming regression model, the sum of the squares of both the
x residual, A2 (x, - i,)2 and the y residual, B2 = (y, - )2 Is minimized. This                                                                          =       least-squares                         (i +        Sex2
results In choosing the line that minimizes the sum of the squares of the per-                                                                                                                               S2   -   Sex2
pendicular distances from the data points to the line, because geometrically C2
= A2 + B2                                                                                                                                               =       least-squares
1   -   (Sex2/Sx2)

regarded       as the “correct”                              value, are thus Model I regression                                If such an estimate        is valid, one may determine          the need for
problems.      Furthermore,        if the x variable can be set to pre-                                                        correcting     the least-squares      slope a priori by noting the ratio
assigned     values where the values recorded                  for it are target                                               of the variance in measuring            a single x value to the variance
values (e.g., prepared          concentrations         of an analyte),        least-                                           of all the x data obtained.
squares regression can be applied (the so-called Berkson case),                                                                   Bartlett’s     three-group       method      has been suggested       as a
even though error may be present in the x-variable                        (3).                                                 simple approach          to the problem         of regression    when the x
In method-comparison             studies    where both the x and y                                                        variable is subject to imprecision,            i.e., when no knowledge     of
variable are measured           with error, often there is no reason to                                                        the error of measurement            of x or y is required      (3). The data
assume that one of the two methods                   is the method of refer-                                                   are ranked by the magnitude            of x and divided into thirds. The
ence. Bias between the two methods will be indicated                         by the                                            means         and 37 for the first third and                and y3 third are
slope of the linear relationship             between the absolute            values                                            computed. Then:
measured       by the two methods             without     error. Model II re-
gression techniques         are necessary       to find the correct slope of                                                                           Bartlett                   =        -
X3   -     Xi
this line. Use of the least-squares             method in Model II regres-
sion cases will yield two different lines, depending                  on whether                                               However,      Wakkers     et al. (5) have compared          predicted     vs.
x or y is used as the independent                 variable;     in fact, the line                                              observed regression slopes with four groups of laboratory               data
indicating     the relationship        between the “absolute”            values of                                             and concluded       that Bartlett’s     method is not as consistent        as
x and y lies somewhere           in between.                                                                                   that of Deming.
Many statisticians        have proposed          solutions     to Model II                                                    More recently,      Blomqvist      (7) has published     a method of
regression     analysis. Deming (4) approaches                  the problem by                                                 calculating    the correct regression        slope when the x variable
minimizing       the sum of the square of the residuals in both the                                                            is measured      with error. However,         his formula is applicable
x and y directions        simultaneously.        This derivation        results in                                             only when x is the initial value and y is the change in this
the best line to minimize            the sum of the squares of the per-                                                        initial value, and thus it cannot be used for method-compar-
pendicular      distances     from the data points to the line (5), as                                                         ison data.
illustrated    in Figure 1. To compute              the slope by Deming’s                                                         Although these Model II regression techniques              are claimed
formula, one must assume gaussian error measurements                             of x                                          to be uninfluenced       by imprecision      in the measurement       of the
andy with constant imprecision               throughout      the range of x and                                                x variable, little work has been done to compare their efficacy
y values. If the ratio of the measurement                errors of x and y can                                                 in this regard. In this paper we use computer-simulated                 data
be estimated,      the following formulas            are used when x is the                                                   and random error to study the effect of imprecision                 in the x
independent       variable (5, 6):                                                                                            variable on the least-squares         b).X, and compare the ability of
the techniques      of Deming, Mandel, and Bartlett to correct the
Deming                           =       U + VU2               + (1/A)                         resulting   error. We will further investigate          the effect on
where                                                                                                                         estimates     by these methods        when proportional       error (i.e., a
N
constant    coefficient    of variation)    exists in the measurement
(yi       -       )2         -       (1/A)     >J(x                -    )2   =    S2      -   (1/A)S2            ofbothx      andy.
=    i1
2                (yj       -    y)     (x1       -   )                              2rSS                   Materials and Methods
and                                                                                                                              We generated    random gaussian data and calculated          com-
Q        2                                                                                         mon statistical    parameters     (means,  standard     deviations,
-‘ex                           error variance                     of a single x value
A   = 52                    =        error variance                     of a single y value                      standard  errors of estimate, correlation coefficients,  regression
ey                                                                                                 slopes and intercepts,   means of first and third groups of data,
As in the least-squares                                 method,                  they-intercept            is calculated   and plotting of x-y data) by use of a statistical          software

CLINICAL CHEMISTRY,                     Vol. 25, No. 3, 1979             433
1.0
Table 1. Slope of Least-Squares Line When Srx
Is Proportional to x
S,y = 0.05 y8 Srx = Sy = 0.20 Y                                                0.9
GaUSS(100, 25)b                         0.901                      0.904
Log gauss (100,                         0.893                      0.896                             0.8
127)                                                                                         U,
a,
a   y-data measured with constant coefficient of variation.                                 0
(Mean, standard deviation) of x-data, measured without error.                            sc 0.7
U,

package called “Minitab”        (8). One thousand       data points were                        0
a,
used for all regressions       with computer-generated              data to
-j    0.6
minimize differences      from random error between the calcu-
lated and predicted    least-squares      slope. Different sets of 1000
randomly     generated    gaussian-distributed         numbers       with a                          0.5
0           5           10     15     20
mean of 0 and standard deviation of 1 were used for the x -data
base, the x errors of measurement,            and the y errors of mea-                                                            Sex
surement.    Because each set was generated            by the computer
in a random order, all pairing of sets could be done in the order                 Fig. 2. Effect of increasing error of measurement of x on least-
that the members of the sets were generated              by the comput-           squares b.5 when Sexis constant for all x
gaussian x.data, S,, 25(D);
Predicted b2. (- - - -); gaussian x-data, 8,, =5 (#{149});
er.
log-gaussian x-data, S,, = 127 ()
The “true” values for the x -data were generated              from the
x -data base as follows: gaussian x -data with a mean of 100 and
Laboratory method-comparison     data for sodium [contin-
standard   deviations   of 5 and 25 were obtained by multiplying
uous-flow (SMA-6) = x, manual flame photometry         = y}, and
each value by the desired standard           deviation   and adding the
calcium [atomic absorption = x, continuous-flow       (SMA-12)
mean. A log-gaussian       distribution      with a mean of 100 and
= y] were analyzed.  Imprecision of measurement of x and y
standard    deviation  of 127 was obtained          by multiplying      each
was estimated from repeated measurements     of a control serum
gaussianx-data base value by 0.443 and adding 1.774 (9); the
close to the means           of the data.
antilog of each number was then taken. The “true” y data were
obtained from the “true” x-data by multiplying each value                         Results
by 0.90 and adding 10. Thus the predicted regression equation
between y and x is:                                                               Characteristics          of Computer-Generated                Data
y   =       10 + 0.90x                               Means and standard deviations of the gaussian-distributed
data randomly generated by the computer agree closely        with
The x error base andy error base were then multiplied                    by   the expected values.The means of the three data bases are
the constant standard       deviation desired. Standard           deviations      -0.01,0.01, and -0.01;the standard deviations are 1.03, 1.03,
for the x error were 5, 10, and 20, and for the y error was 20 for                and 0.99. In addition,    as required     by the least-squares      re-
the log-gaussian    data and the gaussian data with S = 25. For                   gression model, these three sets of data are independent             of
the gaussian x -data with S, = 5, standard               deviations     of the    each other, as indicated    by correlation    coefficients   of 0.063,
x error were 1, 2, and 4, and for they error, 4. These random                     0.012, and 0.013 between the data sets.
errors were added to the “true” x and y data to generate                             The standard  deviation of the estimate of the least-squares
“experimental”       x and y data with constant                 imprecision             is:
throughout     the range.
Since errors of measurement          are frequently        not constant                                          by.,,   =      y.x
in the clinical laboratory,     we also generated       experimental      data
whose error of measurement          was proportional      to the true value          The largest     value of                    presentinour experiments was
(i.e., a constant   coefficient     of variation).     The x error and y          0.8, for which Sby.,,= 0.025. Thus, absolute             differences between
error bases were first multiplied         by the desired coefficient         of   the observed and predictedleast-squares as great as 2 Sb.,,
by.x
variation,   and then by the true x or y value to which they                      = 0.05  are significant  (p <0.05). It should also be noted from
would be added.                                                                   the above formula that low values of N in least-squares         re-
The regression slopes of Deming, Mandel, and Bartlett were                    gression analysis markedly      increase the uncertainty   (or 95%
calculated    by the formulas presented         earlier. The y-intercept          confidence    interval) of the       estimate.
for the line generated      by any method was found by using the
formula:                                                                          S,.   Proportional            to the Value of x
=      -   b                                 If x is measured without error and y is linearly related to
x, the standard     errorof estimate,   S),...,willbe equal to 5ey the
When the error of measurement          was proportional     to the value          standard    deviation of repeated measurement            of a single value
measured,    rather than constant,      values for Sex and Sey were               of y. Thus when y is measured          with a constant      coefficient   of
obtained by two methods. First, the standard            deviation of all          variation,   as is frequently   the case in laboratory       analysis, Sy.x
the x errors and y errors added to the true x and y values was
will increase with increasing values of both y and x. However,
calculated;   experimentally,     this could be done by measuring
as shown in Table 1, little change from the expected                      of
x and y in duplicate,      where:                                                 0.90 is seen, even with a coefficient       of variation    of 20% in the
measurement        of y. Thus, although      the least-squares        model
/i=1
(difference         between   duplicates)2         requires    S., to be constant      for every value of x, the least-
error =                                    2 N                        squares          does not appear to be greatly altered when
is proportional      to x. At least at the magnitude         of S/x      en-
Second, the errors of measurement                     of and 7 were used, i.e.,   countered     in method-comparison       studies, weighted regression
the coefficient of variation times                 either   or 7.                 is not required.

434      CLINICALCHEMISTRY,Vol. 25, No. 3, 1979
1.0                                                                                                          1.0

0.9                                         .

‘C

0.8                                                                                                          0.8
.0

U,
a,
0     0.7                                                                                             0.7
0

U,
U,
0.6                                                                                                      0.6
0
4
-j

0.5                                                                                                      0.5
0            0.1      0.2        0.3          0.4      0.5       0.6          0.7                           0       0.1       0.2        0.3          0.4    0.5     0.6      0.7

Sex   /s                                                                                            Sex/Sx

Fig. 3. The least-squares slope as a function of                            Sex/Sx,       when        Fig.            ofb.5 by the methods ofDeming and Man-
5.Calculation
Sex is constant for all x
del
The first three points to the left are from log-gaussian x-data regressIons, whIle
Predicted b,,,(-     - -   -); gaussian x-d.ata, S,   =         gaussian x.data, S,
5(#{149});                  =   25(0);
the tfree points at far rit are from gaussian x-data (points with S, 5 Identical
log-gaussian x-data, S,           = 127 (a)
to points with 5, = 25) regressIons. Demlng b,,, (#{149});Mandel b,,, (0)

1.5
1.4
1.3                                                                                                                                                        x
x
1.2
0.8
1.1
2..
.0
1.0
0.7

i
0.6

0.5
0       0.1    0.2        0.3         0.4    0.5     0.6    0.7

0           0.1       0.2       0.3          0.4      0.5      0.6           0.7                                                               Sex/Sx
Fig.  6.Least-squares,                                            as
Deming, and Mandel br.,, a function
Sex/Sx                                                             Of Sex/Sx when Sex isproportional                 tothevalueof x
Fig. 4. Calculation of br.,, by the method of Bartlett                                                Predicted b,,., (- - -); least-squares b,,., for gaussian x-data (either S, = 5 or
-

Bartletts method with gaussian x-data (S, = 5 gave identical points to S,                   =   25)                    least-squares b,, for log-gaussian x-data (A); Deming or Mandel
S, = 25) (#{149});
(S); Bartlett’s method with log-gaussian data (0)                                                     b,,., for the corresponding least squares b,., vertically below It (X)

Increasing Imprecision of x Measurement                                         When                     Calculation of the regression line slope                           by the methods     of
Imprecision is Constant for All x Values                                                              Deming and Mandel gave identical results,                             as shown in Figure
5. The slopes remain close to the predicted                          value of 0.9 for both
Figure       2 showsthe effect of increasing        the standard      error                     log-gaussian  and gaussian data, even at                            large values of 5ex /
of measurement         of x on the least-squares         regression slope for
Sx.
gaussian and log-gaussian             x -data. The least-squares                de-
Consistency   of the calculated  regression  coefficient can be
creases steadily from the predicted value of 0.9 with increasing                                      assessed by the ability of regression    with either x or y as the
Sex. Because this decrease is less prominent               as S, of the “true”
independent    variable to produce one line.
x -data increases,         least-squares      slope could perhaps          be ex-
pressed as a function            of the Sex/Sx regardless         of the distri-                            Ifx =       +
bution or dispersion           of the “true” x -data. This postulate              is                        then y = -(ax.y/bx.y)                  + (1/b.)         X

borne out in Figure 3, where a plot of the least-squares                         vs.                        Since y =        +                x
Sex/Sx     yields a single curve for the combined                 gaussian     and                          then by.,, = 1/bx.y
log-gaussian       data. From this graph, significant                 underesti-
if the reverse regression      gives the same line. Results of per-
mation (>0.05 from the predicted                       of the absolute value
forming the regression withy as the independent             variable are
of the true slope of a regression                 line may occur with the
shown in Table 2. While least-squares          regression      gives two
least-squares       method when SeX/SX exceeds 0.2.
markedly    different    lines at large errors of measurement,        the
An attempt       to obtain the correct slope by the method of
methods of Deming and Mandel are consistent               in producing
Bartlett     is presented       in Figure 4. The Bartlett        b., decreases
one regression     line.
with increasing Sex/Sx to the same extent as the least-squares
slope. The relationship            between Sex/Sx and the Bartlett                                    Increasing Imprecision of x Measurement When
also depends on the distribution               of the “true” x -data, giving
Imprecision Is Proportional to Each x Value
dissimilar      curves for log-gaussian          and gaussian       data. These
results suggest that Bartlett’s             method should not be used in                                 The least-squares      by.x is plotted as a function of SeX/SX in
estimating       the slope of a line when x is measured                      with                     Figure   6. The calculated       standard    deviation  ofallthe com-
error.                                                                                                puter-generated      proportional     errors of measurement   of x and

CLINICALCHEMISTRY,Vol. 25, No. 3, 1979 435
Table 2. Consistency of Least-Squares, Deming, and Mandel Regression Coefficients When                                                                  and
Are Constant a
Gauss (lO,5)b                                  Gauss (100,25)                             Log gauss (100,127)
bra                   1Ib.,                    by.x               i/b5.7                          bra          lID5.,

Least squares                              0.55                   1.48                        0.55              1.48                         0.88            0.92
Deming                                     0.87                   0.87                        0.87              0.87                         0.90            0.90
Mandel                                     0.87                   0.86                        0.87              0.86                         0.90            0.90
For gauss (100,5), S#{149},, = 4; for gauss(100,25) and log gauss (100. 127), S,,
= S,,                                                        =    S,,   =   20.
b   (Mean, standard deviation) of the “true” x-data.

Table 3. Consistency of Least-Squares, Deming, and Mandel Regression Coefficients When S.,,, and Se,.
Are Proportional to x and ya
Gauss (100,5)’                                 Gauss (100,25)                             Log gauss (100, 127)
bra                   l/bx.y                   b7.5                 1/b5.7                        bra          110x.y

Least squares                              0.55                   1.48                        0.54              1.51                         0.85            0.95
Deming                                     0.87                   0.87                        0.86              0.86                         0.90            0.90
Mandel                                     0.87                   0.86                        0.87              0.85                         0.90            0.89
For gauss (100,5), a coefficient of variation of 4% was used to compute x and yerrors; for gauss (100,25) and log gauss (100, 127), a coefficient of variation
#{149}

‘
of 20% was used to compute x and y errors. The average S,,      and S,, were used to calculate Deming and Mandel b,,.,.
b (Mean, standard deviation) of “true”  x-data.

y is used in computing         this ratio, equivalent        to an “average”            may give values for the Mandel by.x that are too low. On the
standard    deviation    of measurement         of x and y that would be             other hand, the ratio of 5ex/5ey       does not change greatly when
calculated     from measuring         x and y in duplicate.           For the        the measurement          errors at the means are used; thus the
gaussian x -data, the curve generated is identical to that when                      Deming           can be calculated      if repeated  measurements   of
Sex is constant      (Figure 3). For the log-gaussian data, a dif-                   a control or patient sample are made of a specimen close to the
ferent relationship       is evident,      although       the least-squares          value of and 57.
is not markedly      different    from that for gaussian data at                  Consistency      of the calculated      slopes at large error mea-
the same Sex/Sx. However, the methods of both Deming and                             surements     is shown in Table 3. As in the case when the error
Mandel give identical values of                 close to 0.9 for all x -data         of measurement        is constant,  the reverse regression yields the
at all Sex/Sx.                                                                       same line for the Deming and Mandel methods, while different
Because it may not always be practical                 to perform mea-           lines are obtained       with the least-squares     method.
surements      of x and y in duplicate             to obtain the average
Sex and S when Sex and 5ey are not constant,                  we investigated        Regression Analysis of Laboratory                       Method-
the validity      of using the standard            deviation     of repeated         Comparison Data
measurements        of x and y at the mean of the data, i.e., the
Data with small S,. Regression                    analysis     of clustered
coefficient of variation for x andy times               and 57 Consistently
method comparison data is a well-known                  laboratory     nemesis.
lower values of Sex/Sx are obtained              by this calculation,       par-
Figure 7 shows a plot of, 87 sodium determinations                      by two
ticularly for the log-gaussian         data. Thus the use of Sex about
flame-photometric         methods. The “reference”            method (x -axis)
data has a mean of 139.8 and standard               deviation of 2.67, while
the “test” method           (y-axis) data has a mean of 140.7 and
B                       standard    deviation      of 2.60. Although       a line with a slope close
to 1 is expected,     least-squares       regression     gives a        of 0.66;
ISO                                                                  when regression         is performed        with y as the independent
U-
a
variable, markedly differentvalue of                          = 1/bx.y   = 1.43
-J                                                                        results. Precision estimates from quality-control                samples near
2                                                                         the mean give approximate             measurement          errors of 1.09 and
*   +.;
140                                                                  0.83 for x andy, respectively. Thus Sex/Sx = 0.41, suggesting
necessity for Deming’s method. As seen in Figure 7, Deming’s
0
E
calculation    gives one line with by.x = 1.09, regardless                     of
E                                                                         whether x or y is used as the independent                 variable. This line
130                                                                  is a better estimate       of the relationship        between x and y sug-
gested by the data.
Data with outliers.          Figure 8 shows a plot of 169 samples
assayed forcalcium by atomic absorption (x-axis)                     and SMA
120                                                                  12-SO (y-axis). For these data,              = 9.58,51 = 9.41, Sx = 0.755,
120      130               140             150                 Sy = 0.741, Sex = 0.12, and Sey = 0.14. Since Sex/Sx = 0.16,
SODIUM. mmol/I -SMA-6/60                             imprecision     in the measurement             of the x values does not
Fig.                                 seraanalyzedby
7.Sodium values(mmol/L)forpatients’                                              greatly bias the least-squares            slope estimate.       Yet the least-
two flame-photometric
methods                                                          squares regression        equation    is:
N = 87. Nu’nerals substitute for points where more than one data point occties
a single space. A. Least-squares regression with xas the Independent variable,                                      y   =   1.96 ± 0.78x
y = 47.6 + 0.667x. B. Least-squares regression with y as the Independent
variable. y = -59.2 + 143x. C. Denling regression, using either x or y as the        giving a        value substantially  lower than the expected slope
independent variable, y= -11.7 + 1.09x                                               of 1.0. Inspection    of the data in Figure 8 suggests one or more

CHEMISTRY,Vol.
436 CLINICAL            25,No.3,1979
by repeatedly assaying a sample close to the mean of the data,
Deming’s method gives better results.
On the basis of the results in this paper, the following
a
guidelines linear regression analysisre suggested:
for
0                                                                                      Always plotthe data; perform and apply least-squares
#{149}
regressionnalysis only to the regionoflinearity.
a                                           Although
not stressed inthis paper, curvilinear    deviation will markedly
alter the regression   slope.  Furthermore,      suspectedoutliers
may be identified from the data plot.
of           o
A rough estimate the effectf measurement errors
#{149}                                                        in
x can be made by looking at the ratio      S/S,     where 5ex rep-
resents the precision of a single x measurement      near . If this
-J
ratio exceeds 0.2, significant error occurs in the least-squares
estimate of slope,and Deming’s by.x should be computed. If
the data are markedly skewed (e.g., S > ), and the error of
7       8      9         10        11       12    13     14               measurement of x is proportional to x, a ratio of 0.15 or greater
CALCIUM, mg/dI-AAS                                     may be indicative of significant error in the least-squares
Fig. Calcium values(mg/dL)forpatients’ sera analyzed by
8.                                                                                    slope. Alternatively,   more data can be selected that will in-
atomic absorption (x) and continuous-flow (y)                                             crease the value of S,, or the error of measurement          of the
N = 169. Nirnerals substitute for points where more than one data point occupies          sample by method x may be decreased by averaging N mea-
a single space. A. Least-squares regression line with all data poInts, showing            surements, where:
point no. 1 greater than 4 S,., the lIne, y = 1.96 + 0.78x. B. Least-squares
from
regression line omitting point no.1, showIng point no.2 greater than 4 S,,.,from
the line, y = 0.39 + 0.95x C. Least-squares regression line omItting points 1                             s error   of average   -   Serror of single measurement
and 2, y = 0.04 + 0.99x
Calculation
#{149}            of the Deming                t
requiresheratio, /Sey.
Sex
Estimates    may be obtained      by precision  analysis of a single
sample close to the mean of the data, or, alternatively,           by
spurious data points,the most aberrant       being at x = 13.8, y                         measuring     duplicate    x and y values, where:
= 9.7. The predicted value of y by the regression equation at
x = 13.8 is 12.8, differing from the observed value of 9.7 by 3.1.                                                   /       (difference     between duplicates)2
This difference exceeds four times the standard error of es-                                                        /i=1
timate (Sy.x = 0.49,4 Sy.x = 1.95), and thus this point should                                   ‘error     -                                 2 N
be rejected from the data set. Recomputation of the regression
equation yields:                                                                          If the average of the duplicates is used for the x andy values
in the regression analysis, then it must be remembered that
y     =   0.39 + 0.95x                                     the standard deviationof measurement of an average of two
with       Sy.x=    0.37. Although       outlier        point no. 2 (Figure   8) is not   valuesequals the standard deviation of measurement of a
greater than 4S,,. in the y-direction from the initial regression                         single value divided by ‘V.
line, it can be shown to be greater than 4 S,,. in the vertical                                 The
#{149} standard    error of regression      should always be cal-
direction from the new regression line with point no. 1 omit-                             culated. For either least-squares     or Deming           this statistic
ted. Recalculation      with points 1 and 2 deleted gives the                             may be easily computed      from parameters      likely to be obtained
from a calculator   intended    for scientific    use:
least-squares   regression line:
y     =   0.04 + 0.99x
S        =              (S,,,
-              S)
When    this new line is used, no further data points have y
values  deviating by more than 4 S,,., from the y value pre-                              and can be interpreted    as the standard deviation of the mean
dicted   by the regression line. Thus, even though 169 data                               value expected    for y for a given value of x close to 1. It is a
points  were used, two spurious values can still significantly                            measure   of scatter of the points about the regression            line.
change the least-squares    regression slope. These values may                            Although   more complex and statistically          exact methods     are
easily be identified and omitted by noting that their residual                            available  (10), approximate      detection    of significant outliers
error (observed    y value - predicted  9 value) is greater than                          that may bias the least-squares          slope may be made by ex-
4 S,,.,,                                                                                  cluding any data point whose y value differs from that pre-
dicted by the regression      line by more than Si,,.,. However,         a
Discussion                                                                                large number of data points should not be excluded.
Least-squares  (Model I) regression analysis may be the                                   When measurements         ofx and y are both subject to error,
inappropriate regression technique to use when x is measured                                                                      the
as they are in the clinical laboratory, least-squares      re-
with imprecision.    We have compared     the Model II regressiun                         gression method may givetwo very disparate lines, depending
solutions of Deming, Mandel, and Bartlett for estimating                                  on whether x or y is used as the independentvariable.Neither
when x is measured with error, and find that only the methods                             line expresses                 r
the functionalelationship between the true
valuesof x andy; both are altered by errors of measurement
of Deming and Mandel compute         the correct slope.
The methods of both Deming and Mandel assume that the                                  in the independent variant.The method of Deming can pro-
error of measurement    remains constant throughout       the range                                       to
vide a solution this dilemma, yielding one regression line
of values. However, clinical laboratory     measurements     usually                      between x and y thattakes into account the errors of mea-
increase in absolute imprecision     when larger values are mea-                          surement of both variables.
sured. We have approximated      this situation   by using a model
in which the error of measurement      has a constant coefficient
of variation. If the “average” error is calculated by the method
of duplicates, both Deming and Mandel methods will yield                                    We thank        Drg. John Brimm, Lemuel Bowie, and Rupert Miller for
close to the expected value; however, if precision is estimated                           assistance      with this manuscript.

CHEMISTRY,Vol.
CLINICAL            25,No.3,1979                437
References                                                                   evaluation of regression lines. Clin. Chim. Acta 64, 173 (1975).
6. Mandel, J., The Statistical Analysis of Experimental      Data. John
1.Westgard, J. 0., and Hunt, M. R., Use and interpretation of com-           Wiley and Sons,New York,NY, 1964, pp 290-291.
mon statistical tests in method-comparison studies. Clin. Chem. 19,          7. Blomqvist, N., Cederblad, G., Korsan-Bengtsen,       K., and Wailer-
49 (1973).                                                                   stedt, S., Application of a method for correcting an observed regression
2. Draper, N. R., and Smith, H., Applied     Regression   Analysis.   John   between change and initial value forthebias caused by random errors
Wiley and Sons, New York, NY, 1966, pp 44-103.                               in the initial value. Clin. Chem. 23, 1845 (1977).
3. Sokal, R. R., and Rohlf, F. J., Biometry. W. H. Freeman and Co.,          8. Ryan, T. A., Joiner, B. L., and Ryan, B. F., Minitab         Student
San Francisco, CA, 1969, pp 481-486.                                         Handbook. Duxbury Press, North Scituate, MA, 1976.
4. Deming, W. E., Statistical Adjustment       of Data. John Wiley and       9. Diem, K., and Lentner, C., Eds., Documenta        Geigy. Ciba Geigy
Sons, New York, NY, 1943, p 184.                                             Limited, Basle, Switzerland, 1970, p 164.
5. Wakkers, P. J. M., Hellendoorn, H. B. Z., OpDe Weegh, G. J., and          10. Snedecor, G. W., and Cochran, W. G., Statistical Methods. Iowa
Herspink, W., Applications of statistics in clinical chemistry. A critical   State University Press, Ames, IA, 1967, p 157.

438 CLINICAL             25,No.3,1979
CHEMISTRY,Vol.

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 90 posted: 6/16/2010 language: English pages: 7