# regression by zhangyun

VIEWS: 15 PAGES: 44

• pg 1
```									We use regression analysis to examine the relationship between two continuous variables,
such as blood pressure versus drug dose.

We can also perform t-tests and ANOVA using regression.

A special form of regression, logistic regression, is used for classification problems,
where the outcome variable is a category, such as disease vs no disease.

Cox Proportional Hazards regression analysis is used in survival analysis,
where we examine the effect of a continuous variable (such as gene expression)
on time to an event (such as recurrence of breast cancer)
Regression analysis: how will my blood pressure change when I take blood pressure pills?

Patient ID Number of pills Blood pressure
1              0       117
2              0       112
3              0       139
4              0       154                                                Scatterplot of blood pressure versus
5              0       155                                                           number of pills
6              0       159
7              0       138                                              180
8              0       155                                              160
9              0       139                                              140

Blood pressure
10              0       110                                              120
11              1       117                                              100
12              1       114
80
13              1       142
60
14              1       122
40
15              1       129
16              1       106                                              20
17              1       107                                               0
-1        0             1
18              1       141
19              1       131             independent variable (x) is number of pills
Number of pills
20              1       111             dependent variable (y) is blood pressure
pressure versus

2
Is the mean blood pressure different for patients getting one pill vs. no pill? Use t-test.

Patient ID Number of pillsBlood pressure           t-Test: Two-Sample Assuming Equal Variances
1             0       117
2             0       112                                                   Variable 1 Variable 2
3             0       139                Mean                                    137.8        122
4             0       154                Variance                           353.06667 175.77778
5             0       155                Observations                                10         10
6             0       159                Pooled Variance                    264.42222
7             0       138                Hypothesized Mean Difference                 0
8             0       155                df                                          18
9             0       139                t Stat                             2.1726667
10             0       110                P(T<=t) one-tail                   0.0217023
11             1       117                t Critical one-tail                1.7340636
12             1       114                P(T<=t) two-tail                   0.0434045
13             1       142                t Critical two-tail                 2.100922
14             1       122
15             1       129
16             1       106
17             1       107
18             1       141
19             1       131
20             1       111
t-test pvalue = 0.043405
Regression analysis: how will my blood pressure change when I take blood pressure pills?

Patient ID Number of pills Blood pressure
1               0      117                                                  Add trend line to
2               0      112                                        scatterplot of blood pressure versus
3               0      139                                                   number of pills
4               0      154
5               0      155
180
6               0      159
160
7               0      138
140

Blood pressure
8               0      155
120
9               0      139
100
10               0      110
80
11               1      117
60
12               1      114
40
13               1      142
20
14               1      122
0
15               1      129                                            -1        0                  1
16               1      106
Number of pills
17               1      107
18               1      141
19               1      131             independent variable (x) is number of pills
20               1      111             dependent variable (y) is blood pressure
t-test pvalue = 0.043405
2
Use the Tools / Data analysis / Regression menu to do regression analysis

Patient ID             Number of pills Blood pressure
1                 0             117
2                 0             112
3                 0             139
4                 0             154
5                 0             155
6                 0             159
7                 0             138
8                 0             155
9                 0             139
10                 0             110
11                 1             117
12                 1             114
13                 1             142
14                 1             122
15                 1             129
16                 1             106
17                 1             107
18                 1             141
19                 1             131
20                 1             111

SUMMARY OUTPUT

Regression Statistics
Multiple R          0.455810532
R Square            0.207763241
Standard Error      16.26106461
Observations                 20

ANOVA
df                SS        MS       F
Regression                            1            1248.2 1248.2 4.720481
Residual                             18            4759.6 264.422
Total                                19            6007.8

Coefficients Standard Error          t Stat     P-value
Intercept                        137.8          5.14            26.80      0.0000
Number of pills                   -15.8         7.27             -2.17     0.0434

P-value for number of pills is p = 0.0434, which is significant at the 0.05 level.
The coefficient for the number of pills is -15.8.
The coefficient tells you the slope of the regression line.
That is, for every unit increase in the number of pills, the mean blood pressure goes down by 15.8.
In regression analysis, a significant p-value tells us that the slope is non-zero.

Scatterplot of blood pressure versus
number of pills

180
180
160
Blood pressure   140
120
100
80
60
40
20
0
-1   0                     1   2
Number of pills
Compare the results of a t-test to the results of the regression analysis for 0,1 variable.

t-Test: Two-Sample Assuming Equal Variances
0 pills       1 pill     Difference
Mean                                137.8          122       -15.8
Variance                          353.07      175.78
Observations                        10.00       10.00
Pooled Variance                   264.42
Hypothesized Mean Difference          0.00
df                                  18.00
t Stat                                2.17
P(T<=t) one-tail                      0.02
t Critical one-tail                   1.73
P(T<=t) two-tail                  0.0434
t Critical two-tail                   2.10

Regression
Coefficients Standard Error       t Stat   P-value
Intercept                                  137.8         5.14           26.80   0.0000
Number of pills                             -15.8        7.27           -2.17   0.0434

If the two classes are labelled 0 and 1, the t-test and regression are
doing the same analysis, and give us the same p-value
The difference between the means equals the coefficient (slope)
So t-tests can be formulated as a regression analysis.
Extend the analysis to patients taking 0, 1, or 2 pills

Patient ID             Number of pills Blood pressure
1                 0          117                                      Scatterplot of blood pressure versus
2                 0          112                                                  0, 1 or 2 pills
3                 0          139
4                 0          154                            180
5                 0          155                            160

Blood pressure
140
6                 0          159                            120
7                 0          138                            100
8                 0          155                             80
60
9                 0          139                             40
10                 0          110                             20
0
11                 1          117                                  0      0.5           1          1.5        2
12                 1          114
13                 1          142                                                     Number of pills

14                 1          122
15                 1          129
16                 1          106
17                 1          107
18                 1          141
19                 1          131
20                 1          111
21                 2          109
22                 2          137
23                 2          102
24                 2          106
25                 2          109
26                 2          117
27                 2          138
28                 2          104
29                 2          129
30                 2           93

SUMMARY OUTPUT

Regression Statistics
Multiple R              0.53001801
R Square                0.28091909
Standard Error        15.82049605
Observations                    30

ANOVA
df              SS        MS                         F     Significance F
Regression                          1       2737.800 2737.800                    10.939           0.003
Residual                           28       7008.067  250.288
Total                              29       9745.867

Coefficients Standard Error        t Stat   P-value                 Lower 95% Upper 95%Lower 95.0%
Intercept                     136.43        4.567           29.874    0.000                    127.078 145.788   127.078
Number of pills               -11.7          3.538      -3.307    0.003          -18.946      -4.454   -18.946

P-value for number of pills is p = 0.003, which is significant at the 0.01 level.
The coefficient for the number of pills is -11.7.
The coefficient tells you the slope of the regression line.
That is, for every unit increase in the number of pills, the mean blood pressure goes down by 11.7
In regression analysis, a significant p-value tells us that the slope is non-zero.
2.5

Upper 95.0%
145.788
-4.454
How many pills should I take to get my blood pressure to 120?
If I know the number of pills, how accurately can I predict the blood pressure?
How scattered are the points around the regression line?

Data set 1
Blood
Patient ID Number of pills pressure
1        0       159
2        0       138
Dataset 1. Blood pressure versus number of pills
3        0       155
4        0       139                               180
170

Blood pressure
5        0       110                               160
150
6        1       117                               140
130
7        1       112                               120
8        1       139                               110
100
9        1       154                                90
80
10        1       155                                     0          1         2          3          4       5   6
11        2       117
Number of pills
12        2       114
13        2       142
14        2       122
15        2       129
16        3       106
17        3       107                                         Dataset 2. Blood pressure versus number of pills
18        3       141                               180
Blood pressure

19        3       131                               160
20        3       111                               140
21        4       109                               120
22        4       137                               100
23        4       102                                80
24        4       106                                     0          1         2          3          4       5   6
25        4       109                                                              Number of pills
26        5       117
27        5       108
28        5       104
29        5       129
30        5         93

Data set 2
Blood
Patient ID Number of pills pressure
1        0       143
2        0       142
3        0       144
4        0       140
5        0       143
6        1       143
7        1       135
8        1       141
9        1       136
10        1       139
11   2   133
12   2   134
13   2   138
14   2   130
15   2   133
16   3   132
17   3   131
18   3   135
19   3   134
20   3   127
21   4   122
22   4   126
23   4   127
24   4   124
25   4   129
26   5   117
27   5   121
28   5   122
29   5   123
30   5   124
See Dataset 2 below
We use correlation to describe the scatter of points around the regression line.
Correlation indicates how accurately the x variable predicts the y variable.
Correlation was invented by Pearson
The most common correlation measure is called the Pearson linear correlation coefficient, R.

Dataset 2. Blood pressure versus number of pills

Blood pressure
Dataset 1. Blood pressure versus number of pills                            140

180
Blood pressure

160
140
120
100                                                                               120
80                                                                                      0   1            2           3            4            5
Correlation values
0      1      2        3                              4              5   6
1 => perfect positive correlationpills
Number of
Number of pills
-1 => perfect negative correlation
0 => no correlation

Correlation for dataset 1                                                                               Correlation for dataset 2
Number of pills Blood pressure                                                                          Number of pills
Number of pills                 1                                                                       Number of pills                 1
Blood pressure            -0.604               1                                                        Blood pressure            -0.940

For data set 1, if you know the number of pills, how well can you predict the blood pressure?
How about for data set 2?
In data set 2, blood pressure is much more correlated with the number of pills
ssure versus number of pills

5       6

Blood pressure

1
An example of correlation near one
Blood pressure versus number of pills:
Correlation R near 1.
Blood
Patient ID Number of pills pressure
1        0 130.9157
2        0 130.4289                           160

Blood pressure
155
3        0 130.0122                           150
4        0 130.7251                           145
5        0 130.1121                           140
135
6        1 135.5542                           130
7        1 135.0059                           125
0        1        2          3          4    5   6
8        1 135.4413
9        1 135.0049                                                       Number of pills
10        1 135.5553
11        2 140.1523
12        2 140.936          Calculate correlation using the workbook functions Correl or Pearson
13        2 140.0939
14        2 140.5788         =PEARSON(B4:B33,C4:C33)
15        2 140.3806          0.999401
16        3 145.3203
17        3 145.9028         =Correl(B4:B33,C4:C33)
18        3 145.9257          0.999401
19        3 145.3606
20        3 145.4267
21        4 150.8455
22        4 150.8062
23        4 150.7665
24        4 150.7404
25        4 150.8482
26        5 155.9677
27        5 155.1016
28        5 155.2945
29        5 155.4089
30        5 155.6955
6

ctions Correl or Pearson
An example of correlation near zero
Blood pressure versus number of pills:
Patient ID Number of pills pressure
Blood                             Correlation R near zero
1        0       115
2        0       147        200
3        0       150        150
4        0       144        100
5        0       129        50
6        1       116          0
7        1       151              0               2                4           6
8        1       131
9        1       152
10        1       103        Calculate correlation using the workbook functions Correl or Pearson
11        2       118
12        2       100
13        2       148
14        2       121        =PEARSON(B4:B33,C4:C33)
15        2       137         -0.01674
16        3       155
17        3       139        =Correl(B4:B33,C4:C33)
18        3       145         -0.01674
19        3       108
20        3       119        The Pearson correlation coefficient, R, tells us, given one variable,
21        4       105        how well we can predict a second (correlated) variable.
22        4       108        For example, given the number of pills, how well can we predict
23        4       100        the blood pressure?
24        4       110
25        4       117        R ranges from 1.0, meaning perfect prediction,
26        5       143        to 0.0, meaning no predictive value at all,
27        5       135        to -1.0, meaning that the variables are
28        5       149        perfectly correlated but go in opposite directions,
29        5       153        that is, they are negatively correlated.
30        5       158
ctions Correl or Pearson

us, given one variable,

well can we predict
Effect of outliers on Pearson correlation

Drug dose Blood pressure                                           Drug dose Blood pressure
5              151                                                 5            151
5              145                                                 5            145
5              136                                                 5            136
10              137                                               10             137
10              124                                               10             124
10              124                                               10             124
15              111                                               15             111
15              105                                               15             105
20              110                                               20             110
20               98                                               20             150
Pearson R=                             -0.922204443                Pearson R=                     -0.472650854

Blood pressure vs drug dose                               Blood pressure vs drug dose
with outlier
200

160
Blood pressure
140
Blood pressure

150
120
100
100                                                                80
60
50
40
20
0
0                                                                      0       10         20     30
0             10               20   30

Drug dose                                                Drug dose
Effect of outliers on Pearson correlation

x value            y value                           x value        y value
1                  4                            1       4
1                  1                            1       2
2                  3                            2       3
2                  3                            2       3
3                  1                            3       1
3                  4                            3       4
4                  3                            4       3
4                  2                            4       2
4                  3                           10      10
Pearson R=                   0.0000                  Pearson R= 0.812324

10                                                   12

10
8

8
6

6

4
4

2
2

0                                                    0
0   2         4       6        8   10                0   2   4    6    8   10   12

Spearman rank correlation is less sensitive to outliers than Pearson linear correlation.
Spearman can detect monotonic but non-linear correlations better than Pearson correlation can.
Multiple regression

Do other variables, such as age, also affect blood pressure?
Can we predict blood pressure better if we know both age and the number of pills?
We use multiple regression to answer these questions.
It is called "multiple" because we have multiple independent variables.

Data set 1, with Age included                                                        Dataset 1. Blood pressure versus number of pills
Patient ID       Age         Number of pills Blood pressure

Blood pressure
1          86               0            159                        180
160
2          54               0            138                        140
3          74               0            155                        120
100
4          58               0            139                         80
5          67               0            110                              0              2                 4
6          50               1            117                                             Number of pills
7          66               1            112
8          81               1            139
9          91               1            154                              Blood pressure versus age
10          78               1            155

Blood pressure
11          58               2            117
180
12          48               2            114                        160
13          84               2            142                        140
120
14          73               2            122                        100
80
15          74               2            129                              40            60              80
16          57               3            106
17          65               3            107                                                    Age
18          69               3            141
19          78               3            131
20          52               3            111
21          65               4            109
22          73               4            137
23          61               4            102
24          62               4            106
25          52               4            109
26          54               5            117
27          44               5            108
28          54               5            104
29          68               5            129
30          61               5             93
e versus number of pills

6

versus age

100
Regression of blood pressure on age

Regression Statistics
Multiple R      0.696168
R Square          0.48465                                                                                Blood pressure versus age
0
Standard Error 13.44389

Blood pressure
180
Observations             30                                                                        160
140
120
ANOVA                                                                                              100
80
df        SS       MS        F   Significance F                                   40         60
Regression                   1 4759.196 4759.196 26.33198 1.94E-05
Residual                    28 5060.671 180.7383
Total                       29 9819.867

Standard Error t Stat
Coefficients                      P-value Lower 95%Upper 95% Lower 95.0%Upper 95.0%
Intercept               54.85      13.65        4.02   0.0004    26.90    82.80      26.90      82.80
Age                       1.06      0.21        5.13   0.0000      0.64     1.48       0.64       1.48

Model is BP = 54.8 + 1.06 * Age

RESIDUAL OUTPUT                         Residual = Actual - Predicted

Observation        Actual BP Predicted BPesiduals Squared residual
R                                                                      Age Residual Plot
1          159   145.55      13.45    180.90
2          138   112.14      25.86    668.98                                       50.00

Residuals
3          155   132.82      22.18    491.92
4          139   115.85      23.15    536.01                                        0.00
5          110   125.93     -15.93    253.63                                                0      20
6          117   107.36        9.64    92.89
-50.00
7          112   124.86     -12.86    165.50
8          139   140.25       -1.25     1.55
9          154   151.38        2.62     6.84
10          155   137.06      17.94    321.70
11          117   115.85        1.15     1.33                                                     Age Line Fit Plot
12          114   105.77        8.23    67.72
200
13          142   143.96       -1.96     3.84
Blood pressure

14          122   132.29     -10.29    105.89                                       100
15          129   132.82       -3.82    14.60
0
16          106   115.32       -9.32    86.82
867467669158847465
17          107   123.27     -16.27    264.83
18          141   127.52      13.48    181.80                                                              Age
19          131   137.06       -6.06    36.77
20          111   109.48        1.52     2.30
21          109   123.27     -14.27    203.74
22          137   131.76        5.24    27.46
23          102   119.56     -17.56    308.39
24          106   120.62     -14.62    213.79
25          109   109.48       -0.48     0.23
26          117   111.60        5.40    29.11
27          108   101.53        6.47    41.89
28          104   112.14       -8.14    66.18
29          129   126.46        2.54     6.47
30           93   119.03     -26.03    677.59
5060.671 Sum of squared residuals (SS Residual)
d pressure versus age

80         100

Age

Upper 95.0%

Age Residual Plot

40         60     80       100

Age

Age Line Fit Plot
Blood pressure

847465786561524468       Predicted Blood
pressure
Age
Regression of blood pressure on pills

Dataset 1. Blood pressure versus number of pills
Regression Statistics

Blood pressure
Multiple R       0.604131                                                                                                    180
160
R Square         0.364974                                                                                                    140
100
Standard Error   14.92346                                                                                                     80
Observations             30                                                                                                        0                2

Number of pills
ANOVA
df        SS       MS        F   Significance F
Regression                   1     3584     3584 16.09271 0.000407
Residual                    28 6235.867 222.7095                                                                                             R^2=
Total                       29 9819.867

Standard Error t Stat
Coefficients                     P-value Lower 95%Upper 95%          Upper 95.0%
Lower 95.0%
Intercept           139.7333 4.830266 28.9287 2.11E-22 129.839 149.6277 129.839 149.6277
Number of pills            -6.4 1.595384 -4.01157 0.000407     -9.668   -3.132    -9.668    -3.132

Model is BP = 139.7 - 6.4 * pills

RESIDUAL OUTPUT                           Residual = Actual - Predicted

Observation        Actual BP               R
Predicted BPesiduals Squared residual
1          159    139.7333 19.26667      371.20
2          138    139.7333 -1.73333        3.00
3          155    139.7333 15.26667      233.07                                     Number of pills Residual Plo
4          139    139.7333 -0.73333        0.54
50
5          110    139.7333 -29.7333      884.07
Residuals

6          117    133.3333 -16.3333      266.78
7          112    133.3333 -21.3333      455.11                                0
8          139    133.3333 5.666667       32.11                                         0                           1         2           3
9          154    133.3333 20.66667      427.11                               -50
10          155    133.3333 21.66667      469.44                                                                                   Number of pills
11          117    126.9333 -9.93333       98.67
12          114    126.9333 -12.9333      167.27
13          142    126.9333 15.06667      227.00
14          122    126.9333 -4.93333       24.34
15
16
129
106
126.9333 2.066667
120.5333 -14.5333
4.27
211.22
Number of pills Line Fit Plot
17          107    120.5333 -13.5333      183.15                               200
Blood pressure

18          141    120.5333 20.46667      418.88
100
19          131    120.5333 10.46667      109.55
20          111    120.5333 -9.53333       90.88                                     0
21          109    114.1333 -5.13333       26.35                                             0 00 0 0 1 111 1 22 22 23 3 333 4 4 4 44
22          137    114.1333 22.86667      522.88                                                                         Number of pills
23          102    114.1333 -12.1333      147.22
24          106    114.1333 -8.13333       66.15
25          109    114.1333 -5.13333       26.35
26          117    107.7333 9.266667       85.87
27          108    107.7333 0.266667        0.07
28          104    107.7333 -3.73333       13.94
29          129    107.7333 21.26667      452.27
30           93    107.7333 -14.7333      217.07
6235.867 Sum of squared residuals (SS Residual)
Blood pressure versus number of pills

4             6

Number of pills

0.364974

Upper 95.0%

pills Residual Plot

3          4          5          6

Number of pills

of pills Line Fit Plot
Blood pressure

44 455 555          Predicted Blood
pressure
Multiple regression of blood pressure on age and number of pills

Data set 1, with Age included
Patient ID       Age         Number of pills Blood pressure
1          86               0            159
2          54               0            138                                Dataset 1. Blood pressure versus number of pills
3          74               0            155

Blood pressure
4          58               0            139                              180
160
5          67               0            110                              140
6          50               1            117                              120
100
7          66               1            112                               80
8          81               1            139                                    0              2                 4
9          91               1            154                                                   Number of pills
10          78               1            155
11          58               2            117
12          48               2            114
13          84               2            142
14          73               2            122                                    Blood pressure versus age
15          74               2            129

Blood pressure
16          57               3            106                              180
160
17          65               3            107                              140
18          69               3            141                              120
100
19          78               3            131                               80
20          52               3            111                                    40            60              80
21          65               4            109                                                          Age
22          73               4            137
23          61               4            102
24          62               4            106
25          52               4            109
26          54               5            117
27          44               5            108
28          54               5            104
29          68               5            129
30          61               5             93

SUMMARY OUTPUT

Regression Statistics
Multiple R       0.786213
R Square          0.61813
Standard Error 11.78497
Observations             30

ANOVA
df           SS             MS          F   Significance F
Regression                2   6069.957028    3034.978514 21.85237 2.27E-06
Residual                 27   3749.909639    138.8855422
Total                    29   9819.866667

Coefficients Standard Error      t Stat        P-value Lower 95%Upper 95%Lower 95.0%
Intercept        80.3160          14.5529           5.5189      0.0000 50.4558 110.1761 50.4558
Age               0.8300           0.1962           4.2308      0.0002    0.4274  1.2325    0.4274
Number of pills -4.1899            1.3639          -3.0721      0.0048   -6.9884 -1.3915   -6.9884
Model is BP = 80.3 + 0.83 * Age - 4.19 * Pills
ssure versus number of pills

4               6

umber of pills

ure versus age

80               100

Upper 95.0%
110.1761
1.2325
-1.3915
Model is BP = 80.3 + 0.83 * Age - 4.19 * Pills

Regression Statistics
Multiple R             0.786213
R Square                 0.61813                      R Square =       0.61813
Standard Error         11.78497
Observations                  30

ANOVA
df           SS                 MS         F   Significance F
Regression                       2           6070.0         3035.0 21.85237 2.27E-06
Residual                        27           3749.9          138.9
Total                           29           9819.9

Coefficients Standard Error           t Stat       P-value Lower 95%Upper 95% Lower 95.0%
Intercept                 80.32           14.55                 5.52     0.0000    50.46   110.18      50.46
Age                        0.83            0.20                 4.23     0.0002      0.43     1.23       0.43
Number of pills           -4.19            1.36                -3.07     0.0048     -6.99    -1.39      -6.99

Model is BP = 80.3 + 0.83 * Age - 4.19 * Pills

RESIDUAL OUTPUT                                       Residual = Actual - Predicted

Observation            Actual BP Predicted BP     Residuals      Squared residual
1
2
159
138
151.28
125.13
7.72
12.87
59.63
165.54
Number of pills L
3          155         141.32          13.68     187.19                                   200

Blood pressure
4          139         128.04          10.96     120.15
5          110         135.92         -25.92     672.02                                   100
6          117         117.21          -0.21       0.04                                        0
7          112         130.90         -18.90     357.34                                             0 0 000 1 1 1 11 2
8          139         142.94          -3.94      15.51                                                           Number of pills
9          154         151.65            2.35      5.51
10          155         140.45          14.55     211.76
11          117         119.66          -2.66       7.07
12          114         111.77            2.23      4.95                                                          Age Line F
13          142         141.65            0.35      0.12
14          122         132.52         -10.52     110.74                                       200
Blood pressure

15          129         132.94          -3.94      15.51                                       100
16          106         115.05          -9.05      81.97
17          107         121.28         -14.28     203.88                                         0
18          141         124.60          16.40     269.01                                             867467669158
19          131         132.07          -1.07       1.14
20          111         110.49            0.51      0.26
21          109         117.09          -8.09      65.43
22          137         123.73          13.27     176.13
23          102         114.18         -12.18     148.45
24          106         115.01          -9.01      81.25
25          109         106.30            2.70      7.29
26          117         103.77          13.23     175.05
27          108           95.88         12.12     146.78
28          104         104.18          -0.18       0.03
29          129         115.39          13.61     185.27
30   93   109.58   -16.58    274.86
3749.91 Sum of squared residuals (SS Residual)
Upper 95.0%
110.18
1.23
-1.39

Number of pills Line Fit Plot
Blood pressure

1 1 2 2 2 2 2 3 3 3 3 3 44 4 44 55 5 5 5   Predicted Blood
pressure
Number of pills

Age Line Fit Plot
Blood pressure

Predicted Blood
669158847465786561524468
pressure
Age
duals (SS Residual)
Multiple regression

Mean BP =               123.7

Patient ID        Age           Number of pills Blood pressure BP deviation from mean
1           86                  0             159                     35.3
2           54                  0             138                     14.3
3           74                  0             155                     31.3
4           58                  0             139                     15.3
5           67                  0             110                    -13.7
6           50                  1             117                      -6.7
7           66                  1             112                    -11.7
8           81                  1             139                     15.3
9           91                  1             154                     30.3
10           78                  1             155                     31.3
11           58                  2             117                      -6.7
12           48                  2             114                      -9.7
13           84                  2             142                     18.3
14           73                  2             122                      -1.7
15           74                  2             129                       5.3
16           57                  3             106                    -17.7
17           65                  3             107                    -16.7
18           69                  3             141                     17.3
19           78                  3             131                       7.3
20           52                  3             111                    -12.7
21           65                  4             109                    -14.7
22           73                  4             137                     13.3
23           61                  4             102                    -21.7
24           62                  4             106                    -17.7
25           52                  4             109                    -14.7
26           54                  5             117                      -6.7
27           44                  5             108                    -15.7
28           54                  5             104                    -19.7
29           68                  5             129                       5.3
30           61                  5              93                    -30.7
Sum of Squared deviation from mean =
Squared deviation from mean
1243.7
203.5
977.6
233.1
188.6
45.3
137.7
233.1
916.1
977.6
45.3
94.7
333.7
3.0
27.7
314.5
280.0
298.1
52.8
162.1
217.1
176.0
472.3
314.5
217.1
45.3
247.5
389.4
27.7
944.5
9819.9 (Total SS)
A subtle problem, a common error, and lessons for
designing and interpreting clinical trials

Data set 1, with Age included                                                                   Dataset 1. Blood pressure versus number of pills
Patient ID            Age            Number of pills Blood pressure

Blood pressure
1            86                 0            159                           180
160
2            54                 0            138                           140
3            74                 0            155                           120
100
4            58                 0            139                            80
5            67                 0            110                                 0              2                 4
6            50                 1            117                                                Number of pills
7            66                 1            112
8            81                 1            139
9            91                 1            154                                 Blood pressure versus age
10            78                 1            155

Blood pressure
11            58                 2            117
180
12            48                 2            114                           160
13            84                 2            142                           140
120
14            73                 2            122                           100
80
15            74                 2            129                                 40            60              80
16            57                 3            106
17            65                 3            107                                                       Age
18            69                 3            141
19            78                 3            131
20            52                 3            111
21            65                 4            109
22            73                 4            137
23            61                 4            102
24            62                 4            106
25            52                 4            109
26            54                 5            117
27            44                 5            108
28            54                 5            104
29            68                 5            129
30            61                 5             93

Multiple regression of blood pressure on age and number of pills
Coefficients     P-value                        When we regress on age and number of pills:
Intercept                  80.3160          0.0000                    The coefficient for the number of pills is -4.19.
Age                          0.8300         0.0002                    The coefficient tells you the slope of the regression line.
Number of pills             -4.1899         0.0048                    That is, for every unit increase in the number of pills,
the mean blood pressure goes down by 4.19.

Regression of blood pressure on number of pills                       When we regress on number of pills alone:
Coefficients   P-value                           The coefficient for the number of pills is -6.4.
Intercept               139.7333          0.0000                      The coefficient tells you the slope of the regression line.
Number of pills           -6.4000         0.0004                      That is, for every unit increase in the number of pills,
the mean blood pressure goes down by 6.4.

So does one pill lower blood pressure by 4.19 or by 6.40?
What is going on?
Age versus Number of pills

Number of pills   6
5
4
3
2
1
0
40   50     60       70      80    90   100
Age

It appears that the number of pills may be related to age.
If younger patients have lower blood pressure, and they get more pills,
it would give the impression that more pills cause lower blood pressure.
How can we quantify and test if the number of pills is related to age?

Regression of the number of pills versus age.
Coefficients       P-value
Intercept               71.5905            0.0000
Number of pills          -2.6629           0.0367

The p-value of 0.0367 indicates that the number of pills is significantly associated with age.

Correlation of the number of pills versus age.
-0.38 =PEARSON(B5:B35,C5:C35)

The negative correlation indicates that as patient age increases, the number of pills given decreases.

The apparent effect of the pills (to decrease blood pressure) may be due, in part,
to bias in giving younger patients (with low blood pressure) more pills.

This type of bias is common in observational studies, and is a major reason
why we use randomized, controlled clinical trials, in which
patients would be randomly assigned a number of pills,
without regard to their age.

Whenever we read or do an analysis, we should consider:

Are the subjects randomly assigned to treatments?
Are there correlations among the independent variables?
e versus number of pills

6

versus age

100

mber of pills:
pills is -4.19.
e of the regression line.
the number of pills,
own by 4.19.

e of the regression line.
the number of pills,

```
To top