regression by zhangyun

VIEWS: 15 PAGES: 44

									We use regression analysis to examine the relationship between two continuous variables,
such as blood pressure versus drug dose.

We can also perform t-tests and ANOVA using regression.

A special form of regression, logistic regression, is used for classification problems,
where the outcome variable is a category, such as disease vs no disease.

Cox Proportional Hazards regression analysis is used in survival analysis,
where we examine the effect of a continuous variable (such as gene expression)
on time to an event (such as recurrence of breast cancer)
Regression analysis: how will my blood pressure change when I take blood pressure pills?

Patient ID Number of pills Blood pressure
          1              0       117
          2              0       112
          3              0       139
          4              0       154                                                Scatterplot of blood pressure versus
          5              0       155                                                           number of pills
          6              0       159
          7              0       138                                              180
          8              0       155                                              160
          9              0       139                                              140




                                                                 Blood pressure
         10              0       110                                              120
         11              1       117                                              100
         12              1       114
                                                                                  80
         13              1       142
                                                                                  60
         14              1       122
                                                                                  40
         15              1       129
         16              1       106                                              20
         17              1       107                                               0
                                                                                        -1        0             1
         18              1       141
         19              1       131             independent variable (x) is number of pills
                                                                                        Number of pills
         20              1       111             dependent variable (y) is blood pressure
pressure versus




                  2
Is the mean blood pressure different for patients getting one pill vs. no pill? Use t-test.

Patient ID Number of pillsBlood pressure           t-Test: Two-Sample Assuming Equal Variances
          1             0       117
          2             0       112                                                   Variable 1 Variable 2
          3             0       139                Mean                                    137.8        122
          4             0       154                Variance                           353.06667 175.77778
          5             0       155                Observations                                10         10
          6             0       159                Pooled Variance                    264.42222
          7             0       138                Hypothesized Mean Difference                 0
          8             0       155                df                                          18
          9             0       139                t Stat                             2.1726667
         10             0       110                P(T<=t) one-tail                   0.0217023
         11             1       117                t Critical one-tail                1.7340636
         12             1       114                P(T<=t) two-tail                   0.0434045
         13             1       142                t Critical two-tail                 2.100922
         14             1       122
         15             1       129
         16             1       106
         17             1       107
         18             1       141
         19             1       131
         20             1       111
          t-test pvalue = 0.043405
Regression analysis: how will my blood pressure change when I take blood pressure pills?

Patient ID Number of pills Blood pressure
          1               0      117                                                  Add trend line to
          2               0      112                                        scatterplot of blood pressure versus
          3               0      139                                                   number of pills
          4               0      154
          5               0      155
                                                                          180
          6               0      159
                                                                          160
          7               0      138
                                                                          140




                                                         Blood pressure
          8               0      155
                                                                          120
          9               0      139
                                                                          100
         10               0      110
                                                                          80
         11               1      117
                                                                          60
         12               1      114
                                                                          40
         13               1      142
                                                                          20
         14               1      122
                                                                           0
         15               1      129                                            -1        0                  1
         16               1      106
                                                                                           Number of pills
         17               1      107
         18               1      141
         19               1      131             independent variable (x) is number of pills
         20               1      111             dependent variable (y) is blood pressure
            t-test pvalue = 0.043405
2
Use the Tools / Data analysis / Regression menu to do regression analysis

Patient ID             Number of pills Blood pressure
                   1                 0             117
                   2                 0             112
                   3                 0             139
                   4                 0             154
                   5                 0             155
                   6                 0             159
                   7                 0             138
                   8                 0             155
                   9                 0             139
                  10                 0             110
                  11                 1             117
                  12                 1             114
                  13                 1             142
                  14                 1             122
                  15                 1             129
                  16                 1             106
                  17                 1             107
                  18                 1             141
                  19                 1             131
                  20                 1             111

SUMMARY OUTPUT

       Regression Statistics
Multiple R          0.455810532
R Square            0.207763241
Adjusted R Square 0.163750088
Standard Error      16.26106461
Observations                 20

ANOVA
                               df                SS        MS       F
Regression                            1            1248.2 1248.2 4.720481
Residual                             18            4759.6 264.422
Total                                19            6007.8

                         Coefficients Standard Error          t Stat     P-value
Intercept                        137.8          5.14            26.80      0.0000
Number of pills                   -15.8         7.27             -2.17     0.0434

P-value for number of pills is p = 0.0434, which is significant at the 0.05 level.
The coefficient for the number of pills is -15.8.
The coefficient tells you the slope of the regression line.
That is, for every unit increase in the number of pills, the mean blood pressure goes down by 15.8.
In regression analysis, a significant p-value tells us that the slope is non-zero.

                       Scatterplot of blood pressure versus
                                  number of pills


         180
                 180
                 160
Blood pressure   140
                 120
                 100
                  80
                  60
                  40
                  20
                   0
                       -1   0                     1   2
                                Number of pills
Compare the results of a t-test to the results of the regression analysis for 0,1 variable.

t-Test: Two-Sample Assuming Equal Variances
                              0 pills       1 pill     Difference
Mean                                137.8          122       -15.8
Variance                          353.07      175.78
Observations                        10.00       10.00
Pooled Variance                   264.42
Hypothesized Mean Difference          0.00
df                                  18.00
t Stat                                2.17
P(T<=t) one-tail                      0.02
t Critical one-tail                   1.73
P(T<=t) two-tail                  0.0434
t Critical two-tail                   2.10

Regression
                                   Coefficients Standard Error       t Stat   P-value
Intercept                                  137.8         5.14           26.80   0.0000
Number of pills                             -15.8        7.27           -2.17   0.0434

If the two classes are labelled 0 and 1, the t-test and regression are
doing the same analysis, and give us the same p-value
The difference between the means equals the coefficient (slope)
So t-tests can be formulated as a regression analysis.
Extend the analysis to patients taking 0, 1, or 2 pills

Patient ID             Number of pills Blood pressure
                   1                 0          117                                      Scatterplot of blood pressure versus
                   2                 0          112                                                  0, 1 or 2 pills
                   3                 0          139
                   4                 0          154                            180
                   5                 0          155                            160




                                                              Blood pressure
                                                                               140
                   6                 0          159                            120
                   7                 0          138                            100
                   8                 0          155                             80
                                                                                60
                   9                 0          139                             40
                  10                 0          110                             20
                                                                                 0
                  11                 1          117                                  0      0.5           1          1.5        2
                  12                 1          114
                  13                 1          142                                                     Number of pills

                  14                 1          122
                  15                 1          129
                  16                 1          106
                  17                 1          107
                  18                 1          141
                  19                 1          131
                  20                 1          111
                  21                 2          109
                  22                 2          137
                  23                 2          102
                  24                 2          106
                  25                 2          109
                  26                 2          117
                  27                 2          138
                  28                 2          104
                  29                 2          129
                  30                 2           93


SUMMARY OUTPUT

        Regression Statistics
Multiple R              0.53001801
R Square                0.28091909
Adjusted R Square     0.255237629
Standard Error        15.82049605
Observations                    30

ANOVA
                             df              SS        MS                         F     Significance F
Regression                          1       2737.800 2737.800                    10.939           0.003
Residual                           28       7008.067  250.288
Total                              29       9745.867

                       Coefficients Standard Error        t Stat   P-value                 Lower 95% Upper 95%Lower 95.0%
Intercept                     136.43        4.567           29.874    0.000                    127.078 145.788   127.078
Number of pills               -11.7          3.538      -3.307    0.003          -18.946      -4.454   -18.946

P-value for number of pills is p = 0.003, which is significant at the 0.01 level.
The coefficient for the number of pills is -11.7.
The coefficient tells you the slope of the regression line.
That is, for every unit increase in the number of pills, the mean blood pressure goes down by 11.7
In regression analysis, a significant p-value tells us that the slope is non-zero.
  2.5




Upper 95.0%
     145.788
-4.454
How many pills should I take to get my blood pressure to 120?
If I know the number of pills, how accurately can I predict the blood pressure?
How scattered are the points around the regression line?

Data set 1
                     Blood
Patient ID Number of pills pressure
          1        0       159
          2        0       138
                                                                       Dataset 1. Blood pressure versus number of pills
          3        0       155
          4        0       139                               180
                                                             170




                                            Blood pressure
          5        0       110                               160
                                                             150
          6        1       117                               140
                                                             130
          7        1       112                               120
          8        1       139                               110
                                                             100
          9        1       154                                90
                                                              80
         10        1       155                                     0          1         2          3          4       5   6
         11        2       117
                                                                                            Number of pills
         12        2       114
         13        2       142
         14        2       122
         15        2       129
         16        3       106
         17        3       107                                         Dataset 2. Blood pressure versus number of pills
         18        3       141                               180
                                        Blood pressure




         19        3       131                               160
         20        3       111                               140
         21        4       109                               120
         22        4       137                               100
         23        4       102                                80
         24        4       106                                     0          1         2          3          4       5   6
         25        4       109                                                              Number of pills
         26        5       117
         27        5       108
         28        5       104
         29        5       129
         30        5         93




Data set 2
                     Blood
Patient ID Number of pills pressure
          1        0       143
          2        0       142
          3        0       144
          4        0       140
          5        0       143
          6        1       143
          7        1       135
          8        1       141
          9        1       136
         10        1       139
11   2   133
12   2   134
13   2   138
14   2   130
15   2   133
16   3   132
17   3   131
18   3   135
19   3   134
20   3   127
21   4   122
22   4   126
23   4   127
24   4   124
25   4   129
26   5   117
27   5   121
28   5   122
29   5   123
30   5   124
See Dataset 2 below
We use correlation to describe the scatter of points around the regression line.
Correlation indicates how accurately the x variable predicts the y variable.
Correlation was invented by Pearson
The most common correlation measure is called the Pearson linear correlation coefficient, R.



                                                                                                                      Dataset 2. Blood pressure versus number of pills




                                                                                       Blood pressure
                            Dataset 1. Blood pressure versus number of pills                            140

                      180
     Blood pressure




                      160
                      140
                      120
                      100                                                                               120
                      80                                                                                      0   1            2           3            4            5
Correlation values
          0      1      2        3                              4              5   6
1 => perfect positive correlationpills
                          Number of
                                                                                                                                   Number of pills
-1 => perfect negative correlation
0 => no correlation

Correlation for dataset 1                                                                               Correlation for dataset 2
                 Number of pills Blood pressure                                                                          Number of pills
Number of pills                 1                                                                       Number of pills                 1
Blood pressure            -0.604               1                                                        Blood pressure            -0.940

For data set 1, if you know the number of pills, how well can you predict the blood pressure?
How about for data set 2?
In data set 2, blood pressure is much more correlated with the number of pills
ssure versus number of pills




                           5       6




                               Blood pressure

                                            1
An example of correlation near one
                                                                  Blood pressure versus number of pills:
                                                                          Correlation R near 1.
                     Blood
Patient ID Number of pills pressure
          1        0 130.9157
          2        0 130.4289                           160




                                       Blood pressure
                                                        155
          3        0 130.0122                           150
          4        0 130.7251                           145
          5        0 130.1121                           140
                                                        135
          6        1 135.5542                           130
          7        1 135.0059                           125
                                                              0        1        2          3          4    5   6
          8        1 135.4413
          9        1 135.0049                                                       Number of pills
         10        1 135.5553
         11        2 140.1523
         12        2 140.936          Calculate correlation using the workbook functions Correl or Pearson
         13        2 140.0939
         14        2 140.5788         =PEARSON(B4:B33,C4:C33)
         15        2 140.3806          0.999401
         16        3 145.3203
         17        3 145.9028         =Correl(B4:B33,C4:C33)
         18        3 145.9257          0.999401
         19        3 145.3606
         20        3 145.4267
         21        4 150.8455
         22        4 150.8062
         23        4 150.7665
         24        4 150.7404
         25        4 150.8482
         26        5 155.9677
         27        5 155.1016
         28        5 155.2945
         29        5 155.4089
         30        5 155.6955
           6




ctions Correl or Pearson
An example of correlation near zero
                                                Blood pressure versus number of pills:
Patient ID Number of pills pressure
                     Blood                             Correlation R near zero
          1        0       115
          2        0       147        200
          3        0       150        150
          4        0       144        100
          5        0       129        50
          6        1       116          0
          7        1       151              0               2                4           6
          8        1       131
          9        1       152
         10        1       103        Calculate correlation using the workbook functions Correl or Pearson
         11        2       118
         12        2       100
         13        2       148
         14        2       121        =PEARSON(B4:B33,C4:C33)
         15        2       137         -0.01674
         16        3       155
         17        3       139        =Correl(B4:B33,C4:C33)
         18        3       145         -0.01674
         19        3       108
         20        3       119        The Pearson correlation coefficient, R, tells us, given one variable,
         21        4       105        how well we can predict a second (correlated) variable.
         22        4       108        For example, given the number of pills, how well can we predict
         23        4       100        the blood pressure?
         24        4       110
         25        4       117        R ranges from 1.0, meaning perfect prediction,
         26        5       143        to 0.0, meaning no predictive value at all,
         27        5       135        to -1.0, meaning that the variables are
         28        5       149        perfectly correlated but go in opposite directions,
         29        5       153        that is, they are negatively correlated.
         30        5       158
ctions Correl or Pearson




us, given one variable,

well can we predict
Effect of outliers on Pearson correlation

Drug dose Blood pressure                                           Drug dose Blood pressure
         5              151                                                 5            151
         5              145                                                 5            145
         5              136                                                 5            136
        10              137                                               10             137
        10              124                                               10             124
        10              124                                               10             124
        15              111                                               15             111
        15              105                                               15             105
        20              110                                               20             110
        20               98                                               20             150
Pearson R=                             -0.922204443                Pearson R=                     -0.472650854

                              Blood pressure vs drug dose                               Blood pressure vs drug dose
                                                                                                with outlier
                    200

                                                                                      160
                                                                     Blood pressure
                                                                                      140
   Blood pressure




                    150
                                                                                      120
                                                                                      100
                    100                                                                80
                                                                                       60
                    50
                                                                                       40
                                                                                       20
                                                                                        0
                     0                                                                      0       10         20     30
                          0             10               20   30

                                             Drug dose                                                Drug dose
Effect of outliers on Pearson correlation

x value            y value                           x value        y value
               1                  4                            1       4
               1                  1                            1       2
               2                  3                            2       3
               2                  3                            2       3
               3                  1                            3       1
               3                  4                            3       4
               4                  3                            4       3
               4                  2                            4       2
               4                  3                           10      10
Pearson R=                   0.0000                  Pearson R= 0.812324

  10                                                   12



                                                       10
   8

                                                        8
   6

                                                        6

   4
                                                        4


   2
                                                        2


   0                                                    0
       0   2         4       6        8   10                0   2   4    6    8   10   12




Spearman rank correlation is less sensitive to outliers than Pearson linear correlation.
Spearman can detect monotonic but non-linear correlations better than Pearson correlation can.
Multiple regression

Do other variables, such as age, also affect blood pressure?
Can we predict blood pressure better if we know both age and the number of pills?
We use multiple regression to answer these questions.
It is called "multiple" because we have multiple independent variables.

Data set 1, with Age included                                                        Dataset 1. Blood pressure versus number of pills
Patient ID       Age         Number of pills Blood pressure




                                                                 Blood pressure
               1          86               0            159                        180
                                                                                   160
               2          54               0            138                        140
               3          74               0            155                        120
                                                                                   100
               4          58               0            139                         80
               5          67               0            110                              0              2                 4
               6          50               1            117                                             Number of pills
               7          66               1            112
               8          81               1            139
               9          91               1            154                              Blood pressure versus age
              10          78               1            155




                                                                  Blood pressure
              11          58               2            117
                                                                                   180
              12          48               2            114                        160
              13          84               2            142                        140
                                                                                   120
              14          73               2            122                        100
                                                                                    80
              15          74               2            129                              40            60              80
              16          57               3            106
              17          65               3            107                                                    Age
              18          69               3            141
              19          78               3            131
              20          52               3            111
              21          65               4            109
              22          73               4            137
              23          61               4            102
              24          62               4            106
              25          52               4            109
              26          54               5            117
              27          44               5            108
              28          54               5            104
              29          68               5            129
              30          61               5             93
e versus number of pills




                             6




versus age




                           100
Regression of blood pressure on age

   Regression Statistics
Multiple R      0.696168
R Square          0.48465                                                                                Blood pressure versus age
                0
Adjusted R Square.466244
Standard Error 13.44389




                                                                                Blood pressure
                                                                                                   180
Observations             30                                                                        160
                                                                                                   140
                                                                                                   120
ANOVA                                                                                              100
                                                                                                    80
                       df        SS       MS        F   Significance F                                   40         60
Regression                   1 4759.196 4759.196 26.33198 1.94E-05
Residual                    28 5060.671 180.7383
Total                       29 9819.867

                             Standard Error t Stat
                   Coefficients                      P-value Lower 95%Upper 95% Lower 95.0%Upper 95.0%
Intercept               54.85      13.65        4.02   0.0004    26.90    82.80      26.90      82.80
Age                       1.06      0.21        5.13   0.0000      0.64     1.48       0.64       1.48

Model is BP = 54.8 + 1.06 * Age

RESIDUAL OUTPUT                         Residual = Actual - Predicted

Observation        Actual BP Predicted BPesiduals Squared residual
                                        R                                                                      Age Residual Plot
               1          159   145.55      13.45    180.90
               2          138   112.14      25.86    668.98                                       50.00



                                                                              Residuals
               3          155   132.82      22.18    491.92
               4          139   115.85      23.15    536.01                                        0.00
               5          110   125.93     -15.93    253.63                                                0      20
               6          117   107.36        9.64    92.89
                                                                                                  -50.00
               7          112   124.86     -12.86    165.50
               8          139   140.25       -1.25     1.55
               9          154   151.38        2.62     6.84
              10          155   137.06      17.94    321.70
              11          117   115.85        1.15     1.33                                                     Age Line Fit Plot
              12          114   105.77        8.23    67.72
                                                                                                  200
              13          142   143.96       -1.96     3.84
                                                                                 Blood pressure




              14          122   132.29     -10.29    105.89                                       100
              15          129   132.82       -3.82    14.60
                                                                                                     0
              16          106   115.32       -9.32    86.82
                                                                                                         867467669158847465
              17          107   123.27     -16.27    264.83
              18          141   127.52      13.48    181.80                                                              Age
              19          131   137.06       -6.06    36.77
              20          111   109.48        1.52     2.30
              21          109   123.27     -14.27    203.74
              22          137   131.76        5.24    27.46
              23          102   119.56     -17.56    308.39
              24          106   120.62     -14.62    213.79
              25          109   109.48       -0.48     0.23
              26          117   111.60        5.40    29.11
              27          108   101.53        6.47    41.89
              28          104   112.14       -8.14    66.18
              29          129   126.46        2.54     6.47
              30           93   119.03     -26.03    677.59
5060.671 Sum of squared residuals (SS Residual)
d pressure versus age




                    80         100

              Age



Upper 95.0%




  Age Residual Plot


               40         60     80       100

                    Age



   Age Line Fit Plot
                                Blood pressure


       847465786561524468       Predicted Blood
                                pressure
         Age
Regression of blood pressure on pills

                                                                                                                               Dataset 1. Blood pressure versus number of pills
   Regression Statistics




                                                                                                            Blood pressure
Multiple R       0.604131                                                                                                    180
                                                                                                                             160
R Square         0.364974                                                                                                    140
Adjusted R Square0.342295                                                                                                    120
                                                                                                                             100
Standard Error   14.92346                                                                                                     80
Observations             30                                                                                                        0                2

                                                                                                                                                    Number of pills
ANOVA
                       df        SS       MS        F   Significance F
Regression                   1     3584     3584 16.09271 0.000407
Residual                    28 6235.867 222.7095                                                                                             R^2=
Total                       29 9819.867

                              Standard Error t Stat
                   Coefficients                     P-value Lower 95%Upper 95%          Upper 95.0%
                                                                              Lower 95.0%
Intercept           139.7333 4.830266 28.9287 2.11E-22 129.839 149.6277 129.839 149.6277
Number of pills            -6.4 1.595384 -4.01157 0.000407     -9.668   -3.132    -9.668    -3.132

Model is BP = 139.7 - 6.4 * pills

RESIDUAL OUTPUT                           Residual = Actual - Predicted

Observation        Actual BP               R
                                Predicted BPesiduals Squared residual
               1          159    139.7333 19.26667      371.20
               2          138    139.7333 -1.73333        3.00
               3          155    139.7333 15.26667      233.07                                     Number of pills Residual Plo
               4          139    139.7333 -0.73333        0.54
                                                                                             50
               5          110    139.7333 -29.7333      884.07
                                                                          Residuals



               6          117    133.3333 -16.3333      266.78
               7          112    133.3333 -21.3333      455.11                                0
               8          139    133.3333 5.666667       32.11                                         0                           1         2           3
               9          154    133.3333 20.66667      427.11                               -50
              10          155    133.3333 21.66667      469.44                                                                                   Number of pills
              11          117    126.9333 -9.93333       98.67
              12          114    126.9333 -12.9333      167.27
              13          142    126.9333 15.06667      227.00
              14          122    126.9333 -4.93333       24.34
              15
              16
                          129
                          106
                                 126.9333 2.066667
                                 120.5333 -14.5333
                                                          4.27
                                                        211.22
                                                                                                       Number of pills Line Fit Plot
              17          107    120.5333 -13.5333      183.15                               200
                                                                            Blood pressure




              18          141    120.5333 20.46667      418.88
                                                                                             100
              19          131    120.5333 10.46667      109.55
              20          111    120.5333 -9.53333       90.88                                     0
              21          109    114.1333 -5.13333       26.35                                             0 00 0 0 1 111 1 22 22 23 3 333 4 4 4 44
              22          137    114.1333 22.86667      522.88                                                                         Number of pills
              23          102    114.1333 -12.1333      147.22
              24          106    114.1333 -8.13333       66.15
              25          109    114.1333 -5.13333       26.35
              26          117    107.7333 9.266667       85.87
              27          108    107.7333 0.266667        0.07
              28          104    107.7333 -3.73333       13.94
              29          129    107.7333 21.26667      452.27
              30           93    107.7333 -14.7333      217.07
6235.867 Sum of squared residuals (SS Residual)
 Blood pressure versus number of pills




                            4             6

         Number of pills




                 0.364974


Upper 95.0%




 pills Residual Plot


             3          4          5          6

     Number of pills




of pills Line Fit Plot
                                Blood pressure


            44 455 555          Predicted Blood
                                pressure
Multiple regression of blood pressure on age and number of pills

Data set 1, with Age included
Patient ID       Age         Number of pills Blood pressure
               1          86               0            159
               2          54               0            138                                Dataset 1. Blood pressure versus number of pills
               3          74               0            155




                                                                     Blood pressure
               4          58               0            139                              180
                                                                                         160
               5          67               0            110                              140
               6          50               1            117                              120
                                                                                         100
               7          66               1            112                               80
               8          81               1            139                                    0              2                 4
               9          91               1            154                                                   Number of pills
              10          78               1            155
              11          58               2            117
              12          48               2            114
              13          84               2            142
              14          73               2            122                                    Blood pressure versus age
              15          74               2            129




                                                                        Blood pressure
              16          57               3            106                              180
                                                                                         160
              17          65               3            107                              140
              18          69               3            141                              120
                                                                                         100
              19          78               3            131                               80
              20          52               3            111                                    40            60              80
              21          65               4            109                                                          Age
              22          73               4            137
              23          61               4            102
              24          62               4            106
              25          52               4            109
              26          54               5            117
              27          44               5            108
              28          54               5            104
              29          68               5            129
              30          61               5             93

SUMMARY OUTPUT

   Regression Statistics
Multiple R       0.786213
R Square          0.61813
Adjusted R Square0.589844
Standard Error 11.78497
Observations             30

ANOVA
                    df           SS             MS          F   Significance F
Regression                2   6069.957028    3034.978514 21.85237 2.27E-06
Residual                 27   3749.909639    138.8855422
Total                    29   9819.866667

               Coefficients Standard Error      t Stat        P-value Lower 95%Upper 95%Lower 95.0%
Intercept        80.3160          14.5529           5.5189      0.0000 50.4558 110.1761 50.4558
Age               0.8300           0.1962           4.2308      0.0002    0.4274  1.2325    0.4274
Number of pills -4.1899            1.3639          -3.0721      0.0048   -6.9884 -1.3915   -6.9884
Model is BP = 80.3 + 0.83 * Age - 4.19 * Pills
ssure versus number of pills




                 4               6

umber of pills




ure versus age




              80               100




             Upper 95.0%
               110.1761
                 1.2325
                -1.3915
Model is BP = 80.3 + 0.83 * Age - 4.19 * Pills

      Regression Statistics
Multiple R             0.786213
R Square                 0.61813                      R Square =       0.61813
Adjusted R Square      0.589844
Standard Error         11.78497
Observations                  30

ANOVA
                           df           SS                 MS         F   Significance F
Regression                       2           6070.0         3035.0 21.85237 2.27E-06
Residual                        27           3749.9          138.9
Total                           29           9819.9

                    Coefficients Standard Error           t Stat       P-value Lower 95%Upper 95% Lower 95.0%
Intercept                 80.32           14.55                 5.52     0.0000    50.46   110.18      50.46
Age                        0.83            0.20                 4.23     0.0002      0.43     1.23       0.43
Number of pills           -4.19            1.36                -3.07     0.0048     -6.99    -1.39      -6.99

Model is BP = 80.3 + 0.83 * Age - 4.19 * Pills

RESIDUAL OUTPUT                                       Residual = Actual - Predicted

Observation            Actual BP Predicted BP     Residuals      Squared residual
                   1
                   2
                              159
                              138
                                          151.28
                                          125.13
                                                            7.72
                                                          12.87
                                                                     59.63
                                                                    165.54
                                                                                                                      Number of pills L
                   3          155         141.32          13.68     187.19                                   200




                                                                                           Blood pressure
                   4          139         128.04          10.96     120.15
                   5          110         135.92         -25.92     672.02                                   100
                   6          117         117.21          -0.21       0.04                                        0
                   7          112         130.90         -18.90     357.34                                             0 0 000 1 1 1 11 2
                   8          139         142.94          -3.94      15.51                                                           Number of pills
                   9          154         151.65            2.35      5.51
                  10          155         140.45          14.55     211.76
                  11          117         119.66          -2.66       7.07
                  12          114         111.77            2.23      4.95                                                          Age Line F
                  13          142         141.65            0.35      0.12
                  14          122         132.52         -10.52     110.74                                       200
                                                                                                Blood pressure




                  15          129         132.94          -3.94      15.51                                       100
                  16          106         115.05          -9.05      81.97
                  17          107         121.28         -14.28     203.88                                         0
                  18          141         124.60          16.40     269.01                                             867467669158
                  19          131         132.07          -1.07       1.14
                  20          111         110.49            0.51      0.26
                  21          109         117.09          -8.09      65.43
                  22          137         123.73          13.27     176.13
                  23          102         114.18         -12.18     148.45
                  24          106         115.01          -9.01      81.25
                  25          109         106.30            2.70      7.29
                  26          117         103.77          13.23     175.05
                  27          108           95.88         12.12     146.78
                  28          104         104.18          -0.18       0.03
                  29          129         115.39          13.61     185.27
30   93   109.58   -16.58    274.86
                            3749.91 Sum of squared residuals (SS Residual)
       Upper 95.0%
           110.18
              1.23
             -1.39




Number of pills Line Fit Plot
                                                 Blood pressure


      1 1 2 2 2 2 2 3 3 3 3 3 44 4 44 55 5 5 5   Predicted Blood
                                                 pressure
       Number of pills



      Age Line Fit Plot
                                                 Blood pressure


                                                 Predicted Blood
     669158847465786561524468
                                                 pressure
                 Age
duals (SS Residual)
Multiple regression

Mean BP =               123.7

Patient ID        Age           Number of pills Blood pressure BP deviation from mean
              1           86                  0             159                     35.3
              2           54                  0             138                     14.3
              3           74                  0             155                     31.3
              4           58                  0             139                     15.3
              5           67                  0             110                    -13.7
              6           50                  1             117                      -6.7
              7           66                  1             112                    -11.7
              8           81                  1             139                     15.3
              9           91                  1             154                     30.3
             10           78                  1             155                     31.3
             11           58                  2             117                      -6.7
             12           48                  2             114                      -9.7
             13           84                  2             142                     18.3
             14           73                  2             122                      -1.7
             15           74                  2             129                       5.3
             16           57                  3             106                    -17.7
             17           65                  3             107                    -16.7
             18           69                  3             141                     17.3
             19           78                  3             131                       7.3
             20           52                  3             111                    -12.7
             21           65                  4             109                    -14.7
             22           73                  4             137                     13.3
             23           61                  4             102                    -21.7
             24           62                  4             106                    -17.7
             25           52                  4             109                    -14.7
             26           54                  5             117                      -6.7
             27           44                  5             108                    -15.7
             28           54                  5             104                    -19.7
             29           68                  5             129                       5.3
             30           61                  5              93                    -30.7
                                                Sum of Squared deviation from mean =
Squared deviation from mean
                      1243.7
                       203.5
                       977.6
                       233.1
                       188.6
                        45.3
                       137.7
                       233.1
                       916.1
                       977.6
                        45.3
                        94.7
                       333.7
                         3.0
                        27.7
                       314.5
                       280.0
                       298.1
                        52.8
                       162.1
                       217.1
                       176.0
                       472.3
                       314.5
                       217.1
                        45.3
                       247.5
                       389.4
                        27.7
                       944.5
                      9819.9 (Total SS)
A subtle problem, a common error, and lessons for
designing and interpreting clinical trials

Data set 1, with Age included                                                                   Dataset 1. Blood pressure versus number of pills
Patient ID            Age            Number of pills Blood pressure




                                                                            Blood pressure
                   1            86                 0            159                           180
                                                                                              160
                   2            54                 0            138                           140
                   3            74                 0            155                           120
                                                                                              100
                   4            58                 0            139                            80
                   5            67                 0            110                                 0              2                 4
                   6            50                 1            117                                                Number of pills
                   7            66                 1            112
                   8            81                 1            139
                   9            91                 1            154                                 Blood pressure versus age
                  10            78                 1            155




                                                                             Blood pressure
                  11            58                 2            117
                                                                                              180
                  12            48                 2            114                           160
                  13            84                 2            142                           140
                                                                                              120
                  14            73                 2            122                           100
                                                                                               80
                  15            74                 2            129                                 40            60              80
                  16            57                 3            106
                  17            65                 3            107                                                       Age
                  18            69                 3            141
                  19            78                 3            131
                  20            52                 3            111
                  21            65                 4            109
                  22            73                 4            137
                  23            61                 4            102
                  24            62                 4            106
                  25            52                 4            109
                  26            54                 5            117
                  27            44                 5            108
                  28            54                 5            104
                  29            68                 5            129
                  30            61                 5             93

Multiple regression of blood pressure on age and number of pills
                      Coefficients     P-value                        When we regress on age and number of pills:
Intercept                  80.3160          0.0000                    The coefficient for the number of pills is -4.19.
Age                          0.8300         0.0002                    The coefficient tells you the slope of the regression line.
Number of pills             -4.1899         0.0048                    That is, for every unit increase in the number of pills,
                                                                      the mean blood pressure goes down by 4.19.

Regression of blood pressure on number of pills                       When we regress on number of pills alone:
                     Coefficients   P-value                           The coefficient for the number of pills is -6.4.
Intercept               139.7333          0.0000                      The coefficient tells you the slope of the regression line.
Number of pills           -6.4000         0.0004                      That is, for every unit increase in the number of pills,
                                                                      the mean blood pressure goes down by 6.4.

So does one pill lower blood pressure by 4.19 or by 6.40?
What is going on?
                                        Age versus Number of pills

            Number of pills   6
                              5
                              4
                              3
                              2
                              1
                              0
                                  40   50     60       70      80    90   100
                                                      Age



It appears that the number of pills may be related to age.
If younger patients have lower blood pressure, and they get more pills,
it would give the impression that more pills cause lower blood pressure.
How can we quantify and test if the number of pills is related to age?

Regression of the number of pills versus age.
                    Coefficients       P-value
Intercept               71.5905            0.0000
Number of pills          -2.6629           0.0367

The p-value of 0.0367 indicates that the number of pills is significantly associated with age.

Correlation of the number of pills versus age.
                -0.38 =PEARSON(B5:B35,C5:C35)

The negative correlation indicates that as patient age increases, the number of pills given decreases.

The apparent effect of the pills (to decrease blood pressure) may be due, in part,
to bias in giving younger patients (with low blood pressure) more pills.

This type of bias is common in observational studies, and is a major reason
why we use randomized, controlled clinical trials, in which
patients would be randomly assigned a number of pills,
without regard to their age.

Whenever we read or do an analysis, we should consider:

Are the subjects randomly assigned to treatments?
Are there correlations among the independent variables?
e versus number of pills




                             6




versus age




                           100




 mber of pills:
 pills is -4.19.
e of the regression line.
 the number of pills,
 own by 4.19.




e of the regression line.
 the number of pills,

								
To top