The Use of Dummy Variables by 2WhS2A

VIEWS: 18 PAGES: 78

									The Use of Dummy Variables
• In the examples so far the independent
  variables are continuous numerical
  variables.
• Suppose that some of the independent
  variables are categorical.
• Dummy variables are artificially defined
  variables designed to convert a model
  including categorical independent variables
  to the standard multiple regression model.
Example:
Comparison of Slopes of k
Regression Lines with Common
Intercept
Situation:
• k treatments or k populations are being compared.
• For each of the k treatments we have measured
   both
        – Y (the response variable) and
        – X (an independent variable)
• Y is assumed to be linearly related to X with
        – the slope dependent on treatment
          (population), while
        – the intercept is the same for each treatment
The Model:
Y     1( i ) X   for treatm ent i (i  1, 2, ... , k)
         Graphical Illustration of the above Model
       120
                       Treat k
       100                                   Treat 3
                            .....
                                                       Treat 2
        80
                                                                 Treat 1
  y
        60

        40
                                                Different Slopes
        20
                    Common Intercept
         0
             0         10           x   20                30
• This model can be artificially put into
  the form of the Multiple Regression
  model by the use of dummy variables
  to handle the categorical independent
  variable Treatments.
• Dummy variables are variables that are
  artificially defined
In this case we define a new variable for each
category of the categorical variable.

That is we will define Xi for each category of
treatments as follows:

     X       if the subject receives treatment i
Xi  
     0        otherwise
Then the model can be written as follows:
The Complete Model:

Y  0   X1  
          1
           (1)
                     1
                      ( 2)
                             X 2  
                                     1
                                      (k )
                                             Xk 
where
     X          if the subject receives treatmenti
Xi  
     0          otherwise
In this case

Dependent Variable: Y
Independent Variables: X1, X2, ... , Xk
In the above situation we would likely be
interested in testing the equality of the
slopes. Namely the Null Hypothesis

      H0 :   1
               (1)
                     
                      1
                       ( 2)
                               
                                  1
                                   (k )



(q = k – 1)
The Reduced Model:

          Y   0  1 X  

Dependent Variable: Y
Independent Variable:
         X = X1+ X2+... + Xk
Example:
In the following example we are measuring
        – Yield Y
as it depends on
        – the amount (X) of a pesticide.

 Again we will assume that the dependence of Y on
 X will be linear.
 (I should point out that the concepts that are used
 in this discussion can easily be adapted to the non-
 linear situation.)
• Suppose that the experiment is going to be
  repeated for three brands of pesticides:
•     A, B and C.
• The quantity, X, of pesticide in this
  experiment was set at 4 different levels:
      – 2 units/hectare,
      – 4 units/hectare and
      – 8 units per hectare.
• Four test plots were randomly assigned to
  each of the nine combinations of test plot
  and level of pesticide.
• Note that we would expect a common
  intercept for each brand of pesticide since
  when the amount of pesticide, X, is zero the
  four brands of pesticides would be
  equivalent.
The data for this experiment is given in the following table:


                         2        4         8
                A      29.63    28.16     28.45
                       31.87    33.48     37.21
                       28.02    28.13     35.06
                       35.24    28.25     33.99
                B      32.95    29.55     44.38
                       24.74    34.97     38.78
                       23.38    36.35     34.92
                       32.08    38.38     27.45
                C      28.68    33.79     46.26
                       28.70    43.95     50.77
                       22.67    36.89     50.21
                       30.02    33.56     44.14
60




40


                                         A
                                         B
                                         C


20




 0
     0   1   2   3   4   5   6   7   8
                                        Pesticide   X (Amount)   X1   X2   X3    Y

                                           A            2        2    0    0    29.63
                                           A            2        2    0    0    31.87
                                           A            2        2    0    0    28.02
The data as it would appear in a data      A            2        2    0    0    35.24
file.                                      B            2        0    2    0    32.95
The variables X1, X2 and X3 are the        B            2        0    2    0    24.74
“dummy” variables                          B            2        0    2    0    23.38
                                           B            2        0    2    0    32.08
                                           C            2        0    0    2    28.68
                                           C            2        0    0    2    28.70
                                           C            2        0    0    2    22.67
                                           C            2        0    0    2    30.02
                                           A            4        4    0    0    28.16
                                           A            4        4    0    0    33.48
                                           A            4        4    0    0    28.13
                                           A            4        4    0    0    28.25
                                           B            4        0    4    0    29.55
                                           B            4        0    4    0    34.97
                                           B            4        0    4    0    36.35
                                           B            4        0    4    0    38.38
                                           C            4        0    0    4    33.79
                                           C            4        0    0    4    43.95
                                           C            4        0    0    4    36.89
                                           C            4        0    0    4    33.56
                                           A            8        8    0    0    28.45
                                           A            8        8    0    0    37.21
                                           A            8        8    0    0    35.06
                                           A            8        8    0    0    33.99
                                           B            8        0    8    0    44.38
                                           B            8        0    8    0    38.78
                                           B            8        0    8    0    34.92
                                           B            8        0    8    0    27.45
                                           C            8        0    0    8    46.26
                                           C            8        0    0    8    50.77
                                           C            8        0    0    8    50.21
                                           C            8        0    0    8    44.14
Fitting the complete model :
   ANOVA
               df            SS            MS            F         Significance
                                                                        F
  Regression   3         1095.815813   365.2719378   18.33114788   4.19538E-07
   Residual    32        637.6415754   19.92629923
    Total      35        1733.457389




                                          Coefficients
                    Intercept              26.24166667
                    X1                     0.981388889
                    X2                     1.422638889
                    X3                     2.602400794
Fitting the reduced model :
  ANOVA
               df               SS           MS            F         Significance
                                                                          F
  Regression   1          623.8232508   623.8232508    19.11439978   0.000110172

  Residual     34         1109.634138   32.63629818

  Total        35         1733.457389




                                        Coefficients
                    Intercept            26.24166667

                    X                    1.668809524
The Anova Table for testing the equality of slopes
                    df       SS            MS            F         Significance F

  common slope      1    623.8232508   623.8232508   31.3065283     3.51448E-06
     zero

 Slope comparison   2    471.9925627   235.9962813   11.84345766    0.000141367

    Residual        32   637.6415754   19.92629923

      Total         35   1733.457389
Example:
Comparison of Intercepts of k
Regression Lines with a Common
Slope
(One-way Analysis of Covariance)
Situation:
• k treatments or k populations are being compared.
• For each of the k treatments we have measured
   both Y (then response variable) and X (an
   independent variable)
• Y is assumed to be linearly related to X with the
   intercept dependent on treatment (population),
   while the slope is the same for each treatment.
• Y is called the response variable, while X is called
   the covariate.
The Model:
Y   0(i )  1 X   for treatm ent i (i  1, 2, ... , k)
               Graphical Illustration of the One-way
               Analysis of Covariance Model
         200

                                                               Treat k


                                                               Treat 3

    y                                                          Treat 2
         100
                                                               Treat 1


                                               Common Slopes




           0
                                   x
               0            10            20              30
 Equivalent Forms of the Model:
1)   Y  i  1 X  X    for treatm ent i

      i  adjusted mean for treatm ent i

2) Y     i  1 X  X    for treatm ent i
     i  overall adjusted mean response
      i  adjusted effect for treatm ent i
       i     i
• This model can be artificially put into
  the form of the Multiple Regression
  model by the use of dummy variables
  to handle the categorical independent
  variable Treatments.
In this case we define a new variable for each
category of the categorical variable.

That is we will define Xi for categories I
i = 1, 2, …, (k – 1) of treatments as follows:

     1       if the subject receives treatmenti
Xi  
     0       otherwise
   Then the model can be written as follows:
   The Complete Model:

Y   0  1 X 1   2 X 2     k 1 X k 1   1 X  
   where
      1          if the subject receives treatmenti
 Xi  
      0          otherwise
In this case

Dependent Variable: Y
Independent Variables:
X1, X2, ... , Xk-1, X
In the above situation we would likely be
interested in testing the equality of the
intercepts. Namely the Null Hypothesis

      H 0 : 1   2     k 1  0

(q = k – 1)
The Reduced Model:

          Y   0  1 X  

Dependent Variable: Y
Independent Variable: X
Example:
In the following example we are interested in
comparing the effects of five workbooks (A,
B, C, D, E) on the performance of students in
Mathematics. For each workbook, 15 students
are selected (Total of n = 15×5 = 75). Each
student is given a pretest (pretest score ≡ X)
and given a final test (final score ≡ Y). The
data is given on the following slide
                        The data
 Workbook A     Workbook B     Workbook C      Workbook D      Workbook E
Pre     Post   Pre     Post   Pre     Post    Pre     Post    Pre   Post
43.0    46.4   43.6    52.5   57.5     61.9   59.9     56.1   43.2    46.0
55.3    43.9   45.2    61.8   49.3     57.5   50.5     49.6   60.7    59.7
59.4    59.7   54.2    69.1   48.0     52.5   45.0     46.1   42.7    45.4
51.7    49.6   45.5    61.7   31.3     42.9   55.0     53.2   46.6    44.3
53.0    49.3   43.4    53.3   65.3     74.5   52.6     50.8   42.6    46.5
48.7    47.1   50.1    57.4   47.1     48.9   62.8     60.1   25.6    38.4
45.4    47.4   36.2    48.7   34.8     47.2   41.4     49.5   52.5    57.7
42.1    33.3   55.1    61.9   53.9     59.8   62.1     58.3   51.2    47.1
60.0    53.2   48.9    55.0   42.7     49.6   56.4     58.1   48.8    50.4
32.4    34.1   52.9    63.3   47.6     55.6   54.2     56.8   44.1    52.7
74.4    66.7   51.7    64.7   56.1     62.4   51.6     46.1   73.8    73.6
43.2    43.2   55.3    66.4   39.7     52.1   63.3     56.0   52.6    50.8
44.5    42.5   45.2    59.4   32.3     49.7   37.3     48.8   67.8    66.8
47.1    51.3   37.6    56.9   59.5     67.1   39.2     45.1   42.9    47.2
57.0    48.9   41.7    51.3   46.2     55.2   62.1     58.0   51.7    57.0



 The Model:
  Y   0( i )  1 X   for workbook i (i  A, B, C , D, E )
                   Graphical display of data
              80

              70

              60

              50
Final Score




              40
                                                    Workbook A
              30                                    Workbook B

                                                    Workbook C
              20
                                                    Workbook D
              10                                    Workbook E

               0
                   0      20        40         60                80
                               Pretest Score
            Some comments
1. The linear relationship between Y (Final
   Score) and X (Pretest Score), models the
   differing aptitudes for mathematics.
2. The shifting up and down of this linear
   relationship measures the effect of
   workbooks on the final score Y.
The Model:
Y   0( i )  1 X   for workbook i (i  A, B, C , D, E )

              Graphical Illustration of the One-way
              Analysis of Covariance Model
        200

                                                              A


                                                              B

   y                                                          C
        100
                                                              D


                                              Common Slopes




          0
                                  x
              0            10            20              30
                                        Pre          Final          Workbook
                                                43           46.4      A
The data as it would appear in a data         55.3           43.9      A
file.                                         59.4           59.7      A
                                              51.7           49.6      A
                                                53           49.3      A
                                              48.7           47.1      A
                                              45.4           47.4      A
                                              42.1           33.3      A       54.2   56.8   D
                                                60           53.2      A       51.6   46.1   D
                                              32.4           34.1      A       63.3     56   D
                                              74.4           66.7      A       37.3   48.8   D
                                              43.2           43.2      A
                                                                               39.2   45.1   D
                                              44.5           42.5      A
                                                                               62.1     58   D
                                              47.1           51.3      A
                                                                               43.2     46   E
                                                57           48.9      A
                                              43.6           52.5      B       60.7   59.7   E
                                              45.2           61.8      B       42.7   45.4   E
                                              54.2           69.1      B       46.6   44.3   E
                                              45.5           61.7      B       42.6   46.5   E
                                              43.4           53.3      B       25.6   38.4   E
                                                                               52.5   57.7   E
                                                                               51.2   47.1   E
                                                                               48.8   50.4   E
                                                                               44.1   52.7   E
                                                                               73.8   73.6   E
                                                                               52.6   50.8   E
                                                                               67.8   66.8   E
                                                                               42.9   47.2   E
                                                                               51.7     57   E
The data as it would appear in a data
file with Dummy variables, (X1 , X2,
X3, X4 )added


Pre          Final          Workbook   X1   X2   X3   X4
        43           46.4      A       1    0    0    0
      55.3           43.9      A       1    0    0    0
      59.4           59.7      A       1    0    0    0
      51.7           49.6      A       1    0    0    0
        53           49.3      A       1    0    0    0
      48.7           47.1      A       1    0    0    0    37.3   48.8   D   0   0   0   1
      45.4           47.4      A       1    0    0    0    39.2   45.1   D   0   0   0   1
      42.1           33.3      A       1    0    0    0    62.1     58   D   0   0   0   1
        60           53.2      A       1    0    0    0    43.2     46   E   0   0   0   0
      32.4           34.1      A       1    0    0    0    60.7   59.7   E   0   0   0   0
      74.4           66.7      A       1    0    0    0    42.7   45.4   E   0   0   0   0
      43.2           43.2      A       1    0    0    0    46.6   44.3   E   0   0   0   0
      44.5           42.5      A       1    0    0    0    42.6   46.5   E   0   0   0   0
      47.1           51.3      A       1    0    0    0    25.6   38.4   E   0   0   0   0
        57           48.9      A       1    0    0    0    52.5   57.7   E   0   0   0   0
      43.6           52.5      B       0    1    0    0    51.2   47.1   E   0   0   0   0
      45.2           61.8      B       0    1    0    0    48.8   50.4   E   0   0   0   0
                                                           44.1   52.7   E   0   0   0   0
                                                           73.8   73.6   E   0   0   0   0
                                                           52.6   50.8   E   0   0   0   0
                                                           67.8   66.8   E   0   0   0   0
                                                           42.9   47.2   E   0   0   0   0
                                                           51.7     57   E   0   0   0   0
Here is the data file in SPSS with the Dummy variables, (X1 ,
X2, X3, X4 )added. The can be added within SPSS
Fitting the complete model




 The dependent variable is the final score, Y.
 The independent variables are the Pre-score X and the four
 dummy variables X1, X2, X3, X4.
                      The Output
                                b
        Variables Entered/Removed

         Variables      Variables
Model    Entered        Remov ed          Method
1       X4, PRE, a
                                  .      Enter
        X3, X1, X2
  a. All requested v ariables entered.
  b. Dependent Variable: FINAL



                      Model Summary

                                                   Std. Error
                                      Adjusted       of the
Model        R         R Square       R Square     Estimate
1             .908a        .825            .812         3.594
  a. Predictors: (Constant), X4, PRE, X X1, X2
                                       3,
                The Output - continued
                                         ANOVAb

                         Sum of                        Mean
Model                    Squares          df          Square        F          Sig.
1       Regression      4191.378                5     838.276      64.895        .000 a
        Residual         891.297               69      12.917
        Total           5082.675               74
  a. Predictors: (Constant), X4, PRE, X3, X1, X2
  b. Dependent Variable: FINAL

                                         Coefficientsa

                                                      Standardi
                                                         zed
                               Unstandardized         Coefficien
                                 Coefficients             ts
        Model                   B        Std. Error     Beta         t       Sig.
        1       (Constant)    16.954         2.441                   6.944     .000
                PRE              .709         .045         .809    15.626      .000
                X1             -4.958        1.313        -.241     -3.777     .000
                X2              8.553        1.318         .416      6.489     .000
                X3              5.231        1.317         .254      3.972     .000
                X4             -1.602        1.320        -.078     -1.214     .229
          a. Dependent Variable: FINAL
        The interpretation of the coefficients

                                 Coefficientsa

                                              Standardi
                                                 zed
                       Unstandardized         Coefficien
                         Coefficients             ts
Model                   B        Std. Error     Beta         t       Sig.
1       (Constant)    16.954         2.441                   6.944     .000
        PRE              .709         .045         .809    15.626      .000
        X1             -4.958        1.313        -.241     -3.777     .000
        X2              8.553        1.318         .416      6.489     .000
        X3              5.231        1.317         .254      3.972     .000
        X4             -1.602        1.320        -.078     -1.214     .229
  a. Dependent Variable: FINAL




                                  The common slope
        The interpretation of the coefficients

                                 Coefficientsa

                                              Standardi
                                                 zed
                       Unstandardized         Coefficien
                         Coefficients             ts
Model                   B        Std. Error     Beta         t       Sig.
1       (Constant)    16.954         2.441                   6.944     .000
        PRE              .709         .045         .809    15.626      .000
        X1             -4.958        1.313        -.241     -3.777     .000
        X2              8.553        1.318         .416      6.489     .000
        X3              5.231        1.317         .254      3.972     .000
        X4             -1.602        1.320        -.078     -1.214     .229
  a. Dependent Variable: FINAL




                                  The intercept for workbook E
        The interpretation of the coefficients

                                 Coefficientsa

                                              Standardi
                                                 zed
                       Unstandardized         Coefficien
                         Coefficients             ts
Model                   B        Std. Error     Beta         t       Sig.
1       (Constant)    16.954         2.441                   6.944     .000
        PRE              .709         .045         .809    15.626      .000
        X1             -4.958        1.313        -.241     -3.777     .000
        X2              8.553        1.318         .416      6.489     .000
        X3              5.231        1.317         .254      3.972     .000
        X4             -1.602        1.320        -.078     -1.214     .229
  a. Dependent Variable: FINAL




                                  The changes in the intercept when we
                                  change from workbook E to other
                                  workbooks.
   The model can be written as follows:
   The Complete Model:
  Y  0  1 X 1   2 X 2   3 X 3   4 X 4  1 X  
1. When the workbook is E then X1 = 0,…, X4 = 0
   and
             Y   0  1 X  
2. When the workbook is A then X1 = 1,…, X4 = 0
   and
          Y   0  1  1 X  
   hence 1 is the change in the intercept when
   we change form workbook E to workbook A.
Testing for the equality of the intercepts

     i.e.   H 0 : 1   2   3   4  0
The reduced model
            Y   0  1 X  

The dependent variable in only X (the pre-score)
Fitting the reduced model




 The dependent variable is the final score, Y.
 The independent variables is only the Pre-score X.
The Output for the reduced model
                                   b
          Variables Entered/Remov ed

          Variables      Variables
  Model    Entered       Removed         Method
  1       PREa                    .     Enter
    a. All requested variables entered.
    b. Dependent Variable: FINAL


                    Model Summary

                                                   Std. Error
                                      Adjusted       of the
Model       R       R Square          R Square     Estimate
1            .700 a     .490              .483         5.956
  a. Predictors: (Constant), PRE


                                                  Lower R2
               The Output - continued
                                     ANOVAb

                        Sum of                     Mean
Model                   Squares        df         Square       F       Sig.
1       Regression     2492.779              1   2492.779     70.263     .000 a
        Residual       2589.896             73     35.478
        Total          5082.675             74
  a. Predictors: (Constant), PRE
  b. Dependent Variable: FINAL

                                                                  Increased R.S.S
                                   Coefficientsa

                                                 Standardi
                                                    zed
                         Unstandardized          Coefficien
                          Coefficients               ts
Model                    B        Std. Error       Beta       t        Sig.
1       (Constant)      23.105        3.692                   6.259      .000
        PRE               .614         .073           .700    8.382      .000
  a. Dependent Variable: FINAL
         The F Test



     Reduction in R.S.S
                      q
F
   MSE for complete model
The Reduced model
                                         ANOVAb

                              Sum of                        Mean
      Model                   Squares      df              Square       F          Sig.
      1       Regression     2492.779             1       2492.779     70.263        .000 a
              Residual       2589.896            73         35.478
              Total          5082.675            74
        a. Predictors: (Constant), PRE
        b. Dependent Variable: FINAL




 The Complete model
                                           ANOVAb

                              Sum of                          Mean
      Model                   Squares           df           Square        F           Sig.
      1       Regression     4191.378                 5      838.276      64.895         .000 a
              Residual        891.297                69       12.917
              Total          5082.675                74
        a. Predictors: (Constant), X4, PRE, X3, X1, X2
        b. Dependent Variable: FINAL
                                           The F test
reduced       ANOVA
                                        Sum of Squares df         Mean Square F       Sig.
                          Regression         2492.77885         1   2492.77885 70.2626 4.56272E-13
                          Residual           2589.89635        73 35.47803219
                          Total               5082.6752        74


Complete      ANOVA
                                        Sum of Squares df         Mean Square F       Sig.
                          Regression        4191.377971         5 838.2755942 64.89532 9.99448E-25
                          Residual           891.297229        69 12.91735115
                          Total               5082.6752        74


 Test equality of slope                    Sum of Squares df      Mean Square F        Sig.
                          slope                 2492.77885      1   2492.77885 192.9791 1.13567E-21
                          equality of int.     1698.599121      4 424.6497803 32.87437 2.46006E-15
                          Residual              891.297229     69 12.91735115
                          Total                  5082.6752     74
Testing for zero slope

            i.e.   H 0 : 1  0
The reduced model
 Y  0  1 X 1   2 X 2   3 X 3   4 X 4  

The dependent variables are X1, X2, X3, X4 (the
dummies)
The Reduced model
                                       ANOVAb

                         Sum of                           Mean
 Model                   Squares         df              Square         F           Sig.
 1       Regression     1037.475                4        259.369        4.488         .003 a
         Residual       4045.200               70         57.789
         Total          5082.675               74
   a. Predictors: (Constant), X4, X3, X2, X1
   b. Dependent Variable: FINAL


 The Complete model
                                                ANOVAb

                                 Sum of                        Mean
         Model                   Squares            df        Square        F            Sig.
         1        Regression    4191.378                  5   838.276      64.895          .000 a
                  Residual       891.297                 69    12.917
                  Total         5082.675                 74
            a. Predictors: (Constant), X4, PRE, X3, X1, X2
            b. Dependent Variable: FINAL
                             The F test
Reduced                   Sum of Squares df      Mean Square     F       Sig.
             Regression        1037.4752       4        259.3688 4.488237 0.002757501
             Residual              4045.2     70     57.78857143
             Total             5082.6752      74

Complete                  Sum of Squares df      Mean Square     F       Sig.
             Regression      4191.377971       5     838.2755942 64.89532 9.99448E-25
             Residual         891.297229      69     12.91735115
             Total             5082.6752      74


Zero slope                Sum of Squares df      Mean Square     F        Sig.
             Regression        1037.4752       4        259.3688 20.0791 5.30755E-11
             zero slope      3153.902771       1     3153.902771 244.1602    2.3422E-24
             Residual         891.297229      69     12.91735115
             Total             5082.6752      74
    The Analysis of Covariance
• This analysis can also be performed by
  using a package that can perform Analysis
  of Covariance (ANACOVA)
• The package sets up the dummy variables
  automatically
Here is the data file in SPSS . The Dummy variables are no
longer needed.
In SPSS to perform ANACOVA you select from the menu –
Analysis->General Linear Model->Univariatee
This dialog box will appear
You now select:
       1. The dependent variable Y (Final Score)
       2. The Fixed Factor (the categorical
           independent variable – workbook)
       3. The covariate (the continuous independent
           variable – pretest score)
  The output: The ANOVA TABLE
                            Tests of Betw een-Subj ects Effects

        Dependent Variable: FINAL
                           Type III
                           Sum of                    Mean
        Source             Squares       df         Square          F       Sig.
        Corrected Model   4191.378 a           5    838.276        64.895     .000
        Intercept          837.590             1    837.590        64.842     .000
        PRE               3153.903             1   3153.903       244.160     .000
        WORKBOOK          1698.599             4    424.650        32.874     .000
        Error              891.297            69     12.917
        Total             219815.6            75
        Corrected Total   5082.675            74
           a. R Squared = .825 (Adjusted R Squared = .812)

Compare this with the previous computed table
                    Sum of Squares df             Mean Square F        Sig.
   slope                 2492.77885             1   2492.77885 192.9791 1.13567E-21
   equality of int.     1698.599121             4 424.6497803 32.87437 2.46006E-15
   Residual              891.297229            69 12.91735115
   Total                  5082.6752            74
The output: The ANOVA TABLE
                          Tests of Betw een-Subj ects Effects

      Dependent Variable: FINAL
                         Type III
                         Sum of                    Mean
      Source             Squares       df         Square          F       Sig.
      Corrected Model   4191.378 a           5    838.276        64.895     .000
      Intercept          837.590             1    837.590        64.842     .000
      PRE               3153.903             1   3153.903       244.160     .000
      WORKBOOK          1698.599             4    424.650        32.874     .000
      Error              891.297            69     12.917
      Total             219815.6            75
      Corrected Total   5082.675            74
        a. R Squared = .825 (Adjusted R Squared = .812)




This is the sum of squares in the numerator when we attempt to
test if the slope is zero (and allow the intercepts to be different)
    Another application of the use of
           dummy variables
• The dependent variable, Y, is linearly
  related to X, but the slope changes at one or
  several known values of X (nodes).
Y




                                      X
       nodes
                     Y                                               k
  The model

                                   1           2



                                         x1            x2         xk         X
                          0  1 X                            X  x1
                 0  1 x1   2  X  x1                 x1  X  x2
      
   Y 
        0  1 x1   2  x2  x1   3  X  x2         x2  X  x3
      
      
or
                          0  1 X                           X  x1
                 0   1   2  x1   2 X               x1  X  x2
      
  Y 
        0   1   2  x1    2   3  x2   3 X     x2  X  x3
      
      
Now define
                X if X  x1
          X1  
                x1 if X  x1
              0         if X  x1
             
       X 2   X  x1 if x1  X  x2
             x  x       x2  X
              2 1

              0          if X  x2
             
       X 3   X  x2   if x2  X  x3
             x  x        x3  X
              3 3
Etc.
Then the model
                           0  1 X                            X  x1
                  0  1 x1   2  X  x1                x1  X  x2
        
     Y 
          0  1 x1   2  x2  x1   3  X  x2       x2  X  x3
        
        
 can be written
       Y   0  1 X 1   2 X 2  3 X 3                   
An Example
In this example we are measuring Y at time X.
Y is growing linearly with time.
At time X = 10, an additive is added to the
process which may change the rate of growth.
The data
   X     0.0    1.0    2.0    3.0    4.0    5.0    6.0
   Y     3.9    5.9    6.4    6.3    7.5    7.9    8.5
   X     7.0    8.0    9.0   10.0   11.0   12.0   13.0
   Y    10.7   10.0   12.4   11.0   11.5   13.9   17.6
   X    14.0   15.0   16.0   17.0   18.0   19.0   20.0
   Y    18.2   16.8   21.8   23.1   22.9   26.2   27.7
             Graph
30


25


20


15


10


 5


 0
     0   5    10     15   20
Now define the dummy variables
            X if X  10
      X1  
           10 if X  10


            0      if X  10
      X2  
            X  10 if 10  X
The data as it appears in SPSS – x1, x2 are the dummy
variables
We now regress y on x1 and x2.
                                               Model Summary

The Output                                                                     Std. Error
                                                                 Adjusted        of the
                      Model           R       R Square           R Square      Estimate
                      1                .990 a     .980               .978        1.0626
                          a. Predictors: (Constant), X2, X1

                                                   ANOVAb

                                     Sum of                       Mean
             Model                   Squares         df          Square       F          Sig.
             1       Regression     1015.909                 2   507.954    449.875        .000 a
                     Residual         20.324                18     1.129
                     Total          1036.232                20
               a. Predictors: (Constant), X2, X1
               b. Dependent Variable: Y



                                               Coefficientsa

                                                             Standardi
                                                                zed
                                      Unstandardized         Coefficien
                                       Coefficients              ts
             Model                    B        Std. Error      Beta         t         Sig.
             1       (Constant)       4.714         .577                    8.175       .000
                     X1                .673         .085          .325      7.886       .000
                     X2               1.579         .085          .761     18.485       .000
               a. Dependent Variable: Y
             Graph
30


25


20


15


10


5


0
     0   5     10    15   20
    Testing for no change in slope

Here we want to test
 H0: 1 = 2 vs HA: 1 ≠ 2


The reduced model is
 Y = 0 + 1 (X1+ X2) + 
    = 0 + 1 X + 
Fitting the reduced model
We now regress y on x.
                                         Model Summary
The Output                                                          Std. Error
                                                       Adjusted       of the
                     Model         R       R Square    R Square     Estimate
                     1                   a
                                    .971       .942        .939       1.7772
                       a. Predictors: (Constant), X


                                                      ANOVAb

                                        Sum of                      Mean
             Model                      Squares         df         Square           F          Sig.
             1         Regression       976.219                1   976.219        309.070        .000 a
                       Residual          60.013               19     3.159
                       Total           1036.232               20
               a. Predictors: (Constant), X
               b. Dependent Variable: Y
                                                  Coefficientsa

                                                               Standardi
                                                                  zed
                                        Unstandardized         Coefficien
                                         Coefficients              ts
             Model                      B        Std. Error      Beta             t         Sig.
             1        (Constant)        2.559         .749                        3.418       .003
                      X                 1.126         .064          .971         17.580       .000
               a. Dependent Variable: Y
Graph – fitting a common slope
30


25


20


15


10


5


0
     0   5    10     15    20
The test for the equality of slope
Reduced Model                           Sum of Squares df      Mean Square F       Sig.
                    Regression             976.2194805       1 976.2194805 309.0697 3.27405E-13
                    Residual               60.01290043      19 3.158573707
                    Total                  1036.232381      20




Complete Model                          Sum of Squares df      Mean Square F        Sig.
                    Regression             1015.908579       2 507.9542895 449.8753           0
                    Residual               20.32380204      18 1.129100113
                    Total                  1036.232381      20


equality of slope                       Sum of Squares df      Mean Square F        Sig.
                    slope                  976.2194805       1 976.2194805 864.5996 1.14256E-16
                    equality of slope       39.6890984       1   39.6890984 35.15109 1.30425E-05
                    Residual               20.32380204      18 1.129100113
                    Total                  1036.232381      20

								
To top