Document Sample

```					                             Spatial Data Analysis and Mining

Tutorial on Regression Analysis

KWAN, Tsun-hei
09116474D

1. Plot PPE against GDP using MINITAB graphic tools

Scatterplot of PPE(\$1000) vs GDP(\$1000)
25

20
PPE(\$1000)

15

10

5

10        15         20         25          30          35
GDP(\$1000)

2. Comment on the graphical relationship between the two variables

As shown at the above scatter diagram, the PPE and GDP have a positive
relation, which means, in general, the higher the GDP, the higher the PPE of a
country as the dots in the graph concentrated at the lower left and upper right
region.

3. Establish a regression model to further investigate the relationship between
PPE and GDP (note that you need to identify the dependent and independent
variables correctly!)

Regression Analysis: PPE(\$1000) versus GDP(\$1000)

The regression equation is
PPE(\$1000) = - 0.88 + 0.475 GDP(\$1000)

4. Solve the regression model parameter.

a = -0.88
b = 0.475
Spatial Data Analysis and Mining

Tutorial on Regression Analysis

KWAN, Tsun-hei
09116474D

5. Calculate the regression statistics

Predictor    Coef SE Coef      T      P
Constant   -0.883   2.097 -0.42 0.677
GDP(\$1000) 0.47548 0.08270 5.75 0.000

S = 3.09519      R-Sq = 57.9%         R-Sq(adj) = 56.2%

Analysis of Variance

Source           DF       SS      MS     F                     P
Regression        1 316.66 316.66 33.05 0.000
Residual Error 24 229.92     9.58
Total            25 546.59

Unusual Observations

Obs GDP(\$1000) PPE(\$1000)     Fit SE Fit                  Residual St Resid
24      30.5     23.714 13.597 0.794                      10.117      3.38R

R denotes an observation with a large standardized residual.

6. Based on the regression statistics, how much your regression model explains

Since R2 is equal to 57.9%, it suggests that 57.9% of the variation in PPE can
be explained by the regression line, while 42.1% of it is error.
Spatial Data Analysis and Mining

Tutorial on Regression Analysis

KWAN, Tsun-hei
09116474D

7. Produce a regression plot superimposed on the graph of your bivariate plot of
data
Fitted Line Plot
PPE(\$1000) = - 0.883 + 0.4755 GDP(\$1000)
25                                                                S           3.09519
R-Sq         57.9%

20
PPE(\$1000)

15

10

5

10           15           20      25        30          35
GDP(\$1000)

8. Produce a plot of your residuals against the predicted (adjusted data) and
Residual     Plots for PPE(\$1000)
comment on it
ability Plot                                                        Versus Fits
10

5
Residual

0

-5
5         10                                  5.0          7.5        10.0    12.5        15.0
dual                                                                    Fitted Value

gram                                                               Versus Order
10
As shown at the graph of Residual against the predictor. Since a random
scattered diagram is resulted as most of the dots are of residual ranged from 5
to -5, 5 indicates that the regression model assumptions should be probably
Residual

it
acceptable.
0

-5
9. Do you think that the regression model you used is appropriate for this
4    6   8    10                 2      6 8 10 types 16 18 20 methods
investigation?4 What other12 14 of statistical22 24 26 can you use to support
dual                                           Observation Order
Spatial Data Analysis and Mining

Tutorial on Regression Analysis

KWAN, Tsun-hei
09116474D

The above linear regression model used is suitable if we only want to check
whether there is a relationship between GDP and PPE (two variables only). In
other cases, if factors such as ages of citizen and climate of the country are taken
into account, we may need to use multiple linear regression models. Sometimes,
we may have to use non-linear model when, for example, PPE = a(GDP)2 + b.

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 11 posted: 2/27/2012 language: English pages: 4