Tasks

Document Sample
Tasks Powered By Docstoc
					                             Spatial Data Analysis and Mining

                             Tutorial on Regression Analysis

                                       KWAN, Tsun-hei
                                         09116474D


Tasks
   1. Plot PPE against GDP using MINITAB graphic tools

                             Scatterplot of PPE($1000) vs GDP($1000)
                   25




                   20
      PPE($1000)




                   15



                   10




                   5

                        10        15         20         25          30          35
                                               GDP($1000)



   2. Comment on the graphical relationship between the two variables

        As shown at the above scatter diagram, the PPE and GDP have a positive
        relation, which means, in general, the higher the GDP, the higher the PPE of a
        country as the dots in the graph concentrated at the lower left and upper right
        region.


   3. Establish a regression model to further investigate the relationship between
      PPE and GDP (note that you need to identify the dependent and independent
      variables correctly!)

        Regression Analysis: PPE($1000) versus GDP($1000)

        The regression equation is
        PPE($1000) = - 0.88 + 0.475 GDP($1000)


   4. Solve the regression model parameter.

        a = -0.88
        b = 0.475
                    Spatial Data Analysis and Mining

                    Tutorial on Regression Analysis

                                KWAN, Tsun-hei
                                  09116474D

5. Calculate the regression statistics

   Predictor    Coef SE Coef      T      P
   Constant   -0.883   2.097 -0.42 0.677
   GDP($1000) 0.47548 0.08270 5.75 0.000


   S = 3.09519      R-Sq = 57.9%         R-Sq(adj) = 56.2%


   Analysis of Variance

   Source           DF       SS      MS     F                     P
   Regression        1 316.66 316.66 33.05 0.000
   Residual Error 24 229.92     9.58
   Total            25 546.59


   Unusual Observations

   Obs GDP($1000) PPE($1000)     Fit SE Fit                  Residual St Resid
    24      30.5     23.714 13.597 0.794                      10.117      3.38R

   R denotes an observation with a large standardized residual.

6. Based on the regression statistics, how much your regression model explains
   the variability in your data?

   Since R2 is equal to 57.9%, it suggests that 57.9% of the variation in PPE can
   be explained by the regression line, while 42.1% of it is error.
                                                       Spatial Data Analysis and Mining

                                                           Tutorial on Regression Analysis

                                                                          KWAN, Tsun-hei
                                                                            09116474D

                      7. Produce a regression plot superimposed on the graph of your bivariate plot of
                         data
                                                                            Fitted Line Plot
                                                              PPE($1000) = - 0.883 + 0.4755 GDP($1000)
                                           25                                                                S           3.09519
                                                                                                             R-Sq         57.9%
                                                                                                             R-Sq(adj)    56.2%

                                           20
                              PPE($1000)




                                           15



                                           10



                                            5


                                                10           15           20      25        30          35
                                                                          GDP($1000)

                   8. Produce a plot of your residuals against the predicted (adjusted data) and
  Residual     Plots for PPE($1000)
                      comment on it
ability Plot                                                        Versus Fits
                                           10


                                           5
                          Residual




                                           0


                                           -5
       5         10                                  5.0          7.5        10.0    12.5        15.0
dual                                                                    Fitted Value

gram                                                               Versus Order
                                           10
                         As shown at the graph of Residual against the predictor. Since a random
                         scattered diagram is resulted as most of the dots are of residual ranged from 5
                         to -5, 5 indicates that the regression model assumptions should be probably
                          Residual




                                it
                         acceptable.
                                           0


                              -5
                      9. Do you think that the regression model you used is appropriate for this
  4    6   8    10                 2      6 8 10 types 16 18 20 methods
                         investigation?4 What other12 14 of statistical22 24 26 can you use to support
dual                                           Observation Order
                         your conclusion?
                    Spatial Data Analysis and Mining

                    Tutorial on Regression Analysis

                                KWAN, Tsun-hei
                                  09116474D

The above linear regression model used is suitable if we only want to check
whether there is a relationship between GDP and PPE (two variables only). In
other cases, if factors such as ages of citizen and climate of the country are taken
into account, we may need to use multiple linear regression models. Sometimes,
we may have to use non-linear model when, for example, PPE = a(GDP)2 + b.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:8
posted:2/27/2012
language:English
pages:4