Regression Analysis

Document Sample
Regression Analysis Powered By Docstoc
					        Chapter 5 - Hirschey
• Demand Estimation
  – The lynch pin of successful agribusiness firms
    • Consumer interviews
    • Market experiments
    • Regression analysis
• Consumer Interviews
  – Requires questioning customers to estimate the relation
    between demand and a variety of underlying factors;
    however most consumers are unable to answer various
    questions, so it is difficult for survey techniques to
    accurately estimate demand relations.

• Market Experiments
  – Demand estimation in a controlled environment; with this
    technique, firms study one or more markets with specific
    prices, packaging, or advertising and then vary controllable
    factors over time or between markets.
  – Shortcomings: (1) expensive; (2) usually undertaken on a
    scale too small to allow high levels of confidence in the
    results; (3) seldom run for sufficiently long periods.
-Simple linear regression
-Multiple regression
-Estimation and interpretation

                   Tintner (1953)
“The Study of the Application of Statistical Methods
     to the Analysis of Economic Phenomena”
Business analysts often need to be
in position to:

  - Interpret the economic or financial landscape

  - Identify and assess the influence of several exogenous or
  predetermined factors on one or more endogenous

  - Provide ex-ante forecasts of one or more endogenous

  How does one achieve these objectives?
 Why do Business Analysts Wish
  to Achieve These Objectives?
-To improve decision-making!

Example: Investigate key determinants of demand for Prego
         spaghetti sauce: Price of Prego; price of competitors
        (Ragu, Classico, Hunt’s, Newman’s Own); in-store
         displays; coupons; price of spaghetti

Forecast sales of Prego spaghetti sauce one month, one quarter, or
even one year into the future
 Course of Action —
Development of Formal
 Quantitative Models

  Regression Analysis
    Components of Regression
Regression Analysis involves four phases:

–   Specification – the model building activity
–   Estimation – fitting the model to data
–   Verification – testing the model
–   Prediction – producing ex-ante forecasts and
            conducting ex-post forecast evaluations
Specification               Estimation

           Components of
          Regression Analysis

Verification                Prediction
Getting Started
   Regression Analysis Begins
    with Model Specification
- Model specification entails the expression of theoretical
     constructs in mathematical terms
- This phase of regression analysis constitutes the model building
- In essence, model specification is the translation of theoretical
     constructs into mathematical/statistical forms
- Fundamental principles in model building:
     * The principle of parsimony (other things the same, simple
             models generally are preferable to complex models,
             especially in forecasting)
     * The shrinkage principle (imposing restrictions either on
             estimated parameters or on forecasts often improves
             model performance)
     * The KISS principle “Keep it Sophistically Simple”
  The Simple Linear Regression
 Dependent Variable         Independent Variable
 Left-Hand Side Variable    Right-Hand Side Variable
 Explained Variable         Explanatory Variable
 Regressand                 Regressor
 Response Variable          Control Variable
Endogenous Variable

                    y = b 0 + b 1x + u
Coefficients                        Error Term
  β0: Intercepts                     Disturbance Term
  β1: Slope                          Innovation
The error term u explicitly relates that the
relationship between y and x is not an
identity; u arises for two reasons:
  (1) measurement error
  (2) the regression inadvertently omits the effects
  of other variables besides x that could impact y.
                             Graphical Illustration
             y = ß0 + ß1x + u
                Regression line: E(y|x)= ß0 + ß1x
    Y (Dependent Variable)

                                                               Β1: Slope

     ß               0

                                   X (Independent Variable)   x+h
How to interpret coefficients?
Example: Estimation of Demand
- Often in regression analysis, analysts have
  interest in estimating demand relationships,
  particularly for commodities.

- Analysts may wish to estimate the demand
  for cosmetic products, automobiles, various
  food products, or various beverages.
Demand Curve for Lipton Tea
  Average price
  per package
                               QLT = B0 + B1PLT + u
          4                      Demand Curve
                                 Q = 2,500 – 500P
                                 P = 5 – .002Q



                  500   1000    1500      2000    2500
                                       packages of Lipton tea
             Demand Curve
    The Demand Curve shows the theoretical
      relation between price and quantity
      demanded, holding all other factors constant.

     Axes: price is on y-axis, quantity on the x-axis

     Example: Demand curve for Lipton tea,
             Q=2500 – 500P

   Key question: How are these numbers obtained?
                β0  2500
                β  500
          Random Sampling

• Randomly sample n
                          Independent    Dependent
  observations from a
                            variable      variable
  population               (Exogenous
• For each observation,      variable)

  yi = ß0 + ß1xi + ui      x1 (Price1)   y1 (Quantity1)

  Qi = ß0 + ß1Pi + ui      x2 (Price2)   y2 (Quantity2)

• Goal: Estimate ß0 and        …              …
  ß1                       xn (Pricen)   yn (Quantityn)
 Translations of the Theoretical
Construct into a Statistical Model
         1. Q = a-bP
         2. Q = a0 – a1P + a2I + a3A + a4PS

own-price effect        income effect    advertising     price of substitute product
      (-)                    (+)           effect                    (+)
              -    The coefficients a0, a1, a2, a3, and a4 are labeled the demand
                   parameters; we expect certain signs and magnitudes of the
                   demand parameters according to economic theory.

              - Different versions of the regression model for applied analysis are
• Excel file of Prego spaghetti sauce
• Excel file of Keynesian consumption

           Sample                Sample
            (data)              parameters

                                                     t tests, F tests,
                          Regression               confidence intervals

     Statistics         OLS: assumptions, properties of OLS
                        estimates, interpretation of estimates

     Measure of central tendency: mean, median, mode

     Measures of variability or dispersion: range, variance, standard
           deviation, coefficient of variation
          Descriptive Statistics
Measures of central tendency:
       - mean
       - median
       - mode
Measures of dispersion/variability:
       - range
       - variance
       -standard deviation
       - coefficient of variation
Critical Ingredient in all
  Regression Models
  “Sufficiently large” amount of historical data.
 “Ask not what you can do to the data, but rather
          what the data can do for you.”

Data Types:
Data—The Critical Ingredient
Critical Ingredient – data (sample “sufficiently large”)

   - Time-series data
       * daily, weekly, monthly, quarterly, annual
                       DAILY – closing prices of stock prices
                       WEEKLY – measures of money supply
                       MONTHLY – housing starts
                       QUARTERLY – GDP figures
                       ANNUAL – salary figures
   - Cross-Sectional Data
               * Snapshot of activity at a given point in time
               * Survey of household expenditure patterns
               * Sales figures from a number of supermarkets at a
                       given point in time.
     Quote from Lord Kelvin
“I often say that when you can measure what you are speaking
 about, and express it in numbers, you know something about
 it; but when you cannot measure it, when you cannot express
        it in numbers, your knowledge is of a meager and
                       unsatisfactory kind.”
    Get a Feel for the Data
-Plots of key variables
-Scatter plots
-Descriptive statistics
                          • skewness
   • mean
                          • kurtosis
   • median
                          • distribution
   • minimum
   • maximum
   • standard deviation
Figure 5.5 Scatter Diagrams of Various Unit Cost-
                Output Relations
Figure 5.6 Regression Relation Between Units Sold and
   Personal Selling Expenditures for Electronic Data
                Processing (EDP), Inc.
            Descriptive Statistics
         X1 
         
Let X   X 2 
                  correspond to a vector of T observations for the
        X        variable X.
         T

       The mean, a measure of central tendency, corresponds to
the average of the set of observations corresponding to a particular
data series. The mean is given by:       T

                                             x     i
                                        x   i 1
The units associated with the mean are the same as the units of xi,
i= 1, 2, …, T.
                                                               continued . . .
         The median also is a measure of central tendency of a data
series. The median corresponds to the 50th percentile of the data
series. The units associated with the median are the same as the
units of xi, i = 1, 2, …, T. To find the median, arrange the
observations in increasing order. When sample values are arranged
in this fashion, they often are called the 1st, 2nd, 3rd … order
statistics. In general, if T is an odd number, the median is the
                                  T 1
order statistic whose number is        .

If T is an even number, the median corresponds to the average of
                                       T     T 1
the order statistics whose numbers are   and      .
                                       2       2

                                                          continued . . .
Standard Deviation
       The standard deviation is a measure of the spread or
dispersion of a series about the mean.            1
                                        T           2
                                                                           
                                                             ( xi  x ) 2 
The standard deviation is given by S =                   i 1                 .
                                                         T 1 
                                                                           

The units associated with the standard deviation are the same as
the units of xi.
        The variance also is a measure of the spread or dispersion of
a series about the mean.           T

                                           (x  x)
The variance is expressed as ˆ   2
                                         i 1
                                                                 Note that  2  S 2 .
                                                 T 1
The units associated with the variance are the square of the units of
xi.                                                        continued . . .
        The minimum series corresponds to the smallest value,
min(x1, x2, …, x ). The units associated with the minimum are the

same as the units of xi.
        The maximum of a series corresponds to the largest value,
max(x1, x2, …, x ). The units associated with the maximum are the

same as the units of xi.
     The range of a series is the difference between the
maximum and the minimum values. The range is expressed as
Range x = max x – min x. The units associated with the range are
the same as the units of xi.
                                                           continued . . .
        Skewness is a measure of the amounts of asymmetry
in the distribution of a series. If a distribution is symmetric,
skewness equals zero. If the skewness coefficient is negative
(positive), then the distribution of the series has a left (right)
tail. The greater the absolute value of the skewness statistic,
the more asymmetrical is the distribution. The skewness
coefficient is given by:
                           1 T
                           T i 1
                                  ( xi  x ) 3
The skewness statistic is a unitless measure.

                                                          continued . . .
        Kurtosis is a measure of the flatness or peakedness of the
distribution of a series relative to that of a normal distribution. A
normal random variable has a kurtosis of 3. A kurtosis statistic
greater than 3 indicates a more peaked distribution than the
normal distribution. A kurtosis statistic less than 3 indicates a
more flat distribution than the normal distribution. The kurtosis
coefficient is given by        1 T
                                 (x  x)i

                         k      i 1

The kurtosis coefficient also is a unitless measure.
Jarque-Bera test statistic (Jarque and Bera, 1980)
       The Jarque-Bera (JB) statistic combines the skewness and
kurtosis coefficients to produce another measure of the
departure of normality of a series. This measure is given by:
                        T 2 1 ˆ      2
                    JB   m  (k  3) .
                        6    4        
For a normal distribution, m  0 and k  3. Thus, the JB statistic
is zero for normal distributions. Values greater than zero indicate
the degree of departure from normality.

                                                          continued . . .
Coefficient of variation
       The coefficient of variation is the ratio of the standard
deviation to its mean. This measure typically is converted to a
percentage by multiplying the ratio by 100. This statistic
describes how much dispersion exists in a series relative to its
mean. This measure is given by:
  CV   100% .
The utility of this information is that in most cases the mean and
the standard deviation change together. As well, this statistic is
not dependent on units of measurement.
Correlation Coefficient
        The correlation coefficient is a measure of the degree of
linear association between to variables. The statistic, denoted by r,
is given by:
                          ( x  x )( y  y )
                                 i          i
                r        i 1
                      T               T

                      ( xi  x ) 2  ( yi  y ) 2
                      i 1           i 2
While r is a pure number without units, r always lies between -1 and
+1. Positive values of r indicate a tendency of x and y to move
together, that is, large values of x are associated with large values
of y, and small values of x are associated with small values of y.
When r is negative, large values of x are associated with small
values of y, and small values of x are associated with large values of
y. The closer to +1, the greater the degree of direct linear
relationship between x and y. The closer to -1, the greater the
degree of inverse linear relationship between x and y. Finally, when
r = 0, there is no linear association between x and y.
         The mode corresponds to the most frequent
observation in the data series x1, x2, …, xr. The units
associated with the mode are the same as the units of
xi. In empirical applications, often the observations are
non-repetitive. Hence, this measure often is of limited
Data Example

        Prices and quantities sold
        of Prego Spaghetti Sauce
        by week.
 Time-Series Plot of the
Volume of Prego Spaghetti
   Sauce Sold by Week
Descriptive Statistics and the
Histogram Associated with the
  Volume of Prego Spaghetti
Time-Series Plot of the
Price of Prego Spaghetti
    Sauce by Week
Descriptive Statistics of the
 Price of Prego Spaghetti
         PPRG versus QPRG
Weekly Scatter Plot of Prices and Quantities Sold of Prego
                    Spaghetti Sauce.
       Correlation Matrix

The correlation between the price and quantity sold of
           Prego Spaghetti Sauce is -0.73.
    Another Example: Relationship
    between Real Income and Real
-Question: What is the effect of real per capita income on real
per capita personal consumption expenditures?
-Known information:
   -Dependent variable: real per capita consumption
   expenditures (c)
   -Explanatory variable: real per capita income (I)
       Regression: C = ß0 + ß1I + u
    -ß1 measures the change in real income on consumption; the marginal
    propensity to consume (MPC).
    -ß0 represents the “autonomous” level of real per capita consumption
                     Random Sampling

-Randomly sample n               Explanatory Dependent
observations from a                variable   variable
population (1980:1 to 2010:3).
123 quarterly observations.            I         C
-For each observation,              I1980:1    C1980:1
Ct= b0 + b1It + ut
                                    I1980:2    C1980:2
                                      …          …
-Goal: Estimate ß0 and ß1.
                                    I2010:3    C2010:3

Another goal: Forecasts for Ct 2010:4 and beyond
Estimation of the Simple
   Linear Regression
     Ordinary Least Squares,
     Regression Line, Fitted
y4      Values, Residuals

                                         ˆ ˆ      ˆ
                                         y  b0  b1x
y3                        .} û3
y2               û2 { .           OLS: choose β0 and β1
                                  to minimize these sum
                                  of squared prediction
y1       .} û1
        x1           x2   x3            x4           x
Intuitive Thinking about OLS
-OLS is fitting a line through the sample points such that the
       sum of squared prediction errors is as small as possible,
       hence the term least squares.

-Residual û, is an estimate of the error term, u, and is the
       difference between sample point (actual value) and the
       fitted line (sample regression line).

       u  AVi  FVi
       ˆ                           i = 1, 2, . . ., n.

     Actual Value           Fitted Value
Minimizing Residual Sum of
    min  uSquares  b  b x 
           ˆ   min  y
                              n

                                        i                        i      0              1 i
                    b 0, b 1   i 1               b 0, b 1   i 1

First order conditions:                                                     n

                                                                            x  x  y  y 
                                     
 2
                                                                                       i             i
                    ˆ     ˆ
               yi  b 0  b1 xi  0                                 ˆ
                                                                    b1    i 1
    i 1
                                                                                   xi  x 2
                                         
 2 xi yi  b 0  b1 xi  0
             ˆ     ˆ                                                              i 1

    i 1
                                                                    ˆ        ˆ
                                                                    b0  y  b1x

 Interpretation: The slope estimate is the sample
 covariance between x and y divided by the sample
 variance of x.
Assumptions Behind the Simple
   Linear Regression Model

       yi = ß0 + ß1xi + ui
Assumption 1: Zero Mean of
E(u) = 0: The average value of u, the error term, is 0.
Assumption 2: Independent Error
                  Terms of all other u ,
 Each observed u is independent
                i                        j

          Corr(uiuj) = 0 for all i  j
     Assumption 3: Homoscedasticity
 Var(u|x) = σ², the variance of the regression is constant.


                    .                               .
                                        . .
x1       x2   x3         x         x1 x2 x3             x
    Assumption 4: Normality
-The error term u is normally distributed with mean zero and
       variance σ².
-This assumption is essential for inference and forecasting.
-This assumption is not essential to estimate the parameters of
the regression model.
-We only need assumptions 1-3 to derive the OLS estimators
-OLS → ordinary least squares
   Properties of OLS
          Estimators the
Unbiasedness: OLS estimators represent
true population parameters.
                 E b 1  b 1
                    
                 E b0  b0
         Variance of OLS
                Estimators of our estimate
-We know that the sampling distribution
is centered around the true parameter (unbiasedness).

-Unbiasedness is a description of the estimator—in a
given sample we may be “near” or “far” from the true
parameter. But on average, we will cover the population

-Question: How spread out is the distribution of the OLS
estimator? The answer to this question leads us to
examining the variance of the OLS estimator.
       Estimating the Error
-Variance of Population σ² vs. sample variance  ² .

-The error variance, σ² , is unknown because we don’t
       observe the errors, ui.

-What we observe are the residuals, ûi.

-We can use the residuals to form an estimate of the error
       The Residual Variance
Use the residuals u i to estimate the residual variance. This
variance represents the amount of dispersion about the fitted
                           ˆ   ˆ
               ui  yi  b 0  b1 xi
               u  b  b x  u   b  b x
               ˆi     0       1 i   i
                                              1 i

                         ˆ          
                                     ˆ        
               ui  ui  b 0  b 0  b1  b1 xi
               Then, an unbiased estimator of  2 is

                              ui2  SSE / n  2 
               ˆ2 
                     n  2 ˆ

   * Note: SSE is the residual or error sum of squares
   and (n-2) is the degrees-of-freedom.
Standard Error of OLS
     standard error of the regression
  ˆ   ˆ  2

 The standard error of b1 is given by se b1  
        ˆ      
 se b1   /   xi  x 
    ˆ                    2

 The standard error of b 0 is given by se ( b 0 )
                  xi 
                      2         2

              ˆ        
                 n 
 se ( b 0 )  
      ˆ                
              xi  x 
                               1
            Gauss-Markov Theorem
Under the following assumptions, the OLS procedure produces unbiased estimates
of the regression model population parameters.
                        ˆ                ˆ
                     E (b0 )  b0 and E (b1 )  b1
          (1) The model is linear in parameters.
                    yi = ß0 + ß1Xi +ui
                    ln yi = c0 + c1lnxi + vi
          (2) E(ui) = 0
          (3) Corr(uiuj) = 0    i≠j
          (4)E(ui²) = σ² for all i (Homoscedasticity)
          (5) the sample outcomes on x (xi, i = 1, 2, …, n) are not all the
                    same values.
Also, in the class of linear unbiased estimators, the OLS Estimator is best (in the sense of
providing the minimum variance).
OLS Estimators are BLUE! (Best Linear Unbiased Estimators)
Goodness-of-Fit: Some Terminology
  yi  yi  u i
       ˆ ˆ
 We then define the following :
        yi  y 2 is the total sum of squares (SST)
      ˆ yi  y 2 is the regression sum of squares (SSR)
         y  yi 2   ui2 is the residual or error sum of squares (SSE)
       ˆ                 ˆ
 Thus, SST  SSR  SSE

 -Goodness-of-fit: how well does the simple regression line fit the
       sample data?
 -Calculate R2 = SSR/SST = 1 – SSE/SST

                                                                    continued . . .
-Concept: measures the proportion of the variation in the
      dependent variable explained by the regression
                                                 yi  y 2
                                                ˆ
             Explained sample variabil ity                           SSR      SSE
      R2                                     i 1
                                                                         1
                                                y        y
               Total sample variabil ity                     2       SST      SST
                                               i 1

 -Range: between zero and one.
 -Example: R² = 0.78, the regression equation explains 78%
      of the variation in y.
              R² and Adjusted R²

                                                y
                                                  ˆ        y
-R²   R2 
             Explained sample variabili ty
                                              i 1
                                                                          1

                                                y        y
               Total sample variabili ty                     2       SST      SST
                                               i 1

                          2       SSE /(n  k  1)
-Adjusted R²            R  1
                                   SST /(n  1)

(a)Why do we care about the adjusted R²?
(b) Is adjusted R² always better than R²?
(c)What’s the relationship between R² and adjusted R²?
• Run SAS programs to demonstrate simple
  linear regression
  – Prego Spaghetti sauce
  – Keynesian consumption function
What Have We Learned About
Regression Analysis Thus Far?
-Population parameters vs. sample parameters
-Getting a feel for the data
-Ordinary least squares (OLS)
       (a) Assumptions
       (b) Estimators
       (c) Unbiasedness
       (d) Interpretation of Estimated Parameters

       (a) R²
       (b) adjusted R²
    Coming Attractions
    The Multiple Linear Regression Model
          Estimation and Inference
Use of SAS to Conduct the Regression Analysis

Shared By: