Statistics

Document Sample
Statistics Powered By Docstoc
					     Simple regression
Statistics for dummies




               Statistics

    Gabriel V. Montes-Rojas




Gabriel Montes-Rojas     Statistics
                         Simple regression
                    Statistics for dummies


y = β0 + β1x + u
  Much of applied econometrics is concerned with the linear simple
  regression model that explains the relationship between y and x:

                             y = β0 + β1 x + u

  where

                     y                                  x
            dependent variable               independent variable
             explained variable                explanatory variable
             response variable                   control variable
                regressand                    regressor or covariate

  u is called the error term, residual or disturbance and represents
  all other factors, different from x that affect y .
                     Gabriel Montes-Rojas     Statistics
                          Simple regression
                     Statistics for dummies


y = β0 + β1x + u


  Our interest is the effect of x on the variable y on some
  population. The error term, u is assumed to have no systematic
  influence on y and therefore, only x is of importance. Then, we
  believe that y ≡ f (x ) = β 0 + β 1 x.
  The following definitions will be used extensively during the course:
      β 0 is the intercept, f (0) = β 0 .
      This represents the value of y when x is set at 0.
      β 1 is the slope, ∆y = β 1 .
                        ∆x
      This represents the unit change in y after a unit change
      in x.



                     Gabriel Montes-Rojas     Statistics
     Simple regression
Statistics for dummies




Gabriel Montes-Rojas     Statistics
                        Simple regression
                   Statistics for dummies


Example 2.7 (p.41 in Wooldridge): Returns to education




                      wage = β 0 + β 1 educ + u

      Wages are expected to be an increasing function of education,
      i.e. more education means on average higher wages. Then, in
      this linear model, we expect that β 1 > 0.
      What does u mean? Other factors, different from education,
      that affect wages, such as age or ability.




                    Gabriel Montes-Rojas    Statistics
                          Expectation
      Simple regression
                          Variance
 Statistics for dummies
                          Regression model




Statistics for dummies




 Gabriel Montes-Rojas     Statistics
                                              Expectation
                          Simple regression
                                              Variance
                     Statistics for dummies
                                              Regression model


Random variables (RV)
  Why do we need random variables in Econometrics????
     We will (almost) never observe the whole population, only a
     small portion of it
     A random sample is a subset of a population
     If we consider the random variable X , a random sample is
     {xi }n=1 or x1 , x2 , ..., xn that consists of n realisations of the
          i
     variable X , which are indexed by i.
     Example: If X is the return of an asset, a random sample are
     actual observations in the market about the asset returns. Say
     for a sample of three observations
     x1 = $ 1000, x2 = −$ 567, x3 = $ 0
     Example: Flipping a coin: let X = 0 be HEADS and X = 1
     be TAILS. Then, X = {0, 1}. Moreover,
     P [X = 0] = P [X = 1] = 0.5. (This is called the Bernoulli
     distribution).
                      Gabriel Montes-Rojas    Statistics
                                                Expectation
                            Simple regression
                                                Variance
                       Statistics for dummies
                                                Regression model


Discrete vs Continuous RVs

  A discrete random variable is one that takes on only a finite or
  countably infinite number of values.

  Example: Flipping a coin: let X = 0 be HEADS and X = 1 be
  TAILS. Two possible values: 0 or 1.

  Example: Number of £50 bills in your wallet: X can take any
  number in 0, 1, 2, 3,..., ∞

  Each outcome of X has an associated probability.
  pj = P (X = xj ), j = 1, ..., k. This probability measure satisfies:
       pj ≥ 0, j = 1, 2, ..., k
       ∑k=1 pj = 1
        j


                       Gabriel Montes-Rojas     Statistics
                                               Expectation
                           Simple regression
                                               Variance
                      Statistics for dummies
                                               Regression model


Discrete vs Continuous RVs

  A continuous random variable is one that takes on any real
  value.

  Let X be a continuous random variable. Its probability measure is
  described by a density function f (X ) that satisfies
      f (x ) ≥ 0 for all x ∈ X , where X is the domain of X , usually
      X =R
        X
            f (x )dx = 1


  Although the density function acts as a probability of each value of
  x, it has a tricky interpretation, because there are so many values
  in X , that individually each one has probability zero (?!).

                      Gabriel Montes-Rojas     Statistics
                                              Expectation
                          Simple regression
                                              Variance
                     Statistics for dummies
                                              Regression model


Expectation of a RV

  Random variables can be described by some of its features:

                             Expectation: E [X ]

  What value should we expect from X ? If we have a considerable
  amount of draws from the X random variable, what would be their
  average?
  For the coin example:
  E [X ] = 0 × P [X = 0] + 1 × P [X = 1] = 0 × 0.5 + 1 × 0.5 = 0.5.
  For the discrete RVs: E [X ] = ∑k=1 xj × P [X = xj ].
                                  j

  For the continuous RVs: E [X ] =            X
                                                  xf (x )dx.

                     Gabriel Montes-Rojas     Statistics
                                           Expectation
                       Simple regression
                                           Variance
                  Statistics for dummies
                                           Regression model




Property of expectation: Let A and B be two random variables,
and c and d two constants. Then, E [cA + dB ] = cE [A] + dE [B ].
Property of expectation: Let A and B be two independent
random variables. Then, E [A × B ] = E [A] × E [B ].




                  Gabriel Montes-Rojas     Statistics
                                            Expectation
                        Simple regression
                                            Variance
                   Statistics for dummies
                                            Regression model




An estimator of the expectation of a random variable X is the
sample average.
Given a random sample {xi }n=1 , define x = n−1 ∑n=1 xi which is
                           i           ¯        i
simply the average.

             ˆ                                           ˆ
An estimator µ is unbiased for a given parameter µ if E (µ) = µ

In words, if we consider all possible random samples, on average,
we will obtain the parameter we want to estimate.
In our case, we can prove that E (x ) = E (X ).
                                  ¯
Proof:...



                   Gabriel Montes-Rojas     Statistics
                                             Expectation
                         Simple regression
                                             Variance
                    Statistics for dummies
                                             Regression model


Variance of a RV




  However, for a given realisation of X , defined as x, we may have
  that x = E [X ].
  But, how much does this random variable deviate from the E [X ]?

                Variance: Var [X ] ≡ E [(X − E [X ])2 ]




                     Gabriel Montes-Rojas    Statistics
                                             Expectation
                         Simple regression
                                             Variance
                    Statistics for dummies
                                             Regression model




Prove that Var [X ] = E [X 2 ] − (E [X ])2 .
Property of variance: Var [aX ] = a2 × Var [X ]
Property of variance:
Var [aX + bY ] = a2 × Var [X ] + b 2 × Var [Y ] + 2ab × Cov [X , Y ],
where Cov [X , Y ] = E [XY ] − E [X ]E [Y ]




                     Gabriel Montes-Rojas    Statistics
                                            Expectation
                        Simple regression
                                            Variance
                   Statistics for dummies
                                            Regression model


Covariance



  The covariance of the random variables A and B measures how
  much co-movement they have.

          Covariance: Cov [Y , X ] ≡ E [YX ] − E [Y ]E [X ]

  Property of covariance: Let A and B be two independent random
  variables. Then, Cov [A, B ] = 0.




                   Gabriel Montes-Rojas     Statistics
                                                Expectation
                            Simple regression
                                                Variance
                       Statistics for dummies
                                                Regression model


In the simple regression model...

  In the simple regression model, Y , X and U are random
  variables. β 0 and β 1 are population parameters, i.e. constants
  that describe the relation between Y and X . Then,

           E [Y ] = E [ β 0 + β 1 X + U ] = β 0 + β 1 E [X ] + E [U ]
  (Since U captures other factors, we will assume that E [U ] = 0.)
  However, our main interest is in the conditional expectation that
  defines the population regression model:

  E [Y |X ] = E [ β 0 + β 1 X + U |X ] = β 0 + β 1 X + E [U |X ] = β 0 + β 1 X

  Assumption: U and X are independent, then E [U |X ] = E [U ] = 0.

                       Gabriel Montes-Rojas     Statistics
                                            Expectation
                        Simple regression
                                            Variance
                   Statistics for dummies
                                            Regression model


Parameters vs Estimators




  Note:
      β 0 and β 1 are population parameters to be estimated.
      ˆ       ˆ
      β 0 and β 1 will be their estimators.
     The parameters are just numbers, they are fixed. However,
     the estimators will be random variables.




                   Gabriel Montes-Rojas     Statistics