Econ 488 by dffhrtcv3


									   Econ 488
      Lecture 2
Cameron Kaplan
Hypothesis Testing

Suppose you want to test whether the
 average person receives a B or higher (3.0)
 in econometrics.
The Null Hypothesis (H0): Usually trying to
 reject this:
  H0: µ =3.0
Hypothesis Testing
 Alternative Hypothesis (HA or H1): The null
  hypothesis is not true
   HA: µ ≠3.0 (two-sided)
  Or HA: µ >3.0 (one-sided)
 Usually we pick the two sided test unless we can
  rule out the possibility that µ >3.0
Hypothesis Testing
 Suppose we conduct a sample of 20 former
  econometrics students we found:
    Sample Mean = 3.30
    Standard Deviation = 0.25
 How likely is it that a sample of 20 would give a sample
  average of 3.30 if the population average was really 3.0?
          Hypothesis Testing

When we estimate x-bar using an estimated
 standard error we need to use the t-

                 s N
Hypothesis Testing
Test Statistic:
      X -m
     s N
Significance Level - Most common is 5%
 or 1%.
5 % significance level
                         If  really was
                          3.0, what
                          values of t
                          would give us a
                          test that would
                          reject the null
                          when it’s
                          correct only 5%
                          of the time?
Hypothesis Testing
We have a sample size of 20
Thus we have N-1 = 20-1 = 19 degrees of
Look in t-table
t* = 2.093
So if our value of t is greater than 2.093
 OR less than -2.093, we should reject the
 null hypothesis
Hypothesis testing

   X -m
   s N
     3.3 - 3   0.3      0.3
t=           =       =
   0.25 20 0.25 4.472 0.0559
 t = 5.366
 So, we should reject the null

Suppose we want to know: if the average
 student really got a 3.0, how likely would it
 be for us to observe a value at least as far
 from 3.0 as we did in our sample?
In other words, if  = 3.0, how likely is it
 that when we draw a sample of 20 that we
 would get a sample mean of 3.3 or greater
 (or 2.7 or less)?
 We want to know the probability that t>5.366
 Can’t look up in most tables, but most stats
  software gives it to you.
 In this case, p=0.000035
 In other words if the null were true, we would
  only get a value that extreme 0.0035% of the
  time (1 out of 29,000 times)
 This is strong evidence that we should reject the
If p-value is smaller than the significance
 level, reject null.
P-value is nice, because if you are given
 p-value, you don’t have to look anything
 else up in a table.
Smaller p-values mean null hypothesis is
 less likely to be true.

A biased sample is a sample that differs
 significantly from the population.
Common Types of Bias

Selection Bias
Sample systematically excludes or
 underrepresents certain groups.
e.g. calculating the average height of US
 men using data from medicare records
We are systematically excluding the
 young, who may be different for many
Common Types of Bias

Self-Selection Bias/Non-Response Bias
Bias that occurs when people choose to
 give certain information.
e.g. ads to participate in medical studies
e.g. calculating average CSUCI GPA by
 asking students to volunteer to let us look
 at their transcripts.
Common Types of Bias
 Survivor Bias
 Suppose we are looking at the historical average
  performance of companies on the NYSE, and
  wanted to know how that was related to CEO
 One problem that we might have is that we
  might only look at companies that are still
 We are excluding companies that went out of
Review of Regression
Regression - Attempt to explain movement
 in one variable as a function of a set of
 other variables
Example: Are higher campaign
 expenditures related to more votes in an
Review of Regression
Dependent Variable - Variable that is
 observed to change in response to the
 independent variable
e.g. share of votes in the election
Independent Variable(s) (AKA explanatory
 variable) - variables that are used to
 explain variation in dependent variable.
e.g. campaign expenditures.
Review of Regression
Example: Demand
Quantity is dependent variable
Price, Income, Price of compliments, Price
 of Substitutes are all independent
Simple Regression
Y = 0+1X
Y: Dependent Variable
X: Independent Variable
0: Intercept (or Constant)
1: Slope Coefficient
Simple Regression




Simple Regression
1 is the response of Y to a one unit
 increase in X
1 =Y/X
When we look at real data, the points
 aren’t all on the line
Simple Regression


Simple Regression
How do we deal with this?
By adding a stochastic error term to the
Y = 0 + 1X + 
Deterministic Component
Stochastic Component
Simple Regression


         0 +  1X

Why do we need ?

1. Omitted Variables
2. Measurement Error
3. The underlying relationship may have a
   different functional form
4. Human behavior is random
 There are really N equations because there are
  N observations.
 Yi = 0 + 1Xi + i (i=1,2,…,N)
 E.g.
 Y1 = 0 + 1X1 + 1
 Y2 = 0 + 1X2 + 2
 YN = 0 + 1XN + N
Multiple Regression
We can have more than one independent
Yi = 0 + 1X1i + 2X2i + 3X3i + I
What does 1 mean?
It is the impact of a one unit increase in X1
 on the dependent variable (Y), holding X2
 and X3constant.
Steps in Empirical Economic
1. Specify an economic model.
2. Specify an econometric model.
3. Gather data.
4. Analyze data according to econometric
5. Draw conclusions about your economic
Step 1: Specify an Economic Model
 Example: An Economic Model of Crime
 Gary Becker
 Crimes have clear economic rewards (think of a
  thief), but most criminal behavior has economic
 The opportunity cost of crime prevents the
  criminal from participating in other activities such
  as legal employment,
 In addition, there are costs associated with the
  possibility of being caught, and then, if
  convicted, there are costs associated with being
Economic Model of Crime
 y=f(x1, x2, x3, x4, x5, x6, x7)
 y=hours spent in criminal activity
 x1=“wage” for an hour spent in criminal activity
 x2=hourly wage in legal employment
 x3=income from sources other than
 x4=probability of getting caught
 x5=probability of being convicted if caught
 x6=expected sentence if convicted
 x7=age
Economic Model of Education
What is the effect of education on wages?

educ=years of education
exper=years of workforce experience
tenure=years at current job
Step 2: Specify an econometric
In the crime example, we can’t reasonably
 observe all of the variables
e.g. the “wage” someone gets as a
 criminal, or even the probability of being
We need to specify an econometric model
 based on observable factors.
Econometric Model of Crime
crimei = 0 + 1wagei + 2othinci +
 3freqarri + 4freqconvi + 5avgseni +
 6agei + I
crime = some measure of frequency of
 criminal activity
wage = wage earned in legal employment
othinc = income earned from other
freqarr = freq. of arrests for prior
Econometric Model of Crime
crimei = 0 + 1wagei + 2othinci +
 3freqarri + 4freqconvi + 5avgseni +
 6agei + I
freqconv = frequency of convictions
avgsen = average length of sentence
age= age in years
= stochastic error term
Econometric Model of Crime

The stochastic error term contains all of
 the unobserved factors, e.g. wage for
 criminal activity, prob of arrest, etc.
We could add variables for family
 background, parental education, etc, but
 we will never get rid of 
Wage and Education

wagei = 0 + 1educi + 2exper + 3tenurei
 + I
What are the signs of the betas?
Run Regression in Gretl! (wage1.gdt)
Step 3: Gathering Data

Types of Data:
Cross-Sectional Data
Time Series Data
Pooled Cross Sections
Panel/Longitudinal Data
Cross-Sectional Data
A sample of individuals, households, firms,
 cities, states, or other units, taken at a
 given point in time
Random Sampling
Mostly used in applied microeconomics
  General Social Survey
  US Census
  Most other surveys
Cross-Sectional Data
Obs   wage   educ   exper   female married
1     3.10   11     2       1      0
2     3.24   12     22      1      1
3     6.00   11     3       0      1
…     …      …      …       …      …
525   3.50   16     4       0      0
526   4.25   14     5       1      0
Time Series Data
Observations on a variable or several
 variables over time
E.g. stock prices, money supply, CPI,
 GDP, annual homicide rates, etc.
Because past events can influence future
 events, and lags in behavior are common
 in economics, time is an important
 dimension of time-series
Time Series Data
More difficult to analyze than cross-
 sectional data
Observations across time are not
May also have to control for seasonality
Time Series Data
Obs   year   avgmin avgcov unemp gnp
1     1950   0.20   20.1   15.4   878.7
2     1951   0.21   20.7   16.0   925.0
3     1952   0.23   22.6   14.8   1015.9
…     …      …      …      …      …
37    1986   3.35   58.1   18.9   4281.6
38    1987   3.35   58.2   16.8   4496.7
Pooled Cross-Sections
 Both time series and cross-sectional features
 Suppose we collect data on households in 1985
  and 1990
 We can combine both of these into one data set
  by creating a pooled cross-section
 Good if there is a policy change between years
 Need to control for time in analysis
Pooled Cross-Sections
Obs     year     hprice    proptax
1       1993     85,500    42
2       1993     67,300    36
…       …        …         …
250     1993     134,000   41
251     1995     65,000    16
252     1995     182,400   20
…       …        …         …
520     1995     57,200    16
Panel/Longitudinal Data

A panel data set consists of a time series
 for each cross-sectional member
E.g. select a random sample of 500
 people, and follow each for 10 years.
Panel Data
obs   personid   year   wage    dinout
1     1          1990   5.50    2
2     1          1992   6.50    4
3     1          1994   6.75    4
4     2          1990   10.50   6
5     2          1992   10.50   5
6     2          1994   11.25   2
7     3          1990   7.75    5
…     …          …      …       …
900   300        1994   15.00   2
Causality & Ceteris Paribus

What we really want to know is: does the
 independent variable have a causal effect
 on the dependent variable
But: Correlation does not imply causation
Suppose we want to know if higher
 education leads to higher worker
Causality and Ceteris Paribus

If we find a relationship between education
 and wages, we don’t know much
Why? What if highly educated people have
 higher IQs, and it’s really high IQ that
 leads to higher wages?
If you give a random person more
 education, will they get higher wages?
Causality and Ceteris Paribus

What we want to know is… Does higher
 education lead to higher wages ceteris
 paribus… holding all else constant
We have to control for IQ, experience,
 gender, job training, etc.
But we can’t control for everything!

To top