PS581 Multivariate Statistical Analysis by ewghwehws

VIEWS: 10 PAGES: 31

									 BUSH 632: Getting Beyond
Fear and Loathing of Statistics

            Lecture 1

          Spring, 2007
                  Don’t Panic
• Motivation: this course is about the
  connection between theoretical claims and
  empirical data

• What we’ll cover (after a very brief review):
  – Part 1: bi-variate regression
  – Part 2: multiviariate regression
  – Part 3: logit analysis and factor analysis
    The place of statistical analysis
• Programs, policies, legislation typically consist
  of sets of normative claims and a (sketchy?)
  theory about how to achieve objectives
  – Policies typically attempt to map a set of beliefs and
    empirical claims into society, the economy,
    international relations. (E.g., welfare reform)
• Policy analysts need to be able to identify the
  values served, distill the theory, and evaluate its
  empirical claims.
   The place of statistical analysis
• Ingredients of strong empirical research
  –Theory  claims for policy (and counter-claims)
  –Hypotheses  measurement  analysis
  –Findings  Back to theory…
  –Implications for policy
•Characterizing data
  –Data Quality: Valid? Reliable? Relevant?
•Appropriate model design and execution
  –Are statistical models appropriate to test hypotheses?
  –Are models appropriately specified?
  –Do data conform to statistical assumptions?
            How to survive this class
• Use the webpage
    – http://www.tamu.edu/classes/bush/hjsmith/courses/bush632.html
•   Lectures and book: as close as possible
•   Readings: Read ‘em or weep.
•   Questions: Bring ‘em to class, office hours
•   Stata: Use it a lot
    – In-class examples and exercises
    – Download exercises and data in advance
    – The place of exercises in Bush 632
• Nothing late; don’t miss class…
                   Class Exams
• Three Take-Home Exams
  – Characteristics and Grading Criteria
     •   Connection to theory
     •   Clear hypotheses
     •   Appropriate statistical analyses
     •   Clear and succinct explanations
• Class Data Will Be Provided
  – From the text
     • www.aw-bc.com/stock_watson
  – From Us
     • On the Class Webpage
 A Brief Refresher on Functions
         and Sampling
• Statistical models involve relationships
  – Relationships imply functions
     • E.g.: Coffee consumption and productivity

• Functions are ubiquitous (or chaos prevails)
  – Most general expression: Y f (X1, X2, … Xn, e)
          Linear Functions

X    Y
-5   0
-4   1
-3   2
-2   3
-1   4
0    5
1    6
2    7
3    8
4    9
5    10
           Non-Linear Functions
                          Y= 3 - Xsqd

                           5

X    Y
-5   -22
-4   -13                   0
-3   -6    -6   -4   -2         0       2   4   6
-2   -1
-1    2
0     3                    -5
1     2
2    -1
3    -6
4    -13                  -10                       Y
5    -22



                          -15




                          -20




                          -25
     More Non-Linear Functions


X     Y
-5   -397
-4   -221
-3   -105
-2   -37
-1    -5
0     3
1     -1
2     -5
3     3
4     35
5    10 3
             Functions in Policy
• Welfare and work incentives
  – Employment = f(welfare programs, …) Pretty complex

• Nuclear deterrence
  – Major power military conflict = f(nuclear capabilities,
    proliferation, …)

• Educational Attainment
  – Test Scores = f(class size, institutional incentives, …)

• Successful Program Implementation
  – Implementation = f(clarity, public support, complexity…)
   Sampling is also ubiquitous
• “Knowing” a person: we sample
• “Knowing” places: we sample
• Samples are necessary to identify functions
  – Samples must cover relevant variables,
     contexts, etc.
• Strategies for sampling
  – Soup and temperature: stir it
  – Stratify sample: observations in appropriate
    “cells”
  – Randomize
      Statistics Refresher: Topics
• Central tendency          • Characteristics of sampling
   – Expected value and       distributions
     means
                            • Class Data
• Dispersion
   – Population variance,      – 2005 National Security
     sample variance,            Survey (phone and web)
     standard deviations       – Stata application
• Measures of relations     • Means, Variance, Standard
• Covariation                 Deviations
   – covariance matrices
• Correlations              • The Normal Distribution
• Sampling                  • Medians and IQRs
  distributions             • Box Plots and Symmetry
                              Plots
  Measures of Central Tendency

In general: E[Y] = µY                  I

For discrete functions:     E[Y] =    Y f (Y ) = µ
                                      i 1
                                             i   i   Y




                                      

For continuous functions:   E[ Y] =    Yf (Y)dY = µ     Y
                                      



An unbiased estimator of the expected value:
                                   Yi
                               Y      .
                                   n
     Rules for Expected Value

• E[a] = a -- the expected value of a constant
  is always a constant

• E[bX] = bE[X]

• E[X+W] = E[X] + E[W]

• E[a + bX] = E[a] + E[bX] = a + bE[X]
    Measures of Dispersion

• Var[X] = Cov[X,X] = E[X-E[X]]2

• Sample variance:       2
                        sX 
                              (Xi  X)2
                                n 1


• Standard deviation:     X  Var(X )


                                 2
• Sample Std. Dev:        sX  sX
 Rules for Variance Manipulation

• Var[a] = 0
• Var[bX] = b2 Var[X]
• From which we can deduce:
  Var[a+bX] = Var[a] + Var[bX] = b2 Var[X]
• Var[X + W]
  = Var[X] + Var[W] + 2Cov[X,W]
       Measures of Association

• Cov[X,Y] = E[(X - E[X])(Y - E[Y])]
  = E[XY] - E[X]E[Y]

• Sample Covariance:         {( X   i    X)(Yi  Y )}
                                      n 1
                                   Cov[X,Y]
• Correlation:            XY 
                                  Var[X]Var[Y]

• Correlation restricts range to -1/+1
       Rules of Covariance
          Manipulation

• Cov[a,Y] = 0 (why?)

• Cov[bX,Y] = bCov[X,Y] (why?)

• Cov[X + W,Y] = Cov[X,Y] + Cov[W,Y]
        Covariance Matrices
   Var[Y ] Cov[Y , X] Cov[Y ,Z ]
   Cov[X,Y ] Var[X ] Cov[X, Z]
  
  Cov[Z,Y ] Cov[Z, X] Var[Z ] 
                                
  Correlation Matrices (Example)
. correlate     ahe yrseduc
(obs=2950)
             |      ahe yrseduc
-------------+------------------
         ahe |   1.0000
     yrseduc |   0.3610   1.0000
    Figure 5.3 Annual Hourly Earnings and Years of
          Education (Stock & Watson p. 165)
            Characterizing Data
• Rolling in the data -- before modeling
           – A Cautionary Tale

• Sample versus population statistics
Concept         Sample Statistic                          Population Parameter
                                  n

Mean                             X           i
                                                                    E[Y]
                          X     i 1
                                      n


Variance              s
                          2
                              
                                 (Y  Y )i
                                                      2
                                                                 Y  Var[Y]
                                                                  2

                          Y
                                   (n  1)


Standard Deviation             sY  sY
                                                  2
                                                                  Y  Var[Y]
  Properties of Standard Normal
     (Gaussian) Distributions
• Can be dramatically different than sample
  frequencies (especially small ones) Stata
• Tails go to plus/minus infinity
• The density of the distribution is key:
  +/- 1.96 std.s covers 95% of the distribution
  +/- 2.58 std.s covers 99% of the distribution
• Student’s t tables converge on Gaussian
    Standard Normal (Gaussian)
           Distributions
• So what?
  – Only mean and standard deviation needed to
    characterize data, test simple hypotheses
  – Large sample characteristics: honing in on normal
                            ni=300

                             ni=100

                                      ni=20




                       X
                Order Statistics
• Medians
  – Order statistic for central tendency
  – The value positioned at the middle or (n+1)/2 rank
  – Robustness compared to mean
     • Basis for “robust estimators”
• Quartiles
  – Q1: 0-25%; Q2: 25-50%; Q3: 50-75% Q4: 75-100%
• Percentiles
  – List of hundredths (say that fast 20 times)
       Distributional Shapes

• Positive Skew                    Y  MdY

                     MdY   Y


• Negative Skew                    Y  MdY
                           Y MdY


• Approximate
  Symmetry                          Y  MdY
                           MdY
                           Y
    Using the Interquartile Range
                (IQR)
•   IQR = Q3 - Q1
•   Spans the middle 50% of the data
•   A measure of dispersion (or spread)
•   Robustness of IQR (relative to variance)
•   If Y is normally distributed, then:
    – SY≈IQR/1.35.
• So: if MdY ≈ Y and SY ≈IQR/1.35, then
    – Y is approximately normally distributed
Example: The Observed Distribution
   of Annual Household Income
  (Distribution of income by gender: men=1, women=2)
  Interpreting Box Plots




Median Income = 15.38 (men), 14.34 (women)
         Quantile Normal Plots
• Allow comparison between an empirical
  distribution and the Gaussian distribution
• Plots percentiles against expected normal
• Most intuitive:
   – Normal QQ plots
• Evaluate
    Data Exploration in Stata
• Access The Guns dataset from the replication data
  on the Stock and Watson Webpage
• Using Incarceration Rate: univariate analysis Stata
• Using Incarceration Rate : split by Shall Issue
  Laws Stata
• Exercises:
   – Graphing: Produce
       • Histograms
       • Box plots
       • Q-Normal plots
       For Next Week
• Read Stock and Watson
  – Chapter 4
• Homework Assignment on
  Webpage

								
To top