PowerPoint Presentation

Document Sample
PowerPoint Presentation Powered By Docstoc
					Causal Inference Using
 Observational Data


          Sukyeong Pi
        Larry Featherston
Employment and Disability Institute
       Cornell University
          Feb. 21, 2009


                                      www.edi.cornell.edu
Agenda
•   Randomized Controlled Trial
•   Observational Studies
•   Propensity Score Matching
•   Example
•   Limitations of PSM
Randomized Controlled Trial (RCT)
• A research study in which the participants are randomly
  assigned groups to objectively compare different
  interventions

• RCT is recognized as a sound scientific method: Gold
  Standard for making causal inferences and making policy
  decisions

• Control for subject selection bias: Minimize subject
  differences between groups
Limitations of RCT
• Philosophical/Ethical Issue: Against the obligation to offer
  each student optimal treatment

• Strategic Issues: Requires time and specialized expertise,
  Generalizability Issue

• Tactical Issues: Issues of treatment fidelity and integrity

• Logistical Issues: Challenges finding adequate numbers
  of subjects, Expensive requiring substantial resources
Advantages of Observational Studies
• Address chief criticism of RCTs: Genealizability

• Availability, Cost, Time

• Serve as a rich source of descriptive information

• Examine exposure in real life  Policy decisions possible

• Large sizes permit investigation of exposures with smaller
  effect sizes
Observational Studies
• Selection Bias: No control for group assignment
  (Ignorability of treatment assignment)

         A          tx         DV
  ?
         B          ctl        DV

• Baseline characteristics of comparison groups are
  different in ways that affect the outcome due to observed
  or unobserved confounders.

• One approach to remove the bias in nonrandomized
  experiments is propensity score matching.
Propensity Score Matching
• Definition: The conditional probability (0 to 1) of receiving
  a given exposure (treatment) given a vector of measured
  or observed covariates.

• Assumption of RCT: the probability to be assigned to
  treatment group is 0.5

• PS reduces baseline information to a single composite
  summary of the covariates, thus minimizing differences
  and improving comparability between two groups in
  observational research
Procedures of Propensity Score Analysis
1. Estimate propensity for treatment given covariates using
   Logistic Regression method: Save predicted value
      e (x) = β0 + β1X1i + β2X2i +… + βnXni + ei
      Propensity Score = e(x) / {1+e(x)}

2. Balance check
   Compare propensity scores between Tx and Ctl groups

3. Estimate effect of treatment on outcome using PS
   a. Regression Model
   b. Stratification
   c. Matching
EXAMPLE
• Research Question: What is the effect of VR services?
  (LR found top three services related to successful VR
  outcome: On the Job Support, Rh Tech, Job Placement)

• Data Source: 2006 RSA 911 data (including consumers
  closed after IPE developed; N=352,138)

• IVs: Gender, Race/Ethnicity, Level of Education, Work
  Status at Application, Primary Source of Support, SSI/DI,
  Type of Disability
• Intervention (tx): Types of Services

• Outcome: Type of Closure
Step 0: Data Set-up
• Variable Selection by crosstabulation of covariates and
  type of closure (outcome)
• Covariates for this example (dummy var.)
  - Gender (2)
  - Race/Ethnicity (3): White no Hispanic, African, others
  - Education (3): <12 yr, 12 yrs (incl. SE cert), >12 yrs
  - Work Status at App (3): Emp wo sup, Other emp,
    No emp
  - Source of Support (2): Personal Income, Others
  - SSI/DI (2): Y/N
  - Disability (5): Sensory, All Mental with SA, LD/ADHD,
    MR/Autism, Others
Step 1: Propensity Scores
• Goal: to include all variables that play a role in
  the selection process, including interactions and
  other nonlinear terms and variables that show
  weak relations to outcome (e.g., p<.10 or p<.25)
  (Rosenbaum & Rubin,1984)


  “Unless a variable can be excluded because there is a
  consensus that it is unrelated to outcome or is not a
  proper covariate, it is advisable to include it in the
  propensity score model even if it is not statistically
  significant.” (Rubin & Thomas,1984)

• In the example, all variables were included for PS
  computation
Step 1: PS by Stepwise LR
        Job Placement Services   B          S.E.     Wald       Sig.     Exp(B)
                        White    0.084       0.012     53.172   0.0000     1.088
                   African Am    0.204       0.013    247.599   0.0000     1.227
                  HS Diploma     0.084       0.009     88.717   0.0000     1.087
                     College+        0.05    0.011      22.45   0.0000     1.051
Employment wo Support at app     -0.486      0.014   1189.201   0.0000     0.615
   All other employment at app   -0.287      0.021    184.183   0.0000     0.751
                        SSD/I        0.16    0.008    362.081   0.0000     1.173
       Personal Income at app    -0.182      0.015    153.565   0.0000     0.833
               Sensory Disab     -0.362      0.014    707.399   0.0000     0.696
                 Mental Disab    0.323       0.009   1210.793   0.0000     1.381
                    LD/ADHD      0.324       0.012    678.563   0.0000     1.383
                   MR/Autism         0.56    0.013   1837.416   0.0000     1.751
                Gender_Male      0.066       0.007     81.003   0.0000     1.069
                     Constant    -1.011      0.014   4874.301   0.0000     0.364
Step 2: Balance Check
• Compare two groups in their distributions using
  descriptive statistics and t-tests

• Box plot graph illustrates some overlaps (similar
  characteristic band of propensity scores)
  between two groups

• No overlap indicates that the differences in
  outcome was drawn from group differences
  (Selection Bias), not from the service effect (e.g.,
  rehab tech services)???
Step 2: Check Distribution/Balance
Propensity Score    Ctl      Tx
       N           236731   115407
      Mean          0.316    0.351
     Median         0.724    0.243
      Mode          0.332    0.365
 Std. Deviation     0.315    0.388
    Minimum         0.092    0.076
   Maximum          0.115    0.115
Quartiles     25    0.252    0.301
              50    0.332    0.365
              75    0.388    0.403
Step 2: Check Distribution/Balance
 Pre adjustment     Ctl      Tx        After Adjust.    Ctl     Tx

       N           236731   115407          N           82040   44379

      Mean          0.316    0.351        Mean          0.348   0.350

     Median         0.724    0.243       Median         0.353   0.353

      Mode          0.332    0.365        Mode          0.315   0.315

 Std. Deviation     0.315    0.388    Std. Deviation    0.023   0.023
    Minimum                             Minimum         0.301   0.301
                    0.092    0.076
   Maximum                              Maximum         0.388   0.388
                    0.115    0.115
Quartiles     25                     Quartiles     25   0.328   0.331
                    0.252    0.301
              50                                   50   0.353   0.353
                    0.332    0.365
              75                                   75   0.369   0.369
                    0.388    0.403
Step 2: Balance Check
    Job Placement Services        Pre Means        Means After Adj.           T-test
      Propensity Scores        No Svcs   Svcs     No Svcs     Svcs      Pre            Post
 dum_gender                      0.533   0.557      0.552      0.542   -13.777*         3.429*
 dum_white                       0.664   0.633      0.670      0.658   18.326*          4.096*
 dum_black                       0.209   0.252      0.178      0.194   -28.049*        -6.875*
 dum_hsdiploma_12 year ed        0.428   0.452      0.369      0.363   -13.297*          2.090
 dum_college+                    0.289   0.250      0.302      0.296   24.656*          2.388*
 dum_emp wo support at app       0.217    0.111     0.021      0.028   84.518*         -7.793*
 dum_other employment at app     0.041   0.030      0.022      0.028   16.061*         -6.925*
 dum_ssi or ssdi                 0.266   0.324      0.246      0.246   -35.110*          0.031
 dum_persona income at app       0.201   0.105      0.042      0.051   78.297*         -6.973 *
 dummy_sensory disab             0.173   0.085      0.000      0.000   78.064*             N/A
 dummy_mental disab              0.299   0.365      0.357      0.381   -38.424*        -8.447 *
 dummy_LD_ADHD                   0.126   0.144      0.204      0.202   -14.897*          0.982
 dummy_MR_Autism                 0.088   0.143      0.019      0.030   -46.195*    -11.147*
Step 2: Check Distribution/Balance
                              Services
  Pre-Adj      Employment   initiated, not
                outcome       employed
Not received       125728         111003
                    53.1%          46.9%
                                                                          Services
Received            80063          35344
                                              After Adj.   Employment   initiated, not
                    69.4%          30.6%                    outcome       employed
Total              205791         146347     Not                37355          44685
                    58.4%          41.6%     received
                                                               45.5%          54.5%
                                             Received           30439          13940
                                                               68.6%          31.4%
                                             Total              67794          58625
                                                               53.6%          46.4%
Step 2: Check Distribution/Balance
Propensity Score    Ctl     Tx
       N           321402   30736
      Mean          0.072   0.243
     Median         0.027   0.256
      Mode          0.063   0.452
 Std. Deviation      .101    .157
    Minimum          .005    .005
   Maximum           .630    .630
Quartiles     25    0.016   0.094
              50    0.027   0.256
              75    0.080   0.363
Step 3: Analysis with PS
• Three techniques are commonly used to
  reduce selection bias and increase
  precision with PS

  - Regression (covariance) adjustment
  - Stratification
  - Matching
Step 3: Analysis I - Regression
• Treat the PS as an additional covariate in
  multivariable regression model

• As a composite of confounders, PS can reduce
  bias in the estimate of the treatment effect by
  adjusting for the pattern of observed
  confounders.

• Treatment effect appears more efficient when
  using PS as a covariate after stratification within
  the strata
Step 3: Analysis II - Stratification
• Solution for the problem of dimensionality to
  make two groups comparable (2k subclasses
  needed for k covariates)

• PS as a scalar summary of all the observed
  background covariates, stratification can balance
  the distributions of the covariates

• Five strata based on the PS will remove over
  90% of the bias in each of the covariates(Cochran, 1968)
Step 3: Analysis II - Stratification
 Job Placement Assistance Services   WO JOB PLCT      W JOB PLCT
Quintiles      Type of Closure       Freq      %      Freq      %        Totals
       1    Employment outcome       44607     77.2   10075     79.1      70543
              WO Emp outcome         13205     22.8    2656     20.9      20.8%
       2    Employment outcome       23232     51.7   14299     70.9      65084
              WO Emp outcome         21685     48.3    5868     29.1      19.2%
       3    Employment outcome       20131     43.5   16839     67.3      71322
              WO Emp outcome         26177     56.5    8175     32.7      21.1%
       4    Employment outcome       18779     47.2   17216     69.0      64740
              WO Emp outcome         21018     52.8    7727         31    19.1%
       5    Employment outcome       14986     38.5   18773     66.7      67079
              WO Emp outcome         23943     61.5    9377     33.3      19.8%
              Totals                 227763 (67.2%)   111005 (32.8%)     338768
Step 3: Analysis II - Stratification
Step 3: Analysis – Stratification (26 closures)
                                           77.279.1
                                      80                        70.9           67.3          69
                                                                                                         66.7
                                      70
   Percentage of Successful Closure




                                      60                 51.7
                                                                                      47.2
                                      50                               43.5
                                                                                                  38.5
                                      40

                                      30

                                      20

                                      10

                                       0
                                            1              2              3           4             5
                                                           Quintiles of Propensity Score

                                                      W/O Job Placement       W/ Job Plaement
Step 3: Analysis III - Matching
• Nearest available matching on the estimated PS

• Mahalanobis metric matching including the PS:
  - An equal percent bias reducing technique (mean for the
    treated minus the mean for the control)
  - Add PS to other covariates in the calculation of the
    Mahalanobis distance

• Nearest available Mahalanobis metric matching within
  calipers defined by the PS within a caliper of ¼ of the
  standard deviation of the propensity score
Step 3: Analysis III - Matching
Using the key variable of PS, matching was conducted
(based on the same PS). Matched cases N=114,790
                                    No
                  Employment    Employment
                   outcome       outcome      Total
  No Job                72543         42247      114790
  Placement
  Services              63.2%         36.8%     100.0%
  Job Placement         79808         34982      114790
  Services
  Received              69.5%         30.5%     100.0%
  Total                152351         77229      229580
                        66.4%         33.6%     100.0%
Interpretation


             What do you think?

     Do you think PS gives better ideas
       to make a causal inference?
Limitations of PSM
• With only observed covariates; No control for unobserved
  (e.g., age for this example)

• Inspection of the overlap between conditions before
  matching or other techniques: Group overlap must be
  substantial (e.g., rehab tech svcs)

• Best with large samples

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:11/29/2011
language:English
pages:28