Power Considerations for Educational
Studies with Restricted Samples that Use
State Tests as Pretest and Outcome
Measures
June 2010
Presentation at the Institute for Education Sciences Research
Conference
Russell Cole ● Josh Haimson ● Irma Perez-Johnson ● Henry May
The research reported here was supported
by the National Center for Education
Evaluation and Regional Assistance, U.S.
Department of Education, through contract
ED-04-CO-0112 to Mathematica Policy
Research.
Measuring impact of education intervention
Randomized controlled trial (RCT)
– Unbiased estimate of program impact
– Increasingly prevalent in education research
Probability of detecting a true program impact
is based on n, , effect size (ES)
– Use of pretest can increase power (1- b)
– Pretest-Posttest correlation shrinks minimum
detectable effect size (MDES)
(1 RA )
2
MDES M n k * RA (rPost , Pre ) 2
2
n * P *(1 P)
3
MDES Increases as Pretest-Posttest
Correlation Decreases
0.400
0.350
0.300
0.250
MDES
0.200
0.150
0.100
0.050
0.000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Pre-Post Correlation
N = 500 N = 250
4
State Tests Prevalent, But Appropriate?
State assessments as outcomes
– Used to define proficiency for AYP
– Universal in grades 3–8 (Math and ELA)
– Minimizes burden
– Low(er) cost and scale scores readily available
State tests tend to have lower CSEM at middle
of ability distribution
– Largest CSEM at tails
– Variance (2) can be partitioned into explainable and
unexplainable (measurement error) components
– Given increased CSEM at tails, samples of students
selected at tails will have higher proportions of
unexplainable variance
5
General Methodology
If there is greater measurement error for low-
performing students, does this mean that
pretest-posttest correlations will be
attenuated?
To capture variability in correlation coefficients
associated to measurement error, select
samples with different average achievement
levels and calculate r (i.e. rPre,Post|Pre )
Compare pretest-posttest correlations across
different achievement levels (and across
states) to inform power calculations
6
Research Questions
What is the average pretest-posttest
correlation coefficient for samples of students
selected at different pretest achievement
levels?
Do correlation coefficients differ by state?
7
Population Data
4 complete states + 2 large districts from 2
additional states
3 years of population data
– 2 sets of pre-post correlations
– (Year1,Year2), (Year2,Year3)
English/Language Arts & Mathematics
Grades 3–8
8
Analysis Decisions
1. Sample pretest achievement level determined
A. Lowest performers
B. Proficiency threshold
C. Average performers
2. Grade grouping (pretest year)
A. Early elementary (grades 3 and 4)
B. Late elementary (grade 5)
C. Middle school (grades 6 and 7)
9
Analysis Procedure
For each state, year, subject, and grade-group:
1. Pretest standardization
2. Selection of study samples (n = 500)
3. Calculation of pretest-posttest correlation
– 6 states, 2 years pre-post data, 2 subjects, 3 grade groups
for each achievement level
4. Cross-cutting aggregation (ANOVA)
10
Pretest-Posttest Correlations Attenuated
for Lowest-Performing Samples
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Population Proficiency Average Lowest
Threshold Performers Performers
11
Large Variation in Pretest-Posttest
Correlation Across States
1.00
0.90
0.80
0.70 State A
0.60 State B
0.50
State C
0.40
State D
0.30
State E
0.20
State F
0.10
0.00
Population Proficiency Average Lowest
Threshold Performers Performers
12
Observed rPre,Post|Pre for Power Analysis
r = .37 r = .60 r = .89
0.400
0.350
0.300
0.250
MDES
0.200
0.150
0.100
0.050
0.000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Pre-Post Correlation
N = 500 N = 250
13
Implications for MDES Might Be Modest
r = .60
0.400
r = .65
0.350
0.300
0.250
MDES
0.200
0.150
0.100
0.050
0.000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Pre-Post Correlation
N = 500 N = 250
14
Discussion/Summary
Pretest-posttest correlations
– Large attenuation when homogeneous sample
selected
– Might be lower than anticipated for low performers
on state assessments
– Similar for ELA/Mathematics and across grade levels
– Affected by other factors (ceiling/floor effects)
Use available administrative records to gauge
rPre,Post|Pre
15
Thank you
rcole@mathematica-mpr.com
May, Henry, Irma Perez-Johnson, Joshua Haimson, Samina Sattar,
and Phil Gleason (2009). “Using State Tests in Education
Experiments: A Discussion of the Issues.” (NCEE 2009-013).
Washington, DC: National Center for Education Evaluation and
Regional Assistance, Institute of Education Sciences, U.S.
Department of Education.
http://ies.ed.gov/ncee/pdf/2009013.pdf
16