Comparing Growth in
Student Performance
David Stern, UC Berkeley
Career Academy Support Network
Presentation to Educating for Careers/
California Partnership Academies Conference
Sacramento, March 4, 2011
1
What I’ll explain
• Why “value added” is the most valid way to
compare academy students’ progress with
other students in same school and grade
• How to compute value added
• Example of application to career academies
2
???
What questions
do you have
at the start?
3
What is “value added”?
• Starts with matched data for individual
students at 2 or more points in time
• Uses students’ characteristics and previous
performance to predict most recent
performance
• Positive value added means a student’s
actual performance is better than predicted
• If academy students on average perform
better than predicted, academy has positive
value added
4
.93
Correlation between
academic performance index
and % low-income students
in California school districts
5
Value added is better measure
than
• Comparing average performance of 2 groups
of students without controlling for their
previous performance – because one group
may have been high performers to start with
• Comparing this year’s 11th graders (for
example) with last year’s 11th graders –
because these are different groups of
students!
6
Creates better incentives
• Reduces incentive for academy to recruit or
select students who are already performing
well
• Recognizes academies for improving
performance of students no matter how they
performed in the past
• Provides a valid basis on which to compare
student progress, and then ask why
7
What NOT to do
• DON’T attach automatic rewards or
punishments to estimates of value added –
use them as evidence for further inquiry
• DON’T rely only on test scores – analyze a
range of student outcomes: e.g., attendance,
credits, GPA, discipline, etc.
• DON’T use just 2 points in time – analyze
multiple years if possible, and do the analysis
every year
8
Recent reports
• National Academies of Science: “Getting
Value out of Value Added”
http://www.nap.edu/catalog.php?record_id=1
2820
• Economic Policy Institute: “Problems with the
Use of Student Test Scores to Evaluate
Teachers”
http://epi.3cdn.net/b9667271ee6c154195_t9
m6iij8k.pdf
9
How it’s done
• Need matched data for each student at 2 or
more points in time
• Accurately identify academy and non-
academy students in each time period
• Use statistical regression model to predict
most recent performance, based on students’
characteristics and previous performance
10
Example: comparing
teachers
• Each point on graph shows one student’s
English Language Arts test score in spring
2003 (horizontal axis) and spring 2004
(vertical axis) for an actual high school
• Regression line shows predicted score in
2004, given score in 2003
• Students who had teacher #30 generally
scored higher than predicted in 2004 – this
teacher had positive value added
11
Scatterplot of 2003 and 2004 English
Language Arts scores at one high
school
Scatterplot of 2003 and 2004 scores,
with regression line
Dots above the line represent students
who scored higher than predicted in 2004.
Dots below the line represent students
who scored lower than predicted.
Most students with teacher 30
scored higher in 2004
than predicted by their 2003 score
This student’s 2004 score
was higher than predicted
This student’s 2004 score
was lower than predicted
Example using academies,
in a high school with
4 career academies
and 4 other programs:
Programs 2, 4, 5, and 8 are
career academies
15
Parents’ education differs across programs
16
Student ethnicity also differs
17
Students in programs 4, 5, and 8 are
• less likely to have college-educated parents
• less likely to be white.
Comparisons of student performance should take
such differences into account.
18
Grade 11 enrollments, 2009-10
Analysis focused on
students in grade 11
who were present in
at least 75% of classes.
19
Mean GPA during junior year, 2009-
10
20
Mean 11th grade test scores, spring
2010
21
Mean 8th grade test scores for 2009-10
juniors
22
Juniors in programs 4 and 5 had lower grades and test score
But comparing 11th grade test scores is misleading because
students who entered programs 4 and 5 in high school
were already scoring lower at end of 8th grade.
More valid comparison would focus on CHANGE
in performance during 2009-10.
23
Numbers of students by change in English
lang. arts performance level during 2009-10
Performance levels:
far below basic,
below basic, basic,
proficient, advanced.
Only program 8
had more students
whose performance
level went up than
students whose
performance level
went down.
24
Change in GPA from grade 8 to 11
Programs 1, 3, and 8
had students with
highest GPAs in
8th grade.
GPA in 11th grade was
lower than in 8th grade
for students in these
3 programs.
25
Predicting 2010 test score based on 2009
score
Dots above the line represent students
who scored higher than predicted in 2010.
Dots below the line represent students
who scored lower than predicted.
26
Predicting 11th grade GPA based on 8th
grade
Dots above the line represent students
who scored higher than predicted in 2010.
Dots below the line represent students
who scored lower than predicted.
27
Regression analysis uses prior performance
along with other student characteristics
to estimate each student’s predicted performance
in 2009-10.
In this analysis, programs 2-8 are compared to
program 1.
Positive regression coefficient says, on average,
students in that program exceeded prediction
more than students in program 1 did.
28
Value added results for test
scores
Only program 8 had positive
value added compared to
program 1.
The only statistically significant
differences with program 1
were programs 2 and 4, both
negative. In these two programs,
students scored significantly
lower than predicted.
29
Value added results for GPA
Programs 3, 6 and 8 were
significantly different
from program 1.
Average GPA was lower
than predicted
in these three programs.
30
Questions for this school
• Why did juniors’ GPA in 2009-10 fall below
prediction in programs 3, 6, and 8?
• Why did juniors’ test scores in English
language arts fall below prediction in
programs 2 and 4?
• Important to see whether these patterns
persist for more than one year.
31
Conclusion
• Academy National Standards of Practice: “It
is important to gather data that reflects
whether students are showing improvement
and to report these accurately and fairly to
maintain the academy’s integrity.”
• Measuring value added will keep academies
in the forefront of evidence-based practice
32
???
What questions
do you have now?
33