Grades as an Incentive in Introductory Economics
Document Sample


Grades as an Incentive in Introductory Economics
Lester Hadsell
Division of Economics and Business
College at Oneonta
State University of New York
Oneonta, NY 13820
hadsell@oneonta.edu
Phone: (607) 436-2448
Acknowledgments
I thank Scott England, Scott Houser, Pete Schuhmann, and participants at the 2008 AEA meetings in
New Orleans for helpful comments, and Maureen Cashman and Mary Ellen Mack for assistance in
data gathering.
Grades as an Incentive in Introductory Economics
ABSTRACT
This paper reports results of investigations performed over two semesters examining the
effects of deemphasizing grades in Introductory Microeconomics. Each semester the author
taught one class using a typical grading policy in which all papers and exams were graded on
a 100 point scale and one class in which all papers and exams were graded as either
satisfactory or unsatisfactory (with the opportunity to revise and resubmit). Students in the
S/U sections performed no worse and in some respects performed marginally better on
several measures of learning and effort, including exam scores, homework completion, and
attendance. Students in the S/U section indicate higher satisfaction with several facets of the
course and instruction.
I. INTRODUCTION
Grades in higher education serve several functions. Chief among these, they inform
students of their level of understanding; convey to potential employers or graduate school
admissions officers the academic achievements of applicants; and provide an incentive for
students to exert effort and learn. This last role is particularly salient to economists who tend
to place great value on incentives, in general and in classroom settings (as described in a
recent survey of economics faculty by Hadsell and MacDermott, 20091). As one of many
examples of this emphasis, Buckles and Hoyt (2006, p. 76) in their chapter on using active
learning techniques in large lecture classes, encourage "regularly graded activities" during
class to "provide students with very persuasive incentives to attend class, and get actively
involved in learning."
Several recent studies in the economics education literature have reported benefits of
graded work in terms of increasing student achievement. Grove and Wasserman (2006), for
example, find that freshmen whose homework assignments were graded scored better on
1
The survey reports responses from 816 economics faculty from across the U.S. Among the findings: 91
percent agreed or strongly agreed with the statement, “I think it is useful to use grades as incentives to increase
student performance.”
1
exams than freshmen whose assignments were not graded and Betts and Grogger (2003)
report that more stringent high school grading policies were correlated with higher test scores
on standardized tests. Cherry and Ellis (2005) find that using a competitive grading system
(in which the number of As, Bs, and so on is restricted) led to higher average exam scores in
introductory macroeconomics.
Yet these benefits are not without limits. Grove and Wasserman note that
upperclassmen did not experience the same benefits of graded homework that freshmen did.
Betts and Grogger find that the long-term effect of more stringent grading policies was
negligible – with the exception of minority students, for whom the effects were negative.
And Cherry and Ellis find that only high-achieving students benefit (low achieving students
perform no better under the competitive conditions).2 Further, Dickie (2006) reports that
classroom experiments with grade inducements led to lower learning outcomes and student
interest compared to experiments without the grade incentives.
The limitations of extrinsic rewards such as grades have been studied extensively in
educational psychology. In their meta analysis of more than 100 empirical studies from the
educational psychology literature, Deci et al. (1999) conclude that grades are helpful in
getting students to do unpleasant work, but are ineffective and even harmful in certain
circumstances – for example, when they are seen as controlling or when they are used as
incentive to get students to do work they otherwise enjoy. Examples of suboptimal outcomes
include decreased learning, reduced effort, increased anxiety, and preference for less
2
Cherry and Ellis state (p. 9) that “Results indicate rank-ordering may eliminate the incentive for high
performing students to "stop" once they achieve a stated objective….” An alternative approach to improving the
performance of high-achieving students is to simply raise the criterion-referenced requirements for the higher
letter grades (make the minimum level of achievement (grade) necessary for an A be 95 instead of 93, for
example). This alternative would not carry any of the negative baggage associated with competitive grading (as
discussed in the next section).
2
challenging tasks. In the instances when extrinsic rewards and intrinsic interest coincide, the
extrinsic rewards sometimes crowd out the intrinsic value of the task, reducing the student’s
interest. Motivational crowding out, as it has been called, has received wide empirical
support in psychology and economics (e.g., Frey and Jengen, 2001), and is being applied by
economists in labor market research (Frey, 1998; Murdock, 2002; Falk and Kosfeld, 2006)
and experimental settings (Frey and Oberholzer-Gee, 1997; Frey, Oberholzer-Gee, and
Eichenberger, 1996; Mellstrom and Johanson, 2008; Heyman and Ariely, 2004; James,
2005).
In this paper, I further explore the effectiveness of grades as an incentive. I taught two
sections of introductory microeconomics each semester for two semesters, holding all aspects
of the course as similar as possible within semester, except for the grading structure. Each
semester, in one section (the control), a typical grading policy was followed: all assignments
and exams were graded on a 100-point scale, with a weighted average of assignment and
exam grades constituting the course grade. In the other section (the treatment), grades were
deemphasized: assignments and exams were graded either satisfactory (passing on exams,
“C+” or higher on assignments) or unsatisfactory, with the opportunity to retake exams and
resubmit assignments. Course grades were determined by the amount of satisfactory course
work, as explained in detail below.
With the less stringent grading in the treatment section, we might expect students to
reduce their effort, leading to lower scores on exams and lower quality assignments (Cherry
and Ellis, 2005, provide a detailed analysis of such expectations). Conceivably, a student
could barely meet the minimum requirement for all exams and assignments and still earn an
A for the course. In fact, I find that average exam scores are no lower in the treatment
3
sections and scores in the lower end of the distribution in the treatment sections marginally
exceed those of the control. Furthermore, homework submission rates are higher and more
consistent throughout the semester for the treatment sections, while quality was about the
same. Attendance is also found to be higher for the treatment section for one semester and no
different for the other. In short, deemphasizing grades did not hurt student performance and
in some ways increased it. Further, students’ evaluation of the course indicates that they view
the treatment sections as more organized and fairer compared to the control and they rate
teaching effectiveness higher.
II. EXPERIMENTAL DESIGN
This investigation was undertaken during the Fall 2007 and Fall 2008 semesters at a
small, state-operated liberal arts college in the northeastern U.S. Each semester, the author
taught two sections of Introductory Microeconomics. In one section (the control section), all
papers and exams were graded on a 100 point scale, with course grade determined by a
weighted average of these. In the other section (the treatment section), all papers and exams
were graded as either Satisfactory or Unsatisfactory (with the opportunity for resubmission
and a subsequent Satisfactory). “Satisfactory” meant passing (65% or higher) on exams and
“C+” or higher on assignments. The number of satisfactory items determined the student’s
course grade, as summarized in Table 1 and described in detail below. Unsatisfactory exams
and assignments were not counted against the student (as long as they reached the requisite
number of satisfactory exams/papers) for the semester.
[Table 1 goes here]
4
Classes met for 50 minutes three times per week (MWF) in the Fall 2007 semester,
with the treatment section (1 pm) immediately preceding the control section (2 pm). In Fall
2008, the sections met twice per week for 75 minutes each day (T, TH) with the order of
control-treatment sections reversed so that the control section (10:30 am) preceded the
treatment section (12:30 pm). Within each semester, both sections used the same textbook,
covered the same topics, had the same lectures, and had the same assignments and exams.
Grading details for the first iteration - Fall 2007
During the Fall 2007 semester, students in both sections were provided the same 13
weekly assignments (all of approximately the same difficulty), four longer essays (of up to 4
pages each), and two exams, each with 15 multiple choice, 10 matching, and several short
answer questions and problems. The weekly assignments were relatively easy, consisting of
basic questions requiring short verbal and quantitative answers. The 4-page essays required
more in-depth understanding and a small amount of research. In the control section, students
were required to submit 8 weekly assignments of their choice, one essay, and two exams,
constituting 30%, 20%, and 50% of the course grade, respectively.
In the treatment section, the only A-E grades assigned were course grades. Students
could choose the number and types of assignments and exams to complete, depending on
their desired course letter grade (see Table 1). Likewise, students could choose the level of
difficulty on the exam by successfully completing the entire exam for an “Advanced”
designation (required for an A in the course) or just the multiple choice and matching
sections for a “Basic” designation (required for B or C in the course). Unsatisfactory
assignments and exams could be resubmitted or retaken for credit (with a new set of
5
questions on the exams).3 Greater effort, as measured by the number of satisfactory
assignments and exams translated into a higher course grade. To earn an A for the course a
student in the treatment section had to submit 10 satisfactory weekly assignments, two
satisfactory essays, and two “Advanced” exams. To earn a B the requirements were 7
assignments, 0 essays, and 2 “Advanced” exams (or 7 assignments, 2 essays, and 2 “Basic”
exams), and so on. Unsatisfactory work did not count in the student’s course grade.
Grading details for the second iteration - Fall 2008
During the Fall 2008 semester, students in both sections were provided identical
weekly assignments and four exams consisting exclusively of multiple choice questions. The
first three exams contained either 25 or 30 questions covering material since the previous
exam. The fourth exam was a comprehensive final exam with 37 questions.
In the control section, 25% of the course grade was based on the average grade of six
assignments and 75% was based on the average of the three highest exam scores. The final
exam was optional. If taken, the grade on the final exam replaced all lower prior scores (the
final exam grade was dropped if it was lower than the previous low score).4 In the treatment
section, students were asked to submit either 4 or 8 satisfactory assignments and 2, 3, or 4
satisfactory exams, depending on their desired course grades. This is summarized in Table 1.
The retake for exam 1 was in the week after exam 1, the retakes for exams 2 and 3 occurred
at the same time as exam 4. There was no retake option on exam 4.
Grades were deemphasized in the treatment section in the sense that (a) no 0-100 or
A-E grades were reported to students on individual assignments or exams; (b) poor
3
A student who attempted an “Advanced” (i.e., answered the short answer questions) but did not pass that
portion could retake the exam for an “Advanced” designation.
4
Thus, the final exam acted as a de facto retake.
6
(“unsatisfactory”) work did not count against the student; (c) retakes or resubmissions could
replace poor work; and (d) only a minimum level of quality was necessary to earn full credit.
Participants
Data describing the students are shown in Table 2. In Fall 2007, 88 students
participated: 48 in the control section, 40 for the treatment. In Fall 2008, 108 students
participated: 54 in each section. The students were of traditional college age (18-22) and
were from all class years, freshman to senior. The most common major was Business
Economics (with concentrations in Marketing, Management, Finance, and Management
Information Systems), for which the course was one of four required economics courses (two
introductory and two intermediate).5 The remaining students were of various majors fulfilling
a one or two course requirement for their major, minor, or general education. The
characteristics of students across sections was similar in terms of major, academic ability,
academic load, and gender. The lower portion of Table 2 shows the sample of students used
in the regression analysis to be discussed later. This sample is restricted by availability of
SAT scores. Table 2 also contains section averages for exams (initial score, not retakes),
number of assignments submitted, and levels of interest and enjoyment (surveyed at the end
of the semester).
[Table 2 goes here]
Students did not know the grading structure for the sections before classes began.6
Further, because all sections were fully subscribed (including two other sections each
semester taught by other instructors), students’ ability to switch sections was severely limited
5
The Economics major is distinct from the Business Economics major.
6
In fact my name was not listed on the master course schedule when most students registered for Fall 2007.
7
during the add/drop period by the economics department, which approves all adds once a
course is fully registered (although students could drop the course without permission).
Records from the college Registrar covering add/drop week (first week of classes) indicate
that, in Fall 2007, one student in the treatment section dropped the course and one added it
(out of 40 students total registered), while in the control section, three dropped and three
added (out of 48 total). In Fall 2008, no students added or dropped the treatment section,
while in the control, one student added and one dropped (out of 54 total). No student either
semester switched from one section to the other. Thus, there is scant evidence of students
registering for the class that fit their grading preference. In this respect the assignment of
students resembles a natural experiment with random placement. Another key feature of this
study is that the instructor was the same for both control and treatment sections. As such,
differential effects between control and treatment sections due to instructor qualities are
removed.
III. RESULTS
A. Learning Outcomes
The effect on measurable learning outcomes, namely exam scores, is examined first
by looking at the average and then the distribution. Regression analysis of the effects of the
treatment grading policy on exam performance is based on the following specification
(subscripts indicating student are omitted), designed to isolate the effects of the treatment on
exam performance while controlling for differences in ability, demographic characteristics,
academic purpose and load:
8
G = β0 + β1Section + β2Gender + β3Gender*Section + β4Fresh + β5Soph +
β6Junior + β7Buseco + β8 Buseco*Section + β9SATV + β10SATM + β11HRS,
where G is the percent correct on multiple choice and matching questions (0-100),
Section equals 0 for the control section, 1 for the treatment section,
Gender equals 0 for females, 1 for males (given findings of gender differences in
economics classes, e.g., Borg and Stranahan, 2002, and references therein),
Gender*Section equals 1 for males in the treatment section, 0 otherwise (given
differential responses by gender to grading policies, as reported, e.g., by
Jensen and Owen, 2001; Kaenzig et al., 2007; and Lammers et al., 2005),
Fresh, Soph, and Junior are class year dummies,
Buseco equals 1 for Business Economics majors, 0 otherwise,
Buseco*Section equals 1 for Business Economics majors in the treatment section, 0
otherwise (assuming that importance placed on grades by business majors
may be different than non-majors),
SATV is the student’s score on the verbal portion of the SAT,
SATM is the student’s score on the math portion of the SAT, and
HRS is the number of credit hours the student attempted during the semester.
The regression is run separately for each exam and for the exam average each
semester. Results are shown in Table 3. Students in the treatment sections scored no lower
than students in the control despite the reduced grade incentive. In Fall 2007, exam scores in
the treatment section were 6.2 points to 7.3 points higher on a 100-point scale, of noteworthy
magnitude, although these estimates fall just shy of standard levels of statistical significance.
9
In Fall 2008, the overall exam average difference is weakly greater than zero (and not
statistically significant, t = 1.03) at 3.5 points while it is almost 7.5 points for exam 4 (just
shy of statistical significance, t = 1.62). The results provide support for the notion that a
grading structure of the type used in the treatment section does not lead to reduced student
learning, on average.
[Table 3 goes here]
One might expect the distribution of grades to be altered by a grading policy that
awards full credit for simply meeting a minimum standard. At least two views provide a
rationale for an expectation that the distribution will be compressed in the treatment section.
For one, students on the upper end of the distribution decrease effort as additional effort
much beyond what ensures passing is wasted (in terms of translating into a higher grade),
while students on the lower end of the distribution, near the minimum acceptable grade, will
increase study effort so as to avoid a lower course grade or the additional work and bother
associated with a retake. A second view, based on psychological phenomenon of self-
efficacy (Bandura, 1997) and fear of failure (Conroy, 2001), implies that scores at the bottom
will increase if (1) students believe they can successfully complete the task, or (2) students’
fear of receiving a poor grade is reduced, thereby reducing anxiety. Certainly, the opportunity
to retake the exam if the student does not pass it the first time may reduce the level of anxiety
associated with exam taking. This may be especially true (and beneficial) for students who
have had little prior academic success (i.e., those in the lower end – to middle – of the grade
distribution) (Taylor and Perry, 2005).7
7
This benefit may extend to the middle of the distribution if an acceptable grade on a 0-100 scale for the
student is higher than simply passing.
10
Figure 1 shows the distribution of scores for each of the exams for each semester.
(The appendix contains the kernel density of the distributions shown in Fig. 1.) Scores at the
lower end of the distribution appear to be generally higher in the treatment section while
scores at the higher end are generally about the same. Comparison of the 10th and 90th
percentiles (not shown) indicates that for every exam except exam 4, Fall 2008, the score for
the treatment section at the 10th percentile exceeds that of the control. Further, at the 90th
percentile, the treatment score exceeds the control in three of seven cases.
[Figure 1 goes about here]
A more rigorous analysis is the Wilcoxon (Mann-Whitney) rank-based nonparametric
test of the hypothesis that the subgroups have the same general distribution. Specifically, the
Wilcoxon test examines the comparability of mean ranks across subgroups. Based on results
from applying this test to each exam, as shown in Table 4, we cannot reject the hypothesis
that the distributions are the same. The implication is that scores in the treatment section
were distributed no differently than scores in the control. In other words, there is no evidence
that the alternative grading structure induced students at the upper end of the distribution to
slack off. There also is little evidence that students in the lower end performed better under
the alternative grading structure.8
[Table 4 goes about here]
B. Effort – Homework, Attendance, and Hours of Study
The effects on effort are examined by comparing homework, attendance, and hours of
study across sections. With regard to homework, the belief is that the alternative grading
8
The findings here are similar to Grove and Wasserman (2006) who note (p. 451) that, in their study, scores
were not changed for “academically above- or below-average students, or of any other category of students
[besides freshmen].”
11
system may induce more effort if it reduces anxiety concerning grades, because students
were able to get full (100%) credit more easily and students could resubmit unsatisfactory
work. This view is based on research findings from educational psychologists suggesting that
students’ preferences for challenging tasks is inhibited when their work is graded (Harter,
1978; Brooks et al., 1998) and that some students react with reduced effort when they fear
their work will receive a poor grade (Thompson, 1994; Thompson and Perry, 2005).9 Thus, a
policy that deemphasizes grades on the homework should elicit greater effort on the
assignments, translating into more satisfactory assignments (as was seen in Grolnick and
Ryan, 1987; Ames and Ames, 1991).10 In the context of this study, students in the control
section who fear a poor grade will wait until the last possible opportunity to submit work
(given that they have choice over which assignments to submit), or simply will not submit
required work. Substandard work in the treatment group costs nothing since an
‘unsatisfactory’ is not factored into the course grade, and students have the opportunity to
resubmit the work for credit. In the control section, a ‘low’ grade (which could be a 50, 70,
… or 80, depending on the student) is permanent (students were not given the opportunity to
9
Students who fear falling short may simply withhold effort, even if it ensures failure, rather than put forth
effort and potentially fail. In the language of educational psychology, these students are seeking to protect their
self-worth. These students are risk avoidant with the express purpose of avoiding public acknowledgement of
inability (the student would prefer to blame failure on lack of effort rather than be forced to admit lack of
ability).
10
In Grolnick and Ryan (1987) students were asked to complete a reading assignment. One group was told that
they would later be asked questions about it, but the experimenter emphasized that there would be no
evaluations or grades. A second group was told to read the material and learn it because they would be graded
on performance. In a follow-up eight days later, the first group showed greater interest, rote recall, and
conceptual integration. Similar findings are reported by Moeller and Reschke (1993) and Hahn et al. (1989).
Ames and Ames (1991) also report that perceived homework control had a strong effect on completion rates. In
their study, two classes were assigned homework. In one class, work was graded A through F and counted for
30% of their course grade. In the other class, students were told to try their best (complete as much as they
could in a limited time) and questions about the homework would be covered in class the next day. These
assignments were graded either satisfactory or unsatisfactory and returned for students to redo or complete. The
homework counted for 10% of the course grade. The latter class submitted more assignments.
12
redo assignments).11 In the treatment section, the emphasis is on effort: an “A” for the course
is achievable for all students, regardless of the number of times they receive an
unsatisfactory.
Total number of weekly homework submissions for each student was recorded each
semester (see Table 2). For Fall 2007, students in the treatment section submitted a
statistically significant (p<0.01) greater number of assignments (7.95) compared to the
control (6.73).12 This difference could reflect that students in the control section were
required to submit 8 assignments whereas students in the treatment were asked to submit 4,
7, or 10, depending on their desired course grade. Interestingly, though, 19 students (out of
48 total) in the control section submitted fewer than the required 8 assignments (and six
submitted 4 or fewer). In the treatment, only five students submitted fewer than 7
assignments (the number required for at least a B for the course). In Fall 2008, there is no
statistical difference in average number of assignments submitted (p=0.14).
In addition to these end of semester totals, the week-to-week number of submissions
for the class as a whole was recorded for the two classes during Fall 2008 (this was not done
for Fall 2007), shown in Figure 2. Students in the treatment section submitted assignments
much earlier in the semester and more consistently compared to the control section. As
suggested by educational psychologist, expectations of success at a task are influenced, in
part, by past experiences of success and initial feedback on success (Svinicki, 2005). The S/U
grading and the ability to resubmit work until it was satisfactory virtually guaranteed success.
[Figure 2 goes here]
11
Even with the opportunity to resubmit, 0-100 grading places a cost (the number of points deducted) on
anything short of perfection.
12
These are the weekly assignments, not the longer essays.
13
A second measure of effort, class attendance, was required but was not part of
students’ course grade either semester. The attendance rate in Fall 2007 was significantly
higher (p = 0.03) in the treatment section (87.3%) compared to the control section (81.9%).
The Fall 2008 rate was not significantly different between the sections (p = 0.53), at 76.7% in
the control section and 75.2% in the treatment section.
A third measure of effort, number of hours students studied (during an average week
and during the week prior to an exam) was self-reported by students in an end-of-semester
survey each year. There is no evidence that students in the treatment group, as a whole,
studied less than students in the control. In 2007, students in the control group reported
virtually the same (p=0.64) study time during an average week (2.97 hours) compared to the
treatment group (2.74). Students in the control reported less study time in the week prior to
an exam (5.27 hours versus 6.36 for the treatment), although this difference was not
statistically significant (p = 0.14).
The findings are similar for 2008: Students in the control group reported the same
study time during an average week (2.58 hours) and in the week prior to an exam (5.60
hours) compared to the treatment group (3.18, p = 0.23; and 6.35, p = 0.39). Of course, self-
reported data such as these should be viewed with caution, but as long as errors in reporting
are not systematically correlated with treatment the conclusions regarding section differences
are firm.
In sum, there is no evidence that students in the treatment sections exerted less effort,
even when the extrinsic rewards were reduced.
14
C. Student affect
Research in educational psychology also finds a strong link between student emotion
and emphasis on performance. For example, Pekrun et al. (2006) find that a strong student
emphasis on grades is positively related to subsequent anxiety, hopelessness, and shame.
Likewise, Karabenick (2004), examining student behavior in 13 college psychology classes,
finds that students in classes with greater perceived emphasis on grades and interpersonal
performance comparisons were less likely to ask for help when needed. Nevertheless, student
interest and enjoyment (surveyed at the end of each semester) are virtually identical across
groups in the present study (see Table 2). Thus, deemphasizing grades did not, by itself,
generate the kinds of emotional returns that one may have expected by reviewing the
educational psychology literature.
[Table 5 goes about here]
On the other hand, students in the treatment section evaluated the course differently
along several metrics in end-of-semester evaluations. Both semesters, students in the
treatment section thought the course was more organized, the grading fairer, and the teaching
more effective (see Table 5). That students would evaluate teaching as more effective does
not necessarily imply that better ratings were the result of higher grades (in fact, the
percentage of As and Bs between sections was similar in Fall 2007, when the difference in
effectiveness rating was the greatest).13 Students may perceive increased teacher
effectiveness when they are able to focus on their studies, with less attention paid to grades;
13
In 2007, 21 percent of students were awarded As in the control vs. 20 percent in the treatment and 26 percent
Ds and Es in the control vs. 0 percent in the treatment. In 2008, 17 percent As awarded in the control vs. 28
percent in the treatment and 22 percent Ds and Es in the control vs. 17 percent in the treatment.
15
and when faculty are not seen as dispensers of grades, controlling students’ futures, but rather
as guides helping students attain a level of learning.14
Students thought the S/U grading was less demanding in Fall 2008, while they
thought it was equally demanding in Fall 2007. They were probably correct for the former:
too little may have been expected of them, and the higher percentage of As and Bs earned in
the treatment section that semester supports this view (given the equality of exam scores).
Finally, students perceived no difference in course rigor either semester, which supports the
contention that learning expectations were the same, even if the grading structure was not.
IV. CONCLUDING REMARKS
This study set out to examine the effects of deemphasizing grades in Introductory
Microeconomics along three metrics: student learning, effort, and affect. Conducted over two
semesters, with 196 students, the primary observations are:
1. There is no evidence that deemphasizing grades leads to lower exam performance,
either in terms of averages or distribution, although there is weak evidence that
deemphasizing grades is associated with higher mean scores and marginally improved
performance by students at the lower end of the distribution.
2. There is no evidence that deemphasizing grades leads to less effort, in terms of
homework completion, study time, and class attendance. In fact, treatment section
homework submission rates and class attendance were higher in the first trial, and
homework submission rates were more consistent over the semester relative to the
control in the second trial.
14
As reported by Deci et al. (1999), noted earlier.
16
3. Students in the sections that deemphasized grades evaluated the course as more
organized, with fairer grading and more effective teaching.
The finding that deemphasizing grades did not lead to lower learning outcomes or effort
is noteworthy, given the high value economists generally place on extrinsic incentives.
Decades of research in Educational Psychology, as reviewed in a meta analysis by Deci et al.
(1999), has shown that extrinsic incentives such as grades are useful – in the right situation
(non-controlling, unpleasant tasks), with the right students (those with an extrinsic
motivation). But the same research also reports potential harmful effects on student learning,
effort, and interest. Recent work in economics education, and labor and experimental
economics also reports ambiguous effects of extrinsic rewards.
Several caveats of the present research are noteworthy:
• Grades were not eliminated; they were deemphasized. This investigation concerns the
extent to which grades are used (i.e., the emphasis). Certainly, this paper is not
arguing for an elimination of evaluation of student work, or standards.
• Exam retakes may not be practical in large classes, thus the grading structure in this
study is not necessarily a prescription for others to follow. Grades can, however, be
deemphasized in other ways (e.g., Hadsell and MacDermott, 2009).
• Although deemphasizing grades did not lead to improved learning outcomes (at least
at statistically significant levels) the potential positive effects (on interest, persistence,
preference for challenging tasks) must be considered when evaluating the net effects
(all good opportunities for future research).
17
• The deemphasis of grades in the treatment sections had several components, primarily
the Satisfactory/Unsatisfactory grading and the opportunity to resubmit unsatisfactory
work. Each of these undoubtedly had an effect on students’ choices, effects that
should be the focus of future research.
In addition to reproducing the basic findings of the present study, using a different set of
students and faculty, future research could study the longer term effects. Alternative grading
criteria could also be investigated (e.g., a hybrid of traditional 0-100 grading and S/U
grading, and varying opportunity for retakes and resubmissions), or subsets (e.g., isolating
the effects of grading only homework S/U).
Effective teaching requires proper technique, interesting content, and appropriate
incentives. The topic of appropriate incentives remains largely under-explored in the
economics education literature. The role of grades as extrinsic incentives is well entrenched
in the psyche of economic educators. But this paper finds that deemphasizing grades did not
lead to a decline in measurable learning outcomes or effort. Given the benefits of
deemphasizing grades reported in the educational psychology literature, it is incumbent on
economic educators to further explore the effects of grades. By relying too much on grades as
a motivator we may be achieving suboptimal outcomes in much the same way that relying on
outdated, ineffective techniques and content leads to poor results.
18
REFERENCES
Ames, R. and C. Ames (1991) "Motivation and Effective Teaching." In B. F. Jones and L.
Idol (eds.) Educational Values and Cognitive Instruction: Implications for Reform. Hillsdale,
N. J.: Erlbaum.
Bandura, Albert (1997) Self-efficacy: The exercise of control. New York: Freeman.
Becker, William E., Michael Watts, and Suzanne R. Becker (ed.) (2006) Teaching
Economics: More Alternatives to Chalk and Talk. Cheltenham, UK: Edward Elgar.
Betts, Julian R. and Jeffrey Grogger (2003) "The Impact of Grading Standards on Student
Achievement, Educational Attainment, and Entry-Level Earnings." Economics of Education
Review 22(4): 343-352.
Borg, Mary O. and Harriet A. Stranahan (2002) “Personality Type and Student Performance
in Upper-Level Economics Courses: The Importance of Race and Gender.” Journal of
Economic Education 33(1): 3-14.
Brooks, S.R., Freiburger, S.M., & Grotheer, D.R. (1998). Improving elementary student
engagement in the learning process through integrated thematic instruction. Unpublished
s
master' thesis, Saint Xavier University, Chicago, IL. (ERIC Document Reproduction
Service No. ED 421 274)
Buckles, Stephen and Gail Mitchell Hoyt (2006) “Using Active Learning Techniques in
Large Lecture Classes.” in Becker et al. (2006).
Cherry, Todd L. and Larry V. Ellis (2005) “Does Rank-Order Grading Improve Student
Performance? Evidence from a Classroom Experiment.” International Review of Economics
Education 4(1): 9-19.
Conroy, David E. (2001) Progress in the Development of a Multidimensional Measure of
fear of Failure: The Performance Failure Appraisal Inventory (PFAI).” Anxiety, Stress, and
Coping 14: 431-452.
Deci, Edward, Richard Koestner, and Richard Ryan (1999) “A Meta-analytic Review of
Experiments Examining the Effects of Extrinsic Rewards on Intrinsic Motivation.”
Psychological Bulletin 125(6): 627-668.
Dickie, Mark (2006) "Experimenting: Does It Increase Learning in Introductory
Microeconomics?" Journal of Economic Education 37(3): 267-288.
Falk and Kosfeld (2006) “The Hidden Costs of Control.” American Economic Review 96(5):
1611-1630.
19
Frey, Bruno (1998): Not Just for the Money: An Economic Theory of Personal Motivation.
Glos: Edward Elgar.
Frey, Bruno and Reto Jegen (2001) “Motivation Crowding Theory.” Journal of Economic
Surveys 15(5): 589-611.
Frey, Bruno and Felix Oberholzer-Gee (1997) “The Cost of Price Incentives: An Empirical
Analysis of Motivation Crowding-Out.” American Economic Review 87(4): 746-755.
Frey, Bruno, Felix Oberholzer-Gee, and Reiner Eichenberger (1996) “The Old Lady Visits
Your Backyard: A Tale of Morals and Markets.” Journal of Political Economy 104(6): 1297-
1313.
s
Grolnick, W. and R. Ryan (1987) “Autonomy in Children' Learning An Experimental and
Individual Difference Investigation." Journal of Personality and Social Psychology 52(5):
890-898.
Grove, Wayne A. and Tim Wasserman (2006) “Incentives and Student Learning: A Natural
Experiment with Economics Problem Sets.” American Economic Review 96(2): 447-452.
Hadsell, Lester and Raymond MacDermott (2009) “Faculty Perceptions of Grades: Results
from a National Survey of Economics Faculty.” Mimeo (accessed at
http://employees.oneonta.edu/hadsell/) on December 1, 2009.
Hahn, Sidney, Tamara Stassen, and Claus Reschke (1989) "Grading Classroom Oral
Activities: Effects on Motivation and Proficiency." Foreign Language Annals 22: 241-252.
Harter, Susan (1978) "Pleasure Derived from Challenge and the effects of Receiving Grades
s
on Children' Difficulty Level Choices." Child Development 49: 788-799.
Heyman, James and Dan Ariely (2004) “Effort for Payment: A Tale of Two Markets.”
Psychological Science 15(11): 787-793.
James, Harvey S. Jr. (2005) “Why did you do that? An economic examination of the effect of
extrinsic compensation on intrinsic motivation and performance.” Journal of Economic
Psychology 26: 549-566.
Jensen, Elizabeth J. and Ann L. Owen (2001) “Pedagogy, Gender, and Interest in
Economics.” Journal of Economic Education 32(4): 323-343.
Kaenzig, Rebecca, Eva Hyatt, and Stella Anderson, (2007) “Gender Differences in College
of Business Educational Experiences.” Journal of Education for Business 83(2): 95-100.
Karabenick, S. (2004) “Perceived Achievement Goal Structure and College Student Help
Seeking.” Journal of Educational Psychology 96(3): 569-581.
20
Lammers, H. Bruce, Tina Kiesler, Mary T. Curren, Deborah Cours, Brian Connett (2005)
“How Hard Do I Have to Work? Student and Faculty Expectations Regarding University
Work.” Journal of Education for Business 81(2): 210-213.
Mellstrom, Carl and Magnus Johannesson (2008) “Crowding Out In Blood Donation: Was
Titmuss Right?” Journal of the European Economic Association 6(4):845–863.
Moeller, Aleidine J. and Claus Reschke (1993) “A Second Look at Grading and Classroom
Performance: Report of a Research Study.” The Modern Language Journal 77(2): 163-169.
Murdock, Kevin (2002) “Intrinsic motivation and optimal incentive contracts.” RAND
Journal of Economics 33(4): 650-671.
Pekrun, R., A. Elliott, and M. Maier (2006) “Achievement Goals and Discrete Achievement
Emotions: A Theoretical Model and Prospective Test.” Journal of Educational Psychology
98(3): 583–597.
Svinicki, Marilla D. (2005) "Student Goal Orientation, Motivation, and Learning." IDEA
Paper #41, IDEA Center, February.
Thompson, Ted (1994) “Self-worth protection: Review and implications for the classroom.”
Educational Review 46(3): 259-274.
Thompson, Ted and Zoe Perry (2005) “Is the Poor Performance of Self-Worth Protective
Students Linked with Social Comparison Goals?” Educational Psychology 25(5): 471-490.
21
Table 1. Course grading structure.
Control (graded) section Treatment (S/U) section (number required*)
Fall 2007 Number Weight A B B C
Weekly assignments 8 30% 10 7 7 4
Essay assignment 1 20% 2 0 2 0
Exams** 2 50% 2 Adv 2 Adv 2 Bas 2 Bas
Fall 2008 Number Weight A B B C C
Weekly assignments 6 25% 8 4 8 4 8
Exams 3*** 75% 4 4 3 3 2
* Minimum number required for each course letter grade.
** "Bas" (Basic) exam consists of multiple choice and matching questions. "Adv" (Advanced) exam is completion of
the Basic plus completion of short answer/essay questions.
*** Four exams were offered. The lowest exam score was dropped.
22
Table 2 - Summary Statistics
Fall 2007 Fall 2008
Grand
Control Treatment Total Control Treatment Total Total
Full Sample
No. of students 48 40 88 54 54 108 196
Percent Freshmen 8% 0% 5% 31% 28% 30% 18%
Percent Sophomore 33% 35% 34% 43% 43% 43% 39%
Percent Junior 33% 38% 35% 22% 17% 19% 27%
Percent Senior 26% 27% 26% 4% 12% 8% 16%
Percent Bus Econ 40% 38% 39% 37% 35% 36% 37%
Percent Male 69% 60% 65% 61% 80% 70% 68%
Semester hours 15.6 16.1 15.8 15.4 15.7 15.5 15.7
Exam 1 average 78% 79% 79% 77% 79% 78% 8.19
Exam 2 average 70% 76% 73% 71% 69% 70% 7.96
Exam 3 average n/a n/a n/a 65% 67% 66% 0.66
Exam 4 average n/a n/a n/a 67% 69% 68% 0.68
Asgmnts submitted 6.7 8.0 n/a 4.9 5.5 5.2 n/a
Interest* 0.52 0.51 0.51 3.9 3.8 3.8 n/a
Enjoyment** 6.1 6.8 6.5 3.7 3.8 3.7 n/a
Regression Sample
No. of students 35 33 68 45 47 92 160
Percent Freshmen 6% 0% 3% 27% 26% 26% 16%
Percent Sophomore 29% 36% 32% 44% 40% 42% 38%
Percent Junior 37% 36% 37% 24% 19% 22% 28%
Percent Senior 29% 27% 28% 4% 15% 10% 18%
Percent Bus Econ 43% 39% 41% 36% 36% 36% 38%
Percent Male 63% 58% 60% 58% 81% 69% 66%
SAT verbal 540 531 536 526 539 533 534
SAT math 563 571 567 566 568 567 567
Semester hours 15.4 15.6 15.5 15.0 15.7 15.4 15.4
Exam 1 average 78% 80% 79% 77% 79% 78% 78%
Exam 2 average 73% 75% 74% 71% 68% 70% 72%
Exam 3 average n/a n/a n/a 66% 67% 66% 66%
Exam 4 average n/a n/a n/a 68% 70% 69% 69%
Asgmnts submitted 6.3 8.0 7.1 5.0 5.4 5.2 5.2
Interest** 0.36 0.57 0.47 3.8 3.7 3.8 n/a
Enjoyment*** 5.9 6.8 6.4 3.6 3.7 3.7 n/a
* SAT scores were available for the number of students indicated under "Regression Sample."
** Interest is measured on a -1,0,1 (decreased, stayed same, increased) scale for Fall 2007
and a 1-5 scale (decreased a great deal to increased a great deal) for Fall 2008.
*** Enjoyment is measured on a 1-10 scale (decreased a great deal to increased a great deal) for Fall
2007 and a 1-5 scale (decreased a great deal to increased a great deal) for Fall 2008.
All exam data are for first attempts.
23
Table 3. Regression Results.
(t-statistics in parentheses)
Fall 2007 Fall 2008
Explanatory
variable Exam 1 Exam 2 Exam Ave Exam1 Exam 2 Exam 3 Exam 4 Exam Ave
SECTION 7.32 6.21 6.76 -0.83 0.67 2.18 7.47 3.50
(1.30) (1.16) (1.47) (-0.18) (0.14) (0.4) (1.62) (1.03)
GENDER 13.06 -4.63 4.22 -6.19 -0.30 -9.78 -0.46 -1.83
(2.77) (-1.03) (1.10) (-1.86) (-0.09) (-2.36) (-0.13) (-0.65)
GENDER*SECTION -11.13 0.24 -5.44 2.75 -2.90 0.05 -5.91 -4.91
(-1.61) (0.04) (-0.96) (0.51) (-0.52) (0.01) (-1.07) (-1.2)
FRESH -5.49 -20.82 -13.16 -1.72 -0.99 -6.52 4.42 -2.20
(-0.51) (-2.04) (-1.5) (-0.38) (-0.21) (-1.21) (0.97) (-0.66)
SOPH -1.79 -7.02 -4.40 -2.45 -4.00 -6.12 1.60 -4.40
(-0.37) (-1.54) (-1.13) (-0.59) (-0.94) (-1.24) (0.39) (-1.47)
JUNIOR 1.91 -6.59 -2.34 -3.60 -1.21 -1.02 -0.01 -4.52
(0.42) (-1.51) (-0.63) (-0.8) (-0.26) (-0.18) (0.00) (-1.3)
BUSECO -0.36 8.22 3.93 -3.66 6.01 5.36 -0.82 -0.03
(-0.07) (1.68) (0.94) (-1.03) (1.65) (1.26) (-0.21) (-0.01)
BUSECO*SECTION 7.58 -13.26 -2.84 3.07 -4.97 2.16 -1.37 2.38
(1.16) (-2.13) (-0.53) (0.65) (-1.01) (0.38) (-0.27) (0.64)
SEMHRS 0.24 0.36 0.30 -0.36 0.47 -1.11 -1.51 -0.75
(0.21) (0.34) (0.33) (-0.43) (0.54) (-1.13) (-1.84) (-1.24)
SATV 0.09 0.07 0.08 0.07 0.04 0.07 0.02 0.05
(2.70) (2.45) (3.08) (3.34) (1.9) (2.59) (0.83) (3.15)
SATM 0.00 0.06 0.03 0.01 0.03 0.05 0.05 0.04
(0.04) (2.31) (1.37) (0.37) (1.6) (1.93) (2.2) (2.51)
Constant 19.16 -3.38 7.89 48.09 23.46 29.27 54.43 37.59
(0.67) (-0.12) (0.34) (2.89) (1.35) (1.47) (3.21) (3.02)
Observations 68 68 68 92 90 87 87 81
2
Adj R 0.13 0.22 0.16 0.13 0.05 0.16 0.05 0.19
24
Table 4. Wilcoxon (Mann-Whitney) test
for equal distribution across samples.
Value Probability
Fall 2007
Exam 1 0.419 0.68
Exam 2 0.983 0.33
Fall 2008
Exam 1 0.679 0.50
Exam 2 0.544 0.59
Exam 3 0.903 0.37
Exam 4 0.985 0.32
Table 5
End of semester evaluations.
Differences
Measure Fall 2007 Fall 2008
Organization of course* -0.30 -0.26
Fairness in grading** -0.68 -0.97
Overall teaching effectiveness* -0.60 -0.22
Demanding grading*** 0.11 0.80
Rigor**** -0.11 0.08
*Rating scale: 1 (excellent) to 5 (poor).
** Rating scale: 1 (very fair) to 5 (unfair)
*** Rating scale: 1 (high expectation) to 5 (low expectation)
**** Rating scale: 1 (rigorous and demanding) to 5 (not rigorous)
Differences measures rating for treatment minus control.
Thus, a negative number indicates a better rating for the
treatment section.
25
Figure 1. Distribution of Exam Grades.
Fall 2007
Exam 1 Exam 2
20 20
No. of Students
No. of Students
15 15
Experimental Experimental
10 10
Control Control
5 5
0 0
41-50 51-60 61-70 71-80 81-90 91-100 41-50 51-60 61-70 71-80 81-90 91-100
Grade Grade
Fall 2008
Exam 1 Exam 2
25 25
20 20
No. of Students
No. of Students
15 Experimental 15 Experimental
10 Control 10 Control
5 5
0 0
41-50 51-60 61-70 71-80 81-90 91-100 41-50 51-60 61-70 71-80 81-90 91-100
Grade Grade
Exam 3 Exam 4
25 25
20 20
No. of Students
No. of Students
15 15 Experimental
Experimental
10 Control 10 Control
5 5
0 0
41-50 51-60 61-70 71-80 81-90 91-100 41-50 51-60 61-70 71-80 81-90 91-100
Grade Grade
26
Figure 2.
Assignments submitted
Fall 2008
400
350
Cumulative number submitted
300
250
Control
200
Treatment
150
100
50
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Week
27
APPENDIX
Figure A-1. Kernel density of exam scores, Fall 2007. Section 0 is the control section.
EX2 by SECTION
EX1 by SECTION
.028
.032
.028 .024
.024 .020
.020 .016
Density
Density
.016
.012
.012
.008
.008
.004
.004
.000
.000 20 30 40 50 60 70 80 90 100 110 120
0 20 40 60 80 100 120
SECTION=0 SECTION=1
SECTION=0 SECTION=1
28
Figure A-2. Kernel density of exam scores, Fall 2008. Section 0 is the control section.
EX1 by SECTION EX2 by SECTION
.040 .04
.035
.030 .03
.025
Density
Density
.020 .02
.015
.010 .01
.005
.000 .00
30 40 50 60 70 80 90 100 110 120 20 30 40 50 60 70 80 90 100 110 120
SECTION=0 SECTION=1 SECTION=0 SECTION=1
EX3 by SECTION EX4 by SECTION
.030 .05
.025
.04
.020
.03
Density
Density
.015
.02
.010
.01
.005
.000 .00
20 30 40 50 60 70 80 90 100 110 120 30 40 50 60 70 80 90 100 110
SECTION=0 SECTION=1 SECTION=0 SECTION=1
29
Get documents about "