Grades as an Incentive in Introductory Economics by qym17251


									                  Grades as an Incentive in Introductory Economics

                                        Lester Hadsell

                             Division of Economics and Business
                                      College at Oneonta
                                State University of New York
                                     Oneonta, NY 13820

                                    Phone: (607) 436-2448


I thank Scott England, Scott Houser, Pete Schuhmann, and participants at the 2008 AEA meetings in
New Orleans for helpful comments, and Maureen Cashman and Mary Ellen Mack for assistance in
data gathering.
                     Grades as an Incentive in Introductory Economics


This paper reports results of investigations performed over two semesters examining the
effects of deemphasizing grades in Introductory Microeconomics. Each semester the author
taught one class using a typical grading policy in which all papers and exams were graded on
a 100 point scale and one class in which all papers and exams were graded as either
satisfactory or unsatisfactory (with the opportunity to revise and resubmit). Students in the
S/U sections performed no worse and in some respects performed marginally better on
several measures of learning and effort, including exam scores, homework completion, and
attendance. Students in the S/U section indicate higher satisfaction with several facets of the
course and instruction.


         Grades in higher education serve several functions. Chief among these, they inform

students of their level of understanding; convey to potential employers or graduate school

admissions officers the academic achievements of applicants; and provide an incentive for

students to exert effort and learn. This last role is particularly salient to economists who tend

to place great value on incentives, in general and in classroom settings (as described in a

recent survey of economics faculty by Hadsell and MacDermott, 20091). As one of many

examples of this emphasis, Buckles and Hoyt (2006, p. 76) in their chapter on using active

learning techniques in large lecture classes, encourage "regularly graded activities" during

class to "provide students with very persuasive incentives to attend class, and get actively

involved in learning."

         Several recent studies in the economics education literature have reported benefits of

graded work in terms of increasing student achievement. Grove and Wasserman (2006), for

example, find that freshmen whose homework assignments were graded scored better on
  The survey reports responses from 816 economics faculty from across the U.S. Among the findings: 91
percent agreed or strongly agreed with the statement, “I think it is useful to use grades as incentives to increase
student performance.”

exams than freshmen whose assignments were not graded and Betts and Grogger (2003)

report that more stringent high school grading policies were correlated with higher test scores

on standardized tests. Cherry and Ellis (2005) find that using a competitive grading system

(in which the number of As, Bs, and so on is restricted) led to higher average exam scores in

introductory macroeconomics.

        Yet these benefits are not without limits. Grove and Wasserman note that

upperclassmen did not experience the same benefits of graded homework that freshmen did.

Betts and Grogger find that the long-term effect of more stringent grading policies was

negligible – with the exception of minority students, for whom the effects were negative.

And Cherry and Ellis find that only high-achieving students benefit (low achieving students

perform no better under the competitive conditions).2 Further, Dickie (2006) reports that

classroom experiments with grade inducements led to lower learning outcomes and student

interest compared to experiments without the grade incentives.

        The limitations of extrinsic rewards such as grades have been studied extensively in

educational psychology. In their meta analysis of more than 100 empirical studies from the

educational psychology literature, Deci et al. (1999) conclude that grades are helpful in

getting students to do unpleasant work, but are ineffective and even harmful in certain

circumstances – for example, when they are seen as controlling or when they are used as

incentive to get students to do work they otherwise enjoy. Examples of suboptimal outcomes

include decreased learning, reduced effort, increased anxiety, and preference for less

  Cherry and Ellis state (p. 9) that “Results indicate rank-ordering may eliminate the incentive for high
performing students to "stop" once they achieve a stated objective….” An alternative approach to improving the
performance of high-achieving students is to simply raise the criterion-referenced requirements for the higher
letter grades (make the minimum level of achievement (grade) necessary for an A be 95 instead of 93, for
example). This alternative would not carry any of the negative baggage associated with competitive grading (as
discussed in the next section).

challenging tasks. In the instances when extrinsic rewards and intrinsic interest coincide, the

extrinsic rewards sometimes crowd out the intrinsic value of the task, reducing the student’s

interest. Motivational crowding out, as it has been called, has received wide empirical

support in psychology and economics (e.g., Frey and Jengen, 2001), and is being applied by

economists in labor market research (Frey, 1998; Murdock, 2002; Falk and Kosfeld, 2006)

and experimental settings (Frey and Oberholzer-Gee, 1997; Frey, Oberholzer-Gee, and

Eichenberger, 1996; Mellstrom and Johanson, 2008; Heyman and Ariely, 2004; James,


         In this paper, I further explore the effectiveness of grades as an incentive. I taught two

sections of introductory microeconomics each semester for two semesters, holding all aspects

of the course as similar as possible within semester, except for the grading structure. Each

semester, in one section (the control), a typical grading policy was followed: all assignments

and exams were graded on a 100-point scale, with a weighted average of assignment and

exam grades constituting the course grade. In the other section (the treatment), grades were

deemphasized: assignments and exams were graded either satisfactory (passing on exams,

“C+” or higher on assignments) or unsatisfactory, with the opportunity to retake exams and

resubmit assignments. Course grades were determined by the amount of satisfactory course

work, as explained in detail below.

         With the less stringent grading in the treatment section, we might expect students to

reduce their effort, leading to lower scores on exams and lower quality assignments (Cherry

and Ellis, 2005, provide a detailed analysis of such expectations). Conceivably, a student

could barely meet the minimum requirement for all exams and assignments and still earn an

A for the course. In fact, I find that average exam scores are no lower in the treatment

sections and scores in the lower end of the distribution in the treatment sections marginally

exceed those of the control. Furthermore, homework submission rates are higher and more

consistent throughout the semester for the treatment sections, while quality was about the

same. Attendance is also found to be higher for the treatment section for one semester and no

different for the other. In short, deemphasizing grades did not hurt student performance and

in some ways increased it. Further, students’ evaluation of the course indicates that they view

the treatment sections as more organized and fairer compared to the control and they rate

teaching effectiveness higher.


       This investigation was undertaken during the Fall 2007 and Fall 2008 semesters at a

small, state-operated liberal arts college in the northeastern U.S. Each semester, the author

taught two sections of Introductory Microeconomics. In one section (the control section), all

papers and exams were graded on a 100 point scale, with course grade determined by a

weighted average of these. In the other section (the treatment section), all papers and exams

were graded as either Satisfactory or Unsatisfactory (with the opportunity for resubmission

and a subsequent Satisfactory). “Satisfactory” meant passing (65% or higher) on exams and

“C+” or higher on assignments. The number of satisfactory items determined the student’s

course grade, as summarized in Table 1 and described in detail below. Unsatisfactory exams

and assignments were not counted against the student (as long as they reached the requisite

number of satisfactory exams/papers) for the semester.

                                 [Table 1 goes here]

       Classes met for 50 minutes three times per week (MWF) in the Fall 2007 semester,

with the treatment section (1 pm) immediately preceding the control section (2 pm). In Fall

2008, the sections met twice per week for 75 minutes each day (T, TH) with the order of

control-treatment sections reversed so that the control section (10:30 am) preceded the

treatment section (12:30 pm). Within each semester, both sections used the same textbook,

covered the same topics, had the same lectures, and had the same assignments and exams.

Grading details for the first iteration - Fall 2007

       During the Fall 2007 semester, students in both sections were provided the same 13

weekly assignments (all of approximately the same difficulty), four longer essays (of up to 4

pages each), and two exams, each with 15 multiple choice, 10 matching, and several short

answer questions and problems. The weekly assignments were relatively easy, consisting of

basic questions requiring short verbal and quantitative answers. The 4-page essays required

more in-depth understanding and a small amount of research. In the control section, students

were required to submit 8 weekly assignments of their choice, one essay, and two exams,

constituting 30%, 20%, and 50% of the course grade, respectively.

       In the treatment section, the only A-E grades assigned were course grades. Students

could choose the number and types of assignments and exams to complete, depending on

their desired course letter grade (see Table 1). Likewise, students could choose the level of

difficulty on the exam by successfully completing the entire exam for an “Advanced”

designation (required for an A in the course) or just the multiple choice and matching

sections for a “Basic” designation (required for B or C in the course). Unsatisfactory

assignments and exams could be resubmitted or retaken for credit (with a new set of

questions on the exams).3 Greater effort, as measured by the number of satisfactory

assignments and exams translated into a higher course grade. To earn an A for the course a

student in the treatment section had to submit 10 satisfactory weekly assignments, two

satisfactory essays, and two “Advanced” exams. To earn a B the requirements were 7

assignments, 0 essays, and 2 “Advanced” exams (or 7 assignments, 2 essays, and 2 “Basic”

exams), and so on. Unsatisfactory work did not count in the student’s course grade.

Grading details for the second iteration - Fall 2008

        During the Fall 2008 semester, students in both sections were provided identical

weekly assignments and four exams consisting exclusively of multiple choice questions. The

first three exams contained either 25 or 30 questions covering material since the previous

exam. The fourth exam was a comprehensive final exam with 37 questions.

        In the control section, 25% of the course grade was based on the average grade of six

assignments and 75% was based on the average of the three highest exam scores. The final

exam was optional. If taken, the grade on the final exam replaced all lower prior scores (the

final exam grade was dropped if it was lower than the previous low score).4 In the treatment

section, students were asked to submit either 4 or 8 satisfactory assignments and 2, 3, or 4

satisfactory exams, depending on their desired course grades. This is summarized in Table 1.

The retake for exam 1 was in the week after exam 1, the retakes for exams 2 and 3 occurred

at the same time as exam 4. There was no retake option on exam 4.

        Grades were deemphasized in the treatment section in the sense that (a) no 0-100 or

A-E grades were reported to students on individual assignments or exams; (b) poor

  A student who attempted an “Advanced” (i.e., answered the short answer questions) but did not pass that
portion could retake the exam for an “Advanced” designation.
  Thus, the final exam acted as a de facto retake.

(“unsatisfactory”) work did not count against the student; (c) retakes or resubmissions could

replace poor work; and (d) only a minimum level of quality was necessary to earn full credit.


           Data describing the students are shown in Table 2. In Fall 2007, 88 students

participated: 48 in the control section, 40 for the treatment. In Fall 2008, 108 students

participated: 54 in each section. The students were of traditional college age (18-22) and

were from all class years, freshman to senior. The most common major was Business

Economics (with concentrations in Marketing, Management, Finance, and Management

Information Systems), for which the course was one of four required economics courses (two

introductory and two intermediate).5 The remaining students were of various majors fulfilling

a one or two course requirement for their major, minor, or general education. The

characteristics of students across sections was similar in terms of major, academic ability,

academic load, and gender. The lower portion of Table 2 shows the sample of students used

in the regression analysis to be discussed later. This sample is restricted by availability of

SAT scores. Table 2 also contains section averages for exams (initial score, not retakes),

number of assignments submitted, and levels of interest and enjoyment (surveyed at the end

of the semester).

                                      [Table 2 goes here]

           Students did not know the grading structure for the sections before classes began.6

Further, because all sections were fully subscribed (including two other sections each

semester taught by other instructors), students’ ability to switch sections was severely limited

    The Economics major is distinct from the Business Economics major.
    In fact my name was not listed on the master course schedule when most students registered for Fall 2007.

during the add/drop period by the economics department, which approves all adds once a

course is fully registered (although students could drop the course without permission).

Records from the college Registrar covering add/drop week (first week of classes) indicate

that, in Fall 2007, one student in the treatment section dropped the course and one added it

(out of 40 students total registered), while in the control section, three dropped and three

added (out of 48 total). In Fall 2008, no students added or dropped the treatment section,

while in the control, one student added and one dropped (out of 54 total). No student either

semester switched from one section to the other. Thus, there is scant evidence of students

registering for the class that fit their grading preference. In this respect the assignment of

students resembles a natural experiment with random placement. Another key feature of this

study is that the instructor was the same for both control and treatment sections. As such,

differential effects between control and treatment sections due to instructor qualities are



       A. Learning Outcomes

       The effect on measurable learning outcomes, namely exam scores, is examined first

by looking at the average and then the distribution. Regression analysis of the effects of the

treatment grading policy on exam performance is based on the following specification

(subscripts indicating student are omitted), designed to isolate the effects of the treatment on

exam performance while controlling for differences in ability, demographic characteristics,

academic purpose and load:

       G = β0 + β1Section + β2Gender + β3Gender*Section + β4Fresh + β5Soph +

             β6Junior + β7Buseco + β8 Buseco*Section + β9SATV + β10SATM + β11HRS,

where G is the percent correct on multiple choice and matching questions (0-100),

       Section equals 0 for the control section, 1 for the treatment section,

       Gender equals 0 for females, 1 for males (given findings of gender differences in

               economics classes, e.g., Borg and Stranahan, 2002, and references therein),

       Gender*Section equals 1 for males in the treatment section, 0 otherwise (given

               differential responses by gender to grading policies, as reported, e.g., by

               Jensen and Owen, 2001; Kaenzig et al., 2007; and Lammers et al., 2005),

       Fresh, Soph, and Junior are class year dummies,

       Buseco equals 1 for Business Economics majors, 0 otherwise,

       Buseco*Section equals 1 for Business Economics majors in the treatment section, 0

               otherwise (assuming that importance placed on grades by business majors

               may be different than non-majors),

       SATV is the student’s score on the verbal portion of the SAT,

       SATM is the student’s score on the math portion of the SAT, and

       HRS is the number of credit hours the student attempted during the semester.

       The regression is run separately for each exam and for the exam average each

semester. Results are shown in Table 3. Students in the treatment sections scored no lower

than students in the control despite the reduced grade incentive. In Fall 2007, exam scores in

the treatment section were 6.2 points to 7.3 points higher on a 100-point scale, of noteworthy

magnitude, although these estimates fall just shy of standard levels of statistical significance.

In Fall 2008, the overall exam average difference is weakly greater than zero (and not

statistically significant, t = 1.03) at 3.5 points while it is almost 7.5 points for exam 4 (just

shy of statistical significance, t = 1.62). The results provide support for the notion that a

grading structure of the type used in the treatment section does not lead to reduced student

learning, on average.

                                  [Table 3 goes here]

        One might expect the distribution of grades to be altered by a grading policy that

awards full credit for simply meeting a minimum standard. At least two views provide a

rationale for an expectation that the distribution will be compressed in the treatment section.

For one, students on the upper end of the distribution decrease effort as additional effort

much beyond what ensures passing is wasted (in terms of translating into a higher grade),

while students on the lower end of the distribution, near the minimum acceptable grade, will

increase study effort so as to avoid a lower course grade or the additional work and bother

associated with a retake. A second view, based on psychological phenomenon of self-

efficacy (Bandura, 1997) and fear of failure (Conroy, 2001), implies that scores at the bottom

will increase if (1) students believe they can successfully complete the task, or (2) students’

fear of receiving a poor grade is reduced, thereby reducing anxiety. Certainly, the opportunity

to retake the exam if the student does not pass it the first time may reduce the level of anxiety

associated with exam taking. This may be especially true (and beneficial) for students who

have had little prior academic success (i.e., those in the lower end – to middle – of the grade

distribution) (Taylor and Perry, 2005).7

  This benefit may extend to the middle of the distribution if an acceptable grade on a 0-100 scale for the
student is higher than simply passing.

        Figure 1 shows the distribution of scores for each of the exams for each semester.

(The appendix contains the kernel density of the distributions shown in Fig. 1.) Scores at the

lower end of the distribution appear to be generally higher in the treatment section while

scores at the higher end are generally about the same. Comparison of the 10th and 90th

percentiles (not shown) indicates that for every exam except exam 4, Fall 2008, the score for

the treatment section at the 10th percentile exceeds that of the control. Further, at the 90th

percentile, the treatment score exceeds the control in three of seven cases.

                                  [Figure 1 goes about here]

        A more rigorous analysis is the Wilcoxon (Mann-Whitney) rank-based nonparametric

test of the hypothesis that the subgroups have the same general distribution. Specifically, the

Wilcoxon test examines the comparability of mean ranks across subgroups. Based on results

from applying this test to each exam, as shown in Table 4, we cannot reject the hypothesis

that the distributions are the same. The implication is that scores in the treatment section

were distributed no differently than scores in the control. In other words, there is no evidence

that the alternative grading structure induced students at the upper end of the distribution to

slack off. There also is little evidence that students in the lower end performed better under

the alternative grading structure.8

                                  [Table 4 goes about here]

        B. Effort – Homework, Attendance, and Hours of Study

        The effects on effort are examined by comparing homework, attendance, and hours of

study across sections. With regard to homework, the belief is that the alternative grading

  The findings here are similar to Grove and Wasserman (2006) who note (p. 451) that, in their study, scores
were not changed for “academically above- or below-average students, or of any other category of students
[besides freshmen].”

system may induce more effort if it reduces anxiety concerning grades, because students

were able to get full (100%) credit more easily and students could resubmit unsatisfactory

work. This view is based on research findings from educational psychologists suggesting that

students’ preferences for challenging tasks is inhibited when their work is graded (Harter,

1978; Brooks et al., 1998) and that some students react with reduced effort when they fear

their work will receive a poor grade (Thompson, 1994; Thompson and Perry, 2005).9 Thus, a

policy that deemphasizes grades on the homework should elicit greater effort on the

assignments, translating into more satisfactory assignments (as was seen in Grolnick and

Ryan, 1987; Ames and Ames, 1991).10 In the context of this study, students in the control

section who fear a poor grade will wait until the last possible opportunity to submit work

(given that they have choice over which assignments to submit), or simply will not submit

required work. Substandard work in the treatment group costs nothing since an

‘unsatisfactory’ is not factored into the course grade, and students have the opportunity to

resubmit the work for credit. In the control section, a ‘low’ grade (which could be a 50, 70,

… or 80, depending on the student) is permanent (students were not given the opportunity to

   Students who fear falling short may simply withhold effort, even if it ensures failure, rather than put forth
effort and potentially fail. In the language of educational psychology, these students are seeking to protect their
self-worth. These students are risk avoidant with the express purpose of avoiding public acknowledgement of
inability (the student would prefer to blame failure on lack of effort rather than be forced to admit lack of
   In Grolnick and Ryan (1987) students were asked to complete a reading assignment. One group was told that
they would later be asked questions about it, but the experimenter emphasized that there would be no
evaluations or grades. A second group was told to read the material and learn it because they would be graded
on performance. In a follow-up eight days later, the first group showed greater interest, rote recall, and
conceptual integration. Similar findings are reported by Moeller and Reschke (1993) and Hahn et al. (1989).
Ames and Ames (1991) also report that perceived homework control had a strong effect on completion rates. In
their study, two classes were assigned homework. In one class, work was graded A through F and counted for
30% of their course grade. In the other class, students were told to try their best (complete as much as they
could in a limited time) and questions about the homework would be covered in class the next day. These
assignments were graded either satisfactory or unsatisfactory and returned for students to redo or complete. The
homework counted for 10% of the course grade. The latter class submitted more assignments.

redo assignments).11 In the treatment section, the emphasis is on effort: an “A” for the course

is achievable for all students, regardless of the number of times they receive an


        Total number of weekly homework submissions for each student was recorded each

semester (see Table 2). For Fall 2007, students in the treatment section submitted a

statistically significant (p<0.01) greater number of assignments (7.95) compared to the

control (6.73).12 This difference could reflect that students in the control section were

required to submit 8 assignments whereas students in the treatment were asked to submit 4,

7, or 10, depending on their desired course grade. Interestingly, though, 19 students (out of

48 total) in the control section submitted fewer than the required 8 assignments (and six

submitted 4 or fewer). In the treatment, only five students submitted fewer than 7

assignments (the number required for at least a B for the course). In Fall 2008, there is no

statistical difference in average number of assignments submitted (p=0.14).

        In addition to these end of semester totals, the week-to-week number of submissions

for the class as a whole was recorded for the two classes during Fall 2008 (this was not done

for Fall 2007), shown in Figure 2. Students in the treatment section submitted assignments

much earlier in the semester and more consistently compared to the control section. As

suggested by educational psychologist, expectations of success at a task are influenced, in

part, by past experiences of success and initial feedback on success (Svinicki, 2005). The S/U

grading and the ability to resubmit work until it was satisfactory virtually guaranteed success.

                                 [Figure 2 goes here]

   Even with the opportunity to resubmit, 0-100 grading places a cost (the number of points deducted) on
anything short of perfection.
   These are the weekly assignments, not the longer essays.

        A second measure of effort, class attendance, was required but was not part of

students’ course grade either semester. The attendance rate in Fall 2007 was significantly

higher (p = 0.03) in the treatment section (87.3%) compared to the control section (81.9%).

The Fall 2008 rate was not significantly different between the sections (p = 0.53), at 76.7% in

the control section and 75.2% in the treatment section.

        A third measure of effort, number of hours students studied (during an average week

and during the week prior to an exam) was self-reported by students in an end-of-semester

survey each year. There is no evidence that students in the treatment group, as a whole,

studied less than students in the control. In 2007, students in the control group reported

virtually the same (p=0.64) study time during an average week (2.97 hours) compared to the

treatment group (2.74). Students in the control reported less study time in the week prior to

an exam (5.27 hours versus 6.36 for the treatment), although this difference was not

statistically significant (p = 0.14).

        The findings are similar for 2008: Students in the control group reported the same

study time during an average week (2.58 hours) and in the week prior to an exam (5.60

hours) compared to the treatment group (3.18, p = 0.23; and 6.35, p = 0.39). Of course, self-

reported data such as these should be viewed with caution, but as long as errors in reporting

are not systematically correlated with treatment the conclusions regarding section differences

are firm.

        In sum, there is no evidence that students in the treatment sections exerted less effort,

even when the extrinsic rewards were reduced.

        C. Student affect

        Research in educational psychology also finds a strong link between student emotion

and emphasis on performance. For example, Pekrun et al. (2006) find that a strong student

emphasis on grades is positively related to subsequent anxiety, hopelessness, and shame.

Likewise, Karabenick (2004), examining student behavior in 13 college psychology classes,

finds that students in classes with greater perceived emphasis on grades and interpersonal

performance comparisons were less likely to ask for help when needed. Nevertheless, student

interest and enjoyment (surveyed at the end of each semester) are virtually identical across

groups in the present study (see Table 2). Thus, deemphasizing grades did not, by itself,

generate the kinds of emotional returns that one may have expected by reviewing the

educational psychology literature.

                                   [Table 5 goes about here]

        On the other hand, students in the treatment section evaluated the course differently

along several metrics in end-of-semester evaluations. Both semesters, students in the

treatment section thought the course was more organized, the grading fairer, and the teaching

more effective (see Table 5). That students would evaluate teaching as more effective does

not necessarily imply that better ratings were the result of higher grades (in fact, the

percentage of As and Bs between sections was similar in Fall 2007, when the difference in

effectiveness rating was the greatest).13 Students may perceive increased teacher

effectiveness when they are able to focus on their studies, with less attention paid to grades;

  In 2007, 21 percent of students were awarded As in the control vs. 20 percent in the treatment and 26 percent
Ds and Es in the control vs. 0 percent in the treatment. In 2008, 17 percent As awarded in the control vs. 28
percent in the treatment and 22 percent Ds and Es in the control vs. 17 percent in the treatment.

and when faculty are not seen as dispensers of grades, controlling students’ futures, but rather

as guides helping students attain a level of learning.14

           Students thought the S/U grading was less demanding in Fall 2008, while they

thought it was equally demanding in Fall 2007. They were probably correct for the former:

too little may have been expected of them, and the higher percentage of As and Bs earned in

the treatment section that semester supports this view (given the equality of exam scores).

Finally, students perceived no difference in course rigor either semester, which supports the

contention that learning expectations were the same, even if the grading structure was not.


       This study set out to examine the effects of deemphasizing grades in Introductory

Microeconomics along three metrics: student learning, effort, and affect. Conducted over two

semesters, with 196 students, the primary observations are:

       1. There is no evidence that deemphasizing grades leads to lower exam performance,

           either in terms of averages or distribution, although there is weak evidence that

           deemphasizing grades is associated with higher mean scores and marginally improved

           performance by students at the lower end of the distribution.

       2. There is no evidence that deemphasizing grades leads to less effort, in terms of

           homework completion, study time, and class attendance. In fact, treatment section

           homework submission rates and class attendance were higher in the first trial, and

           homework submission rates were more consistent over the semester relative to the

           control in the second trial.

     As reported by Deci et al. (1999), noted earlier.

   3. Students in the sections that deemphasized grades evaluated the course as more

       organized, with fairer grading and more effective teaching.

   The finding that deemphasizing grades did not lead to lower learning outcomes or effort

is noteworthy, given the high value economists generally place on extrinsic incentives.

Decades of research in Educational Psychology, as reviewed in a meta analysis by Deci et al.

(1999), has shown that extrinsic incentives such as grades are useful – in the right situation

(non-controlling, unpleasant tasks), with the right students (those with an extrinsic

motivation). But the same research also reports potential harmful effects on student learning,

effort, and interest. Recent work in economics education, and labor and experimental

economics also reports ambiguous effects of extrinsic rewards.

   Several caveats of the present research are noteworthy:

   •   Grades were not eliminated; they were deemphasized. This investigation concerns the

       extent to which grades are used (i.e., the emphasis). Certainly, this paper is not

       arguing for an elimination of evaluation of student work, or standards.

   •   Exam retakes may not be practical in large classes, thus the grading structure in this

       study is not necessarily a prescription for others to follow. Grades can, however, be

       deemphasized in other ways (e.g., Hadsell and MacDermott, 2009).

   •   Although deemphasizing grades did not lead to improved learning outcomes (at least

       at statistically significant levels) the potential positive effects (on interest, persistence,

       preference for challenging tasks) must be considered when evaluating the net effects

       (all good opportunities for future research).

   •   The deemphasis of grades in the treatment sections had several components, primarily

       the Satisfactory/Unsatisfactory grading and the opportunity to resubmit unsatisfactory

       work. Each of these undoubtedly had an effect on students’ choices, effects that

       should be the focus of future research.

   In addition to reproducing the basic findings of the present study, using a different set of

students and faculty, future research could study the longer term effects. Alternative grading

criteria could also be investigated (e.g., a hybrid of traditional 0-100 grading and S/U

grading, and varying opportunity for retakes and resubmissions), or subsets (e.g., isolating

the effects of grading only homework S/U).

   Effective teaching requires proper technique, interesting content, and appropriate

incentives. The topic of appropriate incentives remains largely under-explored in the

economics education literature. The role of grades as extrinsic incentives is well entrenched

in the psyche of economic educators. But this paper finds that deemphasizing grades did not

lead to a decline in measurable learning outcomes or effort. Given the benefits of

deemphasizing grades reported in the educational psychology literature, it is incumbent on

economic educators to further explore the effects of grades. By relying too much on grades as

a motivator we may be achieving suboptimal outcomes in much the same way that relying on

outdated, ineffective techniques and content leads to poor results.


Ames, R. and C. Ames (1991) "Motivation and Effective Teaching." In B. F. Jones and L.
Idol (eds.) Educational Values and Cognitive Instruction: Implications for Reform. Hillsdale,
N. J.: Erlbaum.

Bandura, Albert (1997) Self-efficacy: The exercise of control. New York: Freeman.

Becker, William E., Michael Watts, and Suzanne R. Becker (ed.) (2006) Teaching
Economics: More Alternatives to Chalk and Talk. Cheltenham, UK: Edward Elgar.

Betts, Julian R. and Jeffrey Grogger (2003) "The Impact of Grading Standards on Student
Achievement, Educational Attainment, and Entry-Level Earnings." Economics of Education
Review 22(4): 343-352.

Borg, Mary O. and Harriet A. Stranahan (2002) “Personality Type and Student Performance
in Upper-Level Economics Courses: The Importance of Race and Gender.” Journal of
Economic Education 33(1): 3-14.

Brooks, S.R., Freiburger, S.M., & Grotheer, D.R. (1998). Improving elementary student
engagement in the learning process through integrated thematic instruction. Unpublished
master' thesis, Saint Xavier University, Chicago, IL. (ERIC Document Reproduction
Service No. ED 421 274)

Buckles, Stephen and Gail Mitchell Hoyt (2006) “Using Active Learning Techniques in
Large Lecture Classes.” in Becker et al. (2006).

Cherry, Todd L. and Larry V. Ellis (2005) “Does Rank-Order Grading Improve Student
Performance? Evidence from a Classroom Experiment.” International Review of Economics
Education 4(1): 9-19.

Conroy, David E. (2001) Progress in the Development of a Multidimensional Measure of
fear of Failure: The Performance Failure Appraisal Inventory (PFAI).” Anxiety, Stress, and
Coping 14: 431-452.

Deci, Edward, Richard Koestner, and Richard Ryan (1999) “A Meta-analytic Review of
Experiments Examining the Effects of Extrinsic Rewards on Intrinsic Motivation.”
Psychological Bulletin 125(6): 627-668.

Dickie, Mark (2006) "Experimenting: Does It Increase Learning in Introductory
Microeconomics?" Journal of Economic Education 37(3): 267-288.

Falk and Kosfeld (2006) “The Hidden Costs of Control.” American Economic Review 96(5):

Frey, Bruno (1998): Not Just for the Money: An Economic Theory of Personal Motivation.
Glos: Edward Elgar.

Frey, Bruno and Reto Jegen (2001) “Motivation Crowding Theory.” Journal of Economic
Surveys 15(5): 589-611.

Frey, Bruno and Felix Oberholzer-Gee (1997) “The Cost of Price Incentives: An Empirical
Analysis of Motivation Crowding-Out.” American Economic Review 87(4): 746-755.

Frey, Bruno, Felix Oberholzer-Gee, and Reiner Eichenberger (1996) “The Old Lady Visits
Your Backyard: A Tale of Morals and Markets.” Journal of Political Economy 104(6): 1297-

Grolnick, W. and R. Ryan (1987) “Autonomy in Children' Learning An Experimental and
Individual Difference Investigation." Journal of Personality and Social Psychology 52(5):

Grove, Wayne A. and Tim Wasserman (2006) “Incentives and Student Learning: A Natural
Experiment with Economics Problem Sets.” American Economic Review 96(2): 447-452.

Hadsell, Lester and Raymond MacDermott (2009) “Faculty Perceptions of Grades: Results
from a National Survey of Economics Faculty.” Mimeo (accessed at on December 1, 2009.

Hahn, Sidney, Tamara Stassen, and Claus Reschke (1989) "Grading Classroom Oral
Activities: Effects on Motivation and Proficiency." Foreign Language Annals 22: 241-252.

Harter, Susan (1978) "Pleasure Derived from Challenge and the effects of Receiving Grades
on Children' Difficulty Level Choices." Child Development 49: 788-799.

Heyman, James and Dan Ariely (2004) “Effort for Payment: A Tale of Two Markets.”
Psychological Science 15(11): 787-793.

James, Harvey S. Jr. (2005) “Why did you do that? An economic examination of the effect of
extrinsic compensation on intrinsic motivation and performance.” Journal of Economic
Psychology 26: 549-566.

Jensen, Elizabeth J. and Ann L. Owen (2001) “Pedagogy, Gender, and Interest in
Economics.” Journal of Economic Education 32(4): 323-343.

Kaenzig, Rebecca, Eva Hyatt, and Stella Anderson, (2007) “Gender Differences in College
of Business Educational Experiences.” Journal of Education for Business 83(2): 95-100.

Karabenick, S. (2004) “Perceived Achievement Goal Structure and College Student Help
Seeking.” Journal of Educational Psychology 96(3): 569-581.

Lammers, H. Bruce, Tina Kiesler, Mary T. Curren, Deborah Cours, Brian Connett (2005)
“How Hard Do I Have to Work? Student and Faculty Expectations Regarding University
Work.” Journal of Education for Business 81(2): 210-213.

Mellstrom, Carl and Magnus Johannesson (2008) “Crowding Out In Blood Donation: Was
Titmuss Right?” Journal of the European Economic Association 6(4):845–863.

Moeller, Aleidine J. and Claus Reschke (1993) “A Second Look at Grading and Classroom
Performance: Report of a Research Study.” The Modern Language Journal 77(2): 163-169.

Murdock, Kevin (2002) “Intrinsic motivation and optimal incentive contracts.” RAND
Journal of Economics 33(4): 650-671.

Pekrun, R., A. Elliott, and M. Maier (2006) “Achievement Goals and Discrete Achievement
Emotions: A Theoretical Model and Prospective Test.” Journal of Educational Psychology
98(3): 583–597.

Svinicki, Marilla D. (2005) "Student Goal Orientation, Motivation, and Learning." IDEA
Paper #41, IDEA Center, February.

Thompson, Ted (1994) “Self-worth protection: Review and implications for the classroom.”
Educational Review 46(3): 259-274.

Thompson, Ted and Zoe Perry (2005) “Is the Poor Performance of Self-Worth Protective
Students Linked with Social Comparison Goals?” Educational Psychology 25(5): 471-490.

                                                    Table 1. Course grading structure.

                               Control (graded) section                          Treatment (S/U) section (number required*)

Fall 2007                                Number Weight                                         A     B     B     C
Weekly assignments                            8  30%                                          10     7     7     4
Essay assignment                              1  20%                                           2     0     2     0
Exams**                                       2  50%                                       2 Adv 2 Adv 2 Bas 2 Bas

Fall 2008                                Number Weight                                 A        B       B          C   C
Weekly assignments                             6 25%                                   8        4       8          4   8
Exams                                       3*** 75%                                   4        4       3          3   2

* Minimum number required for each course letter grade.
** "Bas" (Basic) exam consists of multiple choice and matching questions. "Adv" (Advanced) exam is completion of
   the Basic plus completion of short answer/essay questions.
*** Four exams were offered. The lowest exam score was dropped.

                                           Table 2 - Summary Statistics

                                           Fall 2007                            Fall 2008
                                 Control    Treatment       Total     Control    Treatment       Total           Total
Full Sample
        No. of students               48            40         88          54            54       108             196

        Percent Freshmen             8%           0%         5%          31%          28%        30%             18%
        Percent Sophomore           33%          35%        34%          43%          43%        43%             39%
        Percent Junior              33%          38%        35%          22%          17%        19%             27%
        Percent Senior              26%          27%        26%           4%          12%         8%             16%
        Percent Bus Econ            40%          38%        39%          37%          35%        36%             37%
        Percent Male                69%          60%        65%          61%          80%        70%             68%
        Semester hours              15.6         16.1       15.8         15.4         15.7       15.5            15.7

        Exam 1 average              78%          79%        79%          77%          79%        78%             8.19
        Exam 2 average              70%          76%        73%          71%          69%        70%             7.96
        Exam 3 average               n/a          n/a        n/a         65%          67%        66%             0.66
        Exam 4 average               n/a          n/a        n/a         67%          69%        68%             0.68
        Asgmnts submitted            6.7          8.0        n/a          4.9          5.5        5.2             n/a
        Interest*                   0.52         0.51       0.51          3.9          3.8        3.8             n/a
        Enjoyment**                  6.1          6.8        6.5          3.7          3.8        3.7             n/a

Regression Sample
        No. of students               35            33         68          45            47         92            160

        Percent Freshmen             6%           0%         3%          27%          26%        26%             16%
        Percent Sophomore           29%          36%        32%          44%          40%        42%             38%
        Percent Junior              37%          36%        37%          24%          19%        22%             28%
        Percent Senior              29%          27%        28%           4%          15%        10%             18%
        Percent Bus Econ            43%          39%        41%          36%          36%        36%             38%
        Percent Male                63%          58%        60%          58%          81%        69%             66%

        SAT verbal                   540          531        536          526           539       533             534
        SAT math                     563          571        567          566           568       567             567
        Semester hours              15.4          15.6       15.5        15.0          15.7       15.4           15.4

        Exam 1 average              78%          80%        79%          77%          79%        78%             78%
        Exam 2 average              73%          75%        74%          71%          68%        70%             72%
        Exam 3 average               n/a          n/a        n/a         66%          67%        66%             66%
        Exam 4 average               n/a          n/a        n/a         68%          70%        69%             69%
        Asgmnts submitted            6.3          8.0        7.1          5.0          5.4        5.2             5.2
        Interest**                  0.36         0.57       0.47          3.8          3.7        3.8             n/a
        Enjoyment***                 5.9          6.8        6.4          3.6          3.7        3.7             n/a

        * SAT scores were available for the number of students indicated under "Regression Sample."
        ** Interest is measured on a -1,0,1 (decreased, stayed same, increased) scale for Fall 2007
        and a 1-5 scale (decreased a great deal to increased a great deal) for Fall 2008.
        *** Enjoyment is measured on a 1-10 scale (decreased a great deal to increased a great deal) for Fall
        2007 and a 1-5 scale (decreased a great deal to increased a great deal) for Fall 2008.
        All exam data are for first attempts.

                                        Table 3. Regression Results.
                                          (t-statistics in parentheses)

                           Fall 2007                                                Fall 2008
variable         Exam 1      Exam 2      Exam Ave          Exam1      Exam 2          Exam 3     Exam 4 Exam Ave

SECTION             7.32         6.21         6.76           -0.83          0.67         2.18       7.47     3.50
                  (1.30)       (1.16)       (1.47)         (-0.18)        (0.14)         (0.4)    (1.62)   (1.03)
GENDER            13.06         -4.63         4.22           -6.19          -0.30        -9.78     -0.46     -1.83
                  (2.77)      (-1.03)       (1.10)         (-1.86)        (-0.09)      (-2.36)   (-0.13)   (-0.65)
GENDER*SECTION   -11.13          0.24         -5.44           2.75          -2.90         0.05     -5.91    -4.91
                 (-1.61)       (0.04)       (-0.96)         (0.51)        (-0.52)       (0.01)   (-1.07)    (-1.2)
FRESH              -5.49      -20.82        -13.16           -1.72          -0.99        -6.52      4.42     -2.20
                 (-0.51)      (-2.04)        (-1.5)        (-0.38)        (-0.21)      (-1.21)    (0.97)   (-0.66)
SOPH               -1.79        -7.02         -4.40          -2.45          -4.00        -6.12      1.60     -4.40
                 (-0.37)      (-1.54)       (-1.13)        (-0.59)        (-0.94)      (-1.24)    (0.39)   (-1.47)
JUNIOR              1.91        -6.59         -2.34         -3.60           -1.21        -1.02     -0.01    -4.52
                  (0.42)      (-1.51)       (-0.63)         (-0.8)        (-0.26)      (-0.18)    (0.00)    (-1.3)
BUSECO             -0.36         8.22         3.93           -3.66          6.01          5.36     -0.82     -0.03
                 (-0.07)       (1.68)       (0.94)         (-1.03)        (1.65)        (1.26)   (-0.21)   (-0.01)
BUSECO*SECTION      7.58      -13.26          -2.84           3.07          -4.97         2.16     -1.37     2.38
                  (1.16)      (-2.13)       (-0.53)         (0.65)        (-1.01)       (0.38)   (-0.27)   (0.64)
SEMHRS              0.24         0.36         0.30           -0.36          0.47         -1.11     -1.51     -0.75
                  (0.21)       (0.34)       (0.33)         (-0.43)        (0.54)       (-1.13)   (-1.84)   (-1.24)
SATV                0.09         0.07         0.08            0.07          0.04          0.07      0.02     0.05
                  (2.70)       (2.45)       (3.08)          (3.34)          (1.9)       (2.59)    (0.83)   (3.15)
SATM                0.00         0.06         0.03            0.01          0.03          0.05     0.05      0.04
                  (0.04)       (2.31)       (1.37)          (0.37)          (1.6)       (1.93)     (2.2)   (2.51)
Constant          19.16         -3.38         7.89          48.09         23.46         29.27     54.43    37.59
                  (0.67)      (-0.12)       (0.34)          (2.89)        (1.35)        (1.47)    (3.21)   (3.02)

Observations         68            68           68             92             90            87       87        81
Adj R              0.13          0.22         0.16           0.13           0.05          0.16     0.05      0.19

        Table 4. Wilcoxon (Mann-Whitney) test
        for equal distribution across samples.

                               Value Probability

        Fall 2007
         Exam 1                0.419           0.68
         Exam 2                0.983           0.33

        Fall 2008
         Exam 1                0.679           0.50
         Exam 2                0.544           0.59
         Exam 3                0.903           0.37
         Exam 4                0.985           0.32

                        Table 5
              End of semester evaluations.


Measure                            Fall 2007          Fall 2008

Organization of course*                -0.30              -0.26

Fairness in grading**                  -0.68              -0.97

Overall teaching effectiveness*        -0.60              -0.22

Demanding grading***                    0.11               0.80

Rigor****                              -0.11               0.08

*Rating scale: 1 (excellent) to 5 (poor).
** Rating scale: 1 (very fair) to 5 (unfair)
*** Rating scale: 1 (high expectation) to 5 (low expectation)
**** Rating scale: 1 (rigorous and demanding) to 5 (not rigorous)

Differences measures rating for treatment minus control.
    Thus, a negative number indicates a better rating for the
    treatment section.

                                                                            Figure 1. Distribution of Exam Grades.

                                                                                                  Fall 2007
                                                      Exam 1                                                                                                        Exam 2

                      20                                                                                                         20

                                                                                                               No. of Students
No. of Students

                      15                                                                                                         15

                                                                                        Experimental                                                                                               Experimental
                      10                                                                                                         10
                                                                                        Control                                                                                                    Control

                       5                                                                                                             5

                       0                                                                                                             0
                            41-50   51-60   61-70           71-80   81-90      91-100                                                     41-50   51-60   61-70           71-80   81-90   91-100
                                                    Grade                                                                                                         Grade

                                                                                                  Fall 2008
                                                      Exam 1                                                                                                        Exam 2

                       25                                                                                                  25

                       20                                                                                                  20
     No. of Students

                                                                                                        No. of Students
                       15                                                               Experimental                       15                                                                      Experimental
                       10                                                               Control                            10                                                                      Control

                        5                                                                                                        5

                        0                                                                                                        0
                            41-50   51-60   61-70           71-80   81-90      91-100                                                    41-50    51-60   61-70           71-80   81-90   91-100
                                                    Grade                                                                                                         Grade

                                                      Exam 3                                                                                                        Exam 4

                       25                                                                                                   25

                       20                                                                                                   20
                                                                                                         No. of Students
    No. of Students

                       15                                                                                                   15                                                                      Experimental
                       10                                                               Control                             10                                                                      Control

                        5                                                                                                        5

                        0                                                                                                        0
                            41-50   51-60   61-70           71-80   81-90      91-100                                                    41-50    51-60   61-70           71-80   81-90   91-100
                                                    Grade                                                                                                         Grade

                                                             Figure 2.

                                                        Assignments submitted
                                                              Fall 2008


Cumulative number submitted





                                    1   2   3   4   5    6   7   8   9   10 11 12 13 14


Figure A-1. Kernel density of exam scores, Fall 2007. Section 0 is the control section.

                                                                                                             EX2 by SECTION
                               EX1 by SECTION

             .028                                                                 .024

             .024                                                                 .020

             .020                                                                 .016


             .000                                                                        20   30   40   50    60   70   80    90   100   110   120
                    0   20    40     60     80        100   120
                                                                                                        SECTION=0       SECTION=1
                             SECTION=0    SECTION=1

Figure A-2. Kernel density of exam scores, Fall 2008. Section 0 is the control section.

                                        EX1 by SECTION                                                                                  EX2 by SECTION
             .040                                                                                          .04


             .030                                                                                          .03


             .020                                                                                          .02


             .010                                                                                          .01


             .000                                                                                          .00
                    30   40   50    60      70    80        90        100     110    120                         20     30    40   50     60    70        80    90   100   110    120

                                   SECTION=0           SECTION=1                                                                   SECTION=0              SECTION=1

                                         EX3 by SECTION                                                                                   EX4 by SECTION
             .030                                                                                            .05







             .000                                                                                            .00
                    20   30   40   50     60     70    80        90     100    110    120                          30        40    50      60        70        80    90     100         110

                                   SECTION=0           SECTION=1                                                                        SECTION=0              SECTION=1


To top