Brian Fotinakes by HC120913141951


									                                                Auto Scoring Software and Classroom Practice 1

Brian Fotinakes
December 4, 2009
English 808

       Annotated Bibliography for Automated Scoring Programs and Classroom Practice

Calfee, R. (2000). To grade or not to grade? IEEE Intelligent Systems, 15(5), 35-37.

Calfee is dean of the School of Education at UC Riverside, and endorses the use of automated
scoring programs in education but also acknowledges their main purpose should be for large-
scale assessment and generated feedback for surface features in writing. These programs attend
mainly to micro-level issues in a text, and Calfee believes this makes the software better suited
for inexperienced writers. Calfee also argues that these programs show a correlation between
students reading and writing—meaning these programs have the ability to score how well a
reader can understand and respond to a text.

Chen, C. & Cheng, W. (2008). Beyond the design of automated writing evaluation: Pedagogical

       practices and perceived learning effectiveness in EFL writing classes. Language

       Learning & Technology, 12(2), 94-112.

Chen and Cheng’s study is a “naturalistic classroom-based inquiry that was conducted in three
EFL college writing classroom contexts in a university in Taiwan” (p. 98). The students in these
courses were third-year English majors taught by experienced EFL teachers. The course
emphasized exercising students’ academic writing through a process-writing approach. The
scoring software was implemented to facilitate student writing development. Chen and Cheng
collected data through questionnaires administered to students at the end of the semester, focus
group interviews with students, individual interviews with the course instructors, and analysis of
student writing samples accompanied with software generated scores and feedback. Chen and
Cheng monitored and studied the integration of automated evaluation software which was used
to help students revise their essays.

Cheville, J. (2004). Automated scoring technologies and the rising influence of error. The

       English Journal, 93(4), 47-52.

Cheville articulates many of the concerns compositionists have over the use of scoring software.
Cheville’s first critique focuses on the programs favoring formulaic writing; Cheville argues that
focusing on form filters meaning and distracts writers from their purpose. The programs also
present the idea that grammar and style errors are static across any context, which she disputes
citing the fact that different audiences judge errors differently. Finally, Cheville presents the
argument that writing is social, and these programs cannot account for social contexts. When
used for high stake writing assessment or classroom instruction, these programs may hurt
students in Cheville’s opinion.
                                                 Auto Scoring Software and Classroom Practice 2

Clauser, B., Harik, P., & Clyman, S. (2000). The generalizability of scores for a performance

       assessment scored with a computer-automated scoring system. Journal of Educational

       Measurement, 37(3), 245-262.

The authors of this article study the National Board of Medical Examiners use of scoring
software to evaluate an exam concerning physicians’ patient management skills. This study goes
into depth analyzing the algorithms used in these programs, and compare the scoring results of
the software to human scorers. The study concludes that the scoring software has brought
mechanical accuracy to this test, and it has also eliminated random human errors.

Herrington, A. & Moran, C. (2001). What happens when machines read our students’ writing?

       College English, 63(4), 480-499.

Herrington and Moran point out the negative implications scoring software present in terms of
what is good writing and what administrators may do with such implications. Like Cheville,
Herrington and Moran cite that these programs ignore all social issues regarding texts. The
authors do a good job describing how the software functions and use this description to point out
the negative issues involved in using such computer programs. They also use two scoring
programs and evaluate their personal results.

Huot, B. (2002). (Re)Articulating writing assessment for teaching and learning. Logan: Utah

       State University Press.

Huot describes the goal of this book as changing the way assessment is thought of by those
involved in composition and educational assessment. Huot advocates for site-based and locally
controlled assessments of writing instead of viewing it as a static technology which creates
problems. He also articulates the difference between assessment, grading, and testing, and
asserts that assessment functions more as a response to a text rather than a score. Finally, Huot
discusses how instructors read student writing and how this affects response.

Kukich, K. (2000). Beyond automated scoring. IEEE Intelligent Systems, 15(5), 22-27.

Kukich, representing ETS, first gives a short description of the history of automated scoring
technologies. She claims these programs purpose is to reduce grading for teachers and also
development more effective writing instruction. She describes the basic functions of these
programs and how the programs evaluate essays. She then describes research being conducted to
improve these systems for future use in educational settings.

Landauer, T. & Laham, D. (2000). The intelligent essay assessor. IEEE Intelligent Systems,

       15(5), 27-31.
                                                Auto Scoring Software and Classroom Practice 3

Landauer is a professor of psychology and Laham is a professor of cognitive science; both men
are founders of Knowledge Analysis Technologies which produces scoring software. Their
article goes into depth explaining how semantic analysis functions in scoring software, how it is
calibrated, and argue for the software’s reliability and validity. They contend that these
programs account for content and social situations, but they offer no real evidence of how the
software can account for social context and use the term social in a manner which does not
account for contextual differences.

Warschauer, M. & Grimes, D. (2008). Automated writing assessment in the classroom.

       Pedagogies: An International Journal, 3, 22-36.

Conducted by Warschauer and Grimes, this study is a “mixed-methods exploratory case study to
learn how AWE is used in classrooms and how that usage varies by school and social context”
(p. 26). Warschauer and Grimes used a convenience sample from a middle school, two junior
high schools, and a high school in southern California with diverse student populations varying
in academic achievement, socioeconomic status, ethnic makeup, and access to computers. Data
was collected through transcribed semistructured interviews with principals, language arts
teachers, and two focus groups of students. In addition, thirty language arts classes were
observed, and surveys were completed by students and teacher. A sample of student writing
with feedback from the software was also analyzed. Data and observation notes were analyzed
with standard qualitative coding and pattern identification techniques.

Williamson, M. (2004). Validity of automated scoring: Prologue for a continuing discussion of

       machine scoring student writing. Journal of Writing Assessment, 1(2), 85-104.

In this article, Williamson discusses the differences between two camps of educators (college
writing assessment and educational measurement) involved in writing assessment in order to
initiate discussion about the validity in automated scoring. Williamson looks at the history and
development of assessment technologies in order to explore the beliefs and assumptions in each
group. Williamson then explores the concept of validity and how it has been defined in various
ways. Holistic scoring also finds place in this article as Williamson questions its validity,
ultimately finding that validity exists in each test depending context.

Wohlpart, A.J., Lindsey, C., & Rademacher, C. (2008). The reliability of computer software to

       score essays: Innovations in a humanities course. Computers and Composition, 25(2),


Wohlpart, Lindsey, and Rademacher describe their implementation of scoring software in a large
online course at Florida Gulf Coast University. The authors found the software performed
excellently for their specific needs. They used the programs to holistically score two short
analytical essays for writing ability and textbook content. They found the software scored as
reliable as human evaluators. For a longer interpretive essay, the online course still used human

To top