VIEWS: 8 PAGES: 3 POSTED ON: 9/13/2012
Auto Scoring Software and Classroom Practice 1 Brian Fotinakes December 4, 2009 English 808 Annotated Bibliography for Automated Scoring Programs and Classroom Practice Calfee, R. (2000). To grade or not to grade? IEEE Intelligent Systems, 15(5), 35-37. Calfee is dean of the School of Education at UC Riverside, and endorses the use of automated scoring programs in education but also acknowledges their main purpose should be for large- scale assessment and generated feedback for surface features in writing. These programs attend mainly to micro-level issues in a text, and Calfee believes this makes the software better suited for inexperienced writers. Calfee also argues that these programs show a correlation between students reading and writing—meaning these programs have the ability to score how well a reader can understand and respond to a text. Chen, C. & Cheng, W. (2008). Beyond the design of automated writing evaluation: Pedagogical practices and perceived learning effectiveness in EFL writing classes. Language Learning & Technology, 12(2), 94-112. Chen and Cheng’s study is a “naturalistic classroom-based inquiry that was conducted in three EFL college writing classroom contexts in a university in Taiwan” (p. 98). The students in these courses were third-year English majors taught by experienced EFL teachers. The course emphasized exercising students’ academic writing through a process-writing approach. The scoring software was implemented to facilitate student writing development. Chen and Cheng collected data through questionnaires administered to students at the end of the semester, focus group interviews with students, individual interviews with the course instructors, and analysis of student writing samples accompanied with software generated scores and feedback. Chen and Cheng monitored and studied the integration of automated evaluation software which was used to help students revise their essays. Cheville, J. (2004). Automated scoring technologies and the rising influence of error. The English Journal, 93(4), 47-52. Cheville articulates many of the concerns compositionists have over the use of scoring software. Cheville’s first critique focuses on the programs favoring formulaic writing; Cheville argues that focusing on form filters meaning and distracts writers from their purpose. The programs also present the idea that grammar and style errors are static across any context, which she disputes citing the fact that different audiences judge errors differently. Finally, Cheville presents the argument that writing is social, and these programs cannot account for social contexts. When used for high stake writing assessment or classroom instruction, these programs may hurt students in Cheville’s opinion. Auto Scoring Software and Classroom Practice 2 Clauser, B., Harik, P., & Clyman, S. (2000). The generalizability of scores for a performance assessment scored with a computer-automated scoring system. Journal of Educational Measurement, 37(3), 245-262. The authors of this article study the National Board of Medical Examiners use of scoring software to evaluate an exam concerning physicians’ patient management skills. This study goes into depth analyzing the algorithms used in these programs, and compare the scoring results of the software to human scorers. The study concludes that the scoring software has brought mechanical accuracy to this test, and it has also eliminated random human errors. Herrington, A. & Moran, C. (2001). What happens when machines read our students’ writing? College English, 63(4), 480-499. Herrington and Moran point out the negative implications scoring software present in terms of what is good writing and what administrators may do with such implications. Like Cheville, Herrington and Moran cite that these programs ignore all social issues regarding texts. The authors do a good job describing how the software functions and use this description to point out the negative issues involved in using such computer programs. They also use two scoring programs and evaluate their personal results. Huot, B. (2002). (Re)Articulating writing assessment for teaching and learning. Logan: Utah State University Press. Huot describes the goal of this book as changing the way assessment is thought of by those involved in composition and educational assessment. Huot advocates for site-based and locally controlled assessments of writing instead of viewing it as a static technology which creates problems. He also articulates the difference between assessment, grading, and testing, and asserts that assessment functions more as a response to a text rather than a score. Finally, Huot discusses how instructors read student writing and how this affects response. Kukich, K. (2000). Beyond automated scoring. IEEE Intelligent Systems, 15(5), 22-27. Kukich, representing ETS, first gives a short description of the history of automated scoring technologies. She claims these programs purpose is to reduce grading for teachers and also development more effective writing instruction. She describes the basic functions of these programs and how the programs evaluate essays. She then describes research being conducted to improve these systems for future use in educational settings. Landauer, T. & Laham, D. (2000). The intelligent essay assessor. IEEE Intelligent Systems, 15(5), 27-31. Auto Scoring Software and Classroom Practice 3 Landauer is a professor of psychology and Laham is a professor of cognitive science; both men are founders of Knowledge Analysis Technologies which produces scoring software. Their article goes into depth explaining how semantic analysis functions in scoring software, how it is calibrated, and argue for the software’s reliability and validity. They contend that these programs account for content and social situations, but they offer no real evidence of how the software can account for social context and use the term social in a manner which does not account for contextual differences. Warschauer, M. & Grimes, D. (2008). Automated writing assessment in the classroom. Pedagogies: An International Journal, 3, 22-36. Conducted by Warschauer and Grimes, this study is a “mixed-methods exploratory case study to learn how AWE is used in classrooms and how that usage varies by school and social context” (p. 26). Warschauer and Grimes used a convenience sample from a middle school, two junior high schools, and a high school in southern California with diverse student populations varying in academic achievement, socioeconomic status, ethnic makeup, and access to computers. Data was collected through transcribed semistructured interviews with principals, language arts teachers, and two focus groups of students. In addition, thirty language arts classes were observed, and surveys were completed by students and teacher. A sample of student writing with feedback from the software was also analyzed. Data and observation notes were analyzed with standard qualitative coding and pattern identification techniques. Williamson, M. (2004). Validity of automated scoring: Prologue for a continuing discussion of machine scoring student writing. Journal of Writing Assessment, 1(2), 85-104. In this article, Williamson discusses the differences between two camps of educators (college writing assessment and educational measurement) involved in writing assessment in order to initiate discussion about the validity in automated scoring. Williamson looks at the history and development of assessment technologies in order to explore the beliefs and assumptions in each group. Williamson then explores the concept of validity and how it has been defined in various ways. Holistic scoring also finds place in this article as Williamson questions its validity, ultimately finding that validity exists in each test depending context. Wohlpart, A.J., Lindsey, C., & Rademacher, C. (2008). The reliability of computer software to score essays: Innovations in a humanities course. Computers and Composition, 25(2), 203-223. Wohlpart, Lindsey, and Rademacher describe their implementation of scoring software in a large online course at Florida Gulf Coast University. The authors found the software performed excellently for their specific needs. They used the programs to holistically score two short analytical essays for writing ability and textbook content. They found the software scored as reliable as human evaluators. For a longer interpretive essay, the online course still used human graders.
Pages to are hidden for
"Brian Fotinakes"Please download to view full document