Creating Valid and Reliable Classroom Tests James A. Wollack, PhD John Siegler, PhD Taehoon Kang Craig S. Wells Testing & Evaluation Services Creating Valid and Reliable Classroom Tests Session III: Writing and Scoring Essay and Short-Answer Questions Recap of Session II Writing Multiple-Choice Questions Sharing of Homework Writing and Scoring Essay and Short-Answer Questions Essay Grading Exercise Item Writing Rules Scoring Guidelines Essay Grading Exercise, Revisited Question and Answer Session Preview of Session IV Collect answer sheets from quiz Recap of Session II Writing Multiple-Choice Tests Types of multiple-choice items Rules for writing multiple-choice items Make item as clear as possible Item should have one and only one unambiguously correct answer Distractors should be plausible Avoid irrelevant clues to the right answer or clues that certain distractors are wrong Design items so that common test-taking strategies are ineffective Multiple-choice item writing exercise Sharing of Homework In groups of 3, review your constructed-response items and select the one item that you like best. As a group, develop the scoring rules/criteria for this item. Include the following: Information for the student on how the item will be scored. Information for instructor or TA on how to grade the item. You decide: The item being considered could be either one item on a test or a homework, or perhaps the entire test/homework. Writing and Scoring Essay and Short-Answer Items Essay Grading Exercise Consider the following essay question: While taking multiple-choice tests, students often use various test taking strategies to help them select the correct response. The reason that students are able to use test taking strategies effectively is, frequently, that the items are poorly constructed. Provide examples of two different types of item writing problems that provide irrelevant clues that can be used to the advantage of the test-wise student. Also, provide an example of how instructors can use their knowledge of test taking strategies to develop multiple-choice items that will be harder for students who are overly reliant on test-taking strategies. Justify your choices of examples. (10 points) Writing and Scoring Essay and Short-Answer Items Essay Grading Exercise Initial Score for Student A Initial Score for Student B Writing and Scoring Essay and Short-Answer Items Constructed-response (CR) items are ideal for assessing whether students possess a rich understanding of material. Useful for learning the processes a student uses to solve problems. Students decide How to approach problems How to set up problems What factual information or opinions to use How much emphasis to devote to various parts How to express their answer Writing and Scoring Essay and Short-Answer Items Examples of situations well-suited for CR items. assessing writing ability solving math or science problems comparing and contrasting opposing viewpoints recalling and describing important information developing a plan for solving a problem criticizing or defending an important theory CR items are easy to write, but difficult to score Grading issues Consistency Fairness Rules for Writing CR Items Use CR items to measure complex objectives only Information that can be easily obtained using MC items or other objective-type items should not be tested in CR format list define identify Advantages gained by asking students to produce the answer, rather than recognize it, are more than offset by disadvantages associated with scoring such items. Reserve CR items for situations where supplying answer is essential and where MC items are of limited value why create describe relate explain interpret compare analyze contrast evaluate criticize Rules for Writing CR Items The shorter the required answer, the better Allows for more items to be administered Reduces influence of verbal fluency, spelling, etc. Easier to grade Focus question on a single issue Don’t give examinee or grader too much freedom in determining what the correct answer should be. Issue should be directly linked to course objectives Keep blueprint in mind (i.e., weighting of objectives) when determining how many points to assign to an item and how much time students will require to answer. Provide students enough time to answer Rules for Writing CR Items Models for essay tests Take home vs. in class Inform students ahead of time of the possible topics e.g., Show students 6 questions and tell them that 2 will be on the test. Provide students with a number of questions and allow students to select the one(s) on which to write. Rules for Writing CR Items Models for essay tests Take home vs. in class Take home essays allow students to possible thoughtful, complete, Inform students ahead of time of the give moretopics and better answers 6 questions and tell them that 2 will be on the e.g., Show students Take home essays afford limited security in terms of item exposure, test. available resources, and identity of test taker Provide students with a number of questions and allow students In class essays measure organizational toselect the one(s) on which to write. skills and writing and thinking speed in addition to knowledge of the topic. It is possible to require that take home essays be typed, thereby making it easier and quicker to read, eliminating handwriting from consideration, and reducing the scoring impact of spelling errors. Rules for Writing CR Items Models for essay tests Inform students ahead of time of the possible topics e.g., Show students 6 questions and tell them that 2 will be on the test. Students don’t like surprises Allows students maximum opportunity to prepare their answers Could include memorizing information supplied by a friend. Opens possibility of someone cheating by coming to class with essay already written. Unless the initial set of possible topics covers the entire blueprint very well, this may result in students not bothering to study other areas you regard as important Rules for Writing CR Items Models for essay tests Provide students with a number of questions and allow students to select the one(s) on which to write. It is very difficult to compare performances of two individuals who answered different items Not all items are equally difficult or easy to grade If students select items to answer, it is very hard for you to control the content of the exam. Students will choose to answer items on familiar topics Performance will not represent how well they have mastered entire domain of interest. Avoid this model, if at all possible. Rules for Scoring CR Items CR items are very difficult to grade Grading difficulty increases with the number of scale points Item may be easy to grade with 3 points, but hard with 10. Grading difficulty increases with item complexity. When grading, focus on consistency and fairness Consistency refers to the extent to which the same points are awarded or subtracted for comparable information across students. Two students making comparable misinterpretations or mistakes should receive the same deductions More consistent grading will produce more reliable scores Fairness refers to the extent to which the points assigned or deducted reflect the weighting of objectives in the test blueprint. In answering related questions where the answer for one question is used as input into another question, getting an intermediate step wrong should only result in losing points once. The second problem likely relates to a different objective. Rules for Scoring CR Items Construct a detailed scoring rubric that identifies the basis for awarding and subtracting points at each phase of each item. May help to develop a model answer and think about essential elements in producing this answer Evaluate answers in terms of the learning outcomes measured Pay careful attention to errors of omission and commission Keep in mind the total number of points for the item Little mistakes may result in deductions on items worth a lot of points, but maybe not on items worth few points. Give careful thought to the basis for awarding or subtracting points on essays where examinee is asked for his/her opinion Given some scenario, argue for or against something. Students should be allowed to reach different conclusions, and they shouldn’t be graded down for political, ideological, or philosophical differences from grader. Rubric for Wisconsin Student Assessment System Knowledge & Concepts Examinations 6 Response is complete and superior in development, with fine use of language and mechanics. The writing is clearly focused on a topic and is logical and well developed. There is a clear sense of voice, purpose, and audience. Balance, precise vocabulary, and sophistication set this response apart. 5 Response is clear and well organized. There is a clear sense of purpose and few errors in mechanics or language. There is logical development of topic. Response shows a good command of language, with spelling errors on above grade level words only. This response is balance and complete. 4 Response is completely organized and developed with adequate use of language and mechanics. the piece follows an organizational plan to closure. Development may be brief with few examples, but it is focused on a topic. Vocabulary is good, and common words are spelled correctly. 3 Response is somewhat developed. Frequent errors in mechanics and language detract from the whole. There is some focus on a topic, though lapses in logic or balance may occur. 2 Response is poor. Errors in language and mechanics may obscure the meaning. There is little evidence of focus on a topic or of an organized plan. Poor vocabulary and spelling inhibit understanding. 1 Response is scarcely coherent. Errors obscure the meaning. There is no balance, little or no logic, or attention to the topic. Rules for Scoring CR Items CR items should be graded anonymously, if possible. Reduces grader subjectivity. Can use a code number to identify examinees Examinees could put their name only on the front sheet, which graders are instructed not to look at. Grade all students’ responses one question at a time. Grade item 1 for all students before moving on to grade item 2. Helps grader maintain a single set of criteria for awarding points. Reduces influence of examinee’s previous performance on other items. If multiple graders are used and it is not possible for all graders to rate all items for all students, it is better to have each grader score a particular problem or two for all students than to have each grader score all problems for only a subset of students. e.g., Don’t have TA grade exams for only the students in their discussion section This strategy eliminates effects due to one person grading harder than another. Rules for Scoring CR Items While grading a question, maintain a log of the types of errors observed and their corresponding deductions. It is very difficult to anticipate every error you will see Allows for consistency across exams. May be necessary to re-examine some questions that had already been graded to verify that the point deductions are consistent and fair. Some mistakes may be more common than you had anticipated when you first started grading. Use multiple raters, if possible Unless writing skill is one of the course objectives, do not take credit off for poor grammar, spelling errors, or failure to punctuate properly. Points can be reduced if quality of writing clearly interferes with your ability to understand whether the student has adequately grasped the material. Never grade on the basis of penmanship or length. Can grade down for length if it is clearly outside the length parameters identified on assignment. Other Considerations with CR Items Well-developed CR items can provide a richness of information not available with MC testing. HOWEVER, They are harder and more time-consuming to grade Less reliable than MC tests. More likely to result in students contesting their grades. Because they take longer to answer, many more MC items can be administered in the same time period. CR items don’t sample the domain of interest as thoroughly. A common criticism of MC testing is that many students aren’t good at that type of assessment, so it doesn’t allow for them to show what they know. Research shows very clearly that students’ writing varies by genre. Students may be good at writing compare-contrast or an objective piece, but may not be good at expressing opinions. Summary of MC versus CR items MC items CR items Learning outcomes Good for measuring outcomes at Inefficient for measuring measured lower levels of learning (e.g., knowledge outcomes; best for knowledge, comprehension, and ability to organize, integrate, and application); inadequate for express ideas. organizing and expressing ideas. Sampling of content The use of a large number of The use of a small number of items results in broad coverage items limits coverage which makes which makes representative representative sampling of content sampling of content feasible. infeasible. Preparation of items Preparation of good items is Preparation of good items is difficult and time consuming. difficult but easier than MC items Scoring Objective, simple, and highly Subjective, difficult, and less reliable. reliable. Factors distorting scores Reading ability and guessing. Writing ability and bluffing. Probable effect on Encourages students to Encourages students to organize, learning remember, interpret, and use the integrate, and express their own ideas of others. ideas. Scoring Rubric for Sample Essay Question Consider scoring rubric for the sample essay question. Re-evaluate the responses to the two sample essays. Final score for Student A Final score for Student B By show of hands… Who saw their scores stay the same for both essays? Who saw only one of their scores stay the same? Who saw both their scores change? Who saw the difference between the two scores change? Who saw the ranking of the two scores change? Who scored item 2 higher than item 1? Group Activity Re-assemble into groups and discuss any possible revisions to your scoring rubrics. Share items and rubrics Questions from the first three sessions? Preview of Session IV Collect answer sheets from quiz You may hold onto the actual quiz Evaluating the Test—The Final Step Overview of item analysis Overview of scanning and reporting options Review item analysis from quiz Evaluate items Revise items Final Questions Evaluation The Room for Session IV Has Changed Session IV will take place in Union South. Please check the room postings when you arrive to see which room we will be in.
Pages to are hidden for
"Evaluation Essay Topics and Sample"Please download to view full document