Docstoc

Vol._4_Test_Writer_Reviewer_Manual

Document Sample
Vol._4_Test_Writer_Reviewer_Manual Powered By Docstoc
					        MANUAL FOR

TEST WRITERS AND REVIEWERS




  Testing and Assessment Team
       English Department
       Foundation Program
         Qatar University



            Fall 2008




           Appendix A Vol. 4 Test Writers and Reviewers Manual Fall 2008
                               TABLE OF CONTENTS

   I.      INTRODUCTION ……………………………………………………. 3

   II.     DIFFERENCES BETWEEN TESTING AND TEACHING ………. 4

   III.    GENERAL RECOMMENDATIONS FOR EXAM CONTENT …… 5

   IV.     TESTING GRAMMAR ………………………………………………. 6-11
            Recommended item types ……………………………………… 6
            Sample TOC ………………………………………………………6-7
            Guidelines for item construction ………………………………. 7-11

   V.      TESTING LISTENING COMPREHENSION ……………………… 12-17
            Recommended item types ……………………………………… 12-13
            Sample TOC ………………………………………………………13
            Guidelines for item construction ………………………………. 13-17

   VI.     TESTING READING COMPREHENSION AND VOCABULARY 18-31
            Recommended item types for testing reading ……………… 18
            Sample TOC ………………………………………………………19-20
            Guidelines for reading exam item construction ………………. 20-27
            Guidelines for vocabulary exam item construction ……………27-31

   VII.    TESTING WRITING ………………………………………………….32-35
            Recommended item types ……………………………………… 32
            Sample TOC ………………………………………………………32
            Guidelines for item construction ……………………………….. 32-35

   VIII.   EXAM FORMATTING ………………………………………………. 36
   IX.     EXAM PACING PROCEDURES…………………………………….37

           APPENDIX A – TEST CONSTRUCTION…………………………..39-40
            Planning ………………………………………………………….. 39
            Writing ……………………………………………………………. 39-40
            Reviewing ………………………………………………………… 40

           APPENDIX B – ATTIBUTES OF A GOOD TEST………………….41-42
            Validity …………………………………………………………… 41
            Reliability …………………………………………………………. 31-42
            Practicality ……………………………………………………….. 42

           RECOMMENDED READING ………………………………………. 43




Manual for Test Writers and Reviewers 2008                                 2
       I.      INTRODUCTION

This manual is intended for test writers and test reviewers in the Foundation
Program, Department of English at Qatar University.


The content of this manual is designed to provide guidelines for the production of
improved exams and tests across levels in the English Department.




Manual for Test Writers and Reviewers 2008                                      3
       II.       DIFFERENCES BETWEEN TESTING AND TEACHING

   When designing an exam, it is important to keep in mind the differences between
   material that is practiced in class, and material that is to be tested on exams. Not
   everything practiced in class will lead to inclusion in an exam. In other words, it is
   not necessary to include in an exam all grammar items or all types of vocabulary
   activities studied or practiced in class.


   Decisions of whether or not to include an item in an exam should be based on
   importance given to the particular item in the curriculum.


   When making such decisions, always keep in mind that a test writer should only
   test what is taught and that if an item is important, it should be tested.


   Three useful questions to ask yourself before you include an item type in an
   exam are:


                Is this item type better used for teaching and explaining, or for testing a
                 skill?
                Is this item type important?
                Is this item type testing what it aims to test?




Manual for Test Writers and Reviewers 2008                                                4
       III.      GENERAL RECOMMENDATIONS FOR EXAM CONTENT


             Ensure that all instructions are clear, well written and straightforward
              (instructions should not represent a challenge for students)
             Provide a model immediately after the instructions, so that students can
              see what the activity requires them to do
             Clearly define and indicate what is meant by short answer: give a number
              or words, some limit on length of student response, and an example
             Make sure all exam activities have a clear purpose: i.e. tie in one activity
              with another (without making them dependent on one another)
             Ensure activities on an exam do not represent double jeopardy for
              students (do not write activities that are interdependent where answering
              one question correctly is dependent on answering the previous question
              correctly).
             Make sure blanks are big enough for students to fit their answer comfortably
             Space the test items to ensure easy reading and scoring of answers by the
              students.
             Do not split one item onto two pages. Keep the entire item on one page.
             Each exam draft should be accompanied by an answer key




Manual for Test Writers and Reviewers 2008                                              5
       IV.       TESTING GRAMMAR
                Recommended item types
The following item types can be found in the exam specifications.
Time: 30 minutes
Source material: content in FOG & North Star reading/writing/listening/speaking or
adapted from ESL books, internet etc. (as per exam specs). Do not use or test
meta-language. Focus on main grammar points. Do not test more „esoteric‟ or dated
usage.
Section 1
Item type: MCQ – gap fill, completion, substitution etc. (with approval from
supervisor)
1 answer, 3 distracters
Areas Tested
Content as per syllabus– weight according to importance within syllabus
25 Questions x 1 mark
Total marks: 25
                Sample TOC for grammar exams (on next page)




Manual for Test Writers and Reviewers 2008                                      6
Units 1, 3, 4, 5, 6 North Star and Focus On Grammar (units 22, 23 and 18)
Level 4 Final Exam - Grammar
Section 1                                      Item Tested
Q1              Gerund preceded by „not‟
Q2              Adjectives followed by infinitive
Q3              „where‟ signifying place in adj. clause
Q4              „had had‟ in third conditional
Q5              Gerund as object of preposition
Q6              Adverb clauses of time
Q7              Parallelism of gerunds
Q8              „wish‟ to express present aspiration
Q9              Adverb clauses of reason
Q10             Coordinating conjunctions
Q11              Reduced adverb clauses
Q12              Nouns followed by infinitives
Q13             Obligatory „that‟ in subject noun clause
Q14             Gerund in passive
Q15             „On and ing‟ in adverb phrases
Q16             „The fact that‟
Q17              Word order in embedded questions
Q18              Transitions (however)
Q19             Tense changes from adverb clause to adverb phrase
Q20             Transitions showing temporal relationships
Q21             „Whose‟ as genitive
Q22             Transitions showing contrast
Q23             Subordinating conjunctions
Q24              „Whom‟ as the subject of a preposition
Q25              Reduced adjective clause

                Guidelines for item construction
Please note that examples marked *** indicate samples of what not to do.
1.     Care must be exercised to ensure that each structure point is employed in a
suitable context. The context must be meaningful and as natural as possible. The
lead should be brief, clear, and straight forward with no vocabulary that is not familiar
to the subjects. A context such as "The professor demanded that his students be on
time to every class" is too formal and bookish. If changed to "The professor asked
his students to be on time to every class", it would present a better context for a
structure item. As another example demonstrates, a context like:
      ***A: Are you going home?
      ***B: Yes, I am. I am …………. home.

This exchange sounds like a mechanical classroom activity because it does not



Manual for Test Writers and Reviewers 2008                                             7
represent a likely natural use in casual speech. Never use a stem beginning with a
blank unless it is part of a dialogue.
     A: I can't find my keys.
     B: __________ you looked in your bag?

2.     The stem should provide sufficient context. There is no fixed rule regarding
the length of context. Sometimes a context of a few words would suffice whereas in
other instances a context of 20 words may be insufficient.

3.       The distracters in multiple-choice items should be plausible.and
grammatically correct. Experienced teachers usually have a feeling for the kind of
distracters that are suitable for any given context. Inexperienced teachers may look at
the errors that students make on class exercises, responses given to open-ended
questions, and compositions. In addition, distracters can be selected on the basis of
contrastive analysis when the examinees come from one language background.

4.       Each item should have only one acceptable or clearly best answer with
no distracter that is regionally acceptable or appropriate in one variety of English
but not another. For instance, an item such as the following should be avoided
because the first distracter is appropriate in British English though not in American
English. This implies that all distracters should be definitely wrong.
         ***A: "Why are you ringing the door bell?"
         ***B: "I ……….. my key with me.
        a. haven't     b. don't       c. don‟t have      d. not have
Yet certainly, no distracter should represent a non-English pattern. Not only are
non-English patterns easy to identify as wrong; they may also have a harmful effect
on the testees' learning.

5.       The alternatives should be brief and to the point. The following item is
pragmatically uneconomical:
***A: "What did your father ask you?"
***B: "My father asked me __________ .”
     a. if I had taken my younger brother to school
     b. If I have taken my younger brother to school
     c. did you take your younger brother to school
     d. did I take my younger brother to school


Manual for Test Writers and Reviewers 2008                                              8
However, if changed to the following form, the item would considerably improve:

     A: "What did your father ask you?"
     B: "He asked me if I ___ taken my younger brother to school.”
     a. have           b. will               c. had              d. did

6.      The options should be of similar length. If the four choices are not of the
same length, they should be paired by length, at least. Yet, another possibility is to
use no fixed pattern and have each choice with a different length. No single option
should be attractive only because it appears different from the others. A testee who
does not know the answer to the following item would simply choose "a" and he
would be correct in so doing:
***A: "Tom still had his lights on at 1 AM."
***B: "So, he might ______ for today's test."
        a. have been preparing
        b. preparing
        c. prepared
        d. prepare
This flaw is oftentimes observed in items because test constructors tend to ensure
that the correct choice is definitely right and that the distracters are definitely wrong.
The following modifications can improve the item to some extent:
         a.    have been preparing
         b.    had had prepared
         c.    preparing
         d.    prepare

7.      Each item should test only one point. If another point is also incorporated
into the test item and the examinee fails to choose the correct answer, the examiner
cannot say with confidence that the testee has not mastered the point being tested.
For example, if students fail to select the correct choice in the following example, it is
not fair to say that they do not know how to use the adverb with be. Instead, they are
possibly having problems with the use of "here" and "there".
***A: "Is Mary in class early all of the time?"
***B: "No, ______ early."
         a. she always isn't here
         b. always she isn't here
         c. isn't she always there
         d. she isn't always there



Manual for Test Writers and Reviewers 2008                                               9
              Multiple-choice Items
The following general principles should be observed when multiple-choice items are
constructed:

I.     Each multiple-choice item should have only one answer. This answer
must be absolutely correct unless the INSTRUCTIONS specify choosing the best
option (as in some vocabulary tests). Although this may seem an easy matter, it is
sometimes extremely difficult to construct an item having only one correct answer.
An example of an item with two answers is:
***I stayed here until John______________
           a. had come        b. came        c. would come      d. has come
The purpose of the distractors is to appear as plausible solutions to the problem for
those students who have not achieved the objective being measured by the test
item. Conversely, the distracters must appear as implausible solutions for those
students who have achieved the objective. Only the correct answer should appear
plausible to these students.

2.     Only one feature at a time should be tested: it is less confusing for the
testee and it helps to reinforce a particular teaching point. Obviously, few would wish
to test both grammar and vocabulary at the same time, but sometimes word order
and sequence of tenses are tested simultaneously. Such items are called HYBRID
TEST ITEMS and are to be avoided.
***I never knew where _______________              .
a. had the boys gone                c. have the boys gone
b. the boys have gone               d. the boys had gone

3.     Each option should be grammatically correct when placed in the stem,
except of course in the case of specific grammar test items. For example, stems
ending with the determiner a, followed by options in the form of nouns or noun
phrases, sometimes trap the test constructor. In the item below, the correct answer
C makes the sentence grammatically incorrect.

***Someone who designs houses is a _____________.
           a. designer     b. builder c. architect d. plumber
The item can be easily recast as follows:



Manual for Test Writers and Reviewers 2008                                              10
Someone who designs houses is a(n) _____________.
           a. designer                       c. architect
            b. builder                       d. plumber

4.     All multiple-choice items should be at a level appropriate to the
proficiency level of the testees. The context itself should be at a level lower than
the actual problem that the item is testing: a grammar test item should not contain
other grammatical features as difficult as the area being tested, and a vocabulary
item should not contain semantic features in the stem more difficult than the area
being tested.

5.     Avoid the alternatives all of the above and none of the above.

6.     Keep all the alternatives brief (put all common wording in the stem of the
item and avoid wordiness and complex sentence structure.)

7.     Multiple-choice items should be as brief and as clear as possible (though
it is desirable to provide short contexts for grammar items.) In addition, try to avoid
ambiguous terms and statements.

8.     In many tests, items are arranged in rough order of increasing difficulty.
It is generally considered important to have one or two simple items to "lead in" the
testees, especially if they are not very familiar with the kind of test being
administered. However, areas of language which are trivial and not worth testing
should be excluded from the test.




Manual for Test Writers and Reviewers 2008                                                11
   V.          TESTING LISTENING COMPREHENSION
            Recommended item types for testing listening comprehension:
The following item types are in the exam specifications.
Time: 30 minutes
Source material: spoken transcripts taken/adapted from academic books, ESL books,
internet etc. (as per exam specs). Do not „invent‟ dialog, use existing material and adapt
as necessary. Do not „force‟ North Star vocabulary into transcripts.
Listening passage: Appropriate to level. Does not have to follow North Star themes, but
must be accessible to the students. North Star vocabulary should not be tested directly
in the questions – focus is on listening skills/comprehension. Use shorter passages
which provide plenty of content for questions. Passages should provide plenty of
information in order to provide the test writer with more questions than is needed (at
least three more questions), thus enabling the writer to choose the ten best. Test
reviewers should review the listening quiz by having the text read aloud rather than
reading it silently. Please arrange for a reader to read the test aloud while others listen
and review the questions.


Section 1
Listening passage 1: Extended dialog/conversation (Levels 1, 2, 3) Lecture style
(Level 4)
Played twice ??
Item type: MCQ
1 answer, 2 distracters
Questions should follow order of answers in passage
Areas Tested
Detail                >Applicable to all levels – weight according to syllabus
Inference
Speaker's opinion, attitude, emotion
Main idea (should come at end of questions)
10 Questions x 1 mark
Section 2
Listening passage 1 repeated or new listening passage – Listening passage 2:
Lecture style: All levels



Manual for Test Writers and Reviewers 2008                                             12
Item type: Short answer/table completion/ordering/notes etc. (with approval of
supervisor)
Areas Tested
Level specific – e.g. level 2 might be „identifying chronology‟
10 Questions x 1 mark
Total marks: 20

           Sample TOC for Listening exams
   Level 4 Final Listening Spring 07
   Units 1,2 4,5 6 North Star

Section 1                                    Item Tested
Q1            Listen for details
Q2            Listen for supporting details
Q3            Listen for details
Q4            Listen for details
Q5            Infer information not explicit in passage
Q6            Interpret speaker's attitude/ feeling
Q7            Listen for details
Q8            Predict outcome
Q9            Listen for main idea
Q10           Explain main idea
Section 2
Q1            Listen for details
Q2            Listen for details
Q3            Listen for details
Q4            Identify details
Q5            Interpret speaker's emotions
Q6            Listen for details
Q7            Listen for main idea
Q8            Listen for details
Q9            Identify main idea
Q10           Summarize main idea

Guidelines for constructing listening comprehension test items:
           Spoken language (lecture) is different from written language (passage)
            and in a listening test we are testing students‟ listening ability, so a reading
            passage should not be used as a listening transcript.
           Make the questions while you listen to the transcript (not from a written
            version) as you would expect non-native students to answer the questions
            via listening.


Manual for Test Writers and Reviewers 2008                                                13
          Test reviewers should review the listening quiz by having the text read
           aloud rather than reading it silently. Arrange for a reader to read the text
           aloud while others listen and review the questions.
          Background knowledge needs to be taken into account. Ensure students
           have necessary background knowledge.
          Avoid density of information in one lecture.
          Don‟t forget that in listening we are not testing learner‟s knowledge of
           vocabulary.
          Avoid redundancy, and irrelevant sentences. However, as in normal
           speech patterns, repetition is acceptable, especially in a lecture style. For
           example: As I said before.... In other words....To recap ...
          Experts now suggest that fillers and repetitions (similar to those used in
           normal speech) should, in fact, be incorporated – as this gives students
           time to look for important information without losing the thread. The best
           suggested question types for a listening test in our program are MCQs,
           and Gap-fill/Completion/Note-taking
          Note-taking type questions (Section 2) should have specific instructions on
           the number of words to be given in the answer. This can be done by
           mentioning a number on the answer sheet (Q.1. __________ (2 words) It
           is also recommended to keep the word limit to no more than four words.


   Examples from a level 4 Listening exam
   Section 2 (10 X 1 mark) Note Taking
   Directions: Write your answers on the answer sheet provided. This passage
   will be played twice.
   Example:
   Q - The tour guide is a __________. (1 word)
   A - student




   Section 2 Chart completion (see next page)




Manual for Test Writers and Reviewers 2008                                                14
   A: The Guest Speaker
          Job: Botanist

          Workplace: Royal Botanic Gardens

          Supervising research at: 1. Millennium Seed Bank (3 words) in London.

          Future generations can use plants for : 2. food and medicine (3 words)


   B: The Seeds
          Storage of important seeds

          Age of seeds : 3. 200 (number)

          Hidden in : 4. notebook (1 word) in a closet

          Nationality of merchant : 5. Dutch (1 word)

          Seeds originally came from : 6. South Africa (2 words)

          Boarded a German ship going to Holland

          Ship captured by British Navy

          Number of seeds found : 7. 32 (number)

          Eaten by insects


   C: The Research

          Copied effect of fire on seeds

          Number of seeds that grew after preparation : 8. 3 (number)

          Types of seeds that grew: 9. tree / bean (1 word)

          Experiment will take 10. a month (2 words)




Manual for Test Writers and Reviewers 2008                                          15
      Avoid long passages (between 350 words for level 1 and 700 words for level
       4.)
      In all tests and especially listening, instructions should be clear and
       straightforward; in other words, instructions should be written at one level of
       difficulty lower than the actual test.
      The first question should be an „easy‟ one to lower test anxiety.
      Order questions in the way they appear and are heard in the lecture.
      Evenly space questions throughout the talk.
      Don‟t test content in the first 15-20 seconds of the listening script.
      Questions should address all paragraphs of the lecture, not only one or two.
      Students should never be exposed to a new question format in a testing
       situation.
      MCQs should have 3 response options, not 4
      Proficiency tests usually provide input only once but achievement tests
       usually repeat input twice. Buck (2001) recommends once if you‟re assessing
       main idea and twice if you require listening for details (but never 3 times.)
      Research shows that to understand a text students must know between 90 -
       95% of words, so avoid using unknown vocabulary as a key-able response.
      Emphasizing point „1‟ of this guideline: don‟t transform a reading test into a
       listening one, but if you have no other choice, do the following::
       a. Insert oral markers at beginning (e.g. “Today I am going to talk about…”)
       b. Change complex sentences into shorter ones (use of less complex
       structures is characteristic of spoken text.)
       c. Insert devices that help you buy time like pauses and fillers (i.e., um, err,
       ah) which are more common in natural speech.
       d. Use coordinating conjunctions like „and‟, „but‟, or „so‟ instead of „although‟,
       „whereas‟, etc.
       e. Use built-in pauses and other features of oral texts like false starts,
       ungrammaticality, hesitation, etc.
      When writing questions, don‟t forget to make more than you need (at least
       three more), thereby enabling the reviewers to choose the best ones.
      To accommodate students who have teachers of different dialects or accents,
       each script should be performed twice by native speakers of different




Manual for Test Writers and Reviewers 2008                                                  16
       backgrounds (e.g. once with a female Australian/British reader, the other with
       a male American reader.)
      A sample test similar in style and format should be provided to teachers and
       students in advance of the students‟ examination.
      For further guidelines on item design, please also consult the Guidelines for
       reading exam item construction in this manual pp. 18-31




Manual for Test Writers and Reviewers 2008                                            17
         VI.       TESTING READING COMPREHENSION AND VOCABULARY
                  Recommended item types for testing reading and vocabulary
Reading
Time: 60 minutes
Source material: written texts from academic books, ESL books etc. (as per exam
specs). Do not use/adapt interviews/lecture style talks from the internet.
Reading passage: Appropriate to level. Does not have to follow North Star themes,
but must be accessible to the students.
Section 1
Item type: MCQ
1 answer, 3 distracters
Areas Tested
Main idea
Skim/scan          >    Applicable to all levels – weight according to syllabus
Detail
Inference
10 Questions x 1 mark
Section 2
Item type: Short answer/completion/ordering etc. (with approval of supervisor)
Areas Tested
Level specific – e.g. level 2 might be „identifying chronology‟
5 Questions x 1 mark
Section 3
Item type: MCQ
Areas Tested
Vocabulary in context (refer to passage)
5 Questions x 1 mark
Section 4
Item type: MCQ
Areas Tested
Vocabulary found in North Star reading/writing
10 Questions x 1 mark
Total marks: 30



Manual for Test Writers and Reviewers 2008                                          18
               Sample TOC for reading and vocabulary exams

Sample: Testing & Assessment: Table of Contents Common Exams
Level 4: Final Spring 2007

Section 1                                  Item Tested
Q1              Reading for main idea – whole passage
Q2              Reading for main idea – paragraph specific
Q3              Reading for main idea – paragraph specific
Q4              Reading for main idea – identify cause and effect
Q5              Reading for detail
Q6              Reading for detail
Q7              Reading for detail
Q8              Infer meaning from a text – identify point of view
Q9              Infer meaning from a text – interpretation
Q10             Infer meaning from a text – identify themes
Section 2
Q1              Short answer (maximum 3 words ) – skimming
Q2              Short answer (maximum 3 words ) – scanning
Q3              Short answer (maximum 3 words ) – scanning
Q4              Short answer (maximum 3 words ) – scanning
Q5              Short answer (maximum 3 words ) – main idea
Section 3
Q1              Vocabulary in context – idioms
Q2              Vocabulary in context – prefixes
Q3              Vocabulary in context – suffixes
Q4              Vocabulary in context – synonyms
Q5              Vocabulary in context – synonyms
Section 4
Q1              Vocabulary from North Star – from Units 1- 3
Q2              Vocabulary from North Star – from Units 1- 3
Q3              Vocabulary from North Star – from Units 1- 3
Q4              Vocabulary from North Star – from Units 4,5 & 7
Q5              Vocabulary from North Star – from Units 4,5 & 7
Q6              Vocabulary from North Star – from Units 4,5 & 7
Q7              Vocabulary from North Star – from Units 4,5 & 7
Q8              Vocabulary from North Star – from Units 4,5 & 7
Q9              Vocabulary from North Star – from Units 4,5 & 7
Q10             Vocabulary from North Star – from Units 4,5 & 7


How to complete the TOC
1. Carefully read the Reading Syllabus as given to you by the level supervisor
    and note down the objectives for the units being tested; these form the basis of
    the TOC.
2. Apportion the TOC according to the objectives. For example, if „identify main



Manual for Test Writers and Reviewers 2008                                             19
     idea‟ is listed three times in the syllabus, you should aim to have 3 main idea
     questions in the TOC. If the syllabus lists „inference‟ twice, you should aim to
     have 2 inference questions in the TOC. If there is one objective that asks students to
     identify the writer‟s tone, you should aim to have one question on this in the TOC.
3. In practice, there is not usually a great deal of variation in the TOC from one term
     to the next or from year to year. This doesn‟t mean that there is no variation and
     you will have to make small adjustments as the syllabus dictates.
4. The test items do not have to follow the order of the TOC within each section; the
      order depends on the text.
The TOC is a useful tool for a test writer for four main reasons:
1.      It allows the writer to make sure that all of the objectives in the syllabus are covered.
2.      It allows the writer to focus on the objective being tested when writing the test item.
3.      It makes writing the test more efficient because the writer knows how many „main idea‟
        questions, how many „detail‟ questions, how many „scanning questions, etc., need to be
        written.
4.      The writer might be asked to justify the choice of questions. Using the TOC provides a
        solid theoretical foundation.

              Guidelines for reading exam item construction
Reading comprehension tests try to determine the examinees' ability to get meaning
from the printed material.
TRADITIONAL READING TESTS
This relates to the approach adopted in traditional tests of reading comprehension
which make the basic assumption that reading consists of a number of sub-skills.
Hence, such tests usually include short prose passages, each followed by sets of
multiple-choice comprehension items. The items attempt to test the examinees
ability to:
          1 guess the meaning of words from context;
          2 understand the syntactic structure of the passage;
          3 distinguish explicit and implicit ideas;
          4 grasp the main idea of the passage;
          5 recognize the tone, mood, and purpose of the writer;
          6 identify literary techniques of the writer; and
          7 draw inferences about the content of the passage.




Manual for Test Writers and Reviewers 2008                                              20
Although test items assessing these sub-skills are by and large observed in all reading
tests, their precise number and type vary from test to test. They largely depend on the
individual test writer's judgment. Certainly, not all of these sub-skills are observed in a
single reading comprehension test. Since they are mutually dependent, they may
coincide in various combinations in one single item. For instance, an item asking for the
main idea may require the examinee to not only find the topic sentence but also
understand the explicit and implicit ideas discussed in the passage.

A proficient reader uses these abilities concurrently in any attempt at reading. The
following example illustrates a sample test intended for advanced EFL examinees. It
consists of a paragraph of 130 words followed by seven comprehension items. Of
course, having this many items on such a short passage is beyond routine practice;
they are particularly designed for instructional purposes. Ordinarily, a passage of about
100 words yields three decent items. Fewer than that makes the test inefficient; more
items, on the other hand, compel the test constructor to test trivial points.

EXAMPLE
INSTRUCTIONS: This is a test to show how well you comprehend written English.
First read the paragraph carefully and then answer the questions that follow. For each
item four choices are given. Circle the letter representing the best answer for each item.

A good education should, among other things, train you to think for yourself. The
examination system does anything but that. What has to be learnt is rigidly laid down by
a syllabus, so the student is encouraged to memorize. Examinations do not motivate a
student to read widely, but to restrict his readings; they do not enable him to seek more
and more knowledge, but induce cramming. They lower the standards of teaching, for
they deprive the teacher of all freedom. Teachers themselves are often judged by
examination results and instead of teaching their subjects, they are reduced to training
their students in exam techniques which they despise. The most successful candidates
are not always the best educated; they are the best trained in the technique of working
under compulsion.




Manual for Test Writers and Reviewers 2008                                            21
1. What is the main idea of this paragraph?
      a. Examinations are not achieving the goals they should.
      b. Teachers are not teaching their subjects any longer.
      c. Exams are used to evaluate both teachers and students.
      d. The education system trains you to think for yourself.
2. According to the paragraph, a high grade on an exam is a sign of __________
       a. good education
       b. thorough study
       c. dependable exam techniques
       d. training in test-taking
3. Judging teachers on the outcome of exams has ___________
       a. made students look down at examinations
       b. lowered teaching standards at schools
       c. forced students to study the techniques
       d. resulted in a better educational system
4. What does the writer say about the examination system?
       a. It doesn‟t train students to think.
       b. It motivates a student to read.
       c. It is part of the education system.
       d. It raises standards of education.
5. We can infer from the paragraph that ______affect mainly the quality of teaching:
       a. teachers
       b. students
       c. candidates
       d. examinations
6. We can infer from the paragraph that exams should __________
       a. be based only on syllabus
       b. train you to think for yourself
       c. show which teachers are more effective
       d. make pupils learn the contents of the syllabus




Manual for Test Writers and Reviewers 2008                                         22
7. What is the tone of the writer?
       a. He feels no respect for the system
       b. He tries to find fault with the system
       c. He understands the situation and is pleased
       d. He knows the problems but supports the practice


The item forms illustrate two conceivable patterns commonly utilized in tests of
reading comprehension: the question format· and the incomplete-sentence
format. In general, test writers should aim for a balance between the two. The
question should not be longer than a sentence and should avoid complex
grammatical constructions. The possible answers should be of similar length.


In short, the seven items above are aimed at testing the following points in the text:
1.     Main idea: examinations are not achieving the goals they should.
2.     Distinguish implicit ideas: the examination system has resulted in an emphasis
on teaching to the test; and that judging teachers on the outcome of the examinations
has had an adverse effect on their teaching.
3.     Understand the structure of the passage: examinations are doing
everything except what they really should.
4.     Draw inferences: examinations are responsible for the poor quality of
teaching at the present time, and examinations should train individuals to think for
themselves.
5.     Identify the tone of the writer: he is critical of the existing system of
examinations.
Although experts are not definite as to how many reading sub-skills there are, or if it
is at all possible to measure these sub-skills, traditional types of reading tests remain
the most widely used means of assessing reading ability. Indeed, this approach
provides a handy frame of reference for designing tests. Furthermore, it may have
some value for reading instruction.




Manual for Test Writers and Reviewers 2008                                               23
GUIDELINES

The following general guidelines are suggested for preparing reading comprehension
tests:
1.       In the preparation of reading selections, it is advisable to keep certain
points in mind:
        Passages should be realistic in terms of language use. In
achievement tests, sampling should reflect the language learning objectives; in
proficiency tests, on the other hand, sampling should reflect the type of reading
activity the testees will be likely to encounter in real-life later. When choosing a text
for reading comprehension, the primary focus should be on whether the topic is
appropriate. It is unlikely that you will find a perfect text that does not need
modification or, in some cases, rewriting. Most texts can be edited so that difficult
vocabulary is removed and/or adjusted to fit the level while retaining the original
meaning. For example, „Dual Income Families‟ can be rewritten as „Two-Income
Families‟. A phrase such as “We need to nurture relationships in order to achieve our
goals” can be edited to read “We need to develop our relationships in order to
achieve our goals”. The editing process requires the test writer to think carefully
about each and every word in the text.
        Because the content and the rhetorical organization of the stimulus
materials have specific kinds of impact on reading comprehension, reading tests
should comprise passages whose content is suitable, culturally fair, and
representative of the knowledge and interests of the students. However, stimuli
that are common knowledge or familiar to the examinees through outside knowledge
should be avoided. Each reading passage should have enough substance to
allow for the required number of comprehension items. At present, each of the
selections should have enough content to yield about 10 comprehension items, 5
short answer questions and 5 „vocabulary in context‟ questions. In general, a Level 4
reading text should be around 850 words. A Level 1 reading text should be around
550 words. If the text is too long, the writer may risk having paragraphs deleted in
the review process. Make sure than any edited text remains cohesive. The length of
each paragraph should be about equal.




Manual for Test Writers and Reviewers 2008                                                  24
2.       Items for traditional tests can be of two forms: open-ended and multiple-
choice.
         Open-ended items are easy to construct but difficult to score.
Obviously, they define the task by requiring the testees to write the response to
prove their comprehension of the reading passage.
        Short answer questions should be constructed in a manner that allows/forces
students to answer using a limited number of words. The reason behind this is that
students tend to copy large chunks of texts, hoping the answer to the question is
located in that chunk, rather than showing clear comprehension of the question and
of the reading passage. Therefore it is advised to state in the instructions the number
of words students are allowed to write in order to answer the question. This can be
done by mentioning a number on the answer sheet (i.e. Answer the following
questions using a maximum of three words.) It is also recommended to keep the
word limit to no more than five words.
        Example from a level 4 reading exam:
Section 2 (5 x 1 mark)

Directions: On your answer sheet, answer each of the following questions
using a maximum of 3 words for each question.

Example:       What is the husband‟s traditional role in the family?
               To earn money
     Q 1 - How do women feel they must behave at work?
     A 1 - Aggressive(ly) and competitive(ly)
     Q 2 - What do men find uncomfortable to do when talking to their wives?
     A 2 - Expressing their feelings
     Q 3 - Who is responsible for household duties and child care in traditional families?
     A 3 - Women/wives        OR       the woman/the wife
     Q 4 - What are household duties based on in role-sharing marriages?
     A 4 -Traditional roles
     Q 5 - How much of the housework do men do in non-traditional marriages?
     A 5 - One third OR       1/3




Manual for Test Writers and Reviewers 2008                                            25
         Multiple-choice items are difficult to prepare though they only assess
reading skills. In preparing multiple-choice items, the following considerations are in
order:
        Choices should be syntactically correct and semantically
plausible. Moreover, they should be of equal size or be paired by length.
        The lead of the item should clearly illustrate the problem. An item like the
following is distracting. The testees do not have enough information until they read
all the choices:
         ***Teachers ________________
a. encourage students to learn up hastily for exams
b. make students look down at examinations
c. force students to study test techniques
d. help students to learn the art of test-taking
        Items should require the testees to read the reading materials before they
can correctly answer them. Items that can be answered without reading the passage
should be avoided. The following item, for example, is not passage dependent since a
less knowledgeable testee should be able to eliminate choices A, B, and D easily
without even looking at the passage:
         ***One can infer from the paragraph that examinations should __________
a. test what the students have memorized hastily
b. be important to both students and teachers
c. motivate students to study extensively
d. contain only difficult questions
        Items should require the testees' thorough understanding and interpretation
of the reading materials. Items that require mere matching of fragments of the passage
should be avoided. Consider the following example.
         ***Exams lower the standard of teaching because _______________
a. the most successful candidates are not always the best educated
b. the best trained in the technique of working under compulsion are students
c. they are reduced to training their students in exam techniques
d. they deprive the teacher of all freedom
Testees who have not understood the text are able to answer this item correctly
because it is easy to match the words in the item with those in the passage.



Manual for Test Writers and Reviewers 2008                                             26
       The level of difficulty in the Multiple Choice section should move from easy
to difficult. The first question in the multiple choice section should be easier than the
last question. As far as possible, the level of difficulty should move from easy to
difficult, but this will depend on the text.
       Items should provide no clue to the right answer of any item in the test.
The following item with three negative choices, for instance, gives a clue to the
testees. These choices clearly lead the testees to the right choice. If this item was
included in the Sample Question presented above, it would also give away the
correct choices in the other items:
        ***According to the paragraph, examinations do anything except __________
a. train people to think for themselves
b. limit pupils to memorize their tasks
c. make students study facts hastily
d. lower the standards of teaching


       Guidelines for vocabulary exam item construction
       After the lexical items have been selected, the second task of the test constructor
is to determine the form of the test items. Ideally, the word to be tested is underlined in
a context and supplemented with four possible meanings. The four choices are
generally paraphrases of the underlined word in simple terms:
Example 1
After discussing the matter for two hours, the committee adjourned without having
reached a decision.
        a. finished   b. continued             c. took off      d. broke off
        This is a good way of testing the subjects' understanding of specific words
and expressions. This form, however, has three disadvantages. First, it limits the
testing of only one word in each test item. Second, sometimes lexical items do not
lend themselves to four sensible paraphrases. Third, it allows the testees to
ignore the whole context and get to the meaning of the word being tested. Hence,
a variation of this form without the above-mentioned shortcomings is often included
in vocabulary tests. This form of item is constructed by deleting the word to be
tested from its context. The testees read the sentence and then choose one of the
four given choices that best completes the sentence. This format is at the same time



Manual for Test Writers and Reviewers 2008                                              27
economical for it allows testing the examinees' knowledge of four lexical entries at
each time:
Example 2
After discussing the matter for two hours, the committee _____ without having reached
a decision.
           a. debated         b. collapsed          c. adjourned          d. groused
        An item form that is generally very popular in vocabulary tests is the so-called
standard vocabulary form. Such a form presents a definition, usually very brief, and
asks the testees to pick one of the four choices for the given definition:
Example 3
Adjourn means:
                   a. cause to turn aside.
                   b. break into pieces.
                   c. come into use again.
                   d. break off for a time.
Example 4
Break off for a time means:
a. collapse            b. adjourn             c. deflect           d. prostrate
This is a very economical form of vocabulary item, but such a format encourages
testees to memorize lists of words, which is not pedagogically advisable. A good and
meaningful context, however, provides the examinees with an opportunity to guess
the meaning of the word being tested. After all, testing should also offer an
opportunity for learning. The standard format does not fulfill this requirement.
        In constructing vocabulary items, the following very general guidelines are
recommended:

1. The context should be clear enough to provide the testees with a clear
meaning. A short conversation-like context would make a better lead than a brief
statement, especially for less proficient subjects. For instance, to test the word
borrow a context like "Betys borrowed Kourosh's book" is considered poor for it only
shows that borrow is a verb; the following context provides a better frame for this
word:
                  A: "Did Berta buy another book?"
                  B: "No, she ______Kourosh's.”


Manual for Test Writers and Reviewers 2008                                             28
   a. borrowed                 b. sold                 c. lent             d. returned

2. Do not to include in the items any grammatical structures or erroneous
source of difficulty that the testees may find hard to comprehend. The
following sentence contextualizes the test word, but it is poor since understanding
the context itself for the examinees at this stage of language proficiency is very
difficult.
        ***Being unfortunate to have been bereaved of his belongings, Redha
________lucky John's book.
        a. borrowed           b. sold        c. lent             d. returned
        Along the same lines, the following item is also poor because finding the
    correct choice requires the examinees to possess a certain body of knowledge
    which is beyond the mastery of lexical items:
        ***We can raise sugar cane in Qatar because this country has a
______climate.
        a. dry                b. wet         c. warm             d. cold

3. If the item being written is a paraphrase-type (Example 1), the choices should
be easier than the word being tested. Moreover, the choices should be of the
same grammatical form as the underlined word. That is, the options must be
readily comprehensible to the testees and syntactically acceptable as
replacements for the word being tested. The foIIowing displays a defective item
because the first requirement has not been taken into account in its construction:
        ***The child was frightened of being left alone in the dark room.
                  a.   annoyed
                  b.   ashamed
                  c.   terrified
                  d.   dismayed
        This item has four choices that are approximately of the same level of
difficulty; yet, they are probably as difficult as the word being tested. The third choice
is less frequent than the underlined word. Therefore, if a testee fails to give the
correct response to this item, it cannot be said with confidence that he does not
know the word frighten. The testee may, in fact, have difficulty understanding the
word “terrify”.


Manual for Test Writers and Reviewers 2008                                               29
       The item below demonstrates how to improve the preceding example, though
it is still defective for another reason. Of the four given choices, only one can be
syntactically substituted for the underlined word in the context. Consequently, the
ability to answer this item correctly is not necessarily related to one's knowledge of
the underlined word; one can choose the correct answer by relying on knowledge of
structure alone. Such being the case, this item does not measure the testee's
understanding of the word „frighten‟.
       ***The child was frightened of being left alone in the dark room.

                  a.   made somewhat angry
                  b.   feeling shame
                  c.   attacked suddenly
                  d.   filled with fear
       Similarly, the following item shows another aspect of item deficiency. Here,
only one of the four choices makes sense in the context assuming that the
underlined word is completely missing. Thus again, this item does not test the
testee's knowledge of the word being tested.
       The detective is investigating a hideous crime.
                  a. a torrential
                  b. a detestable
                  c. an insensible
                  d. an immaculate

4. If the item is a completion type item (Example 2), the distracters and the word
being tested should be of the same level of difficulty. In addition, the four
choices must be syntactically acceptable in the stem. For instance, an item such as
the following is considered poor because the first two choices are much easier than
the others. The examinees can easily identify the first two as inappropriate and get
closer to the correct answer for the wrong reason:
   ***Ahmad says he would write an English course book if he could find a
________ to deal with the less interesting parts.
   a. reader                  b. tailor      c. lithographer     d. collaborator
Also an item such as:
   ***I got up so late that I ______had time for breakfast.



Manual for Test Writers and Reviewers 2008                                               30
   a. dull                    b. rarity      c. hardly            d. yet
includes poor distracters since the testees can readily eliminate the first two choices
because they are grammatically inappropriate.

5. The choices should be, as much as possible, related to the same general topic
or area:
   ***She came to the party in a _____ dress.
   a. capable         b. sincere       c. lunatic          d. hideous
It is very likely that examinees will easily eliminate the first three choices because they
may not be associated with dresses. The following item, however, would probably
require much greater control of lexical items:
   ***She came to the party in a hideous dress.
   a. a nice          b. an old       c. a very large      d. a very ugly

6. The choices should be approximately of the same length. No choice should
be much shorter or much longer than the others.
   ***Claire says she would write an English Course Book if she could find _____ to
deal with the less interesting parts
   a. a vet           b. an idiot     c. an ace            d. a collaborator
   The last choice of this item is quite different from the others at least in appearance. It
is suspected that a good proportion of examinees would choose “d” for whatever
reason. Some experts believe that such a flaw is due to a temptation on the part of test
constructors to deliberately include a more correct choice to ensure that it is definitely
right. Consequently, a more balanced set is essential; either the four choices are of
equal length or they are paired in length:
   a. an arbitrator b. a proofreader         c. an entertainer    d. a collaborator

   or

   a. a partner       b. an editor           c. an entertainer    d. a collaborator




Manual for Test Writers and Reviewers 2008                                              31
         VII.   TESTING WRITING
      Recommended item types for testing writing
The following item types can be found in the exam specifications.
Time: 45 minutes (Levels 1 &2)               60 minutes (Levels 3 & 4)
Source material: Writing prompts adapted from ESL books, internet sites, TOEFL
IBT type questions etc.
Section 1
Item type: Writing prompt – Students must choose 1 of 4 prompts provided.
Prompt must define what is expected from the student – e.g. paragraph, 5 paragraph
essay, language areas specific to level, number of words etc.
Areas Tested
Content as per syllabus– weight according to importance within syllabus
Ensure prompt tests functions taught in particular level. E.g. compare-contrast /
narrative / description, etc.
10 marks
Total marks: 10
        Sample TOC for Writing exams

Testing & Assessment: Table of Contents Common Exams – Level 4 Final, Spring 2007

Section 1:                              Item Tested
Q1         A minimum of three paragraphs (350 – 450 words) cause and effect essay
                on a topic based students‟ life, experience and background.
Q2              A minimum of three paragraphs (350 – 450 words) compare and contrast
                essay on a topic based on students‟ life, experience and background.
Q3              A minimum of three paragraphs (350 – 450 words) discussing the
                advantages and disadvantages of a topic based on students‟ life,
                experience and background.
Q4              A minimum of three paragraphs (350-450 words) expressing an opinion on
                a topic based on students‟ life, experience and background.


        Guidelines for constructing writing prompts
Before designing writing prompts, have a look at each level‟s writing syllabus. Make a
list of the various writing skills practiced and taught over the course of the semester.
Note if some of the skills are practiced more than once.
As you design writing prompts, it will help to consider the following issues:
        What is the discourse aim?
        What is the cognitive complexity?



Manual for Test Writers and Reviewers 2008                                               32
       What is the topic?
       Who is the audience?


1. Discourse Aim
In order to simplify, we may refer to this as the PURPOSE. For example: persuasion,
problem-solving, comparison, analysis of an issue, or description.
To accomplish writing to these different purposes, writers must use different
strategies and content; however, each general category of purpose permits a variety
of types of writing within it. For example, a persuasive writing may involve defending
an argument or position, or exploring a problem and its solution.


2. Writing Prompt Components
       Issue
       Descriptive Set-up
       Knowledge Base
       Writer‟s Intent and Writing Task
       Audience
       Form
Issue
The issue is the “problem” and should be of enough complexity to allow for the
expression of diverse viewpoints. It should be relevant to the experiences, interests
and/or capacity of our students. Topics are drawn from the thematic content or
schema as it related to the students‟ context and experience.
Descriptive Set-up
   •    Write a sentence or two that give the current situation of the issue within a
        framework that engages the writer.
   •    Describe the situation in a realistic context for examining the issue, and
        present enough information to familiarize any potentially uninformed writer
        with the nature of the issue. The context may be historical, current, or
        hypothetical.
   •    Key terms that might be unfamiliar are to be defined, paraphrased, or
        illustrated with examples.




Manual for Test Writers and Reviewers 2008                                              33
Knowledge Base
Writers should be able to produce a complete and competent response using their
experience and / or knowledge gained through their academic experience during the
semester. However, the writing prompts must not repeat any essay topics that been
covered in class.
Writing Task
Cues in the wording of the prompt should make it clear whether the writer is to:
• examine different sides of a controversy
• choose a position and provide support for that position
• analyze a problem and its solution(s).
The organization or structure of the response should be appropriate to the task.
Audience
Audience should be specified. The audience may range from the familiar (fellow
students or family members) to the distant (legislators, school board members,
newspaper subscribers).


3. Understanding Writing Prompts
The following describes the steps students should follow to understand writing
prompts.
      Read the prompt silently.
      Ask what the purpose of the prompt is: persuasion, problem solving, analysis
       of an issue, comparison, or description? How can you tell?
      Who is the audience you are writing to in the Prompt?
      In one sentence, explain what your paper is going to be about. (What is the
       issue?)
      What is your task in this paper?


4. Test yourself – examine the following writing prompts and complete the
tasks below
Example 1 -- Read the following writing prompt:

The A/C in your classroom is not working. Write an email to your university
administrator to explain and resolve the problem.
Write a paragraph/essay of x number of words (depending on level)
Please mark P for purpose, A for audience, I for issue, T for task


Manual for Test Writers and Reviewers 2008                                           34
Example 2 -- Read the following writing prompt:

You are a student at Qatar University. Write your American pen pal a letter
about the advantages and disadvantages of studying in Qatar.
Write a paragraph/essay of x number of words (depending on level)
Please mark P for purpose, A for audience, I for issue, T for task

As you construct writing prompts, keep these issues in mind and remember to
include the following:
        Issue
        Descriptive Set-up
        Knowledge Base
        Writer‟s Intent and Writing Task
        Audience


5.       Do not include idiomatic or unfamiliar vocabulary in the writing prompts.
The vocabulary used in the writing prompt should be relatively simple and
straightforward. In other words, the student should not have difficulty understanding the
prompt.




Manual for Test Writers and Reviewers 2008                                          35
       VIII. EXAM FORMATTING
Exam formatting templates will be provided for test writers to follow.
{Please note that writers will be required to format section 2 (in the form of a table) in
the listening exam}
The following formatting rules describe the template that test writers will work with.
This format cannot be altered as the template is password-protected. Please note
that only the testing coordinator and exam writing team leaders have the password.
      Font: Arial
      Font size for cover pages: Heading 1 in „word‟
      Font size for instructions: 11 (Important instructions/words may be in bold)
      Font size for questions: 10.5
      Examples should be indented and spaced separately from instructions and
       questions. Examples should not be numbered – only questions. Use the word
       „Example:‟ to indicate its purpose.
      All pages (except cover page) should be numbered with the number in the
       middle of the bottom of the page.
      Margins: Top: 0.5 cm, Bottom: 0.5 cm, Right: 0.5 cm, Left: 2.0 cm
      Line spacing: 1.5
      Reading: Paragraph and line numbers should be indicated for the reading
       passage.
      Please do not use the „space‟ bar to format your typing.
      Please use the „tab‟ button.
      Please use the formatting functions provided by „word‟ to number your
       questions, and for all other formatting requirements. Please do not use „Arabic
       word‟ settings when typing – it makes changing the format extremely time
       consuming.
      Most Important: Please ask your supervisor, the testing coordinator or any
       other qualified person if you are not sure how to format something – we are
       happy to show you, it saves time later in the process.




Manual for Test Writers and Reviewers 2008                                               36
       IX. EXAM PACING PROCEDURES
Since the process of exam writing, reviewing, approving, copying,…etc is of a
delicate nature, and since time is an important factor, the Testing and Assessment
Coordinator always develops an Exam Pacing document. This document is
distributed to all test leaders, writers, reviewers and Level Supervisors early on
before the exams are developed. The following is a sample of the Pacing document:

   Test Writing & Reviewing Procedure: Final Exam                          Fall 2007
 WEEK 8        Nov. 18
 Writers prepare ToS’s for        (COB = Close Of Business = 2pm)
 final exam and submit
 completed ToS’s for
 approval to Team Leaders
 by COB Thursday Nov.
 22nd



                                                      Level Supervisors check
 WEEK 9          Nov. 25                              ToSs

                                                      Approve or suggest
 Writers receive the                                  changes by 9 am
 relevant Fall 2007 Final                             Wednesday Nov. 28
 exams from Test Leaders


                                     ToS’s


 WEEK 10         Dec. 2

 Team Leaders forward
 approved/modified ToS’s
 to writers

 Team Leaders give the              ToS’s
 piloted tests to the writers.

 Writers Write / Rewrite/
 Modify questions

 Writers Submit First draft of
 modified items to Test
 Leaders by Thursday COB
 Dec. 6



Manual for Test Writers and Reviewers 2008                                             37
   WEEK 11        Dec. 9

   Reviewers and Level                                 Level Supervisors –
   Supervisors receive first                           review (only new
   drafts: Sunday Dec. 9 am                            questions) {Please leave
                                                       the color coding on the
   Team leaders give drafts                            document}, suggest
   to:                                                 changes

                                        First Drafts

                                                       Reviewers
  WEEK 12: Eid Break Dec.         18- Dec. 24



   WEEK 12/13
   Dec. 16/17/26/27                                       Testing Coordinator plans
                                                          Recording
   All reviewed exams must be
   returned to Team Leaders
   by CoB on Wed. Dec. 12th                               Dec. 26/27

   Team Leaders & Reviewers
   meet with writers – give
   feedback Thu. Dec. 13th

   Writers rewrite
   Submit second drafts to
   Team Leaders by 9 am on
   Wednesday Dec. 26th


                                                          Level Supervisors check
                                                          recordings for appropriacy
                                                          of speed & sound quality


                                                          Reviewers Receive the
                                                          Second drafts by COB
WEEK 14        Dec. 30                                    Thursday Dec. 27th
                                      Second Drafts
Team Leaders receive
exams from Reviewers by 9
am Dec. 31st,

Team Leaders make final                                   Testing Coordinator
changes to exams and                                      Approves final drafts
submit all to the Testing
Coordinator by 9 a.m.                                     Organizes photocopying
Thurs. Jan. 3rd                                           & boxing




  Manual for Test Writers and Reviewers 2008                                           38
                     APPENDIX A – TEST CONSTRUCTION
Test construction is the entire process of creating and using a test. Test development is
organized into the following three stages:
              Planning
In the planning stage, we describe in detail the components of the test design that will
enable us to ensure that performance on the test tasks will correspond as closely as
possible to language use, and that the test scores will be maximally useful for their
intended use. In addition, a set of TEST SPECIFICATIONS is written. This will include
information on content, function, format and timing, criteria levels of performance, and
scoring procedures.
              Writing
The second stage in test development is the actual writing of test items. Although
writing items is very time-consuming, writing good items is an art. Essentially, those who
write the test items should possess four characteristics. They have to a) be experienced
in test construction; b) be quite knowledgeable of the content areas of the test; c) have
a capacity in using language clearly and economically; and d) have readiness to
sacrifice time and energy. This stage of test development includes the following
components:
a) Sampling: This is an activity by which the test developer chooses widely from the
   whole area of the course content. One should not concentrate on those elements
   known to be easy to test. Rather, the content of the test should be a
   representative sample of the course material.
b) Item writing: This activity involves the actual writing of the test items. Typically,
   principles can be suggested that apply to almost all kinds of test items; yet, no set
   of rules can be given to ensure the production of good items. However, the
   following are general directions for writing test items of various forms:
    Write more items for each point than you really need so that if some of the items are
      later found to be defective, there will still be enough items for the refined test.
      It is recommended that the refined test contain two or three items on each point.
    Ask questions that experts could agree have only one correct response.
    Write items that ask more specific rather than very general questions.
    State each item as clearly and accurately as possible.
    Avoid complex and awkward word arrangements and non- functional words.


Manual for Test Writers and Reviewers 2008                                                  39
      Make the items and INSTRUCTIONS explicit.
      Avoid trick and catch questions.
      Avoid long and involved statements that contain many qualifying phrases.
      Avoid using sentences directly taken from texts or other sources.
      Use novel materials in formulating tasks.
      Avoid giving the testee a choice of task to fulfill.
      Arrange items in a way that progresses from easy to difficult.
               Reviewing
In this stage of test construction, the written items need to be reviewed with respect
to the accuracy and appropriateness of content. This can be done either by the very
test constructor or by someone else. Through the reviewing stage, problems
unnoticed by the test developer will most likely be observed by the reviewers. Then,
the reviewers would suggest modifications in order to alleviate the problems. This
stage includes the following components:
a.       MODERATION of the items: This activity refers to submitting the test to a
colleague or, preferably, a number of colleagues to be scrutinized in terms of its
content, clarity of instructions, etc.
b.       MODERATION of the scoring key: Once the items have been agreed upon,
the next task is to write a scoring key. Where there is intended to be only one
correct response, this is a perfectly straightforward matter. Where there are
alternative acceptable responses, which may be awarded different scores, or where
partial credit may be given for incomplete responses, greater care is necessary.
Once again the criticism of colleagues should be sought as a matter of course.
c.       PREPARATION of instructions: After the test items have been compiled,
detailed and crystal clear directions should be written. The directions should clearly
tell the testees what they are expected to do. The directions should also inform the
test takers whether they will be penalized for wrong answers or not. Consider giving
sample items in the directions. And the time limits must be plainly stated.




Manual for Test Writers and Reviewers 2008                                               40
              APPENDIX B – ATTRIBUTES OF A GOOD TEST
For a test to display dependable results, it must possess certain attributes. Among
the desired attributes three are essential: validity, reliability, practicality. Validity
indicates the extent to which the test measures what we actually wish to measure.
Reliability shows how accurately and precisely the test measures what it is intended
to measure. Practicality is concerned with the feasibility of the test in terms of
economy, convenience, and interpretability of the results.
               I.     Validity
Validity is the single most important attribute of a good test. A test is useful if it is
appropriate in terms of the objectives for which the test is utilized. That is, a test is
good or considered valid, if it measures what we want it to measure but nothing else.
It follows that any given test then may be valid for some purposes, but not for others.
The matter of concern in testing is to ensure that any test employed is valid for the
purpose for which it is administered. In other words, are the items in a test
measuring the course objectives that have been adhered to? Does it apply to what
was taught in class? Are you measuring knowledge of concepts or minor pieces of
trivia? Validity tells us what can be inferred from test scores. Validity is a quality of
test interpretation and use. If test scores are affected by abilities other than the one
we want to measure, they will not be meaningful indicators of that particular ability.
If, for example, we ask students to listen to a lecture and then to write a short essay
based on that lecture, the essays they write will be affected by both their writing
ability and their ability to comprehend the lecture. Ratings of their essays, therefore,
might not be valid measures of their writing ability. In test validation we are not
examining the validity of the test content or even of the test scores themselves, but
rather the validity of the way we interpret or use the information gathered through the
testing procedure. To refer to a test or test score as valid, without reference to the
specific ability or abilities the test is designed to measure and the uses for which the
test is intended, is therefore more than a terminological inaccuracy.
               II.    Reliability
Reliability is a quality of test scores which refers to the consistency of measures
across different times, test forms, raters, and other characteristics of the
measurement context. Synonyms for reliability are: dependability, stability,
consistency, predictability, accuracy. A reliable man, for instance, is a man whose



Manual for Test Writers and Reviewers 2008                                                  41
behavior is consistent, dependable, and predictable, i.e., what he will do tomorrow
and next week will be consistent with what he does today and what he has done last
week. We say he is stable. An unreliable man, on the other hand, is one whose
behavior is much more variable. He lacks stability. We say he is inconsistent. In
language testing, for example, if a student receives a low score on a test one day
and high score on the same test two days later (the test does not yield consistent
results), the scores cannot be considered reliable indicators of the individual's ability.
Or, if two raters give widely different ratings to the same sample, we say that the
ratings are not reliable.
               III.   Practicality
Practicality is concerned with the practical characteristics of a test such as costs, the
amount of time it takes to construct and to administer, ease of scoring, and ease of
interpreting/reporting the results. Practicality pertains primarily to the way in which
the test will be implemented, and, to a large degree, whether it will be developed and
used at all. That is, for any given situation, if the resources required for implementing
the test exceed the resources available, the test will be impractical, and will not be
used unless resources can be allocated more efficiently, or unless additional
resources can be allocated.




Manual for Test Writers and Reviewers 2008                                                42
                            RECOMMENDED READING
For a comprehensive bibliography and further reading on Language Testing, please
visit the following websites:




http://eric.ed.gov/ERICWebPortal/custom/portlets/recordDetails/detailmini.jsp?_nfpb
=true&_&ERICExtSearch_SearchValue_0=ED189868&ERICExtSearch_SearchType
_0=eric_accno&accno=ED189868


http://links.jstor.org/sici?sici=0039-
8322(197303)7%3A1%3C86%3AED%3E2.0.CO%3B2-M




Manual for Test Writers and Reviewers 2008                                        43

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:4
posted:10/11/2010
language:English
pages:43