Performance Levels for the
Document Sample


ILLINOIS STATE BOARD OF EDUCATION:
DIVISION OF ASSESSMENT
PERFORMANCE LEVELS FOR THE ILLINOIS
STANDARDS ACHIEVEMENT TESTS:
Report Date: August 2, 1999
MetriTech, Inc.
4106 Fieldstone Road
Champaign, Illinois 61822
(217) 398-4868
CONTENTS
EXECUTIVE SUMMARY .................................................................1
STANDARD-SETTING PROCEDURES .............................................4
Participants
Group Leaders
The Mathematics and Reading Process
The Writing Process
RESULTS.......................................................................................9
Reliability of the Ratings
Cut Scores
Participant Evaluations
EXECUTIVE SUMMARY
This report summarizes the approach that was used to establish performance
categories for the ISAT mathematics, writing, and reading tests at each of the
four grade levels at which ISAT is administered. The development of cutoffs in
science and social sciences will be completed in 2000 following the first
administration of these new tests.
The development of useful and defensible standards and the execution of the
standard-setting process itself are complex tasks. The State Board of Education
relied on the contributions of many talented educators throughout Illinois to
successfully accomplish the standard setting process.
Prior to the meetings of the standard-setting panels themselves, ISBE convened
committees of curriculum experts to develop descriptions of student knowledge
and skill levels that define the four performance categories: Academic Warning,
Below Standards, Meets Standards, and Exceeds Standards. Educators
throughout Illinois extensively reviewed these descriptions or definitions before
sending them to the standard-setting panelists.
Panels of recognized subject matter experts subsequently convened in
Springfield to translate the verbal definitions into cut scores on the ISAT tests
(i.e., scores that define the boundaries between categories). Panelists were
drawn from a pool of educators who had specific knowledge of student
performance at the grade levels being assessed by ISAT and experience in as-
sessing students at those grade levels. Panelists were selected to be broadly
representative of the geographic and ethnic diversity of Illinois’ public school
system. A total of 170 educators participated in the standard-setting process.
The distribution across learning areas was as follows: mathematics—56;
writing—62; reading—52.
A procedure originally proposed by Angoff is one of the most frequently used
methods for determining cut scores when multiple-choice test scores are used.
It can be most simply described as a focused, judgmental process by
knowledgeable content experts. The basic Angoff procedure fit the format of the
ISAT reading and mathematics tests. However, certain modifications of the
basic procedure were developed to fit the format of the ISAT writing tests.
In the most frequent application of the Angoff method (e.g., to establish a pass-
fail standard), panelists are asked to examine an item and decide what
proportion of minimally competent individuals will answer the question
correctly. With respect to the ISAT, however, instead of being asked about
minimally competent students, panelists were asked to indicate what percent-
age of three groups of students—those who were just above the Academic
Warning/Below Standards boundary, those who were just above the Below
Standards/Meets Standards boundary, and those who were just above the
–1–
Meets Standards/Exceeds Standards boundary—would answer the question
correctly. The ratings were made sequentially rather than simultaneously (i.e.,
panelists judged the proportion of correct responses by a criterion group to
every item before moving on to the next criterion group). Item performance
statistics were provided to help panelists anchor their ratings.
When the cut scores that emerged from this panel’s work were applied to score
distributions based on the 1999 ISAT census administrations, the percentages
of Illinois public school students in each mathematics category were as shown
in the chart below.
As this chart shows, the percentages of students not meeting standards in
mathematics averages approximately 46%, with the largest concentration of
such students occurring at 8th grade. The percentage of students who exceed
standards is largest at 3rd grade (21%) but averages only about 5% at each of
the other three grades.
The cut scores that emerged from the writing panel’s work lead to the
distributions shown in the next chart. The percentage of students not meeting
standards in writing is greatest at 3rd grade and averages about 36% across all
grades. The percentages of students in the Meets Standards category is very
consistent across grades (50%-56%). In contrast, the Exceeds Standards
category varies quite widely, with only 3% at 8th grade and 23% at 5th grade.
The reading cut scores lead to the distributions shown in the third chart. The
percentages of students not meeting standards is greatest at 3rd and 5th grade
and somewhat lower at the higher two grades. About 34% of students do not
meet standards for reading across grade levels. The percentage of students
falling in the exceeds category is higher for reading than for either of the other
two areas.
–2–
Writing
100% 6 3
12
90% 23
80%
50 56
70%
54
60%
52 Exceed
50% Meet
Below
40% Academic Warning
30% 35
36
28
20%
23
10%
9 6
2 5
0%
3 5 8 10
Grade
Reading
100%
17 15
18
90% 24
80%
70%
44
60% 37 55
54
Exceed
50% Meet
Below
40% Academic Warning
30%
31
38
20% 25
27
10%
8 6
1 1
0%
3 5 8 10
Grade
A number of checks were made on the adequacy of the ratings. Agreement
among the panelists was excellent at all four grade levels. Evaluation forms
completed at the end of the session indicated that the overall level of panelist
confidence in the ratings was extremely high.
These cutoffs represent a set of fixed benchmarks against which schools can
measure the success of their improvement efforts. The percentages of students
who fall into each category may shift each year in response to changes in the
student populations tested. However, as school improvement efforts are
effectively implemented throughout the state, the expectation is that these
percentages will systematically spiral upward.
–3–
1. STANDARD-SETTING PROCEDURES
Technical procedures by which standards are translated into test scores have
been available for many years. They are most often applied in the case of tests
used for certification or licensure when a cutoff must be defined that separates
qualified from unqualified test takers. A procedure originally described by An-
goff is most frequently used in such cases. This procedure can be most simply
described as a focused, judgmental process by knowledgeable content experts.
A modification of the basic Angoff procedure was used to establish cutoffs for
the ISAT tests. In the most frequent application of this method (e.g., to
establish a pass-fail standard), panelists are asked to decide what proportion of
minimally competent individuals will answer each question correctly. With
respect to the ISAT, however, instead of being asked about minimally compe-
tent students, panelists were asked to indicate what percentage of three groups
of students—those who were just above the Academic Warning/Below
Standards boundary, those who were just above the Below Standards/Meets
Standards boundary, and those who were just above the Meets
Standards/Exceeds Standards boundary—would answer the question
correctly. The ratings were obtained sequentially. Panelists first made the
Below Standards/Meets Standards cut for every item, then the Academic
Warning/Below Standards cut, and, finally, the Meets Standards/Exceeds
Standards1 cut. Item statistics were provided to help panelists anchor their
ratings.
Participants
Panels of recognized subject matter experts convened for a two-day period
during the last two weeks of April 1999 in Springfield. Panelists were drawn
from a pool of educators who had specific knowledge of students’ capabilities at
the grade levels being assessed by ISAT and experience in assessing students
at those grade levels. A single panel was used at each grade level.
The educators who served as panelists were drawn from schools and districts
throughout the state. They were recruited by staff from ISBE’s Standards
Division. The vast majority of panelists were classroom teachers. Participants
were assigned to each panel on the basis of their background and experience.
That is, third-grade teachers worked with the third-grade test, sixth-grade
teachers with the sixth-grade test, etc. Panelists were selected to be broadly
representative of the geographic and ethnic diversity of Illinois’ public school
system. The distribution of panelists across grades and learning areas is shown
in Table 1. A total of 170 educators established standards for the ISAT reading,
writing, and mathematics tests.
1 For simplicity, these cutoffs will be referred to subsequently as the “Meets” (Below
Standards/Meets Standards), “Below” (Academic Warning/Below Standards), and “Exceeds”
(Meets Standards/Exceeds Standards) cutoffs.
–4–
Table 1
Distribution of Panelists Across Grades and Learning Areas
Grade Mathematics Writing Reading
3 13 14 14
5 15 13 16
8 15 16 12
10 13 19 10
Prior to the meeting, panelists were sent background material to prepare them
for the standard setting process. The material consisted of a description of the
standard-setting process and the definitions that had been developed.
Group Leaders
The overall operation of the standard-setting panels was under the direction of
MetriTech professional staff. In addition, four individuals who had extensive
experience with the ISAT assessments and the content they covered were
selected to serve as table facilitators or group leaders. They participated in a
three-hour pre-session designed to familiarize them with the procedures to be
used. In addition, staff from ISBE’s Standards Division were available
throughout the work days.
The Mathematics and Reading Process
The work of each panel required two full days to complete. The program began
with a two-hour, large-group presentation that oriented participants to the task
and explained the procedures that would be followed. The rating procedure was
described in detail and time was provided to answer all questions.
The panelists then broke up into grade-level groups. Each group was
moderated by facilitators (group leaders) who were not themselves involved in
making ratings. They began by leading panelists in an extended discussion of
the category to be rated. The focus was on the performance definitions and the
knowledge and skills that defined the borderline student.
When panelists had a clear picture of the type of student to be rated in mind,
group leaders proceeded to the ratings themselves. They began with sample
items that were used for training purposes only. In this way, test items to be
rated were not viewed by the panelists prior to the beginning of the rating
process itself.
–5–
Item analysis data was made available to help panelists understand the
difficulty level of individual items and anchor their judgments. Random
samples of approximately 16,000 students at each grade who had taken the
tests in February 1999 were used for this purpose. Students were split into one
of five ability levels (quintiles) based on their performance on the overall test.
That is, students whose overall scores placed them in the lowest fifth in terms
of the score distribution (1-20%) represented the first group of students,
students whose scores placed them in the next lowest fifth (21-40%)
represented the second group of students, and so forth. For each group,
panelists were shown the percentages of students within that group who had
gotten the item correct.
For each item, panelists examined the item and the associated data, then made
an initial rating. When all panelists had completed their initial rating, they
reported their ratings to the group. If there was more than a 20% difference in
the range of ratings, the group leader led a discussion of the ratings. The
purpose of this discussion was not to force consensus but rather to allow the
panelists to discuss the reasons for their ratings. This often resulted in one or
more persons becoming aware of some facet of the item that they had not
originally considered in their ratings. When the discussion was completed,
panelists were asked to make a final rating on the item. Final ratings were not
announced to the rest of the group.
After panelists completed their ratings of the sample items, group leaders
passed out the actual 1999 ISAT test questions and rating forms. Initially,
groups worked with one item at a time. As group members felt more
comfortable with the process, they began rating several items prior to dis-
cussing them. All groups first completed ratings for the Meets cutoff, followed
by ratings for the Below and then the Exceeds cutoffs.
The process proceeded cautiously and carefully. At the end of the first day of
work, each group had completed ratings related only to the Meets cutoff.
The same procedure was followed on the second day to make the Below and
Exceeds cutoffs. Groups began with a discussion of the level to be rated,
worked with practice items, and then made their ratings of the actual test
questions.2 Both sets of ratings were completed by the end of the second day.
When the ratings were finished, all panelists were asked to complete an
evaluation form that was used to obtain their reactions to the procedure. Pan-
2 In some rating applications, panelists are asked to review an item and make multiple ratings
simultaneously. On the surface this approach appears to be more efficient. However, the difficulty
panelists have in developing and maintaining a clear image in their minds of the kinds of students they
are rating and in shifting between reference groups suggest that this approach is likely to produce less
reliable ratings.
–6–
elists also offered suggestions for enhancing the existing definitions and
provided recommendations regarding specific ISAT items.
The Writing Process
Because the ISAT writing test does not involve a set of multiple-choice items,
modifications to the previously described basic procedure were required to
establish cutoffs for the ISAT writing test. Three different experimental
procedures were developed for this purpose.
Rating Procedure A. In this approach, the panel of raters focused directly on the
analytic scoring scale that is used to score ISAT essays. Four steps led to the
development of the Procedure A ratings.
• Group leaders conducted a group discussion of each feature to be
rated and the interpretation of each score point.
• Panelists made a preliminary rating for the feature. This is a number
between 1.0 and the upper limit of the scale (6.0, except for
Conventions = 2.0). The rating is intended to estimate the score that
would be obtained by students who are just good enough to have met
the standard.
• Group leaders conducted a group discussion of these ratings. The
purpose of this and all discussions of ratings was to allow panelists to
clarify their ratings and share with others the basis for their
judgment.
• Panelists made a second rating following the discussion. These ratings
were not discussed further.
As was done with reading and mathematics, panelists were provided with data
on student performance to help them anchor their judgments. Students’ scores
from the 1999 ISAT test administration were split into one of five ability groups
(quintiles) based on the overall writing score. That is, students whose scores
placed them in the lowest fifth in terms of the score distribution (1-20%) rep-
resented the first group of students, students whose scores placed them in the
next lowest fifth (21-40%) represented the second group of students, and so
forth. For each group, panelists were shown the average score of students
within that group on each writing feature. Results of Procedure A were based
on the second or final set of ratings.
Rating Procedure B. In this approach, the panel of raters worked again with the
analytic scoring scale but were asked to provide a different set of judgments.
This time the panelists were shown the percentage of students in each ability
quintile who scored at each point on the scale and asked to estimate a similar
set of percentages for students whose performance was just good enough for
–7–
them to have met the standard. After the first set of ratings was complete,
panelists discussed them and then proceeded to a final rating. There was no
restriction placed on the percentages assigned to any score point, except that
the total of the ratings was required to sum to 100. Results from Rating
Procedure B were based on the second set of ratings.
Rating Procedure C. The third approach involved direct categorization of
student writing. A sample of 40 essays was selected from those that had been
previously scored by the Writing Validation Committee. Scores for these essays
represent consensus scores by the committee and are used to train the scorers.
Panelists were asked to classify each essay based on the level of student
performance each essay represented. Twelve categories were defined, three for
each of the standard levels. For example, category three (“1+”) represented
writing that was above average for Academic Warning students but not good
enough to be considered Below Standards. Category four (“2-”) represented
writing that was below average for Below Standards students, but good enough
to be above Academic Warning. The writing samples presented to each panel
represented all three writing genres assessed by ISAT (persuasive, expository,
narrative). Panelists were not shown scores along with the essays.
After all ratings were completed, the average score assigned by the Validation
Committee for each feature was obtained for each category across judges. The
average of Category 3 (“1+”) and Category 4 (“2-”) was used to estimate the
Academic Warning/Below Standards cutoff. The average of Category 6 (“2+”)
and Category 7 (“3-”) was used to estimate the Below Standards/Meets
Standards cutoff. The average of Category 9 (“3+”) and Category 10 (“4-”) was
used to estimate the Meets Standards/Exceeds Standards cutoff.
–8–
RESULTS3
Reliability of the Ratings
A number of checks were made on the ratings. First, the ratings at each grade
level were analyzed to obtain the variance component estimates necessary to
calculate generalizability coefficients. These coefficients, technically intraclass
correlations, represent the degree of agreement among panelists in their
ratings. Two types of coefficients4 are reported in Table 2. The first coefficient
(“interrater”) represents the average level of correlation between the ratings of
any two panelists. The second coefficient (“intergroup”) represents the level of
correlation to be expected between the average ratings of one group of panelists
and a second, similarly sized group of panelists.
On average, the interrater coefficient is .9447 across the three areas and the
intergroup coefficient is .9953. There is no systematic difference in the
reliability of the ratings across learning areas despite the different approaches
used for reading and mathematics, on the one hand, and writing, on the other.
In terms of statistical indexes that may be more familiar, consider the following
illustration. If the ratings were item scores and the items were persons (i.e., the
kind of score matrix that is usually used for analyzing the reliability of a test),
then the first intraclass correlation (interrater) would be interpreted as the
average inter-item correlation, and the second intraclass correlation
(intergroup) would be interpreted as coefficient alpha (α). Thus, the reliability of
these ratings exceed those of the best individual achievement tests.
As a check on the impact of group discussion on ratings, means and standard
deviations for the average preliminary item ratings and average final item
ratings were calculated. These are presented in Table 3. The table shows
separate values for writing Process A and Process B. Only a single set of ratings
were collected for writing Process C, which involved classification of papers. As
3 After analysis of the mathematics panels’ rating data, a recommendation was made to ISBE
to convene two additional panels at grades 5 and 8 as a check on the results provided by the
first set of panelists, who had produced cut scores that were unusually deviant relative to the
ability of the student population than those produced by other groups in mathematics and
those operating in reading and writing. Following an analysis of the second panels’ data, the
agency requested an adjustment to the final cut scores for mathematics that brought the four
Meets cutoffs evenly in line with respect to percent correct scores. These adjustments affected
the 5th- and 10th-grade Meets cutoffs and left the other two grades at their original levels. The
results shown in this section for mathematics relate to the final cutoffs after adjustment.
4 The formulas used to calculate the two coefficients are as follows:
Intraclass correlation among r ratings (interrater) = (MSr - MSpxr)/(MSr + (r-1) MSpxr)
Intraclass correlation of an average of r ratings (intergroup) = (MSr - MSpxr)/MSr
where MSr is the mean square between ratings and MSpxr is the mean square for persons by ratings.
–9–
this table shows, mean scores are remarkably consistent from initial to final
rating period, and any differences that do occur are not in any consistent
direction. Standard deviations tend to decrease rather consistently between
preliminary and final ratings. One interpretation of these data is that the
discussion resulted in a tempering of extreme ratings, both high and low,
toward the group mean. Group discussions did not, however, systematically
raise or lower the final ratings.
Table 2
Generalizability Coefficients for Ratings
READING MATHEMATICS WRITING
Grade 3 Interrater Intergroup Interrater Intergroup Interrater Intergroup
Below .96 .99 .98 .99 .74 .97
Meets .97 .99 .98 .99 .95 .99
Exceeds .98 .99 .99 .99 .99 .99
Grade 5
Below .95 .99 .94 .99 .90 .99
Meets .95 .99 .93 .99 .96 .99
Exceeds .95 .99 .83 .99 .99 .99
Grade 8
Below .92 .99 .96 .99 .91 .99
Meets .89 .98 .93 .99 .96 .99
Exceeds .94 .99 .92 .99 .98 .99
Grade 10
Below .95 .99 .97 .99 .86 .99
Meets .94 .99 .98 .99 .90 .99
Exceeds .96 .99 .96 .99 .98 .99
Table 3
Comparison of Initial and Final Ratings
Initial Rating Final Rating
READING
Below Cutoff Mean SD Mean SD
Grade 3 15.90 1.84 15.28 0.76
Grade 5 14.36 1.65 13.96 1.07
Grade 8 15.76 1.32 14.92 1.12
Grade 10 23.68 0.87 23.36 0.69
–10–
Table 3 (continued)
Meets Cutoff
Grade 3 29.93 0.51 29.52 0.35
Grade 5 37.53 0.97 36.94 0.89
Grade 8 36.74 1.57 36.36 1.18
Grade 10 38.80 1.30 38.38 1.17
Exceeds Cutoff
Grade 3 40.28 0.32 40.20 0.26
Grade 5 46.14 0.63 46.20 0.55
Grade 8 48.85 0.60 48.93 0.44
Grade 10 48.74 0.31 48.70 0.32
MATHEMATICS
Below Cutoff Mean SD Mean SD
Grade 3 20.67 1.06 20.75 0.68
Grade 5 15.94 2.03 16.07 1.92
Grade 8 15.32 1.61 15.08 1.53
Grade 10 14.75 1.27 14.94 0.90
Meets Cutoff
Grade 3 31.09 1.18 30.41 0.77
Grade 5 23.06 2.18 23.58 2.48
Grade 8 31.02 6.56 30.82 6.42
Grade 10 25.85 1.49 25.75 1.24
Exceeds Cutoff
Grade 3 45.23 0.53 45.16 0.50
Grade 5 53.35 2.15 53.44 2.14
Grade 8 49.96 1.95 50.06 1.93
Grade 10 53.85 0.64 53.90 0.53
WRITING (Process A)
Below Cutoff Mean SD Mean SD
Grade 3 13.64 0.86 13.48 0.70
Grade 5 11.08 0.30 11.36 0.34
Grade 8 14.80 0.78 14.37 0.46
Grade 10 12.86 1.73 12.94 1.23
Meets Cutoff
Grade 3 19.11 0.78 19.85 0.69
Grade 5 18.70 1.28 18.64 0.55
Grade 8 19.80 0.73 19.28 0.73
Grade 10 20.38 1.81 19.93 1.19
–11–
Table 3 (continued)
Exceeds Cutoff
Grade 3 30.51 0.75 30.40 0.51
Grade 5 28.15 0.37 28.20 0.27
Grade 8 26.24 1.40 26.94 1.02
Grade 10 27.88 1.13 27.77 0.95
WRITING (Process B)
Below Cutoff Mean SD Mean SD
Grade 3 14.42 0.70 14.06 0.33
Grade 5 11.87 0.73 11.91 0.63
Grade 8 15.13 0.65 15.22 0.34
Grade 10 15.06 0.77 15.20 0.47
Meets Cutoff
Grade 3 20.96 0.83 21.13 0.49
Grade 5 15.39 0.32 15.56 0.28
Grade 8 20.04 0.78 20.33 0.58
Grade 10 20.01 1.58 20.53 0.88
Exceeds Cutoff
Grade 3 30.47 0.89 30.09 0.64
Grade 5 27.71 0.30 27.32 0.30
Grade 8 27.27 1.01 27.28 0.86
Grade 10 27.14 0.35 27.00 0.15
Checks were also made for potential outliers (i.e., raters who were unusually
high or low relative to the ratings of the group as a whole). A few such ratings
were found and removed from the final calculation of the cutoffs. However,
removal of outliers did not systematically or significantly alter the cutoff scores.
Cut Scores
When panelist’s ratings are averaged across test items, the result is a “cut
score” on the test that distinguishes two groups of examinees. For example, if
panelists believe that 40% of borderline group students will answer each of 10
questions correctly, then a raw score of .4 * 10 = 4 is the minimum score a stu-
dent must obtain to be judged above the standard on this hypothetical 10-item
test.
In reading and mathematics, none of the panels were able to rate the complete
set of items included in each 1999 ISAT test. Consequently, the cut scores
derived from the rating process required adjustments. First, two calibration
runs were conducted on each test. In the first run, all items in the test were
included. In the second run, only those items rated by the panelists were
included and their difficulties were anchored at the values obtained from the
initial run. This had the effect of equating the full-length and shorter versions
of each test. Then, the proficiency level or theta value corresponding to each
raw score cutoff on the equated short test was identified. Finally, the scaling
–12–
constants used to transform each theta value to the ISAT scale were applied to
these values. The result was a scale score that represented the minimum
acceptable scale score for entry into the category. These scale score values are
shown in Table 4.
Table 4
ISAT Cutoffs for Each Level
READING
Academic Below Meets Exceeds
Warning Standards Standards Standards
03 120-137 138-155 156-173 174-200
05 120-129 130-155 156-170 171-200
08 120-128 129-151 152-172 173-200
10 120-135 136-152 153-174 175-200
MATHEMATICS
Academic Below Meets Exceeds
Warning Standards Standards Standards
03 120-141 142-152 153-172 173-200
05 120-137 138-157 158-190 191-200
08 120-137 138-161 162-184 185-200
10 120-138 139-157 158-187 188-200
WRITING
Academic Below Meets Exceeds
Warning Standards Standards Standards
03 6-13 14-21 22-29 30-32
05 6-13 14-20 21-27 28-32
08 6-14 15-20 21-27 28-32
10 6-14 15-20 21-27 28-32
In writing, similar adjustments were unnecessary because panelists worked
directly with the rubric scales. However, the use of three rating procedures
resulted in three sets of cut scores.5 As the generalizability coefficients
discussed earlier showed, agreement among the three methods was generally
quite high. Overall, the results of Process A resulted in cutoffs that were
somewhat lower than the other two approaches, and the results of Process B
resulted in cutoffs that were somewhat higher than the other two approaches.
The differences were generally small. For these reasons, the decision was made
to average the three values.
Table 5 shows the percentages of students meeting each standard based on the
cutoff scores given in Table 3 for the total 1999 ISAT test populations.
Table 5
5The 5th-grade panel encountered significant difficulty in making the Process B ratings for the Meets cut.
They were unable to complete these ratings and the 5th-grade Meets cut is based on the average of the
other two processes.
–13–
Percentages of 1999 Illinois Students At Each Grade Level Who Fall Into Each Category
Academic Below Meets Exceeds
Warning Standards Standards Standards
Reading
Grade 3 8 31 44 17
Grade 5 1 38 37 24
Grade 8 1 27 54 18
Grade 11 6 25 55 15
Mathematics
Grade 3 12 20 47 21
Grade 5 5 40 53 3
Grade 8 5 52 36 7
Grade 11 6 42 47 5
Writing
Grade 3 9 35 50 6
Grade 5 2 23 52 23
Grade 8 5 36 56 3
Grade 11 6 28 54 12
Note: These percentages are based on data obtained from NCS in June 1999.
There may be very slight differences in the percentages shown in the final
school/district reports.
Participant Evaluations
Evaluation forms were completed by each panelist at the end of the panelist’s
two-day session. Three open-ended questions were asked of each participant:
What were the most positive aspects of the experience? What aspects of the
work caused you the most difficulty? What recommendations would you make
for improving future sessions of this type? In addition, participants were asked
three questions designed to assess their confidence in each of the three cutoffs
and provided a 10-point scale on which to mark their answers. The lower end
of the scale (1) was anchored by the phrase “Not very confident” and the upper
end of the scale (10) was anchored by the phrase “Very confident.”
In terms of participants’ confidence in the Meets ratings, the average rating
across all grades and all areas was 7.92, with a modal rating of 9.00. In terms
of participants’ confidence in the Below ratings, the average rating across all
grades and all areas was 7.88, with a modal rating of 9.00. In terms of
participants’ confidence in the Exceeds ratings, the average rating across all
grades and all areas was 8.30, again with a modal rating of 9.00. Participants
in all grade levels and all areas expressed very high levels of confidence in their
judgments of cutoff scores.
–14–
Related docs
Get documents about "