Document Sample
Deans Powered By Docstoc
					Interpreting IDEA
Heather McGovern
Director of the Institute for Faculty Development
November 2009

add info on criterion vs. normed scores
Add info on standard deviation:Standard deviations of about 0.7 are typical. When
these values exceed 1.2, the class exhibits unusual
diversity. Especially in such cases, it is suggested that
the distribution of responses be examined closely,
primarily to detect tendencies toward a bimodal
distribution (one in which class members are about
equally divided between the “high” and “low” end of the
scale, with few “in-between.” Bimodal distributions
suggest that the class contains two types of students
who are so distinctive that what “works” for one group
will not for the other. For example, one group may have
an appropriate background for the course while the other
may be under-prepared; or one group may learn most
easily through “reading/writing” exercises while another
may learn more through activities requiring motor
performance. In any event, detailed examination of
individual items can suggest possible changes in prerequisites,
sectioning, or versatility in instructional
I.     The appropriate role of IDEA in the context of evaluation
       of teaching at Stockton
        A.   Validity
        B.   Reliability and representativeness
II.    Interpreting data
        A.   Mean
        B.   Halo effect
        C.   Error of central tendency
        D.   Faculty selection of objectives

III.   Making comparisons
        A.   IDEA, Stockton, and disciplinary comparisons
        B.   Converted scores
        C.   Adjusted scores
        D.   Norming
        E.   Comparisons to SET data
IV.    Other things to consider
V.     References
The appropriate role of IDEA in
overall evaluation of teaching
The IDEA Center “strongly recommends that
additional sources of evidence be used when
teaching is evaluated and that student ratings
constitute only 30% to 50% of the overall
evaluation of teaching.” Primary reasons:
  o “some components of effective teaching are
    best judged by peers and not students”
  o “it is always useful to triangulate information...”
  o no instrument is fully valid
  o no instrument is fully reliable
What Stockton candidates provide
for evaluation of teaching
Stockton policy states that “evidence of
  teaching performance should be
  demonstrated by a teaching portfolio, as
  outlined below, which should contain the
 A self-evaluation of teaching
 Student evaluations of teaching and
  preceptorial teaching
 Peer evaluations of teaching
 Other evidence of effectiveness in teaching”
Correlations of multiple measures
of teaching excellence
Student ratings correlate with faculty,
  alumni, and administrator ratings:
Administrator         .39 to .62
Colleagues            .48 to .69
Alumni                .40 to .70
Trained observers .50 to .76
Student comments .75 to .93
Student ratings are valid
The validity of student ratings has been
 checked with correlation studies of multi-
 section course instructor ratings to
 external tests. These have indicated
 validity ratings that are “practically useful”
 (between .30 and .49) as follow:
Achievement of learning .47
Overall course               .47
Overall instructor           .44
Students cannot validly rate all
important qualities of teaching
Of 26 factors Cashin (1989) identifies as
 relevant to teaching effectiveness, there are
 eleven which students cannot assess.

Keig and Waggoner (1994) grouped these into
 three categories:
 “(1) the goals, content, and organization of
 course design, (2) methods and materials
 used in delivery, and (3) evaluation of
 student work, including grading practices.”
How Stockton defines “excellence
in teaching” and what students rate
 “A thorough and current command of the subject matter, teaching
  techniques and methodologies of the discipline one teaches
 Sound course design and delivery in all teaching assignments…as evident in
  clear learning goals and expectations, content reflecting the best available
  scholarship or artistic practices, and teaching techniques aimed at student
 The ability to organize course material and to communicate this
  information effectively. The development of a comprehensive syllabus for
  each course taught, including expectations, grading and attendance policies,
  and the timely provision of copies to students.
 …respect for students as members of the Stockton academic community,
  the effective response to student questions, and the timely evaluation of
  and feedback to students.”
“Where appropriate, additional measures of teaching excellence are
 Ability to use technology in teaching
 The capacity to relate the subject matter to other fields of knowledge
 Seeking opportunities outside the classroom to enhance student learning
  of the subject matter”
False assumptions about IDEA
 Effective teaching=students make
  progress on all 12 learning objectives
 Effective teachers= teachers who employ
  all 20 teaching methods
Reliability and
A number of classes are needed to
draw accurate conclusions
File reviewers should keep in mind that the
IDEA Center “recommends using six to
eight classes, not necessarily all from the
same academic year, that are representative
of all of an instructor’s teaching
The number of student responders
affects reliability
The number of student respondents affects reliability. In this
context, reliability refers to consistency, interrater reliability.
Fewer than ten students are unreliable—evaluators should pay
scant attention to numerical data from classes with fewer than
ten respondents. IDEA reports the following median rates:
  10 raters          .69 reliability
  15 raters          .83 reliability
  20 raters          .83 reliability
  30 raters          .88 reliability
  40 raters          .91 reliability

  Reliability ratings below .70 are highly suspect.
The number of student responders
affects representativeness
 Higher response rates provide more representative
 data. Lower response rates provide less
 representative data.

 This is especially an area of concern for classes using
 the online IDEA which has a lower response rate.
 Dennis Fotia informs me that in Fall 2008, the
 response rate was 62.9%. Spring 2009 data:
 ◦   Number of Surveys Processed: 50
 ◦   Number of Respondents: 1045
 ◦   Number of Responses: 714
 ◦   Average Response Rate: 71.5%
Interpreting data
Mean scores can be affected by
Note that average scores as provided on
the IDEA form are mean scores. As such,
they can be affected by outliers. Careful
evaluators will check the statistical detail on
page 4 to note the presence of outliers.
Scores can be affected by the halo
Ranters and Ravers, or the halo effect
    “the tendency of raters to form a general
   opinion of the person being rated and then let
   that opinion color all specific ratings. If the
   general impression is favorable, the "halo effect" is
   positive and the individual receives higher ratings
   on many items than a more objective evaluation
   would justify. The "halo effect" can also be
   negative; an unfavorable general impression will
   lead to low marks "across the board", even in
   areas where performance is strong.”
How can you know?
Look at the student forms themselves. If a
 form gives someone a 5 all the way
 down—halo effect! In most cases, also
 true with a 1 or any other number all the
 way down…
The Error of Central Tendency
can affect scores
 “Most people have a tendency to avoid the
 extremes (very high and very low) in making
 ratings. As a result, ratings tend to pile up
 more toward the middle of the rating scale
 than might be justified. In many cases, ratings
 which are "somewhat below average" or
 "somewhat above average" may represent
 subdued estimates of an individual's status
 because of the "Error of Central Tendency.”
How faculty select objectives can
affect scores
The “Summary Evaluation” provided on
page one of the IDEA report weights
Progress toward Relevant Objectives at
50% and Excellent Teacher and Excellent
Course at 25%.

Therefore, evaluators should pay attention
to the objectives a faculty member
Things evaluators should check
 The  teacher selected objectives. If not, by
 default, all will be considered “important.”
 This makes most information on the first
 summary page of the report worthless.
 The objectives the teacher chose seem
 reasonable for the course.
The teacher discusses problematic
 objective choices or irregularities in the
Consider external factors in
objective selection
 Most of the time at Stockton, the ultimate
 decisions about objectives are the
 teacher’s choice. However, some (and a
 growing number of) programs encourage
 (or for all practical purposes, especially
 for untenured faculty, require) some
 objectives to be held in common. This is
 particularly the case with courses of
 which there are multiple sections.
Faculty can help evaluators…
Logically, faculty members creating their files
  should note if they
 forgot to select objectives, which seriously
  impacts the results,
 later see that they chose objectives poorly.,
 were using objectives in common with a
  larger group of courses, but those were
  problematic for their class, or
 need to report an unusual situation that
  likely affected student progress towards
  objectives or student perception of the class.
Making comparisons
IDEA compares class results to
three groups
1) Three years of IDEA student ratings at 122
   institutions in 73, 722 classes (excluding classes
   with fewer than 10 students, limiting to no more
   than 5% of database from any one institution,
   excluding first time institutions)
2) Classes at your institution in the most recent
   five years (excluding classes with no objectives
   selected, including classes of all sizes)
3) Classes in the same discipline in the most
   recent five years where at least 400 classes with
   the same disciplinary code were rated
   (excluding as in 1) plus courses with no selected
The validity of comparisons varies
The validity of comparisons depends on a number of
factors, including how “typical” a class is, compared to
classes at Stockton or all classes in the IDEA
database or how well the class aligns with other
classes with the same IDEA disciplinary code.

Some classes at Stockton align poorly with “typical”
classes—say, a fieldwork class or a class with an
cutting-edge format.
External factors can affect
comparisons and ratings
 Students in required courses tend to report lower.
 Students in lower level classes tend to report lower.
 Arts and humanities >social science > math (this may
  be because of differences in teaching quality or due to
  quantitative nature of courses, both, or other factors).
 Gender/age/race/culture/height/physical attractiveness
  and more may be factors, as they are in many other
  areas of life.
 If the students are told the evaluation will be used in
  personnel decisions the scores are higher.
 If the instructor is present during the evaluation the
  scores are higher.
Some external factors don’t usually
affect ratings
 Time of day of the course
 Time in the term in which evaluations are
  given (after midterm)
 Age of student
 Level of student
 Student GPA
Some disciplinary comparisons are
 Many classes align poorly with disciplinary
 codes: CRIM stats here, which is
 compared either with Criminal Justice or
 with Mathematics. Or developmental
 writing here, which is higher level than
 many but also for credit. Or most of our
 G courses, perhaps particularly our GIS
We should use converted scores
IDEA states that “Institutions that want to
  make judgments about teaching
  effectiveness on a comparative basis
  should use converted scores.”
Why we should use converted
 The 5-point averages of progress ratings on
  “Essential” or “Important” objectives vary across
  objective. For instance, the average for “gaining factual
  knowledge” is 4.00, while that for “gaining a broader
  understanding and appreciation for
  intellectual/cultural activity is 3.69.
 Unconverted averages disadvantage “broad liberal
  education” objectives.
 Using converted averages “ensures that instructors
  choosing objectives where average progress ratings
  are relatively low will not be penalized for choosing
  objectives that are particularly challenging or that
  address complex cognitive skills.”
Why we should use adjusted
averages in most cases
Adjusted scores adjust for “student
 motivation, student work habits, class size,
 course difficulty, and student effort.
 Therefore, in most circumstances, the IDEA
 Center recommends using adjusted scores.”
How are they adjusted?
 “Work Habits (mean of Item 43, As a rule,
 I put forth more effort than other students
 on academic work) is generally the most
 potent predictor…Unless ratings are
 adjusted, the instructors of such classes
 would have an unfair advantage over
 colleagues with less dedicated students.”
How are they adjusted, part II
“Course Motivation (mean of Item 39, I
 really wanted to take this course regardless
 of who taught it) is the second most
 potent predictor. …unless ratings are
 adjusted, the instructors of such classes
 would have an unfair advantage over
 colleagues with less motivated students.”
How are they adjusted, part III
“Size of Class…is not always statistically
  significant; but when it was, it was always
  negative – the larger the class, the lower
  the expected rating.”
How are they adjusted, part IV
“Course Difficulty, as indicated by student ratings of item 35,
   Difficulty of subject matter” is complicated because the
   instructor influences students’ perception of difficulty.
Therefore, “A statistical technique was used to remove the
   instructor’s influence on “Difficulty” ratings in order to
   achieve a measure of a class’s (and often a discipline’s)
   inherent difficulty. Generally, if the class is perceived as
   difficult (after taking into account the impact of the
   instructor on perceived difficulty), an attenuated outcome
   can be expected.”
Notable examples: in “Creative capacities” and
   “Communication skills” “high difficulty is strongly associated
   with low progress ratings.”
In two cases, high difficulty leads to high ratings on progress
   toward objectives: “Factual knowledge” and “Principles and
How are they adjusted, part V
“Student Effort is measured with responses to item 37, I
  worked harder on this course than on most courses I have
  taken. “ Here, because response reflects the students’
  general habits and how well the teacher motivated
  students, the latter is statistically removed from the
  ratings leaving the fifth extraneous factor, “student
  effort not attributable to the instructor.” Usually,
  student effort is negatively related to ratings.
A special case is that in the cases of “Classes containing
  an unusually large number of students who worked
  harder than the instructor’s approach required”
  which get low progress ratings, maybe because people
  were unprepared for the class or lack self-confidence
  and so under achieve “or under-estimate their
  progress in a self-abasing manner.”
A critical exception to using
adjusted scores
“We recommend using the unadjusted score if the
average progress rating is high (for example, 4.2
or higher).”

In these cases, students are so motivated and
hard-working that the teacher has little
opportunity to influence their progress, but
“instructors should not be penalized for having
success with a class of highly motivated
students with good work habits.”
Another exception to using
adjusted scores: Assessment of
“In deciding which ratings to use, it is
  important to consider whether the focus
  is on student outcomes or on instructor
  contributions to those outcomes. For the
  former, “Unadjusted” ratings are most
  relevant; for the latter, “Adjusted” ratings
  are generally more appropriate.”
Do not try to cut the scores more
precisely than IDEA does…
Because the instrument is not perfectly
 valid or reliable, trying to compare scores
 within the five major categories IDEA
 provides is not recommended.
Norming sorts people into broad
 Scores are normed. Therefore, it is
 unrealistic to expect most people to
 score above the similar range. Statistically,
 40% of people ALWAYS score in the
 similar range and 30% above and 30%
 below that range.
More thoughts on norming…
   Many teachers teach well. Therefore, the
    comparative standard is relatively high. Being
    “similar” is not bad. It is fine.
   If we made a list of 10 teachers at random at
    Stockton, we’d expect that one would fall
    into the “much lower” range, two into
    “lower,” four into “similar,” two into “higher,”
    and one into “much higher” if we think
    Stockton teachers are basically comparable
    to the teachers in the IDEA database.
Thoughts about comparing to SET
We can’t perform the most accurate
 comparisons of IDEA data to SET data
 because we don’t know the standard of
 error for the SET. The SET also did not
 convert or adjust or norm scores. Questions
 on it were not tested for validity and
 reliability. Most questions don’t compare to
 the IDEA form (and those that do could be
 differently influenced by the other questions
 on the forms).
Mathematical conversion…
All that said, I’ve found what are supposed to
 be fairly valid equations for adjusting from a
 5 point to a 7 point scale and back.

That said, research indicates that scales with
 fewer points (the IDEA compared to the
 SET) allow for less precise measurement.
 Apparently this mainly means that because
 people can’t go as much higher, scores tend
 to be lower even after being converted.
Other things to consider
Other items that relate to our
definition of excellent teaching
Primarily, pages one and two should be
 used for summative evaluation of teaching.
 Page 4, which provides raw data, should
 be at least skimmed to note distribution
 of scores and for responses to additional
Due to our definition of excellence in
 teaching, we should also attend to item 17
 on page 3 (“Provided timely and frequent
 feedback…” for summative evaluation.
Items to consider for formative
 In cases where evaluators can or choose to
 provide formative evaluation, page 3 is
 essential. Here, evaluators should note that
 effective teaching methods and styles depend
 upon the learning objectives for the class,
 and IDEA notes these. IDEA also provides
 suggestions for areas of strength, areas of
 weakness, and areas that are ok but have
 room for improvement. Evaluators can point
 to these to see what behaviors to
 recommend or applaud.
Teachers can use page 3
Teachers can look to the information on
  page three to see what steps they might
  take to improve student progress on
  various objectives. See sample report.
   Cashin,William. “Student Ratings of Teaching, the Research Revisited.” 1995. Idea paper 32.
   Cashin,William. “Student Ratings of Teaching: A Summary of the Research.” 1988. Idea
    paper 20.
   Colman, Andrew, Norris, Claire., and Preston, Carolyn. “Comparing Rating Scales of
    Different Lengths: Equivalence of Scores from 5-Point and 7-Point Scales.” 1997.
    Psychological Reports 80: 355-362.
   Hoyt, Donald and Pallett, William. “Appraising Teaching Effectiveness: Beyond Student
    Ratings.” Idea paper 36.
   “Interpreting Adjusted Ratings of Outcomes.” 2002, updated 2008.
   Pallet, Bill. “IDEA Student Ratings of Instruction.” Stockton College, May 2006.
   “Using IDEA Results for Administrative Decision-making.” 2005.

Shared By:
yan tingting yan tingting