The Value in Evaluation

W
Shared by: HC120807021435
Categories
Tags
-
Stats
views:
0
posted:
8/6/2012
language:
English
pages:
40
Document Sample
scope of work template
							The Value in Evaluation
          Erica Friedman
   Assistant Dean of Evaluation
              MSSM
 DOES                 Longitudinal Clinical
                         Observation               OSCE
SHOWS                                         SP
HOW                          Practical              Oral
        Application
                        Essay       Note
KNOWS         MEQ                   review
HOW
            MCQ
KNOWS
                      Written            Observed

               FORMATIVE            SUMMATIVE
                Session Goals
 Understand the purposes of assessment
 Understand the framework for selecting and
  developing assessment methods
 Recognize the benefits and limitations of different
  methods of assessment
           Conference Objectives
 Review   the goals and objectives for your course or
  clerkship in the context of assessment
 Identify the best methods of assessing your goals
  and objectives
              Purpose of Evaluation
 To certify individual competence
 To assure successful completion of goals/objectives
 To provide feedback
    To students
    To faculty, course and clerkship directors
 As  a statement of values (what is most critical to
  learn)
 For Program Evaluation- evaluation of an aggregate,
  not an individual (ex. average ability of students to
  perform a focused history and physical)
    Consequences of evaluation
 Steering effect- exams “drive the learning”-
  students study/learn for the exam
 Impetus for change (feedback from
  students, Executive Curriculum, LCME)
           Definitions- Reliability
  The consistency of a measurement over time or
  by different observers- ( ex. a thermometer always
  reads 98 degrees C when placed in boiling,
  distilled water at sea level)
 The proportion of variability in a score due to the
  true difference between subjects (ex. The
  difference between Greenwich time and the time
  on your watch)-
   Inter-rater reliability (correlation between scores of 2
    raters)
   Internal reliability (correlation between items within an
    exam)
            Definitions-Validity
 The ability to measure what was intended (the
  thermometer reading is reliable but not valid)
 Four types-
   Face/content
   Criterion
   Construct/predictive
   Internal
                         Types of validity
   Face/content- Would experts agree that it assesses what’s
    important?-(driver’s test mirroring actual driving situation and
    conditions)
   Criterion- draw an inference from test scores to actual performance.
    Ex. if a simulated driver’s test score predicts the road test score, the
    simulation test is claimed to have a high degree of criterion validity.
   Construct/predictive- does it assess what it intended to assess (ex.
    Driver’s test as a predictor of the likelihood of accidents- results of
    your course exam predict the student’s performance on that section
    of Step 1)
   Internal- do other methods assessing the same domain obtain similar
    results (similar scores from multiple SPs assessing history taking
    skills)
Types of Evaluations- Formative
  and Summative Definitions:
 Formative  evaluation- provide feedback so
  the learner can modify their learning
  approach- “When the chef tastes the sauce,
  that’s formative evaluation”
 Summative evaluation- done to decide if a
  student has met the minimum course
  requirements (pass or fail)- usually judged
  against normative standards- “when the
  customer tastes the sauce, that’s summative
  evaluation”
    Conclusions about formative
           assessments
 Stakes  are lower (not determining passing or
  failing, so lower reliability is tolerated)
 Desire more information, so they may require
  multiple modalities (it is rare for one assessment
  method to identify all critical domains) for validity
  and reliability
 Use evaluation methods that support and reinforce
  teaching modalities and steer students’ learning
 May only identify deficiencies but not define how
  to remediate
Conclusions about summative assessments
 Stakes are higher- students may pass who are
  incompetent or may fail and require remediation
 Desire high reliability (>0.8) so often require
  multiple questions/problems or cases (20-30
  stations/OSCE, 15-20 cases for oral presentations,
  700 questions for an MCQ)
 Desire high content validity (single cases have low
  content validity and are not representative)
 Desire high predictive validity (correlation with
  future performance), which is often hard to
  achieve
 Consider reliability, validity, benefit and cost
  (resources, time and $) in determining the best
  assessment tools
How to Match Assessment to Goals
     and Teaching Methods

  Define the type of learning (lecture, small
   group, computer module/self study, etc)
  Define the domain to be assessed
   (knowledge, skill, behavior) and the level of
   performance expected (knows, knows how,
   shows how or does)
  Determine the type of feedback required
         Purpose of feedback
 For students: To provide a good platform to
  support and enhance student learning
 For faculty: To determine what works (what
  facilitated learning and who were
  appropriate role models)
 For students and faculty: To determine
  areas that require improvement
             Types of Feedback
 Quantitative
   Total score compared to other students, providing
    the high, low and mean score and minimum
    requirement for passing grade

 Qualitative
   Written personal feedback identifying areas of
    strength and weakness
   Oral feedback one on one or in a group to discuss
    the areas of deficiency to help guide further
    learning
         Evaluation Bias-Pitfall

 Can   occur with any evaluation requiring
  interpretation by an individual (all methods other
  than MCQ)
 Expectation bias (halo effect)- prior knowledge or
  expectation of the outcome influences the ratings
  (especially a global rating)
 Audience effect- a learner’s performance is
  influenced by the presence of an observer (seen
  especially with skills and behaviors)
 Rater traits- the training of the rater or the rater’s
  traits affect the reliability of the observation
       Types of assessment tools-
                Written
 Does    not require an evaluator to be present during
    the assessment and can be open or closed book
•   Multiple choice question (MCQ)
•   Modified short answer essay question (MEQ)-
    Patient management problem is a variation of this
•   Essay
•   Application test
•   Medical note/chart review
      Types of assessment tools-
    Observer Dependent Interaction
 Usually    requires active involvement of an
    assessor and occurs as a single event
•   Practical
•   Medical record review
•   Standardized Patient(s) (SP)
•   Objective Structured Clinical Examination
    (OSCE)
•   Oral examination- chart stimulated recall;
    triple jump or direct observation
      Types of assessment tools-
        Observer Dependent
       Longitudinal Interaction
 Continual     evaluation over time
•   Preceptor evaluation – either completion of
    a a critical incident report or structured
    rating form based on direct observation over
    time
•   Peer evaluation
•   Self evaluation
                              MCQ
   Definition: A test composed of questions on which each stem is
    followed by several alternative answers. The examinee must select
    the most correct answer.
   Measures: Knows and Knows how
   Pros: Efficient; cheap; samples large content domain (60
    questions/hour); high reliability; easy objective scoring, direct
    correlate of knowledge with expertise
   Cons: Often a recall of facts; provides opportunity for guessing
    (good test-taker); unrealistic; doesn’t provide information about the
    thought process; encourages learning to recall
   Suggestions: Create questions that can be answered from the stem
    alone; avoid always, frequently, all or none; randomly assign correct
    answers; can correct for guessing (penalty formula)
                                MEQ

   Definition: A series of sequential questions in a linear format based
    on an initial limited amount of information. It requires immediate
    short answers followed by additional information and subsequent
    questions. (patient management problem is a variation of this type)
   Measures: Knows and Knows how
   Pros: Can assess problem solving, hypothesis generation and data
    interpretation
   Cons: Low inter-case reliability; less content validity; harder to
    administer; time consuming to grade and variable inter-rater
    reliability
   Suggestions: Use directed (not open ended) questions; provide
    extensive answer key
       Open ended essay question
 Definition:  Question allowing a student the freedom to
  decide the topic to address and the position to take- it
  can be take home
 Measures: Knows, Knows how
 Pros: Assesses ability to think (generate ideas, weigh
  arguments, organize information, build and support
  conclusions and communicate thoughts; high face
  validity
 Cons: Low reliability; time intensive to grade; narrow
  coverage of content
 Suggestions: strictly define the response and the rating
  criteria
                 Application test

 Definition:  Open book problem solving test incorporating
  a variety of MCQs and MEQs. It provides a description of
  a problem with data. The examinee is asked to interpret
  the data to solve the problem. (ex. Quiz item 3)
 Measures: Knows and knows how
 Pros: Assesses higher learning; good face/content
  validity; reasonable reliability; useful for formative and
  summative feedback
 Cons: Harder to create and grade
                        Practical Exam
   Definition: Hands on exam to demonstrate and apply knowledge
    (ex. Culture and identify the bacteria on the glove of a Sinai
    cafeteria worker, or performance of a history and physical on a
    patient)
   Measures: Know, knows how , and ? shows how and does
   Pros: Can test multiple domains, actively involves the learner (good
    steering effect); best suited for procedural/technical skills; higher
    face validity
   Cons: Labor intensive (creation and grading); hard to identify gold
    standard, so subjective grading; high rate of item failure
    (unanticipated problems with administration)
   Suggestions: Pilot first; adequate, specific instructions and goals;
    specific, defined criteria for grading and train raters; for direct
    observation, require multiple encounters for higher reliability
        Medical record/note review

 Definition: Examiner reviews learner’s previously created
  document; can be random
 Measures: Knows how and Does
 Pros: Can review multiple records for higher reliability;
  high face validity; less costly than oral (done without
  learner and at examiner’s convenience)
 Cons: Lower inter-rater reliability; less immediate
  feedback; unable to determine basis for decisions
 Suggestions: Create a template with specific ratings for
  skills
            Standardized Patients
 Definition:  Simulated patient/actor trained to present
  history in reliable, consistent manner and to use a
  checklist to assess students skills and behaviors
 Measures: Knows, Knows how, Shows how and Does
 Pros: High face validity; can assess multiple domains; can
  be standardized; can give immediate feedback
 Cons: Costly; labor intensive; must use multiple SPs for
  high reliability
  OSCE (Objective Structured Clinical
               Exam)
 Definition:  Task oriented, multi-station
  exam; stations can be 5-30 minutes and
  require written answers or observation (ex.
  Take orthostatic VS; perform a cardiac
  exam; smoking cessation counseling; read
  and interpret CXR or EKG results;
  communicate lab results and advise a
  patient
 Measures: Knows, Knows how, Shows how
  and Does
     OSCE (Objective Structured Clinical
                  Exam)

 Pros: Assesses clinical competency; tests a wide
  range of knowledge, skills and behaviors; can give
  immediate feedback; good test-retest reliability;
  good content and construct validity; less patient
  and examiner variability than with direct
  observation
 Cons: Costly (manpower and $); case specific;
  requires > 20 stations for internal consistency;
  weaker criterion validity
          Oral Examination

 Definition: Method of evaluating a
  learner’s knowledge by asking a series of
  questions. The process is open ended with
  the examiner directing the questions. –(ex.
  chart stimulated patient recall or a triple
  jump)
 Measures: Knows, Knows how, sometimes
  Shows how and does
                Oral Exam
 Pros:  Can measure clinical judgement,
  interpersonal skills (communication) and
  behavior; high face validity; flexible; can provide
  direct feedback
 Cons: Poor inter-rater reliability (dove vs hawk
  and observer bias); content specific so low
  reliability (must use > 6 cases to increase
  reliability); labor intensive
 Suggestions: multiple short cases; define
  questions and answers; provide simple rating
  scales and train raters
                    Triple Jump

 Definition:  Three step written and oral exam- written,
  research and then oral part- (ex. COMPASS 1)
 Measures: Knows, knows how, shows how and does
 Pros: Assesses hypothesis generation, use of resources,
  application of knowledge to problem solve and self
  directed learning; provides immediate feedback; high face
  validity
 Cons: only for formative assessment (poor reliability);
  time/faculty intensive; too content specific and
  inconsistent rater evaluations
        Clinical Observations
 Definition: Assessment of various domains
  longitudinally by an observer- either
  preceptor, peer or self (small group
  evaluations during first two years and
  preceptor ratings during clinical exposure)
 Measures: Knows, knows how, Shows how
  and Does
 Pros: Simple; efficient; high face validity;
  formative and summative
          Clinical Observations
 Cons:   low reliability (only recent encounters
  often influence grade); halo effect (lack of domain
  discrimination); more often a judgement of
  personality and “Lake Woebegone” effect (all
  students are necessarily above average);
  unwillingness to document negative ratings (fear
  of failing someone)
 Suggestions: Frequent ratings and feedback;
  increase the number of observations; multiple
  assessors (with group discussion about specific
  ratings)
           Peer/Self Evaluation
 Pros:  Useful for formative feedback
 Cons: Lack of correlation with faculty
  evaluations; same cons as others (measure of “nice
  guy”, low reliability, halo effect- peer evaluations
  have friend effect or fear of retribution or desire to
  penalize
 Suggestions: limit the # of behaviors assessed;
  clarify the difference between evaluation of
  professional and personal aspects; develop
  operationally proven criteria for rating; provide
  multiple opportunities for students to do this and
  provide feedback from faculty
Erica Friedman’s Educational Pyramid
                       Direct Observation,
             Does      Practical

            Shows        OSCE, T Jump
            How          Oral, SP, Practical
                         Chart review

                                MEQ,Essay
           Knows How



            Knows                    MCQ
          Type of assessment         Content areas    Potential     Potential
                                        tested        Reliability    Validity

Written
MCQ                                Knows, Knows How       ++           ++

MEQ                                Knows, Knows How       +            ++

Application test, Practical          Knows, Knows         +            +
                                    How, Shows How
Essay                              Knows, Knows How       0            0

Medical Note review                Knows How, Does        +            ++

Observed
SP                                 Shows How, Does        ++           ++

OSCE                               Shows How, Does        ++           ++

Oral                               Shows How, Does        +            +

Longitudinal clinical experience   Shows How, Does        0            0
      Critical factors for choosing an
               evaluation tool

 Type of evaluation and feedback desired:
  formative/summative
 Focus of evaluation:
  Knowledge, skills, behaviors (attitudes)
 Level of evaluation:
  Know, Knows how, Shows how, Does
 Pros/Cons:
  Validity, Reliability, Cost (time, $ resources)
         How to be successful
 Students  should be clear about the
  course/clerkship goals and the specifics about the
  types of assessments used and the criteria for
  passing (and if relevant, just short of honors and
  honors)
 Make sure the choice of assessments is consistent
  with the values of your course and the school
 Final judgments about students’ progress should
  be based on multiple assessments using a variety
  of methods over a period of time (instead of one
  time point)
 Number of courses or clerkships using a specific
 assessment tool-assessing our assessment methods
Year/      M   M Essay   Prac- SP OSCE   Oral Small     #      #
# courses C    E         tical                group or using using
or         Q   Q         hands                preceptor 1 tool > 1
clerkships               -on                                   tool


1 n=11 9 5         1      3      1   -    -      5       4      7

2 n=13 12 4        4       -     1   1    1      3       4      9
                         1
3 n=8     7         -    Simu-   2   -    7      8        -     8
                         lator
                         1
4 n=4     1 -       -    Simu-   -   -    -      4       2      2
                         lator
       Why assess ourselves?
 Assure  successful completion of our course
  goals and objectives
 Assure integration with the mission of the
  school
 Direct our teaching/learning-(determine
  what worked and what needs changing)
 How we currently assess ourselves
 Student evaluations (quantitative and qualitative)-
  most often summative
 Performance of students on our exam and specific
  sections of USMLE
 Focus and feedback groups (formative and
  currently done by Dean’s office)
 Peer evaluations of course/clerkship- by ECC
 Self evaluations- yearly grid completed by course
  directors and core faculty
 Consider peer evaluation of teaching and teaching
  materials

						
Related docs
Other docs by HC120807021435
Case Dionysia Team Registration Form
Views: 1  |  Downloads: 0
Deaths fell between 1995 181 of 1454 cases
Views: 1  |  Downloads: 0
2007 GRADUATE STUDENT
Views: 2  |  Downloads: 0
The Musculo Skeletal Centre
Views: 4  |  Downloads: 0
puoidirloqui@libero - DOC
Views: 0  |  Downloads: 0
IN CASE
Views: 0  |  Downloads: 0
CASE, GENDER & NUMBER
Views: 0  |  Downloads: 0
American Standard
Views: 5  |  Downloads: 0
bloom09 birs
Views: 0  |  Downloads: 0