A paradigm shift in Education_ Where too go by hcj

VIEWS: 13 PAGES: 58

									Assessment of medical competence:
   Lessons learned in assessment
Past, present, and future?
       moving beyond the psychometric discourse
Inaugural address as Honorary Professor in Assessment in
Medical Medical Education
                     24 September 2009
                Surgery and Internal Medicine
Department of Cees van der Vleuten, Maastricht University,    The
University of Copenhagen
                     Netherlands (www.she.unimaas.nl)

Cees van der Vleuten
Maastricht University
School of Health Professions Education (www.she.unimaas.nl)
Maastricht University
Faculty of Health, Medicine & Life Sciences
The Netherlands

Presentation: www.fdg.unimaas.nl/educ/cees
    Overview of presentation
   Where is education going?
   Lessons learned in assessment
   Areas of development and research
    Where is education going?
   School-based learning
       Discipline-based curricula
       (Systems) integrated curricula
       Problem-based curricula
       Outcome/competency-based curricula
    Where is education going?
   Underlying educational principles:
       Continuous learning of, or practicing with,
        authentic tasks (in steps of complexity; with constant
        attention to transfer)
       Integration of cognitive, behavioural and affective
        skills
       Active, self-directed learning & in collaboration
        with others
       Fostering domain-independent skills, competencies
        (e.g. team work, communication, presentation, science orientation,
        leadership, professional behaviour….).
    Where is education going?
   Underlying educational principles:
     Continuous learning of, or Instruction design theory
                                 practicing with,
     Cognitive
        authentic tasks (in steps of complexity; with constant
    psychologytransfer)
      attention to
        Integration of cognitive, behavioural and affective
                                           Collaborative
    

        skillsCognitive
                load                          learning
       Active, self-directed learning & in collaboration
               theory
        with others                             theory
       Fostering domain-independent skills, competencies
                                    Empirical
        (e.g. team work, communication, presentation, science orientation,
        leadership, professional behaviour….).
                                   evidence
    Where is education going?
   Work-based learning
       Practice, practice, practice….
       Optimising learning by:
            More reflective practice
            More structure in the haphazard learning
             process
            More feedback, monitoring, guiding, reflection,
             role modelling
            Fostering of learning culture or climate
            Fostering of domain-independent skills
             (professional behaviour, team skills, etc).
    Where is education going?
   Work-based learning
       Practice, practice, practice….
    
          Deliberate
        Optimising learning by: Emerging
           Practice
          More reflective practice
                                    work-based
            theory
          More structure in the haphazard learning
                                      learning
           process                    theories
            More feedback, monitoring, guiding, reflection,
             role modelling
                     Empirical
            Fostering of learning culture or climate
                     evidence
            Fostering of domain-independent skills
             (professional behaviour, team skills, etc).
    Where is education going?
   Educational reform is on the agenda
    everywhere
   Education is professionalizing rapidly
   A lot of ‘educational technology’ is
    available
   How about assessment?
    Overview of presentation
   Where is education going?
   Lessons learned in assessment
   Areas of development and research
          Miller’s pyramid of competence


                                                                       Lessons learned while
                     Does                                              climbing this pyramid
                                                                       with assessment
               Shows how                                               technology


               Knows how

                    Knows

Miller GE. The assessment of clinical skills/competence/performance.
        Academic Medicine (Supplement) 1990; 65: S63-S7.
Assessing knowing how


   Does

 Shows how

              60-ies:
Knows how
Knows how
              Written complex
  Knows
  Knows       simulations (PMPs)
    Key findings written simulations
    (Van der Vleuten, 1995)


   Performance on one problem hardly
    predicted performance on another
   High correlations with simple MCQs
   Experts performed less well than
    intermediate experts
   Stimulus format more important than
    the response format
Assessing knowing how
             Specific Lessons learned!
                Simple short scenario-based formats
   Does          work best (Case & Swanson, 2002)
                Validity is a matter of good quality
 Shows how       assurance around item construction
                 (Verhoeven et al 1999)
                Generally, medical schools can do a
Knows how
Knows how        much better job (Jozewicz et al 2002)
                Sharing of (good) test material across
                 institutions is a smart strategy (Van der
  Knows          Vleuten et al 2004).
         Moving from assessing knows1
Knows:
What is arterial blood gas analysis most likely to show in patients
with cardiogenic shock?
        A. Hypoxemia with normal pH
        B. Metabolic acidosis
        C. Metabolic alkalosis
        D. Respiratory acidosis
        E. Respiratory alkalosis




1Case,   S. M., & Swanson, D. B. (2002). Constructing written test questions for the basic and
clinical sciences. Philadelphia: National Board of Medical Examiners.
         To assessing knowing how1
Knowing How:
A 74-year-old woman is brought to the emergency department
because of crushing chest pain. She is restless, confused, and
diaphoretic. On admission, temperature is 36.7 C, blood pressure is
148/78 mm Hg, pulse is 90/min, and resp are 24/min. During the next
hour, she becomes increasingly stuporous, blood pressure decreases to
80/40 mm Hg, pulse increases to 120/min, and respirations increase to
40/min. Her skin is cool and clammy. An ECG shows sinus rhythm and
4 mm of ST segment elevation in leads V2 through V6. Arterial blood
gas analysis is most likely to show:
        A. Hypoxemia with normal pH
        B. Metabolic acidosis
        C. Metabolic alkalosis
        D. Respiratory acidosis
        E. Respiratory alkalosis


1Case,   S. M., & Swanson, D. B. (2002). Constructing written test questions for the basic and
clinical sciences. Philadelphia: National Board of Medical Examiners.
http://www.nbme.org/publications/item-writing-manual.html
    Maastricht item review process
                                                  item analyses
                                                  student
                                                  comments

anatomy

physiology
                                   review         test
int medicine           item pool
                                   committee      administration
surgery

psychology
                                                    Info to users

                                                    item bank

               Pre-test review                 Post-test review
Assessing knowing how
             General Lessons learned!
                Competence is specific, not generic
   Does         Assessment is as good as you are
                 prepared to put into it.
 Shows how

Knows how
Knows how

  Knows
Assessing showing how


   Does

Shows how
Shows how    70-ies:
             Performance
Knows how
Knows how    assessment
             in vitro (OSCE)
  Knows
    Key findings around OSCEs1
   Performance on one station poorly predicted
    performance on another (many OSCEs are
    unreliable)
   Validity depends on the fidelity of the
    simulation (many OSCEs are testing fragmented
    skills in isolation)
   Global rating scales do well (improved
    discrimination across expertise groups; better
    intercase reliabilities; Hodges, 2003)
   OSCEs impacted on the learning of students

1Van   der Vleuten & Swanson, 1990
Reliabilities across methods
                      Case-
Testing               Based
Time in               Short        Oral Long
 Hours MCQ1           Essay2 PMP1 Exam3 Case4 OSCE5

  1        0.62         0.68         0.36   0.50        0.60       0.47

  2        0.76         0.73         0.53   0.69        0.75       0.64

  4        0.93         0.84         0.69   0.82        0.86       0.78

  8        0.93         0.82         0.82   0.90        0.90       0.88

      1Norciniet al., 1985                  4Wass   et al., 2001
      2Stalenhoef-Halling et al., 1990      5Petrusa,  2002
      3Swanson, 1987
Checklist or rating scale reliability in OSCE1
                                        Examiners
            Examiners                    using
Test length   using                      Rating
 In hours   Checklists                   scales

         1                       0.44    0.45
         2                       0.61    0.62
         3                       0.71    0.71
         4                       0.76    0.76
         5                       0.80    0.80

1Van   Luijk & van der Vleuten, 1990
Assessing showing how
             Specific Lessons learned!
                OSCE-ology (patient training, checklist
   Does          writing, standard setting, etc.; Petrusa
                 2002)
Shows how
Shows how       OSCEs are not inherently valid nor
                 reliable, that depends on the fidelity of
                 the simulation and the sampling of
 Knows how       stations (Van der Vleuten & Swanson,
                 1990).

  Knows
Assessing showing how
             General Lessons learned!
                Objectivity is not the same as
   Does          reliability (Van der Vleuten, Norman, De
                 Graaff, 1991)
                 Subjective expert judgment has
Shows how
Shows how    

                 incremental value (Van der Vleuten &
                 Schuwirth, Eva in prep)

 Knows how      Sampling across content and
                 jugdes/examiners is eminently
                 important
  Knows         Assessment drives learning.
Assessing does
             90-ies:
             Performance assessment
   Does
             in vivo by judging work
 Shows how
 Shows how   samples (Mini-CEX, CBD,
             MSF, DOPS, Portfolio)
 Knows how

   Knows
    Key findings assessing does
   Ongoing work; this is where we currently are
   Reliable findings point to feasible sampling (8-
    10 judgments seems to be the magical number;
    Williams et al 2003)
   Scores tend to be inflated (Dudek et al., 2005)
   Qualitative/narrative information is (more)
    useful (Sargeant et al 2010)
   Lots of work still needs to be done
       How (much) to sample across instruments?
       How to aggregate information?
          Reliabilities across methods
                      Case-                              Practice
Testing               Based                               Video    In-
Time in               Short        Oral Long        Mini Assess- cognito
 Hours MCQ1           Essay2 PMP1 Exam3 Case4 OSCE5 CEX6 ment7    SPs8

  1        0.62         0.68         0.36   0.50      0.60          0.47   0.73         0.62     0.61
  2        0.76         0.73         0.53   0.69      0.75          0.64   0.84         0.76     0.76
  4        0.93         0.84         0.69   0.82      0.86          0.78   0.92         0.93     0.92
  8        0.93         0.82         0.82   0.90      0.90          0.88   0.96         0.93     0.93
      1Norciniet al., 1985                  4Wass  et al., 2001            7Ram   et al., 1999
      2Stalenhoef-Halling et al., 1990      5Petrusa, 2002                 8Gorter, 2002
      3Swanson, 1987                        6Norcini et al., 1999
Assessing does
             Specific Lessons learned!
                Reliable sampling is possible
   Does         Qualitative information carries a lot of
                 weight
 Shows how      Assessment impacts on work-based
                 learning (more feedback, more
                 reflection…)
 Knows how      Validity strongly depends on the users
                 of these instruments and therefore on
                 the quality of implementation.
  Knows
Assessing does
             General Lessons learned!
                Work-based assessment cannot
   Does          replace standardised assessment
                 (yet), or, no single measure can do it
                 all (Tooke report, UK)
 Shows how
                Validity strongly depends on the
                 implementation of the assessment
                 (Govaerts et 2007)
 Knows how
                But, there is a definite place for (more
                 subjective) expert judgment (Van der
  Knows          Vleuten et al., 2010).
Competency/outcome categorizations
   CanMeds                 ACGME
    roles                    competencies
      Medical expert          Medical knowledge
      Communicator            Patient care

      Collaborator            Practice-based

      Manager                  learning &
      Health advocate
                                improvement
                               Interpersonal and
      Scholar
                                communication skills
      Professional
                               Professionalism

                               Systems-based

                                practice
     Measuring the unmeasurable


          Does

       Shows how

       Knows how
                           “Domain
         Knows             independent”
                           skills
“Domain specific” skills
    Measuring the unmeasurable
   Importance of domain-independent
    skills
       If things go wrong in practice, these skills
        are often involved (Papadakis et 2005; 2008)
       Success in labour market is associated with
        these skills (Meng 2006)
       Practice performance is related to school
        performance (Padakis et al 2004).
     Measuring the unmeasurable
                           Assessment (mostly
                           in vivo) heavily relying on
          Does             expert judgment and
                           qualitative information
       Shows how

       Knows how
                            “Domain
         Knows              independent”
                            skills
“Domain specific” skills
    Measuring the unmeasurable
   Self assessment
   Peer assessment
   Co-assessment (combined self, peer, teacher
    assessment)
   Multisource feedback
   Log book/diary
   Learning process simulations/evaluations
   Product-evaluations
   Portfolio assessment
Eva, K. W., & Regehr, G. (2005). Self-assessment in the health professions: a reformulation and research agenda. Acad Med, 80(10 Suppl), S46-54.
Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education: A meta-analysis
comparing peer and teacher marks. Review of Educational Research, 70(3), 287-322.
Driessen, E., van Tartwijk, J., van der Vleuten, C., & Wass, V. (2007). Portfolios in medical education: why do they meet
with mixed success? A systematic review. Med Educ, 41(12), 1224-1233.
    General lessons learned
   Competence is specific, not generic
   Assessment is as good as you are prepared to put
    into it
   Objectivity is not the same as reliability
   Subjective expert judgment has incremental value
   Sampling across content and judges/examiners is
    eminently important
   Assessment drives learning
   No single measure can do it all
   Validity strongly depends on the implementation of
    the assessment
    Practical implications
   Competence is specific, not generic
       One measure is no measure
       Increase sampling (across content, examiners,
        patients…) within measures
       Combine information across measures and across
        time
       Be aware of (sizable) false positive and negative
        decisions
       Build safeguards in examination regulations.
    Practical implications
   No single measure can do it all
       Use a cocktail of methods across the competency
        pyramid
       Arrange methods in a programme of assessment
       Any method may have utility (including the ‘old’
        assessment methods depending on its utility
        within the programme)
       Compromises on the quality of methods should be
        made in light of its function in the programme
       Compare assessment design with curriculum
        design
            Responsible people/committee(s)
            Use an overarching structure
            Involve your stakeholders
            Implement, monitor and change (assessment programmes
             ‘wear out’)
    Practical implications
   Validity strongly depends on the
    implementation of the assessment
       Pay special attention to implementation (good
        educational ideas often fail due to implementation
        problems)
       Involve your stakeholders in the design of the
        assessment
       Many naive ideas exist around assessment; train
        and educate your staff and students.
    Overview of presentation
   Where is education going?
   Where are we with assessment?
   Where are we going with assessment?
   Conclusions
    Areas of development and research

   Understanding expert judgment
    Understanding human judgment
   How does the mind work of expert judges?
   How is it influenced?
   Link between clinical expertise and
    judgment expertise?
   Clash between psychology literature on
    expert judgment and psychometric
    research.
    Areas of development and research

   Understanding expert judgment
   Building non-psychometric rigour into
    assessment
   Qualitative methodology as an inspiration

Strategies for establishing
Procedural measures and
trustworthiness:
safeguards:            Quantitative           Qualitative
   • Prolonged engagement benchmarking
     Assessor training & approach
      Criterion                               approach
   • Appeal procedures
     Triangulation
   • Triangulation acrossInternal validity
      Truth value
     Peer examination sources,                Credibility
      Applicability
     saturation           External validity   Transferability
   • Member checking Reliability
      Consistency                             Dependability
   • Assessor panels
   • Structural coherence
      Neutrality          Objectivity         Confirmability
   • Intermediate feedback cycles
   • Time sampling
   • Decision justification
   • Stepwise replication
   • Moderation
   • Dependability audit
   • Scoring rubrics
   • Thick description
   • ……….
   • Confirmability audit
Driessen, E. W., Van der Vleuten, C. P. M., Schuwirth, L. W. T., Van Tartwijk, J., & Vermunt, J. D. (2005).
The use of qualitative research criteria for portfolio assessment as an alternative to reliability evaluation: a case study.
Medical Education, 39(2), 214-220.
    Areas of development and research

   Understanding expert judgment
   Building non-psychometric rigour into
    assessment
   Construction and governance of assessment
    programmes
       Design guidelines (Dijkstra et al., 2009)
       Theoretical model for design (Van der Vleuten et al.,
        under editorial review)
    Assessment programmes

   How to design assessment programmes?
   Strategies for governance (implementation,
    quality assurance)?
   How to aggregate information for decision
    making? When is enough enough?
    Areas of development and research

   Understanding expert judgment
   Building non-psychometric rigour into
    assessment
   Construction and governance of assessment
    programmes
   Understanding and using assessment
    impacting learning
    Assessment impacting learning
   Lab studies convincingly show tests
    improve retention and performance
    (Larsen et al., 2008)
   Relatively little empirical research
    supporting educational practice
   Absence of theoretical insights.
                                             CONSEQUENCES OF IMPACT


Theoretical                                 Metacognitive
                                             regulation
model1                                       strategies
                                                                                  OUTCOMES
                                                                                     OF
                                           •choice
                                                                                  LEARNING
                                           •effort
                                           •persistence
DETERMINANTS OF




                  • impact appraisal
                        • likelihood
                                                              SOURCES OF IMPACT
                        • severity
                  • response appraisal                                   Assessment
    ACTION




                        • efficacy                            •   assessment strategy
                        • costs                               •   assessment task
                        • value                               •   volume of assessable material
                                                              •   sampling
                  • perceived agency
                                                              •   cues
                  • interpersonal factors                     •   individual assessor
                        • normative beliefs
                        • motivation to comply


           1Cilliers,   F. Et al., 2010.
    Areas of development and research
   Understanding expert judgment
   Building non-psychometric rigour into
    assessment
   Construction and governance of assessment
    programmes
   Understanding and using assessment
    impacting learning
   Understanding and using qualitative
    information.
    Understanding and using qualitative information

   Assessment is dominated by the quantitative
    discourse (Hodges 2006)
   How to improve the use of qualitative
    information?
   How to aggregate qualitative information?
   How to combine qualitative and quantitative
    information?
   How to use expert judgment here?
    Finally
   Assessment in medical education has a rich history of
    research and development with clear practical
    implications (we’ve covered some ground in 40 yrs!)
   We are moving beyond the psychometric discourse
    into an educational design discourse
   We are starting to measure the unmeasurable
   Expert human judgment is reinstated as an
    indispensable source of information both at the
    method level as well as at the programmatic level
   Lots of exciting developments lie still ahead of us!
“Did you ever feel you’re on the verge of
an incredible breakthrough?”
    Literature
   Cillier, F. (In preparation). Assessment impacts on learning, you say? Please explain how. The impact of summative assessment
    on how medical students learn.
   Driessen, E., van Tartwijk, J., van der Vleuten, C., & Wass, V. (2007). Portfolios in medical education: why do they meet with
    mixed success? A systematic review. Med Educ, 41(12), 1224-1233.
   Driessen, E. W., Van der Vleuten, C. P. M., Schuwirth, L. W. T., Van Tartwijk, J., & Vermunt, J. D. (2005). The use of qualitative
    research criteria for portfolio assessment as an alternative to reliability evaluation: a case study. Medical Education, 39(2), 214-
    220.
   Dijkstra, J., Van der Vleuten, C. P., & Schuwirth, L. W. (2009). A new framework for designing programmes of assessment. Adv
    Health Sci Educ Theory Pract.
   Dudek, N. L., Marks, M. B., & Regehr, G. (2005). Failure to fail: the perspectives of clinical supervisors. Acad Med, 80(10 Suppl),
    S84-87.
   Eva, K. W., & Regehr, G. (2005). Self-assessment in the health professions: a reformulation and research agenda. Acad Med,
    80(10 Suppl), S46-54.
   Gorter, S., Rethans, J. J., Van der Heijde, D., Scherpbier, A., Houben, H., Van der Vleuten, C., et al. (2002). Reproducibility of
    clinical performance assessment in practice using incognito standardized patients. Medical Education, 36(9), 827-832.
   Govaerts, M. J., Van der Vleuten, C. P., Schuwirth, L. W., & Muijtjens, A. M. (2007). Broadening Perspectives on Clinical
    Performance Assessment: Rethinking the Nature of In-training Assessment. Adv Health Sci Educ Theory Pract, 12, 239-260.
   Hodges, B. (2006). Medical education and the maintenance of incompetence. Med Teach, 28(8), 690-696.
   Jozefowicz, R. F., Koeppen, B. M., Case, S. M., Galbraith, R., Swanson, D. B., & Glew, R. H. (2002). The quality of in-house
    medical school examinations. Academic Medicine, 77(2), 156-161.
   Meng, C. (2006). Discipline-specific or academic ? Acquisition, role and value of higher education competencies. , PhD
    Dissertation, Universiteit Maastricht, Maastricht.
   Norcini, J. J., Swanson, D. B., Grosso, L. J., & Webster, G. D. (1985). Reliability, validity and efficiency of multiple choice
    question and patient management problem item formats in assessment of clinical competence. Medical Education, 19(3), 238-
    247.
   Papadakis, M. A., Hodgson, C. S., Teherani, A., & Kohatsu, N. D. (2004). Unprofessional behavior in medical school is associated
    with subsequent disciplinary action by a state medical board. Acad Med, 79(3), 244-249.
   Papadakis, M. A., A. Teherani, et al. (2005). "Disciplinary action by medical boards and prior behavior in medical school." N Engl
    J Med 353(25): 2673-82.
   Papadakis, M. A., G. K. Arnold, et al. (2008). "Performance during internal medicine residency training and subsequent
    disciplinary action by state licensing boards." Annals of Internal Medicine 148: 869-876
   Van der Vleuten, C. P., Schuwirth, L. W., Scheele, F., Driessen, E. W., & Hodges, B. (2010). The assessment of professional
    competence: building blocks for theory development. Best Pract Res Clin Obstet Gynaecol, 24, 703-719..
    Literature
   Petrusa, E. R. (2002). Clinical performance assessments. In G. R. Norman, C. P. M. Van der Vleuten & D. I. Newble (Eds.),
    International Handbook for Research in Medical Education (pp. 673-709). Dordrecht: Kluwer Academic Publisher.
   Ram, P., Grol, R., Rethans, J. J., Schouten, B., Van der Vleuten, C. P. M., & Kester, A. (1999). Assessment of general
    practitioners by video observation of communicative and medical performance in daily practice: issues of validity, reliability and
    feasibility. Medical Education, 33(6), 447-454.
   Stalenhoef- Halling, B. F., Van der Vleuten, C. P. M., Jaspers, T. A. M., & Fiolet, J. B. F. M. (1990). A new approach to assessign
    clinical problem-solving skills by written examination: Conceptual basis and initial pilot test results. Paper presented at the
    Teaching and Assessing Clinical Competence, Groningen.
   Swanson, D. B. (1987). A measurement framework for performance-based tests. In I. Hart & R. Harden (Eds.), Further
    developments in Assessing Clinical Competence (pp. 13 - 45). Montreal: Can-Heal publications.
   van der Vleuten, C. P., Schuwirth, L. W., Muijtjens, A. M., Thoben, A. J., Cohen-Schotanus, J., & van Boven, C. P. (2004). Cross
    institutional collaboration in assessment: a case on progress testing. Med Teach, 26(8), 719-725.
   Van der Vleuten, C. P. M., & D. Swanson, D. (1990). Assessment of Clinical Skills With Standardized Patients: State of the Art.
    Teaching and Learning in Medicine, 2(2), 58 - 76.
   Van der Vleuten, C. P. M., & Newble, D. I. (1995). How can we test clinical reasoning? The Lancet, 345, 1032-1034.
   Van der Vleuten, C. P. M., Norman, G. R., & De Graaff, E. (1991). Pitfalls in the pursuit of objectivity: Issues of reliability. Medical
    Education, 25, 110-118.
   Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessment of professional competence: from methods to programmes.
    Medical Education, 39, 309-317.
   Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (Under editorial review). On the value of (aggregate) human judgment. Med
    Educ.
   Van Luijk, S. J., Van der Vleuten, C. P. M., & Schelven, R. M. (1990). The relation between content and psychometric
    characteristics in performance-based testing. In W. Bender, R. J. Hiemstra, A. J. J. A. Scherp bier & R. P. Zwierstra (Eds.),
    Teaching and Assessing Clinical Competence. (pp. 202-207). Groningen: Boekwerk Publications.
   Wass, V., Jones, R., & Van der vleuten, C. (2001). Standardized or real patients to test clinical competence? The long case
    revisited. Medical Education, 35, 321-325.
   Williams, R. G., Klamen, D. A., & McGaghie, W. C. (2003). Cognitive, social and environmental sources of bias in clinical
    performance ratings. Teaching and Learning in Medicine, 15(4), 270-292.

								
To top