VIEWS: 13 PAGES: 58 POSTED ON: 1/21/2013
Assessment of medical competence: Lessons learned in assessment Past, present, and future? moving beyond the psychometric discourse Inaugural address as Honorary Professor in Assessment in Medical Medical Education 24 September 2009 Surgery and Internal Medicine Department of Cees van der Vleuten, Maastricht University, The University of Copenhagen Netherlands (www.she.unimaas.nl) Cees van der Vleuten Maastricht University School of Health Professions Education (www.she.unimaas.nl) Maastricht University Faculty of Health, Medicine & Life Sciences The Netherlands Presentation: www.fdg.unimaas.nl/educ/cees Overview of presentation Where is education going? Lessons learned in assessment Areas of development and research Where is education going? School-based learning Discipline-based curricula (Systems) integrated curricula Problem-based curricula Outcome/competency-based curricula Where is education going? Underlying educational principles: Continuous learning of, or practicing with, authentic tasks (in steps of complexity; with constant attention to transfer) Integration of cognitive, behavioural and affective skills Active, self-directed learning & in collaboration with others Fostering domain-independent skills, competencies (e.g. team work, communication, presentation, science orientation, leadership, professional behaviour….). Where is education going? Underlying educational principles: Continuous learning of, or Instruction design theory practicing with, Cognitive authentic tasks (in steps of complexity; with constant psychologytransfer) attention to Integration of cognitive, behavioural and affective Collaborative skillsCognitive load learning Active, self-directed learning & in collaboration theory with others theory Fostering domain-independent skills, competencies Empirical (e.g. team work, communication, presentation, science orientation, leadership, professional behaviour….). evidence Where is education going? Work-based learning Practice, practice, practice…. Optimising learning by: More reflective practice More structure in the haphazard learning process More feedback, monitoring, guiding, reflection, role modelling Fostering of learning culture or climate Fostering of domain-independent skills (professional behaviour, team skills, etc). Where is education going? Work-based learning Practice, practice, practice…. Deliberate Optimising learning by: Emerging Practice More reflective practice work-based theory More structure in the haphazard learning learning process theories More feedback, monitoring, guiding, reflection, role modelling Empirical Fostering of learning culture or climate evidence Fostering of domain-independent skills (professional behaviour, team skills, etc). Where is education going? Educational reform is on the agenda everywhere Education is professionalizing rapidly A lot of ‘educational technology’ is available How about assessment? Overview of presentation Where is education going? Lessons learned in assessment Areas of development and research Miller’s pyramid of competence Lessons learned while Does climbing this pyramid with assessment Shows how technology Knows how Knows Miller GE. The assessment of clinical skills/competence/performance. Academic Medicine (Supplement) 1990; 65: S63-S7. Assessing knowing how Does Shows how 60-ies: Knows how Knows how Written complex Knows Knows simulations (PMPs) Key findings written simulations (Van der Vleuten, 1995) Performance on one problem hardly predicted performance on another High correlations with simple MCQs Experts performed less well than intermediate experts Stimulus format more important than the response format Assessing knowing how Specific Lessons learned! Simple short scenario-based formats Does work best (Case & Swanson, 2002) Validity is a matter of good quality Shows how assurance around item construction (Verhoeven et al 1999) Generally, medical schools can do a Knows how Knows how much better job (Jozewicz et al 2002) Sharing of (good) test material across institutions is a smart strategy (Van der Knows Vleuten et al 2004). Moving from assessing knows1 Knows: What is arterial blood gas analysis most likely to show in patients with cardiogenic shock? A. Hypoxemia with normal pH B. Metabolic acidosis C. Metabolic alkalosis D. Respiratory acidosis E. Respiratory alkalosis 1Case, S. M., & Swanson, D. B. (2002). Constructing written test questions for the basic and clinical sciences. Philadelphia: National Board of Medical Examiners. To assessing knowing how1 Knowing How: A 74-year-old woman is brought to the emergency department because of crushing chest pain. She is restless, confused, and diaphoretic. On admission, temperature is 36.7 C, blood pressure is 148/78 mm Hg, pulse is 90/min, and resp are 24/min. During the next hour, she becomes increasingly stuporous, blood pressure decreases to 80/40 mm Hg, pulse increases to 120/min, and respirations increase to 40/min. Her skin is cool and clammy. An ECG shows sinus rhythm and 4 mm of ST segment elevation in leads V2 through V6. Arterial blood gas analysis is most likely to show: A. Hypoxemia with normal pH B. Metabolic acidosis C. Metabolic alkalosis D. Respiratory acidosis E. Respiratory alkalosis 1Case, S. M., & Swanson, D. B. (2002). Constructing written test questions for the basic and clinical sciences. Philadelphia: National Board of Medical Examiners. http://www.nbme.org/publications/item-writing-manual.html Maastricht item review process item analyses student comments anatomy physiology review test int medicine item pool committee administration surgery psychology Info to users item bank Pre-test review Post-test review Assessing knowing how General Lessons learned! Competence is specific, not generic Does Assessment is as good as you are prepared to put into it. Shows how Knows how Knows how Knows Assessing showing how Does Shows how Shows how 70-ies: Performance Knows how Knows how assessment in vitro (OSCE) Knows Key findings around OSCEs1 Performance on one station poorly predicted performance on another (many OSCEs are unreliable) Validity depends on the fidelity of the simulation (many OSCEs are testing fragmented skills in isolation) Global rating scales do well (improved discrimination across expertise groups; better intercase reliabilities; Hodges, 2003) OSCEs impacted on the learning of students 1Van der Vleuten & Swanson, 1990 Reliabilities across methods Case- Testing Based Time in Short Oral Long Hours MCQ1 Essay2 PMP1 Exam3 Case4 OSCE5 1 0.62 0.68 0.36 0.50 0.60 0.47 2 0.76 0.73 0.53 0.69 0.75 0.64 4 0.93 0.84 0.69 0.82 0.86 0.78 8 0.93 0.82 0.82 0.90 0.90 0.88 1Norciniet al., 1985 4Wass et al., 2001 2Stalenhoef-Halling et al., 1990 5Petrusa, 2002 3Swanson, 1987 Checklist or rating scale reliability in OSCE1 Examiners Examiners using Test length using Rating In hours Checklists scales 1 0.44 0.45 2 0.61 0.62 3 0.71 0.71 4 0.76 0.76 5 0.80 0.80 1Van Luijk & van der Vleuten, 1990 Assessing showing how Specific Lessons learned! OSCE-ology (patient training, checklist Does writing, standard setting, etc.; Petrusa 2002) Shows how Shows how OSCEs are not inherently valid nor reliable, that depends on the fidelity of the simulation and the sampling of Knows how stations (Van der Vleuten & Swanson, 1990). Knows Assessing showing how General Lessons learned! Objectivity is not the same as Does reliability (Van der Vleuten, Norman, De Graaff, 1991) Subjective expert judgment has Shows how Shows how incremental value (Van der Vleuten & Schuwirth, Eva in prep) Knows how Sampling across content and jugdes/examiners is eminently important Knows Assessment drives learning. Assessing does 90-ies: Performance assessment Does in vivo by judging work Shows how Shows how samples (Mini-CEX, CBD, MSF, DOPS, Portfolio) Knows how Knows Key findings assessing does Ongoing work; this is where we currently are Reliable findings point to feasible sampling (8- 10 judgments seems to be the magical number; Williams et al 2003) Scores tend to be inflated (Dudek et al., 2005) Qualitative/narrative information is (more) useful (Sargeant et al 2010) Lots of work still needs to be done How (much) to sample across instruments? How to aggregate information? Reliabilities across methods Case- Practice Testing Based Video In- Time in Short Oral Long Mini Assess- cognito Hours MCQ1 Essay2 PMP1 Exam3 Case4 OSCE5 CEX6 ment7 SPs8 1 0.62 0.68 0.36 0.50 0.60 0.47 0.73 0.62 0.61 2 0.76 0.73 0.53 0.69 0.75 0.64 0.84 0.76 0.76 4 0.93 0.84 0.69 0.82 0.86 0.78 0.92 0.93 0.92 8 0.93 0.82 0.82 0.90 0.90 0.88 0.96 0.93 0.93 1Norciniet al., 1985 4Wass et al., 2001 7Ram et al., 1999 2Stalenhoef-Halling et al., 1990 5Petrusa, 2002 8Gorter, 2002 3Swanson, 1987 6Norcini et al., 1999 Assessing does Specific Lessons learned! Reliable sampling is possible Does Qualitative information carries a lot of weight Shows how Assessment impacts on work-based learning (more feedback, more reflection…) Knows how Validity strongly depends on the users of these instruments and therefore on the quality of implementation. Knows Assessing does General Lessons learned! Work-based assessment cannot Does replace standardised assessment (yet), or, no single measure can do it all (Tooke report, UK) Shows how Validity strongly depends on the implementation of the assessment (Govaerts et 2007) Knows how But, there is a definite place for (more subjective) expert judgment (Van der Knows Vleuten et al., 2010). Competency/outcome categorizations CanMeds ACGME roles competencies Medical expert Medical knowledge Communicator Patient care Collaborator Practice-based Manager learning & Health advocate improvement Interpersonal and Scholar communication skills Professional Professionalism Systems-based practice Measuring the unmeasurable Does Shows how Knows how “Domain Knows independent” skills “Domain specific” skills Measuring the unmeasurable Importance of domain-independent skills If things go wrong in practice, these skills are often involved (Papadakis et 2005; 2008) Success in labour market is associated with these skills (Meng 2006) Practice performance is related to school performance (Padakis et al 2004). Measuring the unmeasurable Assessment (mostly in vivo) heavily relying on Does expert judgment and qualitative information Shows how Knows how “Domain Knows independent” skills “Domain specific” skills Measuring the unmeasurable Self assessment Peer assessment Co-assessment (combined self, peer, teacher assessment) Multisource feedback Log book/diary Learning process simulations/evaluations Product-evaluations Portfolio assessment Eva, K. W., & Regehr, G. (2005). Self-assessment in the health professions: a reformulation and research agenda. Acad Med, 80(10 Suppl), S46-54. Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education: A meta-analysis comparing peer and teacher marks. Review of Educational Research, 70(3), 287-322. Driessen, E., van Tartwijk, J., van der Vleuten, C., & Wass, V. (2007). Portfolios in medical education: why do they meet with mixed success? A systematic review. Med Educ, 41(12), 1224-1233. General lessons learned Competence is specific, not generic Assessment is as good as you are prepared to put into it Objectivity is not the same as reliability Subjective expert judgment has incremental value Sampling across content and judges/examiners is eminently important Assessment drives learning No single measure can do it all Validity strongly depends on the implementation of the assessment Practical implications Competence is specific, not generic One measure is no measure Increase sampling (across content, examiners, patients…) within measures Combine information across measures and across time Be aware of (sizable) false positive and negative decisions Build safeguards in examination regulations. Practical implications No single measure can do it all Use a cocktail of methods across the competency pyramid Arrange methods in a programme of assessment Any method may have utility (including the ‘old’ assessment methods depending on its utility within the programme) Compromises on the quality of methods should be made in light of its function in the programme Compare assessment design with curriculum design Responsible people/committee(s) Use an overarching structure Involve your stakeholders Implement, monitor and change (assessment programmes ‘wear out’) Practical implications Validity strongly depends on the implementation of the assessment Pay special attention to implementation (good educational ideas often fail due to implementation problems) Involve your stakeholders in the design of the assessment Many naive ideas exist around assessment; train and educate your staff and students. Overview of presentation Where is education going? Where are we with assessment? Where are we going with assessment? Conclusions Areas of development and research Understanding expert judgment Understanding human judgment How does the mind work of expert judges? How is it influenced? Link between clinical expertise and judgment expertise? Clash between psychology literature on expert judgment and psychometric research. Areas of development and research Understanding expert judgment Building non-psychometric rigour into assessment Qualitative methodology as an inspiration Strategies for establishing Procedural measures and trustworthiness: safeguards: Quantitative Qualitative • Prolonged engagement benchmarking Assessor training & approach Criterion approach • Appeal procedures Triangulation • Triangulation acrossInternal validity Truth value Peer examination sources, Credibility Applicability saturation External validity Transferability • Member checking Reliability Consistency Dependability • Assessor panels • Structural coherence Neutrality Objectivity Confirmability • Intermediate feedback cycles • Time sampling • Decision justification • Stepwise replication • Moderation • Dependability audit • Scoring rubrics • Thick description • ………. • Confirmability audit Driessen, E. W., Van der Vleuten, C. P. M., Schuwirth, L. W. T., Van Tartwijk, J., & Vermunt, J. D. (2005). The use of qualitative research criteria for portfolio assessment as an alternative to reliability evaluation: a case study. Medical Education, 39(2), 214-220. Areas of development and research Understanding expert judgment Building non-psychometric rigour into assessment Construction and governance of assessment programmes Design guidelines (Dijkstra et al., 2009) Theoretical model for design (Van der Vleuten et al., under editorial review) Assessment programmes How to design assessment programmes? Strategies for governance (implementation, quality assurance)? How to aggregate information for decision making? When is enough enough? Areas of development and research Understanding expert judgment Building non-psychometric rigour into assessment Construction and governance of assessment programmes Understanding and using assessment impacting learning Assessment impacting learning Lab studies convincingly show tests improve retention and performance (Larsen et al., 2008) Relatively little empirical research supporting educational practice Absence of theoretical insights. CONSEQUENCES OF IMPACT Theoretical Metacognitive regulation model1 strategies OUTCOMES OF •choice LEARNING •effort •persistence DETERMINANTS OF • impact appraisal • likelihood SOURCES OF IMPACT • severity • response appraisal Assessment ACTION • efficacy • assessment strategy • costs • assessment task • value • volume of assessable material • sampling • perceived agency • cues • interpersonal factors • individual assessor • normative beliefs • motivation to comply 1Cilliers, F. Et al., 2010. Areas of development and research Understanding expert judgment Building non-psychometric rigour into assessment Construction and governance of assessment programmes Understanding and using assessment impacting learning Understanding and using qualitative information. Understanding and using qualitative information Assessment is dominated by the quantitative discourse (Hodges 2006) How to improve the use of qualitative information? How to aggregate qualitative information? How to combine qualitative and quantitative information? How to use expert judgment here? Finally Assessment in medical education has a rich history of research and development with clear practical implications (we’ve covered some ground in 40 yrs!) We are moving beyond the psychometric discourse into an educational design discourse We are starting to measure the unmeasurable Expert human judgment is reinstated as an indispensable source of information both at the method level as well as at the programmatic level Lots of exciting developments lie still ahead of us! “Did you ever feel you’re on the verge of an incredible breakthrough?” Literature Cillier, F. (In preparation). Assessment impacts on learning, you say? Please explain how. The impact of summative assessment on how medical students learn. Driessen, E., van Tartwijk, J., van der Vleuten, C., & Wass, V. (2007). Portfolios in medical education: why do they meet with mixed success? A systematic review. Med Educ, 41(12), 1224-1233. Driessen, E. W., Van der Vleuten, C. P. M., Schuwirth, L. W. T., Van Tartwijk, J., & Vermunt, J. D. (2005). The use of qualitative research criteria for portfolio assessment as an alternative to reliability evaluation: a case study. Medical Education, 39(2), 214- 220. Dijkstra, J., Van der Vleuten, C. P., & Schuwirth, L. W. (2009). A new framework for designing programmes of assessment. Adv Health Sci Educ Theory Pract. Dudek, N. L., Marks, M. B., & Regehr, G. (2005). Failure to fail: the perspectives of clinical supervisors. Acad Med, 80(10 Suppl), S84-87. Eva, K. W., & Regehr, G. (2005). Self-assessment in the health professions: a reformulation and research agenda. Acad Med, 80(10 Suppl), S46-54. Gorter, S., Rethans, J. J., Van der Heijde, D., Scherpbier, A., Houben, H., Van der Vleuten, C., et al. (2002). Reproducibility of clinical performance assessment in practice using incognito standardized patients. Medical Education, 36(9), 827-832. Govaerts, M. J., Van der Vleuten, C. P., Schuwirth, L. W., & Muijtjens, A. M. (2007). Broadening Perspectives on Clinical Performance Assessment: Rethinking the Nature of In-training Assessment. Adv Health Sci Educ Theory Pract, 12, 239-260. Hodges, B. (2006). Medical education and the maintenance of incompetence. Med Teach, 28(8), 690-696. Jozefowicz, R. F., Koeppen, B. M., Case, S. M., Galbraith, R., Swanson, D. B., & Glew, R. H. (2002). The quality of in-house medical school examinations. Academic Medicine, 77(2), 156-161. Meng, C. (2006). Discipline-specific or academic ? Acquisition, role and value of higher education competencies. , PhD Dissertation, Universiteit Maastricht, Maastricht. Norcini, J. J., Swanson, D. B., Grosso, L. J., & Webster, G. D. (1985). Reliability, validity and efficiency of multiple choice question and patient management problem item formats in assessment of clinical competence. Medical Education, 19(3), 238- 247. Papadakis, M. A., Hodgson, C. S., Teherani, A., & Kohatsu, N. D. (2004). Unprofessional behavior in medical school is associated with subsequent disciplinary action by a state medical board. Acad Med, 79(3), 244-249. Papadakis, M. A., A. Teherani, et al. (2005). "Disciplinary action by medical boards and prior behavior in medical school." N Engl J Med 353(25): 2673-82. Papadakis, M. A., G. K. Arnold, et al. (2008). "Performance during internal medicine residency training and subsequent disciplinary action by state licensing boards." Annals of Internal Medicine 148: 869-876 Van der Vleuten, C. P., Schuwirth, L. W., Scheele, F., Driessen, E. W., & Hodges, B. (2010). The assessment of professional competence: building blocks for theory development. Best Pract Res Clin Obstet Gynaecol, 24, 703-719.. Literature Petrusa, E. R. (2002). Clinical performance assessments. In G. R. Norman, C. P. M. Van der Vleuten & D. I. Newble (Eds.), International Handbook for Research in Medical Education (pp. 673-709). Dordrecht: Kluwer Academic Publisher. Ram, P., Grol, R., Rethans, J. J., Schouten, B., Van der Vleuten, C. P. M., & Kester, A. (1999). Assessment of general practitioners by video observation of communicative and medical performance in daily practice: issues of validity, reliability and feasibility. Medical Education, 33(6), 447-454. Stalenhoef- Halling, B. F., Van der Vleuten, C. P. M., Jaspers, T. A. M., & Fiolet, J. B. F. M. (1990). A new approach to assessign clinical problem-solving skills by written examination: Conceptual basis and initial pilot test results. Paper presented at the Teaching and Assessing Clinical Competence, Groningen. Swanson, D. B. (1987). A measurement framework for performance-based tests. In I. Hart & R. Harden (Eds.), Further developments in Assessing Clinical Competence (pp. 13 - 45). Montreal: Can-Heal publications. van der Vleuten, C. P., Schuwirth, L. W., Muijtjens, A. M., Thoben, A. J., Cohen-Schotanus, J., & van Boven, C. P. (2004). Cross institutional collaboration in assessment: a case on progress testing. Med Teach, 26(8), 719-725. Van der Vleuten, C. P. M., & D. Swanson, D. (1990). Assessment of Clinical Skills With Standardized Patients: State of the Art. Teaching and Learning in Medicine, 2(2), 58 - 76. Van der Vleuten, C. P. M., & Newble, D. I. (1995). How can we test clinical reasoning? The Lancet, 345, 1032-1034. Van der Vleuten, C. P. M., Norman, G. R., & De Graaff, E. (1991). Pitfalls in the pursuit of objectivity: Issues of reliability. Medical Education, 25, 110-118. Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessment of professional competence: from methods to programmes. Medical Education, 39, 309-317. Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (Under editorial review). On the value of (aggregate) human judgment. Med Educ. Van Luijk, S. J., Van der Vleuten, C. P. M., & Schelven, R. M. (1990). The relation between content and psychometric characteristics in performance-based testing. In W. Bender, R. J. Hiemstra, A. J. J. A. Scherp bier & R. P. Zwierstra (Eds.), Teaching and Assessing Clinical Competence. (pp. 202-207). Groningen: Boekwerk Publications. Wass, V., Jones, R., & Van der vleuten, C. (2001). Standardized or real patients to test clinical competence? The long case revisited. Medical Education, 35, 321-325. Williams, R. G., Klamen, D. A., & McGaghie, W. C. (2003). Cognitive, social and environmental sources of bias in clinical performance ratings. Teaching and Learning in Medicine, 15(4), 270-292.
Pages to are hidden for
"A paradigm shift in Education_ Where too go "Please download to view full document