VIEWS: 2 PAGES: 25 POSTED ON: 9/3/2012
Designing and Implementing School Level Assessments with District Input John Sabatini, Kelly Bruce, and Srinivasa (Pavan) Pillarisetti Educational Testing Service This research is funded in part by grants by the Institute of Education Sciences (R305G04065). Any opinions expressed in this article are those of the authors and not necessarily of Educational Testing Service. Email: firstname.lastname@example.org I would like to acknowledge: Strategic Educational Research Partnership (SERP) Institute BPS Design Team (esp. David Francis), Boston University (Gloria Waters and David Caplan). Harvard University (Catherine Snow, Sky Marrietta, Claire White, Joshua Lawrence) Boston Public Schools Brockton Public Schools Design Team Design Team Assessment Subgroup -- Drs. David Francis, Univ. of Houston; Gloria Waters, Boston University; and John Sabatini, ETS Charge -- advise SERP and BPS in the ways in which the assessment of reading could be made more efficient and productive in the district. 3 Needs Assessment In initial Design team meetings, we learned that district leaders had made significant investments in Reading intervention products Teacher professional development to support literacy. State test results Students took lots of tests (mostly mandated) 4 Problem Definition However, no consistent reading/literacy instruments for Determining the nature or severity of reading problems Identifying the prevalence and profiles of struggling readers Receiving timely results Hence, Inefficient placement of students into intervention programs Weak/insensitive measures of effectiveness of interventions that target subskills 5 Aims Short term Build a battery of screening/diagnostic assessments for school-wide use Estimate prevalence and nature of student reading difficulties Long-term goals Replace other redundant assessments Triage students for specialized testing, thus, Reducing the total time spent on assessment Use instruments to evaluate intervention programs 6 The challenge was to build a battery that a) screened for reading difficulties across a wide range of skills from decoding through vocabulary; b) had acceptable psychometric properties; c) was compact (i.e., could be administered in about one 40-50 minute session); d) could feasibly be implemented school wide; and e) rapid turnaround of score reports (i.e., within 2-3 weeks) f) useful at multiple stakeholder levels (e.g., teacher, school, districts) Computerized delivery and scoring could make it feasible to meet most all of the above design constraints, and was viewed as desirable by BPS. 7 Rationale & Background Literature Theoretical perspective grounded in componential approaches to reading assessment (Cain, Bryant, & Oakhill, 2004; Oakhill, Cain, & Bryant, 2003; Perfetti, Landi, & Oakhill, 2005). Although skilled, proficient readers are characterized by the integrative, interactive nature of processing during any reading task, there is nonetheless evidence for subcomponent skills. Component reading measures can be used as indicators of skill profiles of struggling readers, adding value over and above the types of off-the-shelf comprehension tests the district was using (Sabatini, 2009). 8 Background Literature & Rationale As a general principal, test designed to align with empirical research on struggling reader difficulties and effective instructional programs (e.g., NRC; 1998; NICHD, 2000) As well as with cognitive and linguistic theories of the skills underlying reading development and difficulty (e.g., Kintsch, 2000; Perfetti, Landi, & Oakhill, 2005; Perfetti, Van Dyke, & Hart, 2001; Rayner et al, 2007; Vellutino, Tunmer, Jaccard, & Chen, 2007). 9 Method/Approach: Typical Test Design Steps Step 1: Construct definition Step 2: Design Specifications/Test Blueprint Step 3: Test construction Step 4: Conduct pilot Step 5: Conduct field trial Step 6: Go operational 10 Method/Approach: ‘Use-inspired’ approach Step 1: Define an assessment problem and information need/criteria for success Step 2: Get research assessment team(s) to ‘volunteer’ to commit time and resources to problem (in return for data) Step 3: Cobble together funding to accomplish initial aims (e.g., SERP foundation support; researcher grants) Step 4: Get district approval; find some schools willing (and able) to work with you on pilot implementation Step 5: Design/adapt items; conduct pilot/field studies; analyze data, report back to district and SERP Step 6: Rinse and repeat as necessary. 11 Method/Approach: ‘Use-inspired’ process In sum: Process is variable and complex Involves multiple, iterative pilots Each pilot designed to investigate different research and practical questions, ultimately moving the team towards an assessment solution that met the needs of both district and research stakeholders. 12 Pilot 1, June 2006 – September 2006 Participants: two middle schools Summer 2006, follow up Sept 2006. Objectives/Questions: 1. How prevalent were basic reading skill difficulties -- basic decoding, word recognition, and reading fluency? 2. Can we implement this without schools and districts mutinying? Results: 1. Yes, at least in these schools, significant numbers 2. Yes. [We have a great team.] Conclusion: So far so good, let’s try again. 13 Pilot 2, June 2007 Participants: three Middle and three High schools. CORE battery; random half take ETS or BU subtests Objectives/Questions: 1. Confirm basic reading difficulties finding with externally valid tests. 2. Begin exploring relationship of subtests to external test criteria Results: 1. Substantial numbers of students with word reading difficulties on (TOWRE) (Torgesen, Wagner, & Rashotte, 1999); both BU and ETS tests 2. Moderate to strong correlations with MCAS and other external tests. Conclusion: evidence supported the directions chosen by the intervention design teams to develop vocabulary and basic skills programs; but how to reduce battery? 14 Pilot 3, September 2007 - December 2007 Participants: two Middle and two High schools from previous; two new middle schools. CORE battery; random half take ETS or BU subtests Objectives/Questions: 1. Feasible scoring: test multiple choice vs. oral response measures. 2. How best to combine measures into a feasible, parsimonious mixture of measures that spanned the range of reading skills, Results: 1. Multiple choice can work 2. Indeterminate; total battery too long, but no clear path for simplifying. Conclusion: Rinse and repeat. 15 Pilot 4, Fall and Spring 2008 Participants: two middle schools and a follow up with one school in the Spring. Six subtest battery Objectives/Questions: 1. Improve psychometric and scale qualities of subtests 2. Gather evidence of added value in subtests over total scores. Results: 1. Reliability and other test properties showed improvement, - cross grade performance levels in predicted ranges. - sentence and comprehension tests need improvement. 2 Evidence that subscores were contributing added value over and above total scores (Sabatini, Bruce, & Sinharay, 2009). Conclusion: Given the success of the battery so far, it seems appropriate to implement a larger-scale trial. [Repeat] 16 Pilot 5, Fall 2009 Participants: Field test with over 4000 6th- 8th graders (Form 6) and 500 4th- 5th graders (Form 4, which was new). Forms shared 50% of their content. Objectives/Questions: 1. Improve item and form psychometrics 2. Build scales linked to previous year MCAS scores and 3. refine the score reporting. 4. Pilot versions designed for grades 4 and 5. Results: 1. Reliability and other test properties showed improvement 2. Created a scale for each subtest, aligned with MCAS: Warning, Needs Improvement; Proficient level. 3. Presented with SERP in district meeting so that individual schools could use the data to plan for future literacy needs. 4. Initial data promising, but needs further work. Conclusion: Now have scaled test that is functional for operational needs at 6-8 range. 17 Summary: Pilot Site Information Pilot 1 Pilot 2 Pilot 3 Pilot 4 Pilot 5 Number of Schools 3 6 6 2 12 Number of Students 373 573 960 785 4908 Grades 6-7 6-11 6-9 6-8 4-8 Summary: Reliability Estimates Cronbach’s Alphas (Number of Items) Subtest Pilot 1 Pilot 2 Pilot 3 Pilot 4 Pilot 5 Real Word Recognition .93 (40) .87 (20) Pseudoword Reading .89 (20) Pseudohomophone Judgment .84 (56) Lexical Decision .88 (52) .91 (50) Semantic Similarity .70 (28) .82 (35) .88 (38) Morphological Awareness .81 (18) .65 (10) .82 (24) .89 (30) .91 (32) Sentence Processing .73 (25) .82 (26) Efficiency of Basic Reading Comp. .93 (25) .86 (33) .91 (36) Reading Comprehension .75 (22) .78 (21) Challenges and Lessons Learned • Designing for multiple purposes and stakeholders • Adapting to the fits and starts of district and school level decision-making • Sharing actionable results with stakeholders • Technological infrastructure of schools and districts • Collaborating with other research groups 20 Contact Information John Sabatini Jsabatini@ets.org 21 Method/Approach: Typical Test Design Steps Step 1: Construct definition defines the target population, the content and constructs to be measured, and the inferences or claims that test scores are intended to be used to make. Step 2: Design Specifications specification process which includes defining a test blueprint, test administration and scoring logistics, and constraints. Step 3: Test construction Generate and review items, develop test forms drafting of administration and scoring guidelines. 22 Method/Approach: Typical Test Design Steps Step 4: Conduct pilots Assess basic administration and scoring assumptions and revise accordingly. identify poorly performing items, which are then revised or replaced. Step 5: Conduct field trial sample the target population. Stat analysis/psychometrics Scales, norming, equating (as needed) Validity studies Step 6: Go operational that is, they are administered (or sold) for use under test conditions such that score reports are used to inform educational decisions. 23 Challenges and Lessons Learned • Designing for multiple purposes and stakeholders • Adapting to the fits and starts of district and school level decision-making • Sharing actionable results with stakeholders • Technological infrastructure of schools and districts • Collaborating with other research groups 24 Background Literature & Rationale Assessment of components skills useful in screening struggling readers who may have failed to acquire efficient fundamental skills in the elementary school years. – measures of fluency and word reading efficiency are common in research and classrooms across grade levels (e.g., Deno & Marsten, 2006; Wayman et al., 2007). Reading component proficiency is typically characterized by increased automatic and efficient processing – important in the middle grades and beyond in handling the increasing quantity and complexity of texts (ACT Inc., 2009; Adlof, Catts, & Little, 2006; Jenkins et al., 2003; Kuhn et al., 2010; Rayner et al., 2003; Torgesen, Wagner, & Rashotte, 1999). 25
"Strategy Review Wrap-Up"