Docstoc

Strategy Review Wrap-Up

Document Sample
Strategy Review Wrap-Up Powered By Docstoc
					Designing and Implementing School Level
     Assessments with District Input

   John Sabatini, Kelly Bruce, and Srinivasa (Pavan) Pillarisetti
                   Educational Testing Service




This research is funded in part by grants by the Institute of Education Sciences
(R305G04065). Any opinions expressed in this article are those of the authors and not
necessarily of Educational Testing Service. Email: jsabatini@ets.org
     I would like to acknowledge:
 Strategic Educational Research Partnership
   (SERP) Institute

 BPS Design Team (esp. David Francis),

 Boston University (Gloria Waters and David
   Caplan).

 Harvard University (Catherine Snow, Sky
   Marrietta, Claire White, Joshua Lawrence)

 Boston Public Schools

 Brockton Public Schools
                      Design Team

    Design Team Assessment Subgroup -- Drs. David
      Francis, Univ. of Houston; Gloria Waters, Boston
      University; and John Sabatini, ETS

    Charge -- advise SERP and BPS in the ways in which the
      assessment of reading could be made more efficient
      and productive in the district.




3
                   Needs Assessment

    In initial Design team meetings, we learned that district
       leaders had made significant investments in

       Reading intervention products
       Teacher professional development to support literacy.
       State test results
       Students took lots of tests (mostly mandated)




4
                   Problem Definition
    However, no consistent reading/literacy instruments for
     Determining the nature or severity of reading problems
     Identifying the prevalence and profiles of struggling
      readers
     Receiving timely results

    Hence,
     Inefficient placement of students into intervention
      programs
     Weak/insensitive measures of effectiveness of
      interventions that target subskills


5
                         Aims
    Short term
     Build a battery of screening/diagnostic
      assessments for school-wide use
     Estimate prevalence and nature of student
      reading difficulties

    Long-term goals
     Replace other redundant assessments
     Triage students for specialized testing, thus,
     Reducing the total time spent on assessment
     Use instruments to evaluate intervention programs


6
     The challenge was to build a battery
                    that
    a) screened for reading difficulties across a wide
       range of skills from decoding through vocabulary;
    b) had acceptable psychometric properties;
    c) was compact (i.e., could be administered in about
       one 40-50 minute session);
    d) could feasibly be implemented school wide; and
    e) rapid turnaround of score reports (i.e., within 2-3
       weeks)
    f) useful at multiple stakeholder levels (e.g., teacher,
       school, districts)
    Computerized delivery and scoring could make it
       feasible to meet most all of the above design
       constraints, and was viewed as desirable by BPS.
7
               Rationale & Background
                      Literature
    Theoretical perspective grounded in componential
      approaches to reading assessment (Cain, Bryant, & Oakhill,
      2004; Oakhill, Cain, & Bryant, 2003; Perfetti, Landi, & Oakhill, 2005).


    Although skilled, proficient readers are characterized by
       the integrative, interactive nature of processing
       during any reading task, there is nonetheless
       evidence for subcomponent skills.

    Component reading measures can be used as
        indicators of skill profiles of struggling readers,
        adding value over and above the types of off-the-shelf
         comprehension tests the district was using (Sabatini, 2009).

8
              Background Literature &
                    Rationale
    As a general principal, test designed to align with
      empirical research on struggling reader difficulties
      and effective instructional programs (e.g., NRC; 1998; NICHD,
      2000)


    As well as with cognitive and linguistic theories of the
      skills underlying reading development and difficulty
      (e.g., Kintsch, 2000; Perfetti, Landi, & Oakhill, 2005; Perfetti, Van Dyke, & Hart,
      2001; Rayner et al, 2007; Vellutino, Tunmer, Jaccard, & Chen, 2007).




9
                 Method/Approach:
               Typical Test Design Steps

     Step 1: Construct definition

     Step 2: Design Specifications/Test Blueprint

     Step 3: Test construction

     Step 4: Conduct pilot

     Step 5: Conduct field trial

     Step 6: Go operational
10
                     Method/Approach:
                    ‘Use-inspired’ approach
     Step 1: Define an assessment problem and information
        need/criteria for success

     Step 2: Get research assessment team(s) to ‘volunteer’ to
        commit time and resources to problem (in return for data)

     Step 3: Cobble together funding to accomplish initial aims
        (e.g., SERP foundation support; researcher grants)

     Step 4: Get district approval; find some schools willing (and
        able) to work with you on pilot implementation

     Step 5: Design/adapt items; conduct pilot/field studies;
        analyze data, report back to district and SERP

     Step 6: Rinse and repeat as necessary.
11
              Method/Approach:
               ‘Use-inspired’ process

                     In sum:
         Process is variable and complex
         Involves multiple, iterative pilots

     Each pilot designed to investigate different
      research and practical questions, ultimately
       moving the team towards an assessment
     solution that met the needs of both district and
                 research stakeholders.




12
             Pilot 1, June 2006 – September 2006
     Participants: two middle schools Summer 2006, follow up
       Sept 2006.

     Objectives/Questions:
     1. How prevalent were basic reading skill difficulties -- basic
        decoding, word recognition, and reading fluency?
     2. Can we implement this without schools and districts
        mutinying?

     Results:
     1. Yes, at least in these schools, significant numbers
     2. Yes. [We have a great team.]

     Conclusion: So far so good, let’s try again.
13
                           Pilot 2, June 2007
     Participants: three Middle and three High schools. CORE battery;
       random half take ETS or BU subtests

     Objectives/Questions:
     1. Confirm basic reading difficulties finding with externally valid tests.
     2. Begin exploring relationship of subtests to external test criteria

     Results:
     1. Substantial numbers of students with word reading difficulties on
        (TOWRE) (Torgesen, Wagner, & Rashotte, 1999); both BU and ETS tests
     2. Moderate to strong correlations with MCAS and other external
        tests.

     Conclusion: evidence supported the directions chosen by the
       intervention design teams to develop vocabulary and basic skills
       programs; but how to reduce battery?


14
           Pilot 3, September 2007 - December 2007
     Participants: two Middle and two High schools from previous; two
       new middle schools. CORE battery; random half take ETS or BU
       subtests

     Objectives/Questions:
     1. Feasible scoring: test multiple choice vs. oral response measures.
     2. How best to combine measures into a feasible, parsimonious
        mixture of measures that spanned the range of reading skills,

     Results:
     1. Multiple choice can work
     2. Indeterminate; total battery too long, but no clear path for
        simplifying.

     Conclusion: Rinse and repeat.
15
                   Pilot 4, Fall and Spring 2008
     Participants: two middle schools and a follow up with one school in
       the Spring. Six subtest battery

     Objectives/Questions:
     1. Improve psychometric and scale qualities of subtests
     2. Gather evidence of added value in subtests over total scores.

     Results:
     1. Reliability and other test properties showed improvement,
        - cross grade performance levels in predicted ranges.
        - sentence and comprehension tests need improvement.
     2 Evidence that subscores were contributing added value over and
        above total scores (Sabatini, Bruce, & Sinharay, 2009).

     Conclusion: Given the success of the battery so far, it seems
       appropriate to implement a larger-scale trial. [Repeat]
16
                              Pilot 5, Fall 2009
     Participants: Field test with over 4000 6th- 8th graders (Form 6) and 500 4th-
        5th graders (Form 4, which was new). Forms shared 50% of their content.

     Objectives/Questions:
     1. Improve item and form psychometrics
     2. Build scales linked to previous year MCAS scores and
     3. refine the score reporting.
     4. Pilot versions designed for grades 4 and 5.

     Results:
     1. Reliability and other test properties showed improvement
     2. Created a scale for each subtest, aligned with MCAS: Warning, Needs
        Improvement; Proficient level.
     3. Presented with SERP in district meeting so that individual schools could
        use the data to plan for future literacy needs.
     4. Initial data promising, but needs further work.

     Conclusion: Now have scaled test that is functional for operational needs at
       6-8 range.
17
Summary: Pilot Site Information


            Pilot 1 Pilot 2 Pilot 3 Pilot 4 Pilot 5
Number of
Schools
              3       6       6       2       12

Number of
Students
             373    573     960      785     4908

Grades       6-7    6-11     6-9     6-8      4-8
           Summary: Reliability Estimates
                                               Cronbach’s Alphas (Number of Items)

Subtest                             Pilot 1      Pilot 2    Pilot 3     Pilot 4      Pilot 5

Real Word Recognition               .93 (40)    .87 (20)
Pseudoword Reading                  .89 (20)
Pseudohomophone Judgment            .84 (56)
Lexical Decision                                                       .88 (52)      .91 (50)
Semantic Similarity                                        .70 (28)    .82 (35)      .88 (38)
Morphological Awareness             .81 (18)    .65 (10)   .82 (24)    .89 (30)      .91 (32)
Sentence Processing                                                    .73 (25)      .82 (26)
Efficiency of Basic Reading Comp.
                                                           .93 (25)    .86 (33)      .91 (36)
Reading Comprehension                                                  .75 (22)      .78 (21)
         Challenges and Lessons Learned
     •    Designing for multiple purposes and
          stakeholders

     •    Adapting to the fits and starts of district and
          school level decision-making

     •    Sharing actionable results with stakeholders

     •    Technological infrastructure of schools and
          districts

     •    Collaborating with other research groups
20
     Contact Information

       John Sabatini
       Jsabatini@ets.org




21
                      Method/Approach:
                   Typical Test Design Steps
     Step 1: Construct definition
      defines the target population,
      the content and constructs to be measured, and
      the inferences or claims that test scores are intended to be
        used to make.

     Step 2: Design Specifications
      specification process which includes
      defining a test blueprint,
      test administration and scoring logistics, and
      constraints.

     Step 3: Test construction
      Generate and review items,
      develop test forms
      drafting of administration and scoring guidelines.
22
                      Method/Approach:
                   Typical Test Design Steps
     Step 4: Conduct pilots
      Assess basic administration and scoring assumptions and
        revise accordingly.
      identify poorly performing items, which are then revised or
        replaced.

     Step 5: Conduct field trial
      sample the target population.
      Stat analysis/psychometrics
      Scales, norming, equating (as needed)
      Validity studies

     Step 6: Go operational
      that is, they are administered (or sold) for use under test
        conditions such that score reports are used to inform
        educational decisions.

23
         Challenges and Lessons Learned
     •    Designing for multiple purposes and
          stakeholders

     •    Adapting to the fits and starts of district and
          school level decision-making

     •    Sharing actionable results with stakeholders

     •    Technological infrastructure of schools and
          districts

     •    Collaborating with other research groups
24
                Background Literature &
                      Rationale
     Assessment of components skills useful in screening
       struggling readers who may have failed to acquire
       efficient fundamental skills in the elementary school
       years.
        – measures of fluency and word reading efficiency are
          common in research and classrooms across grade levels (e.g.,
           Deno & Marsten, 2006; Wayman et al., 2007).



     Reading component proficiency is typically
       characterized by increased automatic and efficient
       processing
        – important in the middle grades and beyond in handling the
          increasing quantity and complexity of texts (ACT Inc., 2009; Adlof, Catts,
           & Little, 2006; Jenkins et al., 2003; Kuhn et al., 2010; Rayner et al., 2003; Torgesen, Wagner, &
           Rashotte, 1999).



25

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:9/3/2012
language:English
pages:25