THE ENGLISH LANGUAGE DEVELOPMENT ASSESSMENT
A PRODUCT OF THE LIMITED ENGLISH PROFICIENT
STATE COLLABORATIVE ON ASSESSMENT AND STUDENT
STANDARDS (LEP SCASS)
DEVELOPMENT AND VALIDATION OF ELDA ITEMS, TEST FORMS, AND ASSESSMENT PROCEDURES
In the last two years, CCSSO, the eighteen member states of the LEP SCASS and partners at the American
Institutes for Research, the Center for the Study of Assessment Validity and Evaluation at the University of
Maryland, and Measurement Incorporated have developed the English Language Development Assessments
(ELDA). All parts of the ELDA in grades 3-12 have been developed following required practices, described in the
Standards for Educational and Psychological Testing. These are the standards for test development, item and
test psychometric analysis, and validation of test scores followed by state assessments in content areas (e.g.,
reading, mathematics, and science) and by commercial test publishers in the US. Active participation of
assessment and English language acquisition experts from the LEP SCASS states and of CCSSO has been
crucial to the successful implementation of these standards for ELDA.
With Nevada as the lead state, the CCSSO LEP SCASS applied for and received an Enhanced Assessment
Grant under Section 6112b of the No Child Left Behind Act. Once Nevada had been notified of the award, SCASS
analyzed the ESL/ELD standards of member states and developed a set of core standards on which to base the
ELDA tests. LEP SCASS and its collaborators also developed 1) the performance level descriptors that define
what a student is able to do from the beginning level to the level of English proficiency needed to succeed in the
mainstream content classroom; and 2) the framework and test specifications for the ELDA, which includes tests in
the four domains of listening, speaking, reading and writing; vertical alignment across grade clusters; and test
items designed around four topic areas, Math, Science and Technology; English Language Arts; Social Studies;
and School Social for grades 3-12.
Item development experts at the American Institutes for Research, with support from English language
assessment experts from the University of Maryland who also have worked in Maryland and Virginia school
systems and from ELL teachers from across the LEP SCASS states,developed items for administration in a pilot
test in May 2003. A total of 12 states, including 320 students in grades 3-12 in 31 schools, participated in this pilot
test. Pilot test students came from more than 20 language backgrounds and more than 30 countries. The pilot test
included items in reading, writing, listening, and speaking. Results from item analyses, student focus group
reports, and teacher reports indicated that students understood test administration procedures and were able to
give their best performances on all four tests. Test score reliabilities ranged between .77 and .92, similar to score
reliabilities achieved in state assessment programs. AIR and SCASS members revised administration procedures
to make administering the test easier for teachers and taking the test easier for students. AIR item developers
revised the pilot test items and used them as models to develop items for field testing in 2004.
Twelve states participated in the spring 2004 field test of ELDA. Approximately 850 items across all four content
areas were field tested; approximately 6,000 students in grades 3-12 participated. At the same time, three states
conducted census administrations, which included all ELL students in those states. All items had passed content
alignment, language accessibility, and fairness and sensitivity reviews by AIR item developers and LEP SCASS
members. All items were submitted to classical and IRT analyses and screened for difficulty, discrimination,
differential item functioning (DIF), and fit to the Rasch IRT model. AIR psychometricians calibrated and vertically
linked all items that passed statistical reviews to create item banks in reading, writing, listening, and speaking for
grades 3-12. Test forms horizontal and vertical links were highly stable. Test score reliabilities for field test forms
(after removal of screened item) ranged from .88 to .92. Measurement Incorporated conducting standard setting
analyses, based on input from teachers in all participating states, to identify cut scores associated with the ELDA
proficiency levels (e.g., Full English Proficiency). The three states that conducted census administrations used the
scores from spring 2004 to report percentages of students in AMAO levels to the US Department of Education.
Activities planned from December 2004 to September 2005 include the following:
! Revise a subset of 2004 field test items and develop new items for ELDA grades 3-12;
! Develop, review, revise, and approve test items for ELDA grades K-2;
! Develop and administer field tests forms for grades K-12 in spring 2005;
! Train administrators for the 2005 administration of ELDA;
! Score and analyze results from 2005 ELDA administration;
! Conduct standards setting on operational form #1 for the ELDA grades K-12; and
! Construct operational test forms #2 and #3 based on 2005 field test data.
! Five states have indicated that they plan to administer ELDA on a census basis (test all LEP students)
Operational form #1 in grades 3-12 will be available for states to assess all of their LEP students [census testing].
We encourage states to do so since it gives us leverage with the operations contractor [the more students the
lower the per student price of testing]. By the end of September 2005, states should expect to have the following
products: three operational forms for each of five clusters [K, 1-2, 3-5, 6-8, 9-12] and four domains [reading,
writing, speaking, and listening] for a total of 60 individual tests. See breakdown of test forms in table below.
Grade cluster/ Reading Writing Listening Speaking Total Test
Domain Forms/ cluster
Kindergarten 3 forms 3 forms 3 forms 3 forms 12 forms
Grades 1-2 3 forms 3 forms 3 forms 3 forms 12 forms
Grades 3-5 3 forms 3 forms 3 forms 3 forms 12 forms
Grades 6-8 3 forms 3 forms 3 forms 3 forms 12 forms
Grades 9-12 3 forms 3 forms 3 forms 3 forms 12 forms
Total Test 15 forms 15 forms 15 forms 15 forms 60 forms
INITIAL RESULTS FROM ELDA VALIDITY STUDIES
An important factor in the design of the ELDA was its inclusion of empirical validity studies to inform the
development of items and the test forms, with a focus on the degree to which the assessment measured
proficiency in academic English deemed necessary for success in school. While it is standard practice to use
expert judgment and statistical item review to determine test content, it is unusual for developers of educational
tests to conduct such extensive empirical analyses to guide test design as was done with the ELDA. CCSSO
worked with an independent research group, the Center for the Study of Assessment Validity and Evaluation at
the University of Maryland, to design and analyze two sets of studies investigating the validity of the assessment.
The first set of studies was conducted during a pilot test of the initial items developed for the ELDA. Ninety to 130
students in each grade cluster participated in focus group interviews designed to investigate whether the items
were addressing the intended knowledge and skills, that is, to assess the construct integrity of the items.
Findings from the interviews and from feedback from teachers who administered the pilot tests were used to
refine piloted items and to guide the development of new items.
The second set of studies was conducted during the large-scale field-test (approximately 1,000 students per item)
of the ELDA. These studies investigated the validity of interpretations and decisions based on the ELDA
! The first study used expert review of items, teacher judgments of students, and a latent class analysis of
student performance to determine whether the items were consistently and reasonably differentiating levels
of proficiency as defined by the five ELDA proficiency levels. Results of these analyses indicated that the
ELDA measures relevant, ever increasing, complex skill sets in the four domains. The item analyses from
this study were also used to help select the items for the first operational form of the test.
! The second study focused on the relationship of ELDA domain scores to teacher judgments of proficiency
and to student scores on two other standardized English language proficiency tests. Results indicate that
the ELDA better differentiates mid-range and more sophisticated language proficiency by domain than the
other tests, supporting the validity of inferences about students’ academic language proficiency stemming
from the ELDA.
! The third study reviewed the structure of the test, and findings indicate that the language skills assessed are
for the most part cumulative—more complex skills build on simpler skills for most language constructs.
Finally, analyses were broken down by several important student factors, and findings indicate that the ELDA
functions similarly for students who are in different types of programs, for students with different language
backgrounds, and for students at different grade levels within a grade cluster. These findings support the
idea that the ELDA measures language proficiently consistently for important subgroups of English language
As a set, these studies provide support for using the ELDA to appropriately measure the academic English
language proficiency of English language learners in grades 3 through 12.
LEP SCASS member states currently participating in the ELDA development: Georgia, Indiana, Iowa, Kentucky,
Louisiana, Nebraska, Nevada, New Jersey, Ohio, Oklahoma, South Carolina, Virginia and West Virginia.
LEP SCASS member states/agencies currently not part of the ELDA network: Hawaii, North Carolina, Oregon,
Rhode Island, Texas and Department of Defense Education Agencies
For more information, please contact Julia Lara at JuliaL@ccsso.org or at 202-336-7042 or Barbara Carolino at
BarbaraC@ccsso.org or at 202-336-7055.