Learning Center
Plans & pricing Sign in
Sign Out

Dimensions Affecting the Assessment of Reading Comprehension


									Dimensions Affecting the Assessment of Reading Comprehension
David J. Francis – University of Houston Jack M. Fletcher – University of Texas-Houston Hugh Catts – University of Kansas Bruce Tomblin – University of Iowa
Presented to PREL Focus on Comprehension Forum, New York, Sept. 29, 2004 This research was supported in part by funding from NICHD under PO1 HD31952, HD 30995, HD28172, P01HD 21888, and NIDCD under P50 DC 002746

Factors Affecting the Assessment of Reading Comprehension in Adults

Although they limited themselves to one drink at lunch, Jack and David nevertheless scored more poorly on reading assessments in the afternoon.

• Reading is multi-dimensional • Implications for Assessment

• Factors affecting performance on Comprehension Assessments • Mitigating the effects of decoding on State Assessments for children with RD
• Conclusions

Reading is Multi-dimensional
• Reading is the process of extracting meaning from printed language • There are numerous characteristics of the reader and of the printed language (i.e., text) that affect this process of constructing meaning

• Not surprisingly, there are numerous approaches to the assessment of reading comprehension
• These approaches differ in both inputs and outputs, but also in the purposes of assessment

The 2005 NAEP Reading Framework
Reading is an active and complex process that involves • understanding written text;

• developing and interpreting meaning; and • using meaning as appropriate to type of text, purpose, and situation.
Taken from the 2005 NAEP Reading Framework

Meaningful Variations in Reading Assessments
• Common Variations in inputs
– – – – – Type of text presented (e.g., expository, narrative) Length of text presented Linguistic and orthographic complexity of the text Semantic complexity of the text Demands on background knowledge

The 2005 NAEP Reading Framework

Taken from the 2005 NAEP Reading Framework

The 2005 NAEP Reading Framework

Taken from the 2005 NAEP Reading Framework

The 2005 NAEP Reading Framework

Taken from the 2005 NAEP Reading Framework

Meaningful Variations in Reading Outcomes
• The Rand Reading Research Study Group cited three outcomes of reading:
– Knowledge (critical evaluation and integration of new content with stored information) – Application (utilization of new content to solve problems) – Engagement (involvement with ideas, experience, and styles of texts)

• These relate fairly closely to the NAEP aspects of reading

Taken from the 2005 NAEP Reading Framework

Meaningful Variations in Reading Assessments

• General variations in response formats
– Type of response (multiple choice, cloze, constructed response, extended response, retell) – Length of response – Speed of response

• NAEP varies type and length of response:
– Constructed response (brief and extended) – Multiple choice

• State assessments often use similar options

NAEP Grade 4 – Blue Crabs

By George W. Frame

Nearly every day last summer my nephew Keith and I went crabbing in a creek on the New Jersey coast. We used a wire trap baited with scraps of fish and meat. Each time a crab entered the trap to eat, we pulled the doors closed. We cooked and ate the crabs we caught. Blue crabs are very strong. Their big claws can make a painful pinch. When cornered, the crabs boldly defend themselves. They wave their outstretched claws and are fast and ready to fight. Keith and I had to be very careful to avoid having our fingers pinched. Crabs are arthropods, a very large group of animals that have an external skeleton and jointed legs. Other kinds of arthropods are insects, spiders, and centipedes. Blue crabs belong to a particular arthropod group called crustaceans. Crustaceans are abundant in the ocean, just as insects are on land. The blue crab's hard shell is a strong armor. But the armor must be cast off from time to time so the crab can grow bigger. Getting rid of its shell is called molting. Each blue crab molts about twenty times during its life. Just before molting, a new soft shell forms under the hard outer shell. Then the outer shell splits apart, and the crab backs out. This leaves the crab with a soft, wrinkled, outer covering. The body increases in size by absorbing water, stretching the soft shell to a much larger size. The crab hides for a few hours until its new shell has hardened. Keith and I sometimes found these soft-shell crabs clinging to pilings and hiding beneath seaweed.

Sample NAEP Questions – Blue Crabs
1. Do you think it would be fun to catch blue crabs? Using

information from the passage, explain why or why not.

2. According to the passage, what do blue crabs have in common with all other arthropods? A) They have a skeleton on the outside of their bodies. B) They hatch out of a shell-like pod. C) They live in the shallow waters of North America. D) They are delicious to eat.

Sample NAEP Questions – Blue Crabs
3. The growth of a blue crab larva into a full-grown

blue crab is most like the development of

A) a human baby into a teen-ager B) an egg into a chicken C) a tadpole into a frog D) a seed into a tree 4. Write a paragraph telling the major things you learned about blue crabs.

Meaningful Variations in the Purposes of Assessment
• Variations in the purposes of assessment are also relevant
– Student evaluation (formal; informal; high stakes; low stakes; diagnostic and prescriptive) – School evaluation – Reporting (e.g., NAEP) – Research (where reading might be an outcome, or a predictor, or both)

• These variations in purpose also affect student motivation, which can impact performance

Reading Comprehension and its Assessment are Multidimensional
• The foregoing makes clear that reading is multidimensional in its presentation to the reader, in what is expected of the reader, in the contexts in which it occurs, and in the purposes which it serves • For assessment to be successful, we must be clear of its purposes and mindful of its consequences

Reading Comprehension and its Assessment are Multidimensional
• For some purposes, the choice of assessment may have minimal impact on decisions that we reach • In evaluating students, the choice of which assessment to use appears to have only minimal bearing on final decisions about the relative positions of students (Feinberg, 1990; Campbell, 2002)

• That is not to say that the choice of assessment is inconsequential

Reading Comprehension and its Assessment are Multidimensional
• One consequence stems from the link between assessment and instruction • Given that different response types tend to engage different thought processes (Campbell, 2002), reliance on a single response format in state assessments may adversely narrow instruction

• But this link between response type and cognitive processes may also bear on research findings in reading comprehension

Reading Comprehension and its Assessment are Multidimensional
• If an assessment engages certain cognitive processes, then research that favors that assessment may be biased in favor of factors related to those processes
– For example, if the assessment fails to engage students in evaluation and integration of information, then research will find negligible effects for the higher order linguistic and cognitive abilities sub-serving these processes, or the instructional practices that develop those abilities and processes

Reading Comprehension and its Assessment are Multidimensional
• Psychometrically motivated research can help to shed light on the extent to which such factors may be operating • To see how this might work, let’s consider the role of decoding in comprehension

• Let’s do this in the light of several different studies with samples from different populations

Connecticut Longitudinal Study (Shaywitz et al.)
• This first slide shows correlations over time between the Woodcock Reading Mastery Test Passage Comprehension Scores and WRMT Decoding composite (Letter Word and Word Attack) scores • The CLS sample is an epidemiologic sample from Connecticut, largely white, middle to upper income children (Shaywitz, et al., 1990) with very low attrition (over 90% retention through Grade 9)

Correlation between Decoding and Comprehension on the Woodcock-Johnson from Grades 1-9 (N=395) Comprehension Grade Decoding Grade 1 2 3 4 5 6 7 8 9 1 .89 .75 .70 .64 .58 .59 .53 .49 .52 2 .79 .83 .74 .71 .63 .65 .61 .58 .58 3 .73 .78 .77 .74 .68 .67 .65 .62 .60 4 .69 .74 .74 .73 .67 .68 .65 .62 .62 5 .64 .70 .71 .70 .70 .67 .68 .64 .60 6 .66 .70 .75 .74 .69 .69 .69 .65 .63 7 .66 .71 .72 .72 .67 .67 .69 .65 .63 8 .61 .68 .72 .68 .66 .66 .66 .63 .61 9 .65 .69 .71 .70 .66 .66 .68 .63 .63

EARS Sample
• This next slide shows correlations between two reading measures, WJ PC and the Formal Reading Inventory (FRI), at grades 1 and 2 in a large normative sample from three schools in Houston • The sample is a multi-cohort, longitudinal sample that is balanced for gender and roughly balanced for race/ethnicity.

EARS Sample Demographics
• Total N=945 across 5 cohorts • 3 schools • All children in all K, 1, and 2 invited. (Random sample of those consenting - 80+%) • Free lunch participation ranged from 13% to 30% • Boys and girls were equally represented • Caucasian (54%), African American (18%), Hispanic (15%), Asian (12%) • SES - LC (9%), WC (43%), MC (48%)

Correlations among WJ Passage Comprehension and FRI Silent Reading with Decoding and Vocabulary

Grade 1

Grade 2

WJLW_TC WJR: Letter-Word ID(22) Standard Score WJWA_TC WJR: Word Attack (31) Standard Score

0.83579 <.0001 613 0.75390 <.0001 615 0.78872 <.0001 614

0.44264 <.0001 578 0.42372 <.0001 580 0.47603 <.0001 580

0.81243 <.0001 545 0.69642 <.0001 546 0.83472 <.0001 545

0.45224 <.0001 541 0.39780 <.0001 542 0.48679 <.0001 541

WJRV_TC WJR: Reading Vocab.(32) Standard Score

WIVO_TX WISC-R: Vocabulary

Scale Score

0.33062 <.0001 613

0.21450 <.0001 581

0.41416 <.0001 546

0.27326 <.0001 542

Early Interventions Sample (Foorman, et al.)
• The following slide shows correlations for two measures of comprehension, WJ PC and the CRAB (Fuchs & Fuchs), with three measures of decoding over four years in a freshened longitudinal sample recruited from 17 high poverty schools in two cities.

• The sample was over 95% African American.
• Children were randomly sampled from Kindergarten and Grade 1 classrooms and followed longitudinally through Grade 4.

Correlations for WJ PC and CRAB with three Decoding Measures from Grades 1 and 4 for Ethnic-minority Children from 17 High-Poverty Schools
Grade 1 Grade 4

PREDICTOR PC_W WJ LETTER WORD W SCORE 0.73991 <.0001 1432 CRAB 0.76959 <.0001 504 PC_W 0.75412 <.0001 712 CRAB 0.66656 <.0001 706


0.70442 <.0001 1423
0.71900 <.0001 1425 0.68950 <.0001 504

0.62199 <.0001 504
0.78040 <.0001 501

0.64142 <.0001 712
0.62910 <.0001 632 0.71959 <.0001 706

0.59755 <.0001 706
0.62179 <.0001 626

CLRC Sample of Children with and without Specific Language Impairment (Tomblin and Catts)
• This final sample comes from an epidemiologic study of specific language impairment being directed by Bruce Tomblin and Hugh Catts.

• The children were recruited in Kindergarten and followed longitudinally in Grades 2, 4, and 8. Grade 10 assessment is beginning this year. • There are four groups of children (n=570), Controls (n=268), SLI (n=117), NLI (n=91), and low-cognition (n=94)

Correlations for three comprehension measures with language, decoding, and fluency at Grades 2 and 4 for CLRC Sample (N=570)
Grade 2 WRMT PC Language Grade 2 Decoding Grade 2 DABS GORT C WRMT PC Grade 4 DABS GORT C

0.60108 <.0001 0.89082 <.0001

0.64386 <.0001 0.66515 <.0001

0.60623 <.0001 0.61252 <.0001

0.63387 <.0001 0.78720 <.0001

0.61677 <.0001 0.53132 <.0001

0.61972 <.0001 0.45939 <.0001

Language Grade 4
Decoding Grade 4

0.58880 <.0001
0.85356 <.0001

0.62366 <.0001
0.64396 <.0001

0.59077 <.0001
0.57037 <.0001

0.65048 <.0001
0.83721 <.0001

0.63850 <.0001
0.55260 <.0001

0.62654 <.0001
0.49119 <.0001

Fluency 4

0.76879 <.0001

0.61776 <.0001

0.56673 <.0001

0.72327 <.0001

0.49994 <.0001

0.41729 <.0001

DABS – Diagnostic Assessment Battery Comprehension Score; GORT – Gray Oral Reading

Latent Variable Perspective
• The presence of multiple measures of comprehension across multiple time points allows examination of more precisely formulated hypotheses about the relations among the measures

• The WRMT-PC, DABS, and GORT are all purported to measure reading comprehension. • They correlate reasonably high with one another and with factors known to be associated with reading comprehension.

Latent Variable Perspective
• Do these three measures reflect an underlying ability, which no one test measures perfectly, but which all measure somewhat imperfectly? • A strong version of this idea would say that the three tests share one thing in common, and it is this commonality which reflects the underlying process of reading comprehension.

Latent Variable Perspective
• Such psychometric hypotheses carry with them very specific assertions about
– The relations among the variables – The relations of each of the variables to other variables that are related to the proposed construct – As well as relations to variables not related to the proposed construct

• These assertions are falsifiable, which is what makes psychometric models useful for studying the properties of tests

One Factor Model for CLRC Sample – Multiple Group Analysis (factor loadings constrained equal)
.23 1 Reading Comprehension 2

.79 .74 .69

WJPC2 DABS2 GORT2 .47 .57?

.09 .17

.85 .84
Reading Comprehension 4 WJPC4 DABS4 GORT4 .70 .20 .71



.67 .56

2 = 28.62, df = 32, p = .64, RMSEA = 0.000

Multiple Group Single Factor Model for Comprehension with Language and Decoding at Grade 2 as Predictors
.16 .11 1.00 lang2

Reading Comprehension 2

WJPC2 DABS2 GORT2 .56 .62

.22 .35

.66 .61

.04 .18

.43 .84
decode2 .94

.11 (.61) .91
Reading Comprehension 4 .30 WJPC4 DABS4 GORT4 .71 .18 .61



.63 .53

Note: Standardized covariance (correlation) is shown in parentheses.

2 = 327.16, d.f. = 112, p < .001, RMSEA = 0.12

Correlations among factors in multi-group model

RC2 1.00 RC4 0.88 lang2 0.59 decode2 0.92


lang2 decode2

1.00 0.63 0.78

1.00 0.45


Residual Correlation between RC-2 and RC-4 = 0.61

Problems with the One-Factor Model
• Overall the model fit is not particularly strong, especially in light of the strong support for the one factor model in Grades 2 and 4 without predictors

• Introducing the predictors into the model increases our power for discriminating among the different measures of comprehension, and falsifying the unidimensionality hypothesis • Lack of fit in the model tends to come from the somewhat stronger relationship between decoding and WJ PC than the other comprehension measures, and their somewhat greater relation with language.

Points of Clarification
• It should be noted that all of the models allow for test specific relations over time for any repeated measure • The correlation among the comprehension factors is substantial – ranging from .88 to .98

• The correlations over time for all factors are quite high, indicating a high degree of stability in all factors

• Reliance on a single measure of comprehension may diminish our understanding of the importance of different skills to comprehension. • Inclusion of multiple measures mitigates that bias somewhat, but the comprehension measures in this study do not function as a single factor. • By formulating and testing an explicit model for the set of observed relations among measures, we obtained considerably more information about how the tests actually function than by “eyeballing” the correlations among different individual tests.

• It is worth considering that comprehension might be better conceptualized in a production indicator framework, akin to the relation of SES to parents’ education and family income. • This alternate measurement framework is not without challenges, but may better reflect the complementary roles of decoding, language, background knowledge, long term working memory (Kintsch) and other cognitive processes in the formation of meaning from text.

Accommodations for Children with RD
• The effects of accommodations on the performance of students with disabilities on accountability and other high stakes tests have been the topic of several recent reviews (Chiu & Pearson, 1999; Fuchs, Fuchs, & Capizzi, in press; Sireci, Li, & Scarpati, 2003; Thompson, Blount, & Thurlow, 2000; Tindal & Fuchs, 2000). • These reviews uniformly lamented the relative dearth of empirical studies of the effects of accommodations, noting that the research base was inconsistent and generally not adequate to support firm conclusions about the effects of specific accommodations.

Accommodations for Children with RD
• The lack of consistency across studies reflected the wide range of accommodations evaluated in research, differences in implementation, and the heterogeneity of the students identified as disabled (Sireci et al., 2003).

Accommodations for Children with RD
• For accommodations to be fair, they must not alter the validity of the test • In practice, appropriate accommodations will improve the performance of students with disabilities but have negligible impact on the performance of students without disabilities

Accommodations for Children with RD
• One way to think about this notion of differential impact is to think of the accommodation as removing some construct irrelevant variance from the test • That is, for children with a disability, there are factors which contribute to performance on the test which are not essential elements of the construct of interest and do not affect the performance of children without disabilities

Hypothetical Example
• For example, suppose that for students with RD, reading ability is a source of variance in performance on a math test • In contrast, for children without RD, reading is not a significant factor in math performance

• Then, reading the directions and word problems on the math test to students who are poor in reading would remove this irrelevant source of variance in the math test for students with RD

Possible Accommodations on Reading Assessments for Children with RD
• Children with RD struggle with comprehension because of poor decoding skills • A number of possible accommodations have been proposed and examined
– – – – Increased time to read the test Allowing children to read material out loud Increasing print size Reading passages to children (NOTE: THIS INVALIDATES THE TEST)

Suite of Accommodations in Study
• Extended Time (students allowed to complete assessment on two days) • Examiner read aloud
– Instructions, – Proper nouns, – Item stems

• These were chosen because they could be implemented in practice and because they preserved the validity of the state outcome assessment

Study Design
• 182 Grade 3 children were recruited from 6 districts, 48 schools, and 113 classrooms
– N=91 grade 3 children with RD – N=91 grade 3 children who were average readers from the same classrooms

• Children in each group were randomly assigned to take the TAKS reading assessment either under standard administration conditions (n=47 in each group), or under the accommodations (n=44 in each group)

Study Design
• All children were tested on a practice version of the Grade 3 Texas Assessment of Knowledge and Skills (TAKS) that was built by the test developer during field testing • In addition, children were given the Letter Word and Word Attack subtests of the WJ III and the picture vocabulary subtest of the WLPB (Woodcock, 1991)

TAKS Reading
• No modifications were made of the TAKS booklets; the only modifications were in the instructions provided by the examiners. • The Grade 3 reading assessment of the TAKS involves a practice story and three stories of increasing difficulty.

• Questions are designed to access the literal meaning of the passage, vocabulary, and different aspects of critical reasoning about the material in the paragraph.

TAKS Reading
• Both expository and narrative materials are included. • The TAKS is an untimed measure during standard administration guidelines and students are typically allowed as much time as they need to complete the assessment.

• Like all TAKS tests, the Grade 3 reading comprehension assessment is a criterion referenced assessment that is aligned to state standards.

Study Results
• There was a significant interaction between RD status and Accommodations • Specifically, access to accommodations significantly improved performance, but only for children in the RD group.

• Accommodations had a negligible effect for children without RD.

Study Results
Dyslexia Status NO NO YES EFFECT Effect Size N 47 44 M 2166.7 2184.8 18.1 0.15 sd 116.0 122.0 N 47 44 M 1921.7 2055.3 133.6 0.91 YES sd 132.3 162.5

Interaction Effect Size = .86 Difference in Effect Sizes = .76

Test of Interaction: F (1, 155) = 12.04, p = .0007 Note: Model included random effects of school within district.

Study Results
• In addition to significantly improving average performance levels, accommodations significantly affected student passing rates. • Again, improvements were seen only for children with RD.

• For children with RD, accommodations improved passing rates to 41% from 9% (p < .0005) • Pass rates for children without RD went down slightly from 83% to 77% (p is n.s.)

• The study showed that an appropriate suite of accommodations could substantially and significantly improve performance for children with RD on the Grade 3 TAKS • Effects were seen in both the level of performance and in the percentage of children meeting standards

• These same accommodations had virtually no effect on the performance of children without RD

• Given the goal of the TAKS to assess students’ ability to understand text, the suite of accommodations used here is appropriate • Whether similar accommodations would be successful for older students remains to be determined.

To top