Valid Uses of Student Testing as part of Authentic, Comprehensive Student Assessment,
A Statement of Concern from Canada’s School Principals
The members of the Canadian Association of Principals (CAP) have noted that several provincial/territorial governments are increasing their use of province/territory wide (large-scale) external tests to measure and report on student achievement in selected academic subjects. This trend has been in response to public pressure urging decision-makers to make school systems more transparent and accountable. School Principals believe that schools and educators must be accountable to parents, their respective education authorities and the general public. Students tests are a valid part of student assessment programs that are part of that accountability. School Principals in Canada support reasoned efforts to make schools more accountable and to provide better, more timely information to parents, students and teachers. School Principals also support valid uses of a variety of tests as part of a range of measures that comprise an authentic and comprehensive system of student assessment. However, the implementation of such testing programs has resulted in several unintended consequences for students and schools. Principals are concerned about those consequences, several of which are noted in this statement. The results of these large-scale tests are being published by school and school district within jurisdictions with little apparent regard to their valid but limited use in improving student performance or system monitoring. Further, such tests are often not situated within a coherent policy and accountability framework based on learning and overall assessment of student achievement. As well, the results of these tests are often not correlated nor analyzed with respect to context, student and family characteristics and other factors that determine school/student success. Such tests often do not provide information that helps students and educators improve their practices. These invalid uses of large scale testing have been further exacerbated by the media and narrowly focused interest groups or elected officials. This statement has been prepared because school-based administrators throughout Canada are increasingly concerned that current policies and practices on student testing are leading to: unfair and invalid assessment of student, school and system achievement a secretive or unintended shift of priorities to focus on a narrow range of student knowledge and literacy/numeracy skills despite the continuing stated and pressures on schools to achieve broad mandates related to other knowledge, vocational preparation and social development. ill-informed and non-productive public debates about schooling that are based on single test results and rankings without obviously required contextual information that is readily available few actual decisions that explicitly lead to re-allocated or increased resources Our concerns as school-based leaders go beyond our frustration with media sensationalism and elected officials or lobby groups that misconstrue results to support their preconceptions. School Principals around the world believe that many current student testing policies and related accountability practices contain flaws that detract from what should be our primary goal of improving learning. Instead, many jurisdictions appear to be more concerned with providing comparative statistics on a narrow range of student abilities resulting in an incomplete and misleading picture presented to the general public.
Testing should be based on what we know about learning
“Promoting children’s learning is a principal aim of schools. Assessment lies at the heart of this process. It can provide a framework in which educational objectives may be set and pupils’ progress charted and expressed. It can yield a basis for planning the next steps in response to children’s needs... it should be an integral part of the educational process, continually providing both “feedback” and “feed-forward”. It therefore needs to be incorporated systematically in teaching strategies and practices at all levels. (National Curriculum Task Group on Assessment and Testing (TGAT): A Report, 1998) Learning Principles Learning needs to be activity-based. Learning needs to include co-operative learning opportunities. Learning is dependent on situations which are meaningful to the child. Learning needs to address attitudes and values. Learning needs to encompass the use of literacy, numeracy and technology skills. Learning needs to develop critical thinking skills. Learning is affected by developmental stages. Learning is affected by evaluation strategies. Learning is dependent on developing communication skills. Learning is reinforced through integrated experiences. Learning needs to be promoted without bias to gender. Some of the concerns that have been identified by principals are:
Most large-scale tests currently in use in Canada do not adequately reflect and report on a variety of forms of intelligence and learning styles. Large-scale tests in Canada do not cover all required subjects and courses. Instead they examine only a selected few subjects. Testing and publishing of results creates pressure on the schools to focus on the tested subjects. If the following subjects that are currently part of a well-rounded, required school curriculum (reading, writing literature, math, science, history, geography, health/social development, family studies, physical education, computers skills, career education, the arts, a second language and religion/ethics/moral/spiritual development) are indeed less important than literacy and numeracy. Testing should respect principles of fair assessment.
Such principles include: Assessment and evaluation are continuous Assessment and evaluation are an integral part of the teaching-learning process. Assessment and evaluation take into account a child's learning profile. the learning profile is defined as including the child's cognitive, affective, and psychomotor domains, development level and learning style. Assessment and evaluation are designed specifically to assess particular and clearly stated instructional goals, objectives and educational outcomes in a formative way. Process skills are assessed and evaluated as well as content knowledge. Assessment and evaluation methods should be free from language, gender, cultural or racial bias. Assessment and evaluation methods should be valid. Reading and writing are viewed as processes during assessment and evaluation. Students and parents/guardians are active participants in assessment and evaluation. Evaluation procedures and results are fair and are expressed in clear language to the students and parents/guardians.
Assessments of and for learning are both important. Since we already have many assessments of learning in place, if we are to balance the two, we must make a much stronger investment in assessment for learning. We can realize unprecedented gains in achievement if we turn the current day-to-day classroom assessment process into a more powerful tool for learning. We know that schools will be held accountable for raising test scores. Now we must provide teachers with the assessment tools needed to do the job. It is tempting to equate the idea of assessment for learning with our more common term, "formative assessment." But they are not the same. Assessment for learning is about far more than testing more frequently or providing teachers with evidence so that they can revise instruction, although these steps are part of it. In addition, we now understand that assessment for learning must involve students in the process. When they assess for learning, teachers use the classroom assessment process and the continuous flow of information about student achievement that it provides in order to advance, not merely check on, student learning. They do this by: understanding and articulating in advance of teaching the achievement targets that their students are to hit; informing their students about those learning goals, in terms that students understand, from the very beginning of the teaching and learning process; becoming assessment literate and thus able to transform their expectations into assessment exercises and scoring procedures that accurately reflect student achievement; using classroom assessments to build students' confidence in themselves as learners and help them take responsibility for their own learning, so as to lay a foundation for lifelong learning; translating classroom assessment results into frequent descriptive feedback (versus judgmental feedback) for students, providing them with specific insights as to how to improve; continuously adjusting instruction based on the results of classroom assessments; engaging students in regular self-assessment, with standards held constant so that students can watch themselves grow over time and thus feel in charge of their own success; and actively involving students in communicating with their teacher and their families about their achievement status and improvement. In short, the effect of assessment for learning, as it plays out in the classroom, is that students keep learning and remain confident that they can continue to learn at productive levels if they keep trying to learn. In other words, students don't give up in frustration or hopelessness. Assessment for Learning Principle 1: Assessment for learning should be part of effective planning of teaching and learning Principle 2: Assessment for learning should focus on how students learn. Principle 3: Assessment for learning should be recognized as central to classroom practice Principle 4: Assessment for learning should be regarded as a key professional skill for teachers. Principle 5: Assessment for learning should be sensitive and constructive because any assessment has an emotional impact. Principle 6: Assessment should take account of the importance of learner motivation. Principle 7: Assessment for learning should promote commitment to learning goals and a shared understanding of the criteria by which they are assessed Principle 8: Assessment for learning should ensure that learners receive constructive guidance about how to improve
Principle 9: Assessment for learning should develop learners' capacity for self-assessment so that they can become reflective and self-managing Principle 10: Assessment for learning should recognize the full range of achievements of all learners Characteristics of Assessment that Promotes Learning The application of these principles has consequences for the assessment process and creates an assessment process that has certain desirable characteristics. Assessment for learning is embedded in a view of teaching and learning of which it is an essential part: It involves sharing learning goals with pupils, helps pupils to know and to recognize the standards they are aiming for and involves pupils in self-assessment; Assessment provides feedback that leads to pupils recognizing their next steps and how to take them. It is underpinned by confidence that every student can improve and involves both teacher and pupils reviewing and reflecting on assessment data. Some of the concerns that school principals have described in regards to assessment for learning are:
Many large-scale tests measure only the cognitive domain and often use only multiple-choice or limited response questions. This reduces the learning being measured to its lowest from and does not truly reflect the wide range of skills, knowledge and attitudes that schools are required to promote among students. This narrow range of the tests narrows the type of education being offered, whereas the testing done should reflect the broad goals of education not the reverse. The test measures more than simple memorization skills within the cognitive domain. The tests should measure student cognition such as performing required procedures, communicating their understanding, solving routine and non-routine problems and provides an opportunity for students to generalize, prove and make educated conjectures about the content. Tests are often based on curricula whose learning outcomes have been only recently or poorly developed and communicated. Teachers often have not had an adequate opportunity to learn the curricula and adapt their teaching practices. The content of the required curricula to be used as the basis of the test is rarely reviewed and prioritized to ensure that the stated outcomes can be achieved within the time actually available for instruction. The learning outcomes for the curricula that are tested are not always stated clearly and in measurable, specific terms. Studies done in other jurisdictions indicate that large scale testing can result in teachers choosing to leave schools that repeatedly score lower in the publicized rankings. This means that the more qualified and experienced teachers often leave such schools. These schools often serve disadvantaged students or those with special needs. The consequence is that these students are increasingly under-served by the public school system. The purposes of each large-scale test should be clearly stated
The purposes and rationale for examining the knowledge, attitudes, skills that are to be measured by a test should be presented and discussed before beginning the development of the test. That rationale should explicitly describe the purposes of the test, how it will help students, teachers and parents to improve learning and how the results will be used in decision-making. A clear distinction should be made between tests designed for improving learning through individual student-teacher and parent dialogue and tests that are designed for monitoring program or policy effectiveness.
Some of the concerns that school principals have identified regarding the purposes of testing are: A clear distinction between tests used for accountability and monitoring purposes and those used for the improvement of student learning is not always being maintained. For example, province-wide tests involving all students are unnecessary if the purpose is to monitor program effectiveness. A random sample of students will suffice. Because of their size and number of students, the marking of such tests takes months and feedback is often provided several months after the test, making such tests meaningless to students, teachers and their parents. The participation of disadvantaged groups (cultural differences, disability, language, access to community and family resources etc) is not sufficiently taken in to account in the preparation, delivery and interpretation of the test.
5. The development, administration and interpretation of the tests should respect well-defined professional practices and standards as described by the Canadian Psychological Association The government or educational authority considering the development of a large scale test should also describe, in detail, the consultative processes and independent expert reviews that will be used to develop, implement, interpret and announce the results of the tests. The tests used by educational authorities should be developed, administered, scored, interpreted, reported and acted upon in explicitly defensible ways that are based on solid research evidence. The concerns identified by school-based administrators include: Educators are finding that large-scale testing is placing a disproportionate burden on schools and the balance between program development and program evaluation is not being maintained. For example, in one jurisdiction, one dollar per pupil is being spent on curriculum development while three dollars are being spent in large scale student testing. Large-scale tests developed by governments are often not reviewed by independent experts and others who reflect the diversity of the test takers, parents, constituencies and educators involved with the curricula/program/schools. Large-scale tests results are often announced and presented to the public as being reliable, when the research indicates that such tests require a minimum of three years before they should considered to be reliable. Large-scale tests are not often subjected to scrutiny through independent surveys to determine if they are considered to be credible by representative samples of teachers, parents and students. Often the estimates of reliability and standard errors of measurement are not well understood by the media and the general public, leading to misinterpretation and false claims. The confidence intervals should be provided as well as the procedures used to obtain samples and the nature of the populations being studied should be described. These concepts and cautions should be clearly articulated in all written documents, emphasized in all announcements and explained clearly at every opportunity to parents, teachers and the media. Baseline and benchmark (interpretation of “high” standards) levels are often done after the test is completed and there are reports that indicate that such standards are manipulated by education authorities after the test results are compiled. Benchmarks, or expected standards for achievement, should be based on scientific, published evidence that these are achievable and that achieving these student outputs will have a long term impact on outcomes later in the student’s life and career. These benchmark expectations should stated in advance of the
administration of the test. The results of the tests are often not correlated/interpreted with the number of times the participating students have taken such tests. For example, the students in Alberta are often among the highest in regards to international and national tests. The students in Alberta are also the students who most often take such tests. Similarly, students in Asian counties often score very well in large-scale tests. Students in such countries are tested far more often than students in other countries. Often the results of one test are be equated with similar tests, or with earlier versions of the test or with similar tests at different grade levels/ages. This is inappropriate unless this has been specifically designed as such and planned for in the development of the test. Often, such comparisons are done on the basis of “item response theory” and are therefore the product of arcane mathematical and statistical interpretations. Media representatives are often not fully briefed about the interpretation of the results and meetings are often not held with media editorial staff to explain the results.
6. There should be public, planned and inclusive decision-making process to examine the test results in conjunction with other assessment results. Such decisions should include consideration of potential re-allocation or allocation of human, administrative and financial resources: Some of the concerns identified by school administrators are: Remedial and support programs are not readily available for students who fail the test. Often, a timetable for formal, public review and decision-making based on the results is not published when the test is being administered.
7. The results of student tests should be combined with other data sources into comprehensive, contextual school/community profiles that would made available to educators and parents for planning purposes. These data could include factors such as level of education of people living in the community, level of income of people living in the community, first language(s) of the people living in the community descriptive and administrative data on the number and nature of programs offered in the school, covering several aspects of the school environment, number and nature of students participating in extra-curricular and community service activities, the qualifications of teachers assigned to teach the subjects/assignments, then number of parent volunteers and the accessibility and service levels of student health, social serviced, youth, justice and employment workers from the community that are coordinated with the school programs These data sources should be grouped and accessible so that parents, educators and others can make reasonable comparisons with their own trends over several years and with the profiles of similar schools in other jurisdictions. 8. The results of student tests should be included in an Indicators system that truly monitors all of the relevant factors that affect learning. School Systems should benefit from well developed and well implemented Indicators or reporting systems. Unfortunately, Indicators systems have been badly misused by educational authorities. Serious errors and invalid uses of Indicators include: Monitoring only student outputs and not reporting on context (student characteristics), inputs (financial and human resources), processes (program status and implementation) and long term outcomes (relevance to post-graduation life and career achievement)
Poor consultation procedures Reluctance to publish results by province or state unless the authorities control the data and the reporting mechanism Consequently, such Indicators systems are a source of concern for school-based administrators 9. We suggest these actions to implement these principles and practices should be undertaken by national, state/provincial and local education authorities 1. Testing for monitoring and accountability purposes should be clearly separated from testing for student improvement and progress. For example, to monitor the effectiveness of a system or program, it is not necessary, or cost effective, to test all students when a random sample can provide the same information. 2 Ensure that the content of all curricula is achievable and that consensus seeking during the curriculum development process has not resulted in inflated or unachievable outcomes Ensure that all learning outcomes within curricula be stated clearly. Ensure that stated leaning outcomes are based on scientific evidence wherever it is available 3. Provide educators with the means to create authentic, alternative assessment tools such as scoring rubrics, student portfolios and locally developed tests and quizzes. 4. Ensure that secondary analyses are done of the large-scale tests to measure the impact of factors such as socio-economic status, family and student characteristics, community resources and program resources such as the school facilities, equipment, teacher qualifications and school organization. 5. Allow for at least three years of test development and piloting before test results are announced widely and before they are considered to be reliable. 6. Seek independent confirmation and evidence that current tests are not biased for certain groups of students or towards certain learning styles. Also seek independent confirmation that current tests are appropriate for their stated purposes, that the results are leading to meaningful analysis, policy- making, program development and professional development. Submit large- scale tests to independent assessors to ensure that they meet the standards defined by professional authorities and experts such as the CPA or APA standards. 7. Seek independent confirmation that tests are not leading to negative, unintended consequences such as "teaching to the test”. 8. Provide schools the means to collect, analyze and monitor local community, student and program data so that comprehensive profiles of school-communities can be created, monitored used in school planning and coordination with other agencies. Use technology to enable schools to access and analyze their own data. 9. Consult stakeholders on how the results of large-scale tests should be released to the public. Seek to establish a protocol with stakeholders on how -test results and other data should be discussed. Hold orientation sessions with journalists prior to the release if such data to ensure that they are aware of the science underlying appropriate interpretations of the data from large-scale tests. In consultation with stakeholders, define a transparent process for the review, analysis and joint interpretation of the results of large-scale assessments. This process should note when and how decisions based in the results will be made. 10. Provide easy access to independent experts to conduct secondary analyses of the results of large-scale tests 11. Consolidate the data collection procedures for international, national and state/provincial tests and surveys so that the response burden on schools is reduced.