EPSB Work Session Validation Studies & Cut Scores January 21, 2007 Purposes of Testing/Cut Scores To meet statutory requirements and to select individuals who have a minimum level of academic proficiency & content knowledge to be presumed capable of delivering education to children in the public schools. Testing partners with – Admission requirements into teacher preparation programs, including Praxis I tests, college admission exams, grade point averages, and other academic proficiency assessments Kentucky Teacher Internship Program (KTIP) Inferences What decision are we making? What purpose are we serving? What’s the best way to make a decision? Inferences Two Types of Inferences 1. Select those with the highest levels of qualification 2. Select those who have minimum qualifications Minimum = low Negative Consequences Negative Consequences of increasing cut scores: 1. Reduce the number of qualified applicants for certification 2. Increase the number of emergency and conditional certificates 3. Create teacher shortages 4. Reduce institutional QPI scores 5. Disparate impact In any case, the same people will be teaching. Positive Consequences Positive consequences of increasing cut scores: 1. Decrease the number of false positives 2. Perhaps marginally increase the quality of teaching Research When teacher basic skills test scores have been used as predictors of teacher performance, few studies have shown any strong relationship, and some studies have shown no relationship at all. Studies that have attempted to relate content knowledge to teacher performance (most of them confined to mathematics and science) have shown modest results at best. Technical Considerations Standard Error of Measurement (SEM) Methodologies for establishing cut scores – Angoff – Contrasting Groups – Bookmark – Yeager-Mills (Body of Work) – Other Angoff Method • Results in determination of whether a test is valid for use in KY & provides a recommended cut score • Most widely used • Requires teacher judgments • Held up in most research studies as the method that produces the most stable results • Generally accepted by the courts & professionals in the field Angoff Method Process Validation Process/Cut Score Recommendation • A panel of teachers representative from across the state and from each grade level and content area are selected to review items on each test. • Teachers are asked to estimate the proportion of persons with minimally acceptable skills in the content area who would be expected to get each item right. (At least 70% of the items must be judged job relevant in order for the test to be deemed a valid measure of performance). • After all items have been rated, the judgments of the teachers are combined to recommend a cut score for the whole test. Cut Score Recommendations Based on Decision Rules applied since May 1999 Accept the recommendation of the validation panel unless: a. The recommendation fell below the current passing score, or b. The recommendation fell below the Southern Regional Education Board (SREB) average, or c. The recommendation and SREB score fell below the 15th national percentile, or d. The recommendation exceeded the 25th national percentile Tests Validated Since May 1999 & Corresponding Cut Scores Validation Panel Test Recommended Regulatory Test Number Cut Score Cut Score Earth Science: Content Knowledge 0571 145 145 Principles of Learning & Teaching: Grades K-6 0522 164 161 Principles of Learning & Teaching: Grades 5-9 0523 166 161 Principles of Learning & Teaching: Grades 7-12 0524 158 161 Speech Communication 0220 570 580 Education of Exceptional Students: Core Content Knowledge 0353 157 157 Elementary Education: Content Knowledge 0014 148 148 Education of Exceptional Students: Mild to Moderate Disabilities 0542 165 172 Education of Exceptional Students: Core Content Knowledge 0353 157 157 Biology: Content Knowledge 0235 154 146 Chemistry: Content Knowledge 0245 161 147 Physics: Content Knowledge 0265 145 133 Challenges Evidence must support that each test chosen is: #1: valid for the purpose for which it is used #2: anchored in reasonable expectations of job performance #3: a reliable measure #4: does not unfairly disadvantage members of demographic groups ETS Assurance The Educational Testing Service (ETS) employs many psychometricians and conducts many test studies that influence test development, maintenance, and revision, including bias reviews, DIF analysis, reliability coefficients, and calculation of p-values. Recommended Framework • Cut scores between the 15th – 25th percentiles, inclusive • Greater than or equal to current cut score • Comparable to SREB average cut score • Use disparate impact estimates as indicators of possible program performance reviews, combined with other information Legal Considerations 14th Amendment Due Process and Equal Protection Clause Section 1 “. . . nor shall any State deprive any person of life, liberty, or property, without due process of law; nor deny to any person within its jurisdiction the equal protection of the laws.” Section 5 The Congress shall have the power to enforce, by appropriate legislation, the provisions of this article. Legal Considerations Civil Rights Act of 1964 Title VII prohibits discrimination in employment on the basis of race, color, religion, national origin, or sex. Title VI prohibits discrimination in federally funded programs or activities on the basis of race, color, or national origin. Legal Considerations Testing, in and of itself, is usually only determined to be discriminatory if it has a disparate impact on a protected class. Prima Facie Disparate Impact case “. . . established when: (1) plaintiff identifies a specific employment practice to be challenged; and (2) through relevant statistical analysis proves that the challenged practice has an adverse impact on a protected group.” Isabel v. City of Memphis, 404 F.3d 404, 411 (6th Cir.2005). Prima Facie If the plaintiff meets this burden, the employer must show that the protocol in question has “a manifest relationship to the employment”-the so-called “business justification.” Griggs, 401 U.S. at 432, 91 S.Ct. 849. Prima Facie If the employer succeeds, the plaintiff must then show that other tests or selection protocols would serve the employer's interest without creating the undesirable discriminatory effect. “An employer cannot be held liable for disparate impact if a legitimate business policy results in workforce disparities.” Bacon v. Honda of America Mfg., Inc., 370 F.3d 565, 579 (6th Cir.2004) Good Example of a Bad Example of “Business Justification:” So we went in that little room there, and we looked at one another, and we knew we were playing with fire. We had all these pressures.... You knew you were putting both feet, both hands, in the middle of a philosophic war, a media war, a racial war ... Finally somebody said, well, what can we take to the people? At that point we forgot the university. We forgot everybody.... What kind of argument we can make that the people gon buy? And some soul in there said, well could we make the argument that the teachers ought to be smarter than half the students. And we looked around. We said, them old boys down there in Letohatchee will buy that. Everybody will buy it. We were all Alabamians. We all good old boys. We said, we can sell that. Folks in Lowndes County will buy it. Folks up in Wilburn will buy it. Even sophisticates up there in them Birmingham Newspapers, that'll make sense that the teacher ought to be as smart as at least half the students she's teaching. So [one of the steering committee members] was commissioned to go to his office and find out what the average ACT was for graduates, came back and said, I believe it's 16.4. So our big decision was whether to go to 17 or 16. And the only argument I think I recall them arguing for 16. Then we could go back out and say, looka here. Of course, this is also a fallacious argument because the student-the teacher never is as smart as half the students.... [But] that was the scientific basis of it gentlemen and lady. It was just that scientific. Groves v. Alabama State Board of Education, 776 F.Supp. 1518, 1530 (M.D.Ala. 1971). Prior Litigations Sharif by Salahuddin v. New York State Education Department, 709 F.Supp. 345 (S.D.N.Y. 1989) Fields v. Hallsville Independent School District, 906 F.2d 1017 (5th Cir. 1990) Groves v. Alabama State Board of Education, 776 F.Supp. 1518, 1530 (M.D.Ala. 1991) Association of Mexican-American Educators v. State of California, 231 F.3d 572 (9th Cir. 2000) White v. Engler, 188 F.Supp.2d 730 (E.D.Michigan 2001) Summary Teacher testing is used to make inferences about future teacher performance Inferences make sense only in the context of a decision The decision of interest with teacher tests is whether an individual has a minimum level of academic proficiency and content knowledge Summary Cut scores are based on the relevance of test items to performance as a teacher in the appropriate content area, using a modified Angoff procedure Cut scores are recommended through the application of an agreed upon process but approved by the EPSB Board Summary A set of Decision Rules have been applied since May 1999. A recommended framework suggests that: Cut scores be – between the 15th & 25th percentiles greater than or equal to current cut scores comparable to SREB average cut scores Disparate impact be used as a possible indicator of program concerns Summary Tests that have been scientifically tailored and vetted to measure a legitimate skill set related to the certificate holder's duties will withstand judicial scrutiny. Questions?