Validation Studies _ Cut Scores Powerpoint - EPSB Work Session

Document Sample
Validation Studies _ Cut Scores Powerpoint - EPSB Work Session Powered By Docstoc

Validation Studies
    Cut Scores

    January 21, 2007
                       Purposes of
To meet statutory requirements and to select
individuals who have a minimum level of academic
proficiency & content knowledge to be presumed
capable of delivering education to children in the public

Testing partners with –

Admission requirements into teacher preparation programs,
including Praxis I tests, college admission exams, grade point
averages, and other academic proficiency assessments

Kentucky Teacher Internship Program (KTIP)

    What decision are we making?

    What purpose are we serving?

What’s the best way to make a decision?

       Two Types of Inferences

1. Select those with the highest
   levels of qualification

2. Select those who have minimum
            Minimum = low

Negative Consequences of increasing cut scores:

1. Reduce the number of qualified applicants for
2. Increase the number of emergency and conditional
3. Create teacher shortages
4. Reduce institutional QPI scores
5. Disparate impact

   In any case, the same people will be teaching.

Positive consequences of increasing cut scores:

1. Decrease the number of false positives

2. Perhaps marginally increase the quality of teaching

When teacher basic skills test scores have been
used as predictors of teacher performance, few
studies have shown any strong relationship,
and some studies have shown no relationship
at all.

Studies that have attempted to relate content
knowledge to teacher performance (most of
them confined to mathematics and science)
have shown modest results at best.

Standard Error of Measurement (SEM)

Methodologies for establishing cut scores
  –   Angoff
  –   Contrasting Groups
  –   Bookmark
  –   Yeager-Mills (Body of Work)
  –   Other
                 Angoff Method

• Results in determination of whether a test is valid
  for use in KY & provides a recommended cut score

• Most widely used

• Requires teacher judgments

• Held up in most research studies as the method that
  produces the most stable results

• Generally accepted by the courts & professionals in
  the field
                     Angoff Method

    Validation Process/Cut Score Recommendation

• A panel of teachers representative from across the state and
  from each grade level and content area are selected to review
  items on each test.

• Teachers are asked to estimate the proportion of persons with
  minimally acceptable skills in the content area who would be
  expected to get each item right. (At least 70% of the items
  must be judged job relevant in order for the test to be deemed
  a valid measure of performance).

• After all items have been rated, the judgments of the teachers
  are combined to recommend a cut score for the whole test.
                        Cut Score

   Based on Decision Rules applied since May 1999

  Accept the recommendation of the validation panel unless:

a. The recommendation fell below the current passing score, or
b. The recommendation fell below the Southern Regional
   Education Board (SREB) average, or
c. The recommendation and SREB score fell below the 15th
   national percentile, or
d. The recommendation exceeded the 25th national percentile
                   Tests Validated Since May 1999 & Corresponding Cut Scores

                                                                    Test  Recommended Regulatory
                              Test                                 Number   Cut Score  Cut Score
Earth Science: Content Knowledge                                    0571       145        145
Principles of Learning & Teaching: Grades K-6                       0522       164        161
Principles of Learning & Teaching: Grades 5-9                       0523       166        161
Principles of Learning & Teaching: Grades 7-12                      0524       158        161
Speech Communication                                                0220       570        580
Education of Exceptional Students: Core Content Knowledge           0353       157        157
Elementary Education: Content Knowledge                             0014       148        148
Education of Exceptional Students: Mild to Moderate Disabilities    0542       165        172
Education of Exceptional Students: Core Content Knowledge           0353       157        157
Biology: Content Knowledge                                          0235       154        146
Chemistry: Content Knowledge                                        0245       161        147
Physics: Content Knowledge                                          0265       145        133

     Evidence must support that each test chosen is:

#1: valid for the purpose for which it is used

#2: anchored in reasonable expectations of job performance

#3: a reliable measure

#4: does not unfairly disadvantage members of demographic

 The Educational Testing Service (ETS)
  employs many psychometricians and
conducts many test studies that influence
   test development, maintenance, and
  revision, including bias reviews, DIF
  analysis, reliability coefficients, and
          calculation of p-values.

• Cut scores between the 15th – 25th
  percentiles, inclusive
• Greater than or equal to current cut
• Comparable to SREB average cut score
• Use disparate impact estimates as
  indicators of possible program
  performance reviews, combined with
  other information

                 14th Amendment
      Due Process and Equal Protection Clause
                            Section 1
 “. . . nor shall any State deprive any person of life,
liberty, or property, without due process of law; nor
deny to any person within its jurisdiction the equal
                 protection of the laws.”

                          Section 5
 The Congress shall have the power to enforce, by
appropriate legislation, the provisions of this article.

       Civil Rights Act of 1964

     Title VII prohibits discrimination
 in employment on the basis of race,
color, religion, national origin, or sex.

     Title VI prohibits discrimination
   in federally funded programs or
activities on the basis of race, color, or
            national origin.

  Testing, in and of itself, is
usually only determined to be
   discriminatory if it has a
    disparate impact on a
        protected class.
            Prima Facie

          Disparate Impact case
“. . . established when: (1) plaintiff
identifies a specific employment
practice to be challenged; and (2)
through relevant statistical analysis
proves that the challenged practice
has an adverse impact on a protected
group.” Isabel v. City of Memphis,
404 F.3d 404, 411 (6th Cir.2005).
            Prima Facie

If the plaintiff meets this burden, the
employer must show that the
protocol in question has “a manifest
relationship to the employment”-the
so-called “business justification.”
Griggs, 401 U.S. at 432, 91 S.Ct. 849.
             Prima Facie

 If the employer succeeds, the plaintiff
must then show that other tests or
selection protocols would serve the
employer's interest without creating the
undesirable discriminatory effect.
“An employer cannot be held liable for
disparate impact if a legitimate business
policy results in workforce disparities.”
Bacon v. Honda of America Mfg., Inc., 370
F.3d 565, 579 (6th Cir.2004)
           Good Example of a Bad Example of “Business Justification:”
So we went in that little room there, and we looked at one another, and we knew we
were playing with fire. We had all these pressures.... You knew you were putting
both feet, both hands, in the middle of a philosophic war, a media war, a racial war
Finally somebody said, well, what can we take to the people? At that point we
forgot the university. We forgot everybody.... What kind of argument we can make
that the people gon buy? And some soul in there said, well could we make the
argument that the teachers ought to be smarter than half the students. And we
looked around. We said, them old boys down there in Letohatchee will buy that.
Everybody will buy it. We were all Alabamians. We all good old boys.
We said, we can sell that. Folks in Lowndes County will buy it. Folks up in
Wilburn will buy it. Even sophisticates up there in them Birmingham Newspapers,
that'll make sense that the teacher ought to be as smart as at least half the students
she's teaching.
So [one of the steering committee members] was commissioned to go to his office
and find out what the average ACT was for graduates, came back and said, I
believe it's 16.4. So our big decision was whether to go to 17 or 16. And the only
argument I think I recall them arguing for 16. Then we could go back out and say,
looka here. Of course, this is also a fallacious argument because the student-the
teacher never is as smart as half the students.... [But] that was the scientific basis
of it gentlemen and lady. It was just that scientific.
Groves v. Alabama State Board of Education, 776 F.Supp. 1518, 1530 (M.D.Ala.
Sharif by Salahuddin v. New York State Education Department, 709
F.Supp. 345 (S.D.N.Y. 1989)

Fields v. Hallsville Independent School District, 906 F.2d 1017 (5th
Cir. 1990)

Groves v. Alabama State Board of Education, 776 F.Supp. 1518, 1530
(M.D.Ala. 1991)

Association of Mexican-American Educators v. State of California,
231 F.3d 572 (9th Cir. 2000)

White v. Engler, 188 F.Supp.2d 730 (E.D.Michigan 2001)

Teacher testing is used to make inferences
    about future teacher performance

 Inferences make sense only in the context
              of a decision

The decision of interest with teacher tests is
 whether an individual has a minimum level
   of academic proficiency and content

Cut scores are based on the relevance of test
  items to performance as a teacher in the
 appropriate content area, using a modified
             Angoff procedure

 Cut scores are recommended through the
 application of an agreed upon process but
       approved by the EPSB Board

A set of Decision Rules have been applied since
                   May 1999.

A recommended framework suggests that:

 Cut scores be –
     between the 15th & 25th percentiles
     greater than or equal to current cut scores
     comparable to SREB average cut scores
 Disparate impact be used as a possible
  indicator of program concerns

   Tests that have been scientifically
    tailored and vetted to measure a
    legitimate skill set related to the
certificate holder's duties will withstand
             judicial scrutiny.