Unit 7: Validity and Reliability Slides
Validity
Refers to the appropriateness and meaningfulness of the inferences we make from assessment results
Categories of Validity
Content-Related - How adequately does the sample of assessment items represent the task domain to be measured? Criterion-Related - How accurately does performance on the assessment predict future performance or present performance on an alternative measure
Categories of Validity (continued)
Construct-Related - How well can performance on the assessment be explained in terms of a psychological construct? Consequences of Using - How well did use of the assessment serve the intended purpose (e.g., improve performance) and avoid adverse effects (e.g., poor study habits)
Enhancing Content-Related Validity
Methods:
– 1. Identify the general and specific objectives to be assessed – 2. Construct an outline of subject matter – 3. Construct a table of specifications to ensure adequate sampling of items – 4. Provide clear assessment directions and proper administration & scoring procedures
Establishing Criterion-Related Validity
Methods:
– Predictive Study (e.g., predicting college GPA based on ACT) – Concurrent Study (e.g., comparing validity of ACT with the SAT)
Correlation (Validity) Coefficient
1.00 .65 to .99 .30 to .64 .01 to .29 0.00 -.01 to -.29 -.30 to -.64 -.65 to -.99 -1.00
= = = = = = = = =
Perfect positive relationship Strong positive relationship Moderate positive relationship Weak positive relationship No relationship Weak negative relationship Moderate negative relationship Strong negative relationship Perfect negative relationship
Construct-Related Validity
What is a construct?
– A construct is an individual characteristic that we assume exists in order to explain some aspect of behavior
Therefore,
– Construct-related validity refers to the extent to which an instrument measures a construct
Establishing Construct-Related Validity
Convergent Validity - Calculate a correlation coefficient of relationship between two instruments that measure the same construct (e.g., Stanford-Binet and WAIS) Discriminant Validity - Calculate a correlation coefficient between two instruments that measure opposing constructs (e.g., Nach and Naff)
Consequences of Using Assessment Results
Did use of Assessment:
– – – – – – – Improve motivation? Improve performance? Improve self-assessment skills? Contribute to transfer of learning? Encourage independent learning? Encourage good study habits? Contribute to a positive attitude?
What is Reliability?
Refers to the consistency of assessment results. . .
– Would we obtain similar results if we used a different version of a test? – Would we obtain similar results if we used the same assessment at a later time? – If a performance assessment is rated by different observers, would they rate performance the same?
Definitions
Raw Score - the score a student obtains on a test Systematic Errors - external influences that raise or lower ALL the raw scores of students Random Errors - internal influences that raise or lower the INDIVIDUAL raw scores of students Standard Error of Measurement - a measure of the influence of random errors. Reliability Coefficient - a correlation coefficient that indicates relationship between two sets of measurement obtained from the same procedure
Methods of Calculating Reliability Coefficients
Test-Retest Method Equivalent Forms Method Test-Retest with Equivalent Forms Method Internal Consistency Method
Internal Consistency Method
Procedure:
– 1. Correlate the odd item scores on the test with the even item scores on the test – 2. Calculate the Spearman-Brown reliability coefficient
Standard Error of Measurement
Purpose:
– Used to calculate an estimated range (confidence band) of a person’s score if he or she were to take the test again and again.
Calculating Standard Error of Measurement
1. Calculate the standard deviation of a set of raw scores 2. Calculate the split-half correlation coefficient 3. Calculate the Spearman-Brown reliability coefficient 4. Calculate the confidence band of a person’s score if taken again
Reliability of Criterion-Reference Tests
Question:
– How consistently does the test classify masters and non-masters? In other words, if we gave an equivalent test to the students again, would the same students be identified as having mastered the material?
Calculating Reliability of Criterion-Reference Tests
Steps:
– 1. Construct a two-by-two table with Form A as the rows and Form B as the columns – 2. Place the number of students who mastered both forms in the upper right cell, those who mastered only Form A in upper left cell; etc. – 3. Compute the percentage of consistency
Calculating the Reliability of Performance Assessment
Procedure:
– 1. Two or more judges rate the performance of students completing a task – 2. Construct a table consisting of the scales and the number of students receiving each rating by each judge – 3. Calculate the percentage of inter-rater agreement