Validity and Reliability
CSC426 – Values in Computer
• Types of Reliability and Validity
Introduction - Data
• Truth Primary Data Secondary Data
• Leak of information should be minimized
from level to level. We should keep the
validity and reliability of the data.
• Some say: Validity implies Reliability, but
not the opposite
• Reliability (or Precision) is how
dependable/reproducible/repeatable the data is.
How confident are we that it will generate the
same result when the experiment is repeated.
reliability is the consistency of a set of
measurements or measuring instrument.
• Validity (or Accuracy) is how close the measures
to the correct value (on the average). A valid
measure is one which is measuring what it is
supposed to measure.
• Reliability = Precision = Low Variance
• Validity = Accuracy = Low Bias
Types of Reliability
• Inter-Rater or Inter-Observer
Used to assess the degree to which
instruments give consistent estimates of
the same phenomenon.
• Test-Retest Reliability
Used to assess the consistency of a
measure from one time to another.
• Parallel-Forms Reliability
Used to assess the consistency of the
results of two tests constructed in the same
way from the same content domain.
• Internal Consistency Reliability
Used to assess the consistency of results
across items within a test.
Types of Validity
• Face Validity
– Looks like it will closely model the system.
– E.g., Heuristics
• Content Validity
– By investigating the internals, it seems to be correct.
– E.g., An approximation model for the system.
• Criterion Validity
– When the data should measure something that is defined by another
– E.g., How stable the system is =
# Crashes / (CPU Load * # processes)
• Construct Validity
– A measuring tool/model is not available. A hypothesis from the structure of
reality is built, or an effect of the phenomenon is measured.
– E.g., Human relations and behaviors studies.
• If you have three models/instruments to estimate
a parameter in an experiment (e.g.,
temperature), which one to chose?
– Mercury Thermometer?
– Alcohol Thermometer?
– Electric Thermometer?
• How to describe the differences between them?
• How to obtain best estimates using one (or
more) of them?
• You have a new tunable algorithm.
According to its parameters, it can be
completely reliable but not valid, or
completely valid but not reliable, or
anywhere in between.
• How can you use this in reporting your
research (e.g., in writing a paper)?
• Measuring the second:
– The unit used has to be defined as a part of
the experiment’s parameters.
– Time cannot be measured with any degree of
accuracy. We do not have a frame of
• A change in the definition of the second is very
probable and will never stop.
• Precision can reach 1 second in billions of years.
• What is the Lab’s temperature at noon in
– A single measure at noon in July 10th might have a
– Multiple Measures in the same day
– Parallel Measures in the same Day
– Multiple Measures in the different days
• Measuring in different days, will lower the
variance (increase precision) till it reach the
actual temperature variance.
• Getting a better (more accurate) thermometer
will enhance the accuracy. Also, can be
achieved by using more similar thermometers (if
error is not systematic = error in thermometer is