Assessment Reliability and Validity

Reliability determines how consistently a measurement of skill or knowledge yields similar results under varying conditions. If a measure has high reliability, it yields consistent results. There are four principal ways to estimate the reliability of a measure:

  1. Inter-observer: Is determined by the extent to which different observers or evaluators examine the same presentation, demonstration, project, paper, or other performance and agree on the overall rating on one or more dimensions.
  2. Test-retest: Is determined by the extent to which the same test items or kind of performance evaluated at two different times yields similar results.
  3. Parallel-forms: Is determined by examining the extent to which two different measurements of knowledge or skill yield comparable results.
  4. Split-half reliability: Is determined by comparing half of a set of test items with the other half and determining the extent to which they yield similar results.

Validity or face validity is defined as the degree to which the instrument measures what it’s supposed to measure. If an instrument is not reliable over time, it cannot be valid, as results can vary depending upon when it is administered. An instrument can be neither reliable nor valid, reliable and not valid or both reliable and valid. However, an instrument must be reliable in order to be valid.



Although three primary approaches to test validity are reported by Mason and Bramble (1989), Patton (2002) details the associated sub-categorical types of measurement validity:

  1. Content validity: Warrants that an overall sample of the content being measured is represented. Identification of the content must be accurately represented by the test items. A panel or grouping of content experts is typically consulted to identify a broad spectrum of content.
  2. Criterion Validity: Targets the accuracy of a measure itself. Examining criterion validity is demonstrated by comparing the selected measure with another valid measure.
    1. Predictive validity: Predicts a recognized association between the identified construct and something else. Typically, one measure occurs at an earlier time and is used to predict a later measure.
    2. Concurrent validity: Exists when the identified measure positively correlates with a measure that has been previously found to be valid. The two measures could be for the same or different constructs that are related.
  3. Construct validity: Ensures that the assessment measures the construct it claims to measure. Construct validity can be determined by demonstration of comparative test performance results (differential-groups study) or  or  pre and post-testing of implementation of the construct (intervention study). This type of validity can also show how the measure relates to other measures as a defined in the construct.
    1. Discriminant Validity: Illustrates that measures that should not be related are not. A lack of correlation is expected to establish discriminant validity.