Psychometrics

Psychometric Consequences

A result of a classical test
theory (CTT) analysis might be that a particular item was not psychometrically
fit (say, because its discrimination parameter was negative). If preliminary
scores have not been released, then any decision by the instructor regarding
the use of the item in the overall scores will likely have little apparent impact
to the student. In fact, a student’s response might be that the instructor was
wise to weed out "tricky" items to make the overall scores more
reliable and valid. However, if scores were released prior to psychometric
analysis, a student might be surprised at their revised score. Let’s look at an
example to illustrate the point.

Heuristic
Example

Say an instructor has a 10 item
instrument, with items Q1 through Q10. Amanda gets all of the items correct, so
she receives 10/10 = 100%. Babak misses item Q1, so he receives 9/10 = 90%.
Chanda misses items Q2, so she also receives 9/10 = 90%. Note these are
preliminary scores because the instructor has not yet performed a psychometric
analysis.

Assume Q1 is deemed
psychometrically unfit. The instructor has two options: (1) toss out the item,
or (2) give everyone credit for the item. Let’s see how each of these decisions
plays out with our three students.

For Amanda, if the item is
tossed out, she still has a perfect 9/9 = 100%. Giving her credit on an item
for which she has already received credit would not change her 10/10 = 100%.

For Babak, if the item is
tossed out, he now has 9/9 = 100% because Q1 was the item he missed. If he is
given credit for the missed item, he would earn 10/10 = 100%.

For Chanda, if the item is
tossed out, she now has 8/9 = 88.9% because Q1 was not the item she missed.
Giving her credit on an item for which she has already received credit would
not change her 9/10 = 90% score.

Discussion

Now, let’s consider the
consequences. If the instructor released the preliminary scores, and then the
revised scores based on the "toss out" decision, Amanda would stay at
100%, Babak would increase from 90% to 100%, and Chanda would decrease from 90%
to 88.9%. Babak has reason to be pleased, but Chanda might think the
instructor’s decision was unfair because she fell below the 90% criterion that
often determines the difference between assignment of a grade of A or a grade
of B.

If the instructor releases the
revised scores based on the "give everyone credit" decision, Amanda
would stay at 100%, Babak would increase from 90% to 100%, and Chanda would
stay at 90%.  Again, Babak has reason to be pleased with the instructor’s
decision, but Chanda might think the instructor’s decision was unfair because
his score increased but her score stayed the same.

Conclusion

Tossing out psychometrically
unfit items may negatively impact the scores of some students, whereas giving
everyone credit may artificially elevate the scores of other students. Neither
decision is “correct.” Nevertheless, instructors should carefully weigh their
decision to release preliminary scores, as well as what to do after analyzing
the items.

More importantly, any impact is
reduced as the number of items increases (left as an exercise for the reader).
This property along with an increase in internal consistency (e.g., as measured
by Chronbach’s alpha) with an increase in items should counteract complaints
from students that a test is too long.

Psychometric
References

Crocker, L., & Algina, J. (1986). Introduction
to classical and modern test theory
. Belmont, CA: Wadsworth.

De Ayala, R. J. (2009). The theory and
practice of item response theory
. New York: Guilford Press.

Joint Committee on Standards for Educational and
Psychological Testing of the American Educational Research Association, the
American Psychological Association, and the National Council on
Measurement in Education. (1999). Standards for educational and psychological
testing
. Washington, DC: American Educational Research Association.

Lord, F. (1983). Unbiased estimators of ability
parameters, of their variance, and of their parallel-forms reliability. Psychometrika,
48
, 233-246.

Resources

Classical item analysis at the Instructional
Assessment Resources (IAR).

Software

Iteman is stand-alone software designed to
provide detailed item and test analysis reports using classical test theory
(CTT). $

Lertap is
an Excel-based CTT item, test, and survey analysis application. 

Xcalibre is stand-alone software for item
response theory (IRT) analysis of assessment data.