I recently received this question about standard setting from a certification director:
We’re revising our certification with some new items, a few revised items, and a large number of existing items with data. For the revised and existing items, should we give the Angoff judges the p-values of the items?
What Data Should Judges Get?
The answer is a qualified ‘no’.
Think of it this way.
Suppose you have a cut score of 70 on your existing test. And you’d like to give your Angoff judges the p-values of revised & existing items. Why not?
The answer is because for an Angoff rating, you’re asking the judges to give their percent-correct estimate for minimally qualified candidates.
But the p-value data from your test is from all candidates—including failing candidates and highly qualified candidates, not just minimally qualified candidates.
If you want to give your Angoff judges helpful data, give them the item percent-correct data for only the subset of candidates who scored between 70 and 75 on the test. These are justifiably seen as minimally qualified candidates.
What Should Judges Review?
The recommended Angoff procedure is for the judges to take the items themselves and reflect on the difficulty they anticipate minimally qualified candidates will have in taking the items.
But there is no recommended process for presenting judges the items. The items should have choices in the order in which they are presented to candidates, with no indication of which choice is correct.
If the item choices are randomized in the test, then randomize the items as presented to judges.
Some item banking systems have the first choice as the item key, and it’s convenient to send items listed this way to the judges. But the choices should be randomized so the judges aren’t tipped off to the correct answer.