Multiple-choice items are scored by machines, but open-ended items are scored by subjective humans who are prone to errors. I know because I was one of them. In 1994, I was a graduate student looking for part-time work. After a five-minute interview I got the job of scoring fourth-grade, state-wide reading comprehension tests. The for-profit testing company that hired me paid almost $8 an hour, not bad money for me at the time.Farley has written a tell-all book about his experiences in the standardized testing business. Making the Grades: My Misadventures in the Standardized Testing Industry, may be found online at Google books.
One of the tests I scored had students read a passage about bicycle safety. They were then instructed to draw a poster that illustrated a rule that was indicated in the text. We would award one point for a poster that included a correct rule and zero for a drawing that did not.
The first poster I saw was a drawing of a young cyclist, a helmet tightly attached to his head, flying his bike over a canal filled with flaming oil, his two arms waving wildly in the air. I stared at the response for minutes. Was this a picture of a helmet-wearing child who understood the basic rules of bike safety? Or was it meant to portray a youngster killing himself on two wheels?
I was not the only one who was confused. Soon several of my fellow scorers — pretty much people off the street, like me — were debating my poster, some positing that it clearly showed an understanding of bike safety while others argued that it most certainly did not. I realized then — an epiphany confirmed over a decade and a half of experience in the testing industry — that the score any student would earn mostly depended on which temporary employee viewed his response.
An excerpt published in Rethinking Schools contains these gems:
If test scoring is "scientifically-based research" of any kind, then I'm Dr. Frankenstein. In fact, my time in testing was characterized by overuse of temporary employees; scoring rules that were alternately ambiguous and bizarre; and testing companies so swamped with work and threatened with deadlines that getting any scores on tests appeared to be more important than the right scores. I'd say the process was laughable if not for the fact those test scores have become so important to the landscape of modern American education....How about a Congressional investigation of ETS?
...A project manager for a test-scoring company addresses the supervisors hired to manage the scoring of a project. The project is not producing the results expected, to the dismay of the test-scoring company and its client, a state department of education. The project manager has been trying to calm the concerned employees, but she's losing patience. She's obviously had enough.
"I don't care if the scores are right," the project manager snarls. "They want lower scores, and we'll give them lower scores."