Both sides of auto-grading argument miss the point

A recent story in the New York Times covers a software program by nonprofit EdX that will soon be available for free to any institution that wants to use it. Using sophisticated machine learning algorithms to train its artificial intelligence, the software will grade essays and short response questions and provide nearly instant feedback. Naturally there are strong supporters for the new software touting it for “freeing professors for other tasks” (like what?). And just as naturally there are strong critics who have formed a group called Professors Against Machine Scoring Essays in High-Stakes Assessment. From the group’s petition:

Let’s face the realities of automatic essay scoring. Computers cannot “read.” They cannot measure the essentials of effective written communication: accuracy, reasoning, adequacy of evidence, good sense, ethical stance, convincing argument, meaningful organization, clarity, and veracity, among others.

While criticism is certainly warranted, I find the quote to be somewhat bullish. Can these people really claim that they understand how they are able to read and measure the essentials of effective written communication well enough that they can look at a computer and say with confidence, “that can not do what I am doing, and here’s why”? It very well may be that current AI programs do not have the ability to comprehend written communication to a degree necessary to assign grades, but to make the argument that the software shouldn’t be used because “computers cannot ‘read'”, as if that were a self-evident fact is just poor communication.

Now to be fair, I disagree with the supporters of the software as well.

“There is a huge value in learning with instant feedback,” Dr. Agarwal said. “Students are telling us they learn much better with instant feedback.”

Ok, well, not that part, I agree with that part in principle. But what kind of feedback? Supposedly the software can generate a grade and also comments whether or not the essay was “on topic”. So a student could get instant feedback, which is great, and then edit and modify, which is great, and resubmit, which is also great… and then what? What would they be learning?

I promise to be highly skeptical of any answer to that question that isn’t “how to write an essay that receives high marks from an automatic grading AI”.

All this talk about feedback. What about feedback for the professor? I find reading through 60 essays just as tedious and time consuming as the next out-of-place grad student in a department that doesn’t value teaching, but I also recognize that reading those essays is a valuable way for me to gauge how I’m doing. Are the concepts that I think are important showing up? Are there any major communication issues? What about individuals, are some struggling, what can I do to help? How will I learn my students’ personalities and how that might affect their personal engagement with the material? How will I learn to be a better educator?

Granted, even though 60 feels overwhelming, it’s nowhere near 200 or more. I can’t even imagine trying to read through that many assignments myself. I’m confident that if I were force to I would not emerge with my sanity intact. This problem does not go unaddressed.

With increasingly large classes, it is impossible for most teachers to give students meaningful feedback on writing assignments, he said. Plus, he noted, critics of the technology have tended to come from the nation’s best universities, where the level of pedagogy is much better than at most schools.

“Often they come from very prestigious institutions where, in fact, they do a much better job of providing feedback than a machine ever could,” Dr. Shermis said. “There seems to be a lack of appreciation of what is actually going on in the real world.”

An “A” for recognizing the problem. But the proposed solution is nothing more than a patch. In fact, it’s worse, because it is a tool that will enable the continual ballooning of class size. And to what expense? Why don’t you rethink your solution and have it on my desk in the morning. I can’t promise instant feedback, but maybe, just maybe, the feedback provided will be the start to moving in a direction that actually addresses the underlying problems, rather than just using technology to hide them.

2 thoughts on “Both sides of auto-grading argument miss the point

  1. I don’t know that i agree with your objection to the PAMSEHSA quote. The second sentence makes pretty clear what is meant by the first: that “reading” as we understand it in pedagogy involves much more than scanning, but the leveraging of many comprehension, integration, and balancing skills. It seems like fine communication because it launches from the widely shared perception (which i’d taken to be true?) that such skills are understood to be well beyond the reach of contemporary AI. (You seem to confirm this with your later characterization of a grading AI as something that any college student could game.) Am i misreading you?

    I thoroughly agree with your reciprocal point. An embarrassingly large proportion of my education has been what i’ve learned about a familiar subject through tutoring people in it…and that’s just the subject matter.

  2. I disagree with the quote primarily because it *doesn’t* mention AI at all. They say *computers* cannot read, cannot measure the essentials of communication, etc. I find this argument rather weak and … you probably can think of the word more quickly than I can look it up. It is written as though they expect their readers to immediately see “computers cannot read” and go “well OF COURSE they can’t read, they’re just machines!” when in fact that is a false argument. Our brains are machines too: the reason current AI programs still can’t match the quality of reading comprehension of humans has more to do with our lack of understanding of how *we* read and comprehend rather than any inherit limitation of the technology itself. Yes, the fact that we don’t understand well enough how we comprehend written language is a valid argument to make in saying any AI we build won’t be as good as a human grader. That’s not the argument I see them making.

    But in the end, like my title says, I find the details of that argument somewhat distracting (though, naturally, I think AI reading comprehension is a really exciting area of research now and we should continue pursuing it as it will teach us more about ourselves).

    And I’m not sure I would say with confidence that students can’t and don’t game human graders. Especially when one human has 100 papers to read through, it’s going to be easy to miss those written as “this is what I think the grader want’s to see” rather than “this is my understanding of the topic”

Comments are closed.