Both sides of auto-grading argument miss the point

A recent story in the New York Times covers a software program by nonprofit EdX that will soon be available for free to any institution that wants to use it. Using sophisticated machine learning algorithms to train its artificial intelligence, the software will grade essays and short response questions and provide nearly instant feedback. Naturally there are strong supporters for the new software touting it for “freeing professors for other tasks” (like what?). And just as naturally there are strong critics who have formed a group called Professors Against Machine Scoring Essays in High-Stakes Assessment. From the group’s petition:

Let’s face the realities of automatic essay scoring. Computers cannot “read.” They cannot measure the essentials of effective written communication: accuracy, reasoning, adequacy of evidence, good sense, ethical stance, convincing argument, meaningful organization, clarity, and veracity, among others.

While criticism is certainly warranted, I find the quote to be somewhat bullish. Can these people really claim that they understand how they are able to read and measure the essentials of effective written communication well enough that they can look at a computer and say with confidence, “that can not do what I am doing, and here’s why”? It very well may be that current AI programs do not have the ability to comprehend written communication to a degree necessary to assign grades, but to make the argument that the software shouldn’t be used because “computers cannot ‘read'”, as if that were a self-evident fact is just poor communication.

Now to be fair, I disagree with the supporters of the software as well.

“There is a huge value in learning with instant feedback,” Dr. Agarwal said. “Students are telling us they learn much better with instant feedback.”

Ok, well, not that part, I agree with that part in principle. But what kind of feedback? Supposedly the software can generate a grade and also comments whether or not the essay was “on topic”. So a student could get instant feedback, which is great, and then edit and modify, which is great, and resubmit, which is also great… and then what? What would they be learning?

I promise to be highly skeptical of any answer to that question that isn’t “how to write an essay that receives high marks from an automatic grading AI”.

All this talk about feedback. What about feedback for the professor? I find reading through 60 essays just as tedious and time consuming as the next out-of-place grad student in a department that doesn’t value teaching, but I also recognize that reading those essays is a valuable way for me to gauge how I’m doing. Are the concepts that I think are important showing up? Are there any major communication issues? What about individuals, are some struggling, what can I do to help? How will I learn my students’ personalities and how that might affect their personal engagement with the material? How will I learn to be a better educator?

Granted, even though 60 feels overwhelming, it’s nowhere near 200 or more. I can’t even imagine trying to read through that many assignments myself. I’m confident that if I were force to I would not emerge with my sanity intact. This problem does not go unaddressed.

With increasingly large classes, it is impossible for most teachers to give students meaningful feedback on writing assignments, he said. Plus, he noted, critics of the technology have tended to come from the nation’s best universities, where the level of pedagogy is much better than at most schools.

“Often they come from very prestigious institutions where, in fact, they do a much better job of providing feedback than a machine ever could,” Dr. Shermis said. “There seems to be a lack of appreciation of what is actually going on in the real world.”

An “A” for recognizing the problem. But the proposed solution is nothing more than a patch. In fact, it’s worse, because it is a tool that will enable the continual ballooning of class size. And to what expense? Why don’t you rethink your solution and have it on my desk in the morning. I can’t promise instant feedback, but maybe, just maybe, the feedback provided will be the start to moving in a direction that actually addresses the underlying problems, rather than just using technology to hide them.

How will we build a Third System of education?

I have recently been reading about, as Mike Gancarz puts it in Linux and the Unix Philosophy, “The Three Systems of Man”. This is, to my understanding, a fairly well documented and often-observed concept in software design, possibly first referenced by Frederick Brooks in The Mythical Man-Month when he coined “the second system effect“. Gancarz seems to take the concept further, generalizing it to any system built by humans.

Man has the capacity to build only three systems. No mater how hard he may try, no matter how many hours, months, or years for which he may struggle, he eventually realizes that he is incapable of anything more. He simply cannot build a fourth. To believe otherwise is self-delusion.

The First System

Fueled by need, constricted by deadlines, a first system is born out of a creative spark. It’s quick, often dirty, but gets the job done well. Importantly it inspires others with the possibilities it opens up. The “what if”s elicited by a First System lead to…

The Second System

Encouraged and inspired by the success of the First System more people want to get on bored and offer their own contributions and add features they deem necessary. Committees are formed to organize and delegate. Everyone offers their expertise and everyone believes they have expertise, even when they don’t. The Second System has a marketing team devoted to selling its many features to eagerly awaiting customers, and to appeal to the widest possible customer base nearly any feature that is thought up is added. In reality, most users end up only using a small fraction of available features of The Second System, the rest just get in the way. Despite enjoying commercial success The Second System is usually the worse of the three. By trying to appease everyone (and more often then not, by not understanding , the committees in charge have created a mediocre experience. The unnecessary features add so much complexity that bugs are many and fixes take a considerable amount of effort. After some time, some users (and developers) start to recognize The Second System for what it is: bloatware.

The Third System

The Third System is built by people who have been burned by the Second System

Eventually enough people grow frustrated by the inefficiencies and bloat of The Second System that they rebel against it. They set out to create a new system that contains the essential features and lessons learned in the First and Second Systems, but leave out the crud that accumulated by the Second System. The construction of a Third System comes about either as a result of observed need, or as an act of rebellion against the Second System. Third Systems challenge the status quo set by Second Systems, and as such there is a natural tendency to those invested in The Second System to criticize, distrust and fear The Third System and those who advocate for it.

The Interesting History of Unix

Progression from First to Second to Third system always happens in that order, but sometimes a Third System can reset back to First, as is the case with Unix. While Gancarz argues that current commercial Unix is a Second System, the original Unix created by a handful of people at Bell Labs was a Third System. It grew out of the Multics project which was the Second System solution spun from the excitement of the Compatible Time-Sharing System (CTSS), arguably the first timesharing system ever deployed. Multics suffered so much from second-system syndrome that it collapsed under its own weight.

Linux is both a Third and Second system: while it shares many properties of commercial Unix that are Second System-like, it is under active development by people who came aboard as rebels of Unix and who put every effort into eliminating the Second System cruft associated with its commercial cousin.

Is our current Educational Complex a Second System?

I see many signs of second-system effect in our current educational system. Designed and controlled by committee, constructed to meed the needs of a large audience while failing to meet the individual needs of many (most?). Solutions to visible problems are also determined by committee and patches to solutions serve to cover up symptoms. Addressing the underlying causes would require asking some very difficult questions about the nature of the system itself. Something that those invested in it are not rushing to do.

Building a Third System

What would a Linux-esq approach to education look like? What are the bits that we would like to keep? What are the ugliest pieces that should be discarded first? And how will we weave it all together into a functional, useful system?