AI in Teaching: Grading

Grading could be a tedious task since it requires quite some time and attention. Especially with a high number of students in a class, high repetition is needed. Also, grading all in one time is always more consistent especially when setting grade penalties for various types of errors. This is a process with a high number of repetitions, and there are a lot of things that could be automatized, such as certain types of checks in an assignment. When I was grading programming assignments of students in a class with 150 students, I coded my own test in Matlab that opens and runs the codes sequentially and also gives me some time to eyeball the code. It was a lifesaver – saved a tremendous amount of time since I was grading 5 assignments per month. Another aspect is attention. While grading, sometimes I notice that I miss looking at certain parts even when I have a rubric. Obviously, canvas solves this problem to a certain extent as long as we define detailed rubrics, but still, detailed rubrics could be challenging to go over 100+ times. And lastly, as researchers, we need to allocate our time efficiently and balance our time and effort for teaching and research. Therefore, I think grading is a challenging task, especially in higher ed institutions.

One possible solution I consider to this problem is the automation of the grading process – at least, the parts that a computer can do. With the help of AI, what a computer can do is almost limitless, and yes, there are platforms for grading since the early 1990s. The grading task I would like to cover is essay grading, aka Automated Essay Scoring (AES) by Wikipedia. There are different applications that I will mention, but without loss of generality, with AI, spellcheck, grammar check, and lexicon check is made, and a grade is determined. One application of this is e-rater application of ETS which is used in the Analytical Writing section in GRE. In GRE, grading of that section highly depends on grammar usage, lexical complexity, length and argument development. E-rater evaluates the essay based on all these different items using natural language processing and comes up with a grade. If human grader agrees with e-rater grade, e-rater grade is used. Otherwise, a second human grader grades the essay and the average of two graders is taken. With this way, the situations that e-rater might be unreliable are solved and human grader’s grade is used. At the same time, human grader’s possible bias is eliminated by an e-rater check which also decreases human work by up to 50%. There are also other applications, such as Intellimetric (this one even gives personalized feedback to students!), Project Essay Grade, and PaperRater (the last one is free, in case you are interested!). While some argue that these algorithms cannot catch everything that a human brain can catch, some find these more reliable than human graders. There is an ongoing stream of research in this area. In my opinion, there is probably a lot of room for improvement in grading with AI, but as the tools and technology improve, we get to use these a lot more in the future.

Link to source