Skip to content

Xiaochr/LLM-AES

Repository files navigation

From Automation to Augmentation: Large Language Models Elevating the Essay Scoring Landscape

This repository contains the codes for the paper From Automation to Augmentation: Large Language Models Elevating the Essay Scoring Landscape. Dataset and other resources will be released after the anonymous period.

Overview

In this study, we investigate the effectiveness of LLMs, specifically GPT-4 and fine-tuned GPT-3.5, as tools for Automated Essay Scoring (AES). Our comprehensive set of experiments, conducted on both public and private datasets, highlights the remarkable advantages of LLM-based AES systems. They include superior accuracy, consistency, generalizability, and interpretability, with fine-tuned GPT-3.5 surpassing traditional grading models.

Additionally, we undertake LLM-assisted human evaluation experiments involving both novice and expert graders. One pivotal discovery is that LLMs not only automate the grading process but also enhance the performance of human graders. Novice graders when provided with feedback generated by LLMs, achieve a level of accuracy on par with experts, while experts become more efficient and maintain greater consistency in their assessments.

Contributions:

  • We pioneer the exploration of LLMs' capabilities as AES systems, especially in intricate scenarios with tailored grading criteria.

  • We introduce a substantial essay-scoring dataset, comprising 6,559 essays written by Chinese high school students, along with multi-dimensional scores provided by expert educators.

  • Our findings of the LLM-assisted human evaluation experiments underscore the potential of LLM-generated feedback to elevate the capabilities of individuals with limited domain knowledge to a level comparable to experts.

About

[arXiv] From Automation to Augmentation: Large Language Models Elevating the Essay Scoring Landscape

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages