From Automation to Augmentation: Large Language Models Elevating the Essay Scoring Landscape

This repository contains the codes for the paper From Automation to Augmentation: Large Language Models Elevating the Essay Scoring Landscape. Dataset and other resources will be released after the anonymous period.

Overview

In this study, we investigate the effectiveness of LLMs, specifically GPT-4 and fine-tuned GPT-3.5, as tools for Automated Essay Scoring (AES). Our comprehensive set of experiments, conducted on both public and private datasets, highlights the remarkable advantages of LLM-based AES systems. They include superior accuracy, consistency, generalizability, and interpretability, with fine-tuned GPT-3.5 surpassing traditional grading models.

Additionally, we undertake LLM-assisted human evaluation experiments involving both novice and expert graders. One pivotal discovery is that LLMs not only automate the grading process but also enhance the performance of human graders. Novice graders when provided with feedback generated by LLMs, achieve a level of accuracy on par with experts, while experts become more efficient and maintain greater consistency in their assessments.

Contributions:

We pioneer the exploration of LLMs' capabilities as AES systems, especially in intricate scenarios with tailored grading criteria.
We introduce a substantial essay-scoring dataset, comprising 6,559 essays written by Chinese high school students, along with multi-dimensional scores provided by expert educators.
Our findings of the LLM-assisted human evaluation experiments underscore the potential of LLM-generated feedback to elevate the capabilities of individuals with limited domain knowledge to a level comparable to experts.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
materials		materials
BERT_baseline.py		BERT_baseline.py
README.md		README.md
dataset_info.py		dataset_info.py
finetune_all_exam_data.py		finetune_all_exam_data.py
gpt_score_asap1.py		gpt_score_asap1.py
gpt_score_student.py		gpt_score_student.py
retrieval.py		retrieval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

materials

materials

BERT_baseline.py

BERT_baseline.py

README.md

README.md

dataset_info.py

dataset_info.py

finetune_all_exam_data.py

finetune_all_exam_data.py

gpt_score_asap1.py

gpt_score_asap1.py

gpt_score_student.py

gpt_score_student.py

retrieval.py

retrieval.py

Repository files navigation

From Automation to Augmentation: Large Language Models Elevating the Essay Scoring Landscape

Overview

About

Releases

Packages

Languages

Xiaochr/LLM-AES

Folders and files

Latest commit

History

Repository files navigation

From Automation to Augmentation: Large Language Models Elevating the Essay Scoring Landscape

Overview

About

Resources

Stars

Watchers

Forks

Languages