Skip to content

zhangchaodesign/evaluaid

Repository files navigation

EvaluAId: Human-AI Collaborative Evaluation of Open-Ended Student Essays

Link to CHI 2026 Paper

Open-ended writing assignments are central to higher education, yet heterogeneous submissions and scale make evaluation difficult. Automated writing evaluation (AWE) promises speed but often trades away transparency and sidelines human judgment. This paper repositions the AI as an on-demand collaborator that can provide specific, targeted support. In a formative study, we expose leverage points in three cognitive dimensions: evidence identification, comparative judgment, and feedback composition. Guided by these insights, we build EvaluAId, which supports interactive rubric-content mapping, adaptive benchmarking and self-calibration, and personalized, rubric-aligned feedback synthesis. Through a within-subjects study with 12 TAs, we evaluate how this approach supports grading compared with a rubric+LLM chatbot and an LLM-based AWE; EvaluAId improved alignment with expert ratings and increased graders’ satisfaction. Finally, interviews with TAs, instructors, and students underscored the value of thoughtfulness supported by EvaluAId while surfacing practical considerations for integration into classroom. Together, our results argue for deliberate, evidence-first, human-in-the-loop evaluation.

Setup

First, run the development server:

npm run dev
# or
yarn dev
# or
pnpm dev

Open http://localhost:3000 with your browser to see the result.

CHI 2026 Paper

EvaluAId: Human-AI Collaborative Evaluation of Open-Ended Student Essays
Chao Zhang, Kexin Ju, Xinyi Lu, Yu-Chun Grace Yen, and Jeffrey M. Rzeszotarski

Please cite this paper if you used the code or prompts in this repository.

Chao Zhang, Kexin Ju, Xinyi Lu, Yu-Chun Grace Yen, and Jeffrey M. Rzeszotarski. 2026. EvaluAId: Human-AI Collaborative Evaluation of Open-Ended Student Essays. In CHI Conference on Human Factors in Computing Systems (CHI '26), April 26-May 1, 2026, Yokohama, Japan. ACM, New York, NY, USA, 28 pages. https://doi.org/10.1145/3772318.3790814

@inproceedings{10.1145/3772318.3790814,
  author = {Zhang, Chao and Phyllis Ju, Kexin and Lu, Xinyi and Grace Yen, Yu-Chun and M. Rzeszotarski, Jeffrey},
  title = {EvaluAId: Human-AI Collaborative Evaluation of Open-Ended Student Essays},
  year = {2026},
  isbn = {9798400722783},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi-org.proxy.library.cornell.edu/10.1145/3772318.3790814},
  doi = {10.1145/3772318.3790814},
  abstract = {Open-ended writing assignments are central to higher education, yet heterogeneous submissions and scale make evaluation difficult. Automated writing evaluation (AWE) promises speed but often trades away transparency and sidelines human judgment. This paper repositions the AI as an on-demand collaborator that can provide specific, targeted support. In a formative study, we expose leverage points in three cognitive dimensions: evidence identification, comparative judgment, and feedback composition. Guided by these insights, we build EvaluAId, which supports interactive rubric-content mapping, adaptive benchmarking and self-calibration, and personalized, rubric-aligned feedback synthesis. Through a within-subjects study with 12 TAs, we evaluate how this approach supports grading compared with a rubric+LLM chatbot and an LLM-based AWE; EvaluAId improved alignment with expert ratings and increased graders’ satisfaction. Finally, interviews with TAs, instructors, and students underscored the value of thoughtfulness supported by EvaluAId while surfacing practical considerations for integration into classroom. Together, our results argue for deliberate, evidence-first, human-in-the-loop evaluation.},
  booktitle = {Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems},
  articleno = {867},
  numpages = {20},
  keywords = {Writing evaluation, student essays, human-AI collaboration},
  location = {
  },
  series = {CHI '26}
}

Acknowledgements

We sincerely thank all TAs, instructors, and students who participated in our study for generously sharing their insights. We also thank the reviewers for their valuable comments and suggestions.

About

The repo open-sources the prototype from the CHI' 26 paper "EvaluAId: Human-AI Collaborative Evaluation of Open-Ended Student Essays."

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages