This repository contains the code for our ACL 2025 paper: TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes.
Authors: Raj Sanjay Shah, Lei Xu, Qianchu Liu, Jon Burnsky, Drew Bertagnolli, Chaitanya Shivade
TN-Eval provides tools for generating behavioral therapy notes using large language models (LLMs) and evaluating them via automatic, rubric-based protocols.
Download Data
Download AnnoMI data from https://github.com/uccollab/AnnoMI/raw/refs/heads/main/AnnoMI-full.csv and save it as data/AnnoMI-full.csv
.
Generate Notes
python3 src/generate_soap_note.py --input data/AnnoMi-full.csv --output data/llm_notes/
Run Automatic Evaluations
python3 src/run_metrics_reference_free.py \
--note data/llm_notes/outputs_annomi_llama31_70B_high.json \
--output data/llm_notes/utputs_annomi_llama31_70B_high_with_eval.json
You can find all data artifacts in our companion repository: TN-Eval-Data.
This includes:
- Human-written therapy notes
- Human evaluations of human notes and LLM-generated notes
- Automatic evaluations using LLaMA and Mistral models
If you use our data, please cite
@inproceedings{shah2025tneval,
title={TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes},
author={Shah, Raj Sanjay and Xu, Lei and Liu, Qianchu and Burnsky, Jon and Bertagnolli, Drew and Shivade, Chaitanya},
booktitle={Proceedings of the 63nd Annual Meeting of the Association for Computational Linguistics: Industry Track},
year={2025}
}
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.