Code of paper "Language Models can Evaluate Themselves via Probability Discrepancy"
We introduce ProbDiff, a self-evaluation technique applicable to any LLM across tasks.
Given a query
Framework Versions:
python=3.10
torch=2.1.2
transformers=4.37.2
vllm=0.3.0
Xiaohongshu Blog writting dataset is stored in generation_task/data/redbook
We provide xxx.sh to reproduce the results of ProbDiff in each folder.
If you finding our work interesting or helpful to you, please cite this repo.
@inproceedings{xia-etal-2024-language,
title = "Language Models can Evaluate Themselves via Probability Discrepancy",
author = "Xia, Tingyu and Yu, Bowen and Wu, Yuan and Chang, Yi and Zhou, Chang",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
month = aug,
year = "2024",
publisher = "Association for Computational Linguistics",
}