This repo contains all the codes for SEScore implementation. SEScore is a reference-based text-generation evaluation metric that requires no pre-human-annotated error data, described in our paper Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis. from EMNLP 2022. Reader can refer https://research.google/pubs/pub51897/ for our WMT22 results!
Its effectiveness over prior methods like BLEU, BERTScore, BARTScore, PRISM, COMET and BLEURT has been demonstrated on a diverse set of language generation tasks, including translation, captioning, and web text generation. Readers have even described SEScore as "one unsupervised evaluation to rule them all" and we are very excited to share it with you!
We hosted our SEScore metric and running instructions on HuggingFace: https://huggingface.co/spaces/xu1998hz/sescore
python3 new_xlm_mbart_data.py -num_var 10 -lang en_XX -src case_study_src -ref case_study_ref -save save_file_name
python3 new_xlm_mbart_data.py -num_var 10 -lang de_DE -src src_folder -ref ref_folder -save save_file_name