Skip to content

weijia-xu/hallucinations-in-nmt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection

Data, model and source code for the TACL paper:

Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection

Weijia Xu, Sweta Agrawal, Eleftheria Briakou, Marianna J. Martindale, Marine Carpuat

Prerequisite

Evaluation Data

The corpora in data contain 408 English-Chinese and 423 German-English translations annotated for detached hallucinations ({hallucinated: 1, non-hallucinated: 0}). Each line contains the source sentence, model translation, and the label separated by tabs. The guidelines for annotation can be found in the paper.

Models

We release the models used to produce the translations: German-English, English-Chinese. The models are trained based on this codebase. We also release the BPE files for data preprocessing here.

Hallucination Detector

  • code/extract-token-contributions.py contains the python scripts for extracting the relative token contributions given an input file that contains the source and translation pairs (tab-separated) and dump the results into a pickle file.
  • code/classifier.py contains the code to train and test the classifier based on the features extracted from relative token contributions.

You may need to change the paths on top of each script before running.

About

Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages