Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection

Data, model and source code for the TACL paper:

Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection

Weijia Xu, Sweta Agrawal, Eleftheria Briakou, Marianna J. Martindale, Marine Carpuat

Prerequisite

Clone this repository and follow the installation instructions.
Put the code files in the the-story-of-heads directory.

Evaluation Data

The corpora in data contain 408 English-Chinese and 423 German-English translations annotated for detached hallucinations ({hallucinated: 1, non-hallucinated: 0}). Each line contains the source sentence, model translation, and the label separated by tabs. The guidelines for annotation can be found in the paper.

Models

We release the models used to produce the translations: German-English, English-Chinese. The models are trained based on this codebase. We also release the BPE files for data preprocessing here.

Hallucination Detector

code/extract-token-contributions.py contains the python scripts for extracting the relative token contributions given an input file that contains the source and translation pairs (tab-separated) and dump the results into a pickle file.
code/classifier.py contains the code to train and test the classifier based on the features extracted from relative token contributions.

You may need to change the paths on top of each script before running.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
code		code
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

README.md

README.md

Repository files navigation

Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection

Prerequisite

Evaluation Data

Models

Hallucination Detector

About

Releases

Packages

Languages

weijia-xu/hallucinations-in-nmt

Folders and files

Latest commit

History

Repository files navigation

Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection

Prerequisite

Evaluation Data

Models

Hallucination Detector

About

Resources

Stars

Watchers

Forks

Languages