Skip to content

Vicomtech/tando

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

TANDO: A Corpus for Document-level Machine Translation

This repository contains the TANDO corpus for Document-level Machine Translation in Basque-Spanish.

Table of Contents

  1. Description
  2. Citation
  3. License
  4. Contact

Description

TANDO is a corpus for training and evaluation of document-level machine translation models in Basque-Spanish. The corpus was prepared within the ELKARTEK project TANDO (2020-2021: www.tando.eus) by members of the project consortium:

The TANDO corpus includes both parallel and contrastive datasets, in text format, and covers different domains (literature, news, subtitles, talks, politics). It can be downloaded via the following link: https://datasets.vicomtech.org/v2-tando/tando-corpus_v1.0.tar.gz

Citation

If you use any part of the corpus in your own work, please cite the following paper:

@inproceedings{gete-et-al2022tando-corpus,
  title={TANDO: A Corpus for Document-level Machine Translation},
  author={Gete, Harritxu and Etchegoyhen, Thierry and Ponce, David and Labaka, Gorka and
     Aranberri, Nora and Corral, Ander and Saralegi, Xabier
     and Ellakuria Santos, Igor and Martin, Maite}
  booktitle={Proceedings of the 13th Edition of the Language Resources and Evaluation Conference  (LREC 2022)},
  location = {Marseille, France}
  year={2022},
  pages = {TBD}
}

License

The TANDO corpus is distributed under the Creative Commons BY-NC-SA 4.0 license.
To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Contact

If you have any question or suggestion, do not hesitate to contact us at the following addresses:

  • Thierry Etchegoyhen: tetchegoyhen [AT] vicomtech [DOT] org
  • Harritxu Gete: hgete [AT] vicomtech [DOT] org

About

TANDO is a corpus for training and evaluation of document-level machine translation models in Basque-Spanish.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published