Skip to content

tr4lg/MTPEdocs-MQM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

MTPEdocs-MQM

Introduction

To encourage studies on machine translation quality estimation (MTQE) for Japanese-to-English translation direction, we manually annotated erroneous spans in the MT outputs of MTPEdocs.

  • MT-MQM: Span-based issue annotations based on MQM-like manual quality assessment of MT
    • Outputs of two MT systems in MTPEdocs have so far been covered.
      • JaEn_01_TexTra
      • JaEn_02_Google
    • Another set of workers from PE were asked to annotate problematic spans within MT output and label each span with issue type and severity.
      • The issue typology is configured slightly from Table 2 in Freitag et al. (2021) with a prioritization of Terminology issues as in Fujita et al. (2017).
      • All the identified issues were annotated (cf. five most severe issues were selected in Freitag et al. (2021)).
      • Neither PE (in MTPEdocs) nor human translation (in Translation Resources of Nagoya City) was referred to.
    • Segment-level MQM scores were computed based on the annotated issues as Table 4 in Freitag et al. (2021).
      • "Critical" issues were weighted equivalently to "Major" issues.
      • Due to the unlimited number of issues being involved, the score is sometimes very large.

References

Developer

The dataset in this directory is credited to National Institute of Information and Communications Technology (henceforth, NICT). NICT has made the dataset publicly available under the conditions of license specified below.

License

Creative Commons License

Precautions

  • NICT bears no responsibility for the contents of the dataset and assumes no liability for any direct or indirect damage or loss whatsoever that may be incurred as a result of using the dataset.
  • If any copyright infringement or other problems are found in the dataset, please contact us at atsushi.fujita[at]nict[dot]go[dot]jp. We will review the issue and undertake appropriate measures as necessary.

Acknowledgments

We are grateful to Rei Miyata for his helpful advice on the MQM-like annotation instruction and the issue typology. This dataset has been developed partly supported by JSPS KAKENHI Grant-in-Aid for Scientific Research (B) 23K28378, Developing Shared Translation Resources for Local Governments to Facilitate Multilingual Information Dissemination, as a part of work at Advanced Translation Technology Laboratory, Advanced Speech Translation Research and Development Promotion Center, National Institute of Information and Communications Technology.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published