Skip to content
Tokyo Metropolitan University Paraphrase Corpus (TMUP)
Branch: master
Clone or download
Latest commit 57ec94f Jun 12, 2017
Type Name Latest commit message Commit time
Failed to load latest commit information. Create Jun 12, 2017
tmup.tsv Add files via upload Jun 4, 2017

Tokyo Metropolitan University Paraphrase Corpus (TMUP)

TMUP is an evaluation corpus for Japanese paraphrase identification. It consists of 655 sentence pairs in total.

  • 363 paraphrase sentence pairs
  • 292 non-paraphrase sentence pairs

Candidate Acquisition Method

To acquire both paraphrase and non-paraphrase instances, we

  • generated sentence pairs using Google PBMT and NMT to acquire paraphrases
  • extracted sentence pairs from Japanese Wikipedia to acquire non-paraphrases

To acquire both trivial and non-trivial instances, we

  • calculated word overlap rate (Jaccard score) of each sentence pair and uniformly sampled candidates


Two annotators judged whether the candidates are paraphrases.

*For more details, please refer to the paper.

Data Format

label <TAB> sentence_A_ja <TAB> sentence_B_ja <TAB> source_sentence_en (if applicable)


  • 1: Paraphrase
  • 0: Non-paraphrase


If you make use of this corpus, please cite the following publication:

Yui Suzuki, Tomoyuki Kajiwara and Mamoru Komachi. Building a Non-Trivial Paraphrase Corpus using Multiple Machine Translation Systems. In Proceedings of ACL 2017 Student Research Workshop, Vancouver, Canada. July 2017 (to appear).

    author      = {Suzuki, Yui and Kajiwara, Tomoyuki and Komachi, Mamoru},
    title       = {Building a Non-Trivial Paraphrase Corpus
                  using Multiple Machine Translation Systems},
    booktitle   = {Proceedings of ACL 2017 Student Research Workshop},
    month       = {July},
    year        = {2017},
    address     = {Vancouver, Canada},
    publisher   = {Association for Computational Linguistics},
    pages     = {(to appear)},
    url       = {}


Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Copyright (c) 2017 TMU-NLP


For inquiry and feedback please contact the authors below:

  • Yui Suzuki <suzuki-yui at>
  • Tomoyuki Kajiwara <kajiwara-tomoyuki at>
  • Mamoru Komachi <komachi at>
You can’t perform that action at this time.