Skip to content
Tokyo Metropolitan University Paraphrase Corpus (TMUP)
Branch: master
Clone or download
Latest commit 57ec94f Jun 12, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md Create README.md Jun 12, 2017
tmup.tsv Add files via upload Jun 4, 2017

README.md

Tokyo Metropolitan University Paraphrase Corpus (TMUP)

TMUP is an evaluation corpus for Japanese paraphrase identification. It consists of 655 sentence pairs in total.

  • 363 paraphrase sentence pairs
  • 292 non-paraphrase sentence pairs

Candidate Acquisition Method

To acquire both paraphrase and non-paraphrase instances, we

  • generated sentence pairs using Google PBMT and NMT to acquire paraphrases
  • extracted sentence pairs from Japanese Wikipedia to acquire non-paraphrases

To acquire both trivial and non-trivial instances, we

  • calculated word overlap rate (Jaccard score) of each sentence pair and uniformly sampled candidates

Annotation

Two annotators judged whether the candidates are paraphrases.

*For more details, please refer to the paper.

Data Format

label <TAB> sentence_A_ja <TAB> sentence_B_ja <TAB> source_sentence_en (if applicable)

Labels

  • 1: Paraphrase
  • 0: Non-paraphrase

Citing

If you make use of this corpus, please cite the following publication:

Yui Suzuki, Tomoyuki Kajiwara and Mamoru Komachi. Building a Non-Trivial Paraphrase Corpus using Multiple Machine Translation Systems. In Proceedings of ACL 2017 Student Research Workshop, Vancouver, Canada. July 2017 (to appear).

@inproceedings{,
    author      = {Suzuki, Yui and Kajiwara, Tomoyuki and Komachi, Mamoru},
    title       = {Building a Non-Trivial Paraphrase Corpus
                  using Multiple Machine Translation Systems},
    booktitle   = {Proceedings of ACL 2017 Student Research Workshop},
    month       = {July},
    year        = {2017},
    address     = {Vancouver, Canada},
    publisher   = {Association for Computational Linguistics},
    pages     = {(to appear)},
    url       = {http://www.aclweb.org/anthology/}
}

License

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Copyright (c) 2017 TMU-NLP

Contact

For inquiry and feedback please contact the authors below:

  • Yui Suzuki <suzuki-yui at ed.tmu.ac.jp>
  • Tomoyuki Kajiwara <kajiwara-tomoyuki at ed.tmu.ac.jp>
  • Mamoru Komachi <komachi at tmu.ac.jp>
You can’t perform that action at this time.