Skip to content

tmtmaj/Korean-Chinese-parallel-dataset-for-machine-translation-task

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 

Repository files navigation

Korean-Chinese corpus for machine translation (bilingual dataset)

기계 번역을 위한 한국어-중국어 병렬 코퍼스입니다.

These datasets are Korean-Chinese corpus for machine translation task.

This dataset was used in https://arxiv.org/abs/1911.11008, see the paper for more information.

Dong-A dataset (250K)

link:

  1. https://pan.baidu.com/s/1YfH-PM0YwU_GOZdkNGNKQQ (Baidu cloud, key code: vvem)

  2. https://drive.google.com/file/d/1Gd3xbdjQjd8l85Ugu1gZFaY0D-TbCEcF/view?usp=sharing (google drive)

SWRC dataset (50K)

We remove some erroneous sentences in the original parallel dataset from http://semanticweb.kaist.ac.kr/home/index.php/KAIST_Corpus.

link:

  1. https://pan.baidu.com/s/130ZSTjmfTZL_xTi4Jdp75A (Baidu cloud,key code: gw5s)

  2. https://drive.google.com/file/d/1wIbUW2APRPx-P7sRc6nZi3AZ9rBPUv4u/view?usp=sharing (google drive)

@article{DBLP:journals/corr/abs-1911-11008,
  author    = {Jeonghyeok Park and
               Hai Zhao},
  title     = {Korean-to-Chinese Machine Translation using Chinese Character as Pivot
               Clue},
  journal   = {CoRR},
  volume    = {abs/1911.11008},
  year      = {2019},
  url       = {http://arxiv.org/abs/1911.11008},
  archivePrefix = {arXiv},
  eprint    = {1911.11008},
  timestamp = {Tue, 03 Dec 2019 14:15:54 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1911-11008.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published