Transformer-Network-on-CN-EN-Translation

Apply state of art machine translation techniques to Chinese and English translation

Download [MultiUN dataset](wget http://opus.nlpl.eu/download.php?f=MultiUN/en-zh.txt.zip)
Unzip file and put 'MultiUN.en-zh.zh' and 'MultiUN.en-zh.en' in 'corpra' directory.

Install Chinese tokenization library jieba by running command 'pip install jieba'.
Run token_zh.py to tokenize Chinese corpus in MultiUN Dataset (MultiUN.en-zh.zh).
Rename the tokenized result as 'MultiUN.en-zh.zh'.
Run split_data.py to split dataset into training set and test set. Hyperparameters are in hyperparams.py

TODO List:

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.idea		.idea
transformer		transformer
README.md		README.md

Provide feedback