Code for "Unsupervised Cross-lingual Transfer of Word Embedding Spaces" in EMNLP 2018 [pdf ]
This software runs python 3.6 with the following libraries:
- tensorflow r1.6(with cuda 9.0)
- numpy
- tqdm
wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh
./Anaconda3-5.0.1-Linux-x86_64.sh # Follow the instructions
conda create -n <name of your environment> python=3.6 anaconda
source activate <name of your environment>
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.6.0-cp36-cp36m-linux_x86_64.whl
pip install tqdm
run python src/runner.py --help
to see the usage of arguments
example.sh
gives an example run of our model. It will run the "bg-en" experiment of "LEX-C" and then evaluate the accuracy@1. You need to download data before running:
cd data
./download.sh
Note that this data is a subset of the release from MUSE .
Then run the following command to start training:
cd .. # back to root repo directory
./example.sh
Please consider citing our paper if you find this repo useful in your research.
@article{xu2018unsupervised,
title={Unsupervised Cross-lingual Transfer of Word Embedding Spaces},
author={Xu, Ruochen and Yang, Yiming and Otani, Naoki and Wu, Yuexin},
booktitle={Conference on Empirical Methods on Natural Language Processing},
year={2018}
}