Memory-efficient-Stochastic-methods-for-Memory-based-Transformers

This repo contains Pytorch code for Memory-efficient Stochastic methods for Memory-based Transformers paper

To doenload data you can use tranformer-xl repo

Pytorch

We used pytorch version 1.5 for our experiments

Please refer to models.py if you want to just look at the changes proposed in the work

To run the experiments use run_wt15_phase_1.sh and then run_wt15_phase_2.sh for training the models.

Additionally, all the hyperparameters are presented in the Appendix of the paper.

If you liked the work, please cite it. We thank the great minds who have helped in the improvement of science.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
adaptive_io.py		adaptive_io.py
check_version.py		check_version.py
data_process.py		data_process.py
exp_utils.py		exp_utils.py
main.py		main.py
models.py		models.py
optim.py		optim.py
pruning.py		pruning.py
run_wt15_phase_1.sh		run_wt15_phase_1.sh
run_wt15_phase_2.sh		run_wt15_phase_2.sh
trainer.py		trainer.py
utils.py		utils.py
visualisation.py		visualisation.py
weight_initialisation.py		weight_initialisation.py