This repo contains Pytorch code for Memory-efficient Stochastic methods for Memory-based Transformers paper
To doenload data you can use tranformer-xl repo
We used pytorch version 1.5 for our experiments
Please refer to models.py
if you want to just look at the changes proposed in the work
To run the experiments use run_wt15_phase_1.sh
and then run_wt15_phase_2.sh
for training the models.
Additionally, all the hyperparameters are presented in the Appendix of the paper.
If you liked the work, please cite it. We thank the great minds who have helped in the improvement of science.