Skip to content
Codes for "Towards Binary-Valued Gates for Robust LSTM Training".
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
language-modeling
machine-translation
README.md

README.md

g2-lstm

Codes for "Towards Binary-Valued Gates for Robust LSTM Training".

Language modeling code is based on awd-lstm-lm using PyTorch.

Translation code is based on Theano.

Implementation of Gumbel-Gate LSTM: Pytorch version, Theano version.

We also apply dropout to the Gumbel noise added to the gates. In particular, given a fixed probability p, all gates will independently be preturbed by the Gumbel noise with probability p, or stay unperturbed otherwise. We find that no matter what the value of p is, the performance of trained G2-LSTM will be better. When p is small, our model will have better generalization error, and when p is large, our model will have less performance drop under compression. We fix p=0.2 in all our experiments in the paper.

You can’t perform that action at this time.