Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add different rnn implementation modes to ptb tutorial #2276

Merged
merged 5 commits into from
Aug 29, 2017
Merged

Add different rnn implementation modes to ptb tutorial #2276

merged 5 commits into from
Aug 29, 2017

Conversation

bignamehyp
Copy link
Member

Add different rnn implementation modes to ptb tutorial using CudnnLSTM and LSTMBlockCell.

rnn_mode == CUDNN

| config | epochs | train | valid | test | wps

| small | 13 | 40.50 | 116.60 | 112.13 | 49k
| medium | 39 | 22.58 | 123.83 | 118.69 | 24.2k
| large | 55 | 8.03 | 129.78 | 126.03 | 10.5k

In fact, the params tensor layout for cudnn_lstm is so different from canonical LSTM,
we need a complete different set of init_scale and learning rate parameters.

rnn_mode == BLOCK

| config | epochs | train | valid | test | wps

| small | 13 | 40.55 | 120.70 | 115.52 | 17.2k
| medium | 39 | 45.68 | 86.97 | 83.47 | 13.6k
| large | 55 | 37.94 | 82.75 | 78.49 | 5.0k

Wps before this cl for small, medium, large models are
15.6k, 12.6k, and 5.0k, respectively.

Benchmarking platform: E5-2690 v4, Titan X.

@bignamehyp
Copy link
Member Author

I just signed cla.

@bignamehyp bignamehyp self-assigned this Aug 23, 2017
@bignamehyp
Copy link
Member Author

Run googlebot again.

…M and LSTMBlockCell.

rnn_mode == CUDNN
===================================================
| config | epochs | train | valid  | test   | wps
====================================================
| small  | 13     | 40.50 | 116.60 | 112.13 | 49k
| medium | 39     | 22.58 | 123.83 | 118.69 | 24.2k
| large  | 55     |  8.03 | 129.78 | 126.03 | 10.5k
====================================================
In fact, the params tensor layout for cudnn_lstm is so different from canonical LSTM,
we need a complete different set of init_scale and learning rate parameters.

rnn_mode == BLOCK
===================================================
| config | epochs | train | valid  | test   | wps
====================================================
| small  | 13     | 40.55 | 120.70 | 115.52 | 17.2k
| medium | 39     | 45.68 |  86.97 |  83.47 | 13.6k
| large  | 55     | 37.94 |  82.75 |  78.49 | 5.0k
====================================================

Wps before this cl for small, medium, large models are
15.6k, 12.6k, and 5.0k, respectively.

Benchmarking platform: E5-2690 v4, Titan X.
Sync with sequence_loss change.
import reader
from google3.third_party.tensorflow_models.tutorials.rnn.ptb import reader
from google3.third_party.tensorflow_models.tutorials.rnn.ptb import util
from google3.third_party.tensorflow.python.client import device_lib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the google3 imports? Also, we don't have a util.py file in this folder, and I'm not sure whether we can import device_lib.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. PTAL.

@tensorflow tensorflow deleted a comment from googlebot Aug 24, 2017
@tensorflow tensorflow deleted a comment from googlebot Aug 24, 2017
@bignamehyp
Copy link
Member Author

PTAL.

@nealwu
Copy link
Contributor

nealwu commented Aug 29, 2017

Looks good at a glance. I'll try to run it on my machine tomorrow and see how it does.

@nealwu
Copy link
Contributor

nealwu commented Aug 29, 2017

👍 Got the following results with the medium model:

Epoch: 39 Train Perplexity: 45.733
Epoch: 39 Valid Perplexity: 87.722
Test Perplexity: 83.632

@nealwu nealwu merged commit c705568 into master Aug 29, 2017
@nealwu nealwu deleted the ptb branch August 29, 2017 22:02
mark86092 added a commit to mark86092/tensorflow-models that referenced this pull request Sep 1, 2017
@janchorowski
Copy link

This cl is buggy and should be reverted:

First, the CUDNN backend doesn't make sense:

  1. it doesn't do dropout between LSTM layers, which is required to reach the perplexities reported in the header of this file.
  2. the code doesn't handle initial states properly, it is mandatory that the initial state of the RNN is taken from the last state of the previous iteration.

Second, the CL pulled old code for the basic LSTM, making it incompatible with the new Tensorflow, see e.g. #8191.

@protoget
Copy link
Member

Re janchorowski@
First

  1. Cudnn takes dropout rate as one argument. It does dropout for each layer input. I think we tested the convergence before submitting the CL.
  2. This isn't true. For each example, the initial state is always 0 to begin with. The state is carried over to next step (not next example); after finishing all the steps you get the final state (for this example).

Second:
Not sure which part is problematic, could you point out more explicitly?

@janchorowski
Copy link

Hi,

I aplogize for speaking too soon. I had two problems with running the new code and blamed it to this change:

  • the one I described as second, in which this line:
    [attn_cell() for _ in range(config.num_layers)], state_is_tuple=True)
    got changed into: https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py#L222 Note that before the change a new cell object was created for each layer, while after the change the same cell object is reused. This triggered the error "ValueError: Attempt to reuse RNNCell". I then upgraded tensorflow to v 1.3 and can't reproduce this error, I can install some older versions and see at which versions the error is occurring, unfortunately I didn't record it.
  • the bad perplexities achieved with the cudnn backend. The model still overfits (see the table in the OP message). I blamed it on lack of dropout in the cudnn wrapper and on incorrect recurrent state preservation. I reread the code and you are right about the hidden states being properly preserved from minibatch to minibatch. If dropout also works for all layers, then why changing the LSTM implementation has such a drastic impact on the perplexity? The cudnn backend shouldn't require a completely new set of hyperparameters. It is unfortunate to have a fast implementation which gives bad perplexities and a slow one that gives good perplexities.

@protoget
Copy link
Member

Re janchorowski@

Thanks for the details.

You're right the cell is reused, thanks for catching that. Would you create a PR since you've invested a lot of time in it? We can review it.
We have no control of the essential implementation of Cudnn RNN, since it's done by Nvidia as a close-source library. I agree it is unfortunate. That said, I think for the short time being, it would become common for users to use different hyper params depending on the impl. e.g. Nvidia has a tutorial for training with mixed precision for speedup. . You might need to adjust your learning rate in order to achieve convergence.
We have a way to initialize cudnn variables w/ the same distributions as that for platform-independent rnn cells. The change is pending review and would probably catch the 1.4 release. We will update the example with the new cudnn APIs and check the pplx again.

Thank you.

Utumno added a commit to Utumno/models that referenced this pull request Oct 9, 2017
@Utumno
Copy link

Utumno commented Oct 9, 2017

Does the commit linked above fix the problem with reusing the same cell ? If yes I could submit a PR

@nealwu
Copy link
Contributor

nealwu commented Oct 9, 2017

@bignamehyp @protoget Does that change by @Utumno seem good to you? See #934 for some context.

@protoget
Copy link
Member

protoget commented Oct 9, 2017

LGTM if it passes quality tests.
FYI new Cudnn RNN layer API has been submitted so later this example will be further refactored.

@nealwu
Copy link
Contributor

nealwu commented Oct 11, 2017

@Utumno could you submit your PR and @-mention us?

@Utumno
Copy link

Utumno commented Oct 11, 2017

Thanks, will do ASAP but please merge #2403 first - it's needed so this even runs on python 3 and it will be in my pull otherwise

@nealwu
Copy link
Contributor

nealwu commented Oct 11, 2017

@Utumno Sure thing, merged.

Utumno added a commit to Utumno/models that referenced this pull request Oct 11, 2017
ajakash pushed a commit to ajakash/models that referenced this pull request Oct 24, 2017
@cocosci
Copy link

cocosci commented Nov 12, 2017

Does anyone can tell me the difference between "canonical LSTM cells" and "block or CUDNN cells"?
My tf version is 1.2.1 and that leaves me with canonical LSTM cells only. The keyword "BASIC" sounds not so fancy...:)

Correct me if I am wrong.
"All these three backend cells are LSTM units that do not support clipping, projection, etc. The major difference lies in the hardware optimization. It's Okay to choose any of the three"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants