-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf.nn.rnn_cell.DropoutWrapper() drops out across time-steps #7927
Comments
@ebrevdo might have thoughts on this. |
This would be a useful option. We'll need to identify the best way to do it. What do you expect to happen if the dropout wrapper is used in two different calls to dynamic_rnn? The same mask is used across both rnns? |
I think ideally each call to dynamic_rnn would generate a new random mask. Hence stacked rnn's using the dropoutwrapper would have recurrent dropout, but different masks for each layer. Although I guess you could add a
Args:
|
Any thoughts on this moving forward? As a workaround right now I have to statically unroll my lstm so I can manually apply my own randomly generated dropout mask to successive time steps. |
I have a partial solution. Will try to get it in early next week.
…On Mar 9, 2017 7:36 AM, "Brad" ***@***.***> wrote:
Any thoughts on this moving forward? As a workaround right now I have to
statically unroll my lstm so I can manually apply my own randomly generated
dropout mask to successive time steps.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7927 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABtim561TWhtBek8lkLNopv1Rzz1LZSMks5rkByTgaJpZM4MNheF>
.
|
Scratch that, it's now a full implemention. Going through review now. Hope to push it tomorrow. |
Howd the review go? |
Code is in. Use the argument variational_recurrent=True.
…On Mar 27, 2017 6:30 AM, "Brad" ***@***.***> wrote:
Howd the review go?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7927 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABtim_1mD-lH54oinjw_tDvUJVWqGprFks5rp7oJgaJpZM4MNheF>
.
|
Thanks a lot for this implementation. We can't use it for variable length inputs though. What should be given as an
The error is on this line. |
Have you tried tf.shape(inputs)[1:]? I think that may work.
…On Jun 30, 2017 4:59 AM, "Hrant Khachatrian" ***@***.***> wrote:
Thanks a lot for this implementation.
We can't use it for variable length inputs though. What should be given as
an input_size ? input_size=tf.TensorShape([100, None, 200]) gives an
error:
TypeError: Tensors in list passed to 'values' of 'ConcatV2' Op have types
[int32, <NOT CONVERTIBLE TO TENSOR>] that don't all match.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#7927 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABtimw_hyoCcpdzszZGBlHkw74EliEtQks5sJOMsgaJpZM4MNheF>
.
|
Now it says |
What version of tensorflow are you using?
…On Jul 2, 2017 1:29 AM, "Hrant Khachatrian" ***@***.***> wrote:
Now it says TypeError: int() argument must be a string or a number, not
'Tensor'
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#7927 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABtim6QGtA9_6hoxoZ0GhooX3VUV7gKTks5sJ1TTgaJpZM4MNheF>
.
|
0.12.head |
1.2.1 |
Hi @Hrant-Khachatrian, when you say "variable length inputs", do you mean the sequence length is the variable? If so, I think the input_size for For |
For a variable length input (but padded batches) of shape
LSTMCell was instanced with 128 num_units tf-nightly-gpu (1.4.0.dev20171013) |
From your error, it seems you have at least 2 layer of lstm? Is your dropout wrapper applied on the 2nd layer's cell? If so, the input size to your second layer should be your first layer's output, which is 128 instead of 132. |
Thanks, @oahziur, you are right.
and it appears to work. However, I get a new error on the decoder side in a seq2seq model: I did make sure that for the decoder cells, input size is still num_units (128), and not |
I have some questions regarding the implementation of When setting |
After check out the implementation of Since the wrapper implementation is to build a rnncell object, how can it be aware of a new batch is coming. From my limited understanding, only |
You get a new batch every time you call session.run(), independent of the
graph, and a new dropout pattern is created with every call to
session.run(). Therefore a new dropout pattern is applied to every batch.
…On Wed, Nov 22, 2017 at 1:07 AM, Yujia Liu ***@***.***> wrote:
After check out the implementation of variational_recurrent in the
DropoutWrapper, I'm a little confused. When variational_recurrent=True,
the mask is generated when the wrapper is initialized before the training
process, and is applied across every batch. Would it be better if a new
mask be generated every batch as @beeCwright
<https://github.com/beecwright> first requested?
Since the wrapper implementation is to build a rnncell object, how can it
be aware of a new batch is coming. From my limited understanding, only
dynamic_rnn knows when a batch is coming and rnncell only deals with
single timestep input and state. Correct me if I'm wrong :P
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#7927 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABtimxd0_IPTH5UAYZ2RMZNTrJ-8OoZAks5s4-RTgaJpZM4MNheF>
.
|
at
class DropoutWrapper(RNNCell):
it seems dropout is implemented across all inputs and outputs without any implementation options.Would like to have the option for rnn dropout where one 'dropout mask' is generated and is then applied to each time step. On the next batch, a new mask is generated. This method is described in:
(https://arxiv.org/pdf/1512.05287.pdf)
There a few other dropout methods that have been published recently including one from google.
(https://www.aclweb.org/anthology/C/C16/C16-1165.pdf)
And also 'zoneout' which is mentioned in Issue #2789.
Are there any plans to incorporate these advances?
-Brad
The text was updated successfully, but these errors were encountered: