tf.nn.rnn_cell.DropoutWrapper() drops out across time-steps #7927

beeCwright · 2017-02-27T19:29:34Z

at class DropoutWrapper(RNNCell): it seems dropout is implemented across all inputs and outputs without any implementation options.

Would like to have the option for rnn dropout where one 'dropout mask' is generated and is then applied to each time step. On the next batch, a new mask is generated. This method is described in:

(https://arxiv.org/pdf/1512.05287.pdf)

There a few other dropout methods that have been published recently including one from google.

(https://www.aclweb.org/anthology/C/C16/C16-1165.pdf)

And also 'zoneout' which is mentioned in Issue #2789.

Are there any plans to incorporate these advances?

-Brad

The text was updated successfully, but these errors were encountered:

tatatodd · 2017-03-01T02:29:54Z

@ebrevdo might have thoughts on this.

ebrevdo · 2017-03-01T06:00:06Z

This would be a useful option. We'll need to identify the best way to do it. What do you expect to happen if the dropout wrapper is used in two different calls to dynamic_rnn? The same mask is used across both rnns?

beeCwright · 2017-03-01T15:17:58Z

I think ideally each call to dynamic_rnn would generate a new random mask. Hence stacked rnn's using the dropoutwrapper would have recurrent dropout, but different masks for each layer.

Although I guess you could add a seed option if someone wanted to make them the same?

DropoutWrapper( recurrent=True, seed=None)

Args:

recurrent: if True applies dropout mask to individual time slices on the input and output after each time-step (purposed*). If False, apply dropout randomly to input and output (current implementation*)

beeCwright · 2017-03-09T15:36:31Z

Any thoughts on this moving forward? As a workaround right now I have to statically unroll my lstm so I can manually apply my own randomly generated dropout mask to successive time steps.

ebrevdo · 2017-03-10T06:52:26Z

I have a partial solution. Will try to get it in early next week.

…

On Mar 9, 2017 7:36 AM, "Brad" ***@***.***> wrote: Any thoughts on this moving forward? As a workaround right now I have to statically unroll my lstm so I can manually apply my own randomly generated dropout mask to successive time steps. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7927 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABtim561TWhtBek8lkLNopv1Rzz1LZSMks5rkByTgaJpZM4MNheF> .

ebrevdo · 2017-03-15T05:53:28Z

Scratch that, it's now a full implemention. Going through review now. Hope to push it tomorrow.

beeCwright · 2017-03-27T13:30:25Z

Howd the review go?

ebrevdo · 2017-03-30T14:09:47Z

Code is in. Use the argument variational_recurrent=True.

…

On Mar 27, 2017 6:30 AM, "Brad" ***@***.***> wrote: Howd the review go? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7927 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABtim_1mD-lH54oinjw_tDvUJVWqGprFks5rp7oJgaJpZM4MNheF> .

Hrant-Khachatrian · 2017-06-30T11:59:18Z

Thanks a lot for this implementation.

We can't use it for variable length inputs though. What should be given as an input_size ? input_size=tf.TensorShape([100, None, 200]) gives an error:

TypeError: Tensors in list passed to 'values' of 'ConcatV2' Op have types [int32, <NOT CONVERTIBLE TO TENSOR>] that don't all match.

The error is on this line.

ebrevdo · 2017-06-30T15:29:41Z

Have you tried tf.shape(inputs)[1:]? I think that may work.

…

On Jun 30, 2017 4:59 AM, "Hrant Khachatrian" ***@***.***> wrote: Thanks a lot for this implementation. We can't use it for variable length inputs though. What should be given as an input_size ? input_size=tf.TensorShape([100, None, 200]) gives an error: TypeError: Tensors in list passed to 'values' of 'ConcatV2' Op have types [int32, <NOT CONVERTIBLE TO TENSOR>] that don't all match. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#7927 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABtimw_hyoCcpdzszZGBlHkw74EliEtQks5sJOMsgaJpZM4MNheF> .

Hrant-Khachatrian · 2017-07-02T08:28:44Z

Now it says TypeError: int() argument must be a string or a number, not 'Tensor'

ebrevdo · 2017-07-03T01:17:38Z

What version of tensorflow are you using?

…

On Jul 2, 2017 1:29 AM, "Hrant Khachatrian" ***@***.***> wrote: Now it says TypeError: int() argument must be a string or a number, not 'Tensor' — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#7927 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABtim6QGtA9_6hoxoZ0GhooX3VUV7gKTks5sJ1TTgaJpZM4MNheF> .

Hrant-Khachatrian · 2017-07-03T05:38:09Z

0.12.head

dengoswei · 2017-07-27T03:34:37Z

1.2.1
same problem.. TypeError: int() argument must be a string or a number, not 'Tensor'

oahziur · 2017-08-08T02:51:06Z

Hi @Hrant-Khachatrian, when you say "variable length inputs", do you mean the sequence length is the variable? If so, I think the input_size for tf.TensorShape([100, None, 200]) is just 200 (assuming batch_size=100, variable sequence length), since the input_size is the input's depth of the RNN cell.

For TypeError: int() argument must be a string or a number, not 'Tensor', can you try inputs.get_shape()[1:]? - tf.shape() returns Tensor instead of TensorShape.

georgesterpu · 2017-10-16T19:49:45Z

For a variable length input (but padded batches) of shape [batches, timesteps, 132],
inputs.get_shape()[1:] or [2:] does not work in my case. Here is the error:

ValueError: Dimensions must be equal, but are 128 and 132 for 'Encoder/Encoder/while/Encoder/multi_rnn_cell/cell_1/mul' (op: 'Mul') with input shapes: [?,128], [1,132].

LSTMCell was instanced with 128 num_units

tf-nightly-gpu (1.4.0.dev20171013)

oahziur · 2017-10-23T18:24:20Z

@georgesterpu

From your error, it seems you have at least 2 layer of lstm? Is your dropout wrapper applied on the 2nd layer's cell? If so, the input size to your second layer should be your first layer's output, which is 128 instead of 132.

georgesterpu · 2017-10-24T10:47:44Z

Thanks, @oahziur, you are right.
I have added an extra parameter, the layer number, to my build_cell function, and I instantiate the DropoutWrapper like this:

cells = DropoutWrapper(cells,
                                   ..............
                                   variational_recurrent=True,
                                   dtype=tf.float32,
                                   input_size=self._inputs.get_shape()[-1] if layer == 0 else tf.TensorShape(num_units),
                                )

and it appears to work.

However, I get a new error on the decoder side in a seq2seq model:
ValueError: Dimensions must be equal, but are 143 and 128 for 'Decoder/decoder/while/BasicDecoderStep/decoder/attention_wrapper/attention_wrapper/multi_rnn_cell/cell_0/mul' (op: 'Mul') with input shapes: [?,143], [1,128].

I did make sure that for the decoder cells, input size is still num_units (128), and not self._inputs.get_shape()[-1], because it is initialised from the encoder's final state. Still, I get the same error. Do you have any suggestion here ?

zotroneneis · 2017-11-17T09:13:41Z

I have some questions regarding the implementation of variational_recurrent in the Dropout Wrapper. In the paper it says: 'Implementing our approximate inference is identical to implementing dropout in RNNs with the same network units dropped at each time step, randomly dropping inputs, outputs, and recurrent connections. This is in contrast to existing techniques, where different network units would be dropped at different time steps, and no dropout would be applied to the recurrent connections '

When setting variational_recurrent=True, do I get the full functionality of variational dropout presented in the paper (Y. Gal, Z. Ghahramani) ? I have looked at the source code but I'm still not sure whether it includes all aspects of variational dropout presented in the paper. Recent papers like this one use intra-layer dropout in addition to variational_dropout. But when variational_recurrent is already dropping output units at each time step I don't see the need for using dropout between RNN layers as well.

ruoruoliu · 2017-11-22T09:06:52Z

After check out the implementation of variational_recurrent in the DropoutWrapper, I'm a little confused. When variational_recurrent=True, the mask is generated when the wrapper is initialized before the training process, and is applied across every batch. Would it be better if a new mask be generated every batch as @beeCwright first requested?

Since the wrapper implementation is to build a rnncell object, how can it be aware of a new batch is coming. From my limited understanding, only dynamic_rnn knows when a batch is coming and rnncell only deals with single timestep input and state. Correct me if I'm wrong :P

ebrevdo · 2017-11-22T16:32:16Z

You get a new batch every time you call session.run(), independent of the graph, and a new dropout pattern is created with every call to session.run(). Therefore a new dropout pattern is applied to every batch.

…

On Wed, Nov 22, 2017 at 1:07 AM, Yujia Liu ***@***.***> wrote: After check out the implementation of variational_recurrent in the DropoutWrapper, I'm a little confused. When variational_recurrent=True, the mask is generated when the wrapper is initialized before the training process, and is applied across every batch. Would it be better if a new mask be generated every batch as @beeCwright <https://github.com/beecwright> first requested? Since the wrapper implementation is to build a rnncell object, how can it be aware of a new batch is coming. From my limited understanding, only dynamic_rnn knows when a batch is coming and rnncell only deals with single timestep input and state. Correct me if I'm wrong :P — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#7927 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABtimxd0_IPTH5UAYZ2RMZNTrJ-8OoZAks5s4-RTgaJpZM4MNheF> .

tatatodd added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 1, 2017

ebrevdo closed this as completed May 17, 2017

zotroneneis mentioned this issue Nov 21, 2017

How to use tf.nn.dropout to implement embedding dropout #14746

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf.nn.rnn_cell.DropoutWrapper() drops out across time-steps #7927

tf.nn.rnn_cell.DropoutWrapper() drops out across time-steps #7927

beeCwright commented Feb 27, 2017 •

edited

tatatodd commented Mar 1, 2017

ebrevdo commented Mar 1, 2017

beeCwright commented Mar 1, 2017 •

edited

beeCwright commented Mar 9, 2017

ebrevdo commented Mar 10, 2017 via email

ebrevdo commented Mar 15, 2017

beeCwright commented Mar 27, 2017

ebrevdo commented Mar 30, 2017 via email

Hrant-Khachatrian commented Jun 30, 2017 •

edited

ebrevdo commented Jun 30, 2017 via email

Hrant-Khachatrian commented Jul 2, 2017

ebrevdo commented Jul 3, 2017 via email

Hrant-Khachatrian commented Jul 3, 2017

dengoswei commented Jul 27, 2017

oahziur commented Aug 8, 2017

georgesterpu commented Oct 16, 2017

oahziur commented Oct 23, 2017

georgesterpu commented Oct 24, 2017

zotroneneis commented Nov 17, 2017 •

edited

ruoruoliu commented Nov 22, 2017

ebrevdo commented Nov 22, 2017 via email

tf.nn.rnn_cell.DropoutWrapper() drops out across time-steps #7927

tf.nn.rnn_cell.DropoutWrapper() drops out across time-steps #7927

Comments

beeCwright commented Feb 27, 2017 • edited

tatatodd commented Mar 1, 2017

ebrevdo commented Mar 1, 2017

beeCwright commented Mar 1, 2017 • edited

beeCwright commented Mar 9, 2017

ebrevdo commented Mar 10, 2017 via email

ebrevdo commented Mar 15, 2017

beeCwright commented Mar 27, 2017

ebrevdo commented Mar 30, 2017 via email

Hrant-Khachatrian commented Jun 30, 2017 • edited

ebrevdo commented Jun 30, 2017 via email

Hrant-Khachatrian commented Jul 2, 2017

ebrevdo commented Jul 3, 2017 via email

Hrant-Khachatrian commented Jul 3, 2017

dengoswei commented Jul 27, 2017

oahziur commented Aug 8, 2017

georgesterpu commented Oct 16, 2017

oahziur commented Oct 23, 2017

georgesterpu commented Oct 24, 2017

zotroneneis commented Nov 17, 2017 • edited

ruoruoliu commented Nov 22, 2017

ebrevdo commented Nov 22, 2017 via email

beeCwright commented Feb 27, 2017 •

edited

beeCwright commented Mar 1, 2017 •

edited

Hrant-Khachatrian commented Jun 30, 2017 •

edited

zotroneneis commented Nov 17, 2017 •

edited