Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.nn.rnn_cell.DropoutWrapper() drops out across time-steps #7927

Closed
beeCwright opened this issue Feb 27, 2017 · 21 comments
Closed

tf.nn.rnn_cell.DropoutWrapper() drops out across time-steps #7927

beeCwright opened this issue Feb 27, 2017 · 21 comments
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower

Comments

@beeCwright
Copy link

beeCwright commented Feb 27, 2017

at class DropoutWrapper(RNNCell): it seems dropout is implemented across all inputs and outputs without any implementation options.

Would like to have the option for rnn dropout where one 'dropout mask' is generated and is then applied to each time step. On the next batch, a new mask is generated. This method is described in:

(https://arxiv.org/pdf/1512.05287.pdf)

There a few other dropout methods that have been published recently including one from google.

(https://www.aclweb.org/anthology/C/C16/C16-1165.pdf)

And also 'zoneout' which is mentioned in Issue #2789.

Are there any plans to incorporate these advances?

-Brad

@tatatodd
Copy link
Contributor

tatatodd commented Mar 1, 2017

@ebrevdo might have thoughts on this.

@tatatodd tatatodd added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 1, 2017
@ebrevdo
Copy link
Contributor

ebrevdo commented Mar 1, 2017

This would be a useful option. We'll need to identify the best way to do it. What do you expect to happen if the dropout wrapper is used in two different calls to dynamic_rnn? The same mask is used across both rnns?

@beeCwright
Copy link
Author

beeCwright commented Mar 1, 2017

I think ideally each call to dynamic_rnn would generate a new random mask. Hence stacked rnn's using the dropoutwrapper would have recurrent dropout, but different masks for each layer.

Although I guess you could add a seed option if someone wanted to make them the same?

DropoutWrapper( recurrent=True, seed=None)

Args:

recurrent: if True applies dropout mask to individual time slices on the input and output after each time-step (purposed*). If False, apply dropout randomly to input and output (current implementation*)

@beeCwright
Copy link
Author

Any thoughts on this moving forward? As a workaround right now I have to statically unroll my lstm so I can manually apply my own randomly generated dropout mask to successive time steps.

@ebrevdo
Copy link
Contributor

ebrevdo commented Mar 10, 2017 via email

@ebrevdo
Copy link
Contributor

ebrevdo commented Mar 15, 2017

Scratch that, it's now a full implemention. Going through review now. Hope to push it tomorrow.

@beeCwright
Copy link
Author

Howd the review go?

@ebrevdo
Copy link
Contributor

ebrevdo commented Mar 30, 2017 via email

@ebrevdo ebrevdo closed this as completed May 17, 2017
@Hrant-Khachatrian
Copy link

Hrant-Khachatrian commented Jun 30, 2017

Thanks a lot for this implementation.

We can't use it for variable length inputs though. What should be given as an input_size ? input_size=tf.TensorShape([100, None, 200]) gives an error:

TypeError: Tensors in list passed to 'values' of 'ConcatV2' Op have types [int32, <NOT CONVERTIBLE TO TENSOR>] that don't all match.

The error is on this line.

@ebrevdo
Copy link
Contributor

ebrevdo commented Jun 30, 2017 via email

@Hrant-Khachatrian
Copy link

Now it says TypeError: int() argument must be a string or a number, not 'Tensor'

@ebrevdo
Copy link
Contributor

ebrevdo commented Jul 3, 2017 via email

@Hrant-Khachatrian
Copy link

0.12.head

@dengoswei
Copy link

1.2.1
same problem.. TypeError: int() argument must be a string or a number, not 'Tensor'

@oahziur
Copy link
Contributor

oahziur commented Aug 8, 2017

Hi @Hrant-Khachatrian, when you say "variable length inputs", do you mean the sequence length is the variable? If so, I think the input_size for tf.TensorShape([100, None, 200]) is just 200 (assuming batch_size=100, variable sequence length), since the input_size is the input's depth of the RNN cell.

For TypeError: int() argument must be a string or a number, not 'Tensor', can you try inputs.get_shape()[1:]? - tf.shape() returns Tensor instead of TensorShape.

@georgesterpu
Copy link
Contributor

For a variable length input (but padded batches) of shape [batches, timesteps, 132],
inputs.get_shape()[1:] or [2:] does not work in my case. Here is the error:

ValueError: Dimensions must be equal, but are 128 and 132 for 'Encoder/Encoder/while/Encoder/multi_rnn_cell/cell_1/mul' (op: 'Mul') with input shapes: [?,128], [1,132].

LSTMCell was instanced with 128 num_units

tf-nightly-gpu (1.4.0.dev20171013)

@oahziur
Copy link
Contributor

oahziur commented Oct 23, 2017

@georgesterpu

From your error, it seems you have at least 2 layer of lstm? Is your dropout wrapper applied on the 2nd layer's cell? If so, the input size to your second layer should be your first layer's output, which is 128 instead of 132.

@georgesterpu
Copy link
Contributor

Thanks, @oahziur, you are right.
I have added an extra parameter, the layer number, to my build_cell function, and I instantiate the DropoutWrapper like this:

cells = DropoutWrapper(cells,
                                   ..............
                                   variational_recurrent=True,
                                   dtype=tf.float32,
                                   input_size=self._inputs.get_shape()[-1] if layer == 0 else tf.TensorShape(num_units),
                                )

and it appears to work.

However, I get a new error on the decoder side in a seq2seq model:
ValueError: Dimensions must be equal, but are 143 and 128 for 'Decoder/decoder/while/BasicDecoderStep/decoder/attention_wrapper/attention_wrapper/multi_rnn_cell/cell_0/mul' (op: 'Mul') with input shapes: [?,143], [1,128].

I did make sure that for the decoder cells, input size is still num_units (128), and not self._inputs.get_shape()[-1], because it is initialised from the encoder's final state. Still, I get the same error. Do you have any suggestion here ?

@zotroneneis
Copy link

zotroneneis commented Nov 17, 2017

I have some questions regarding the implementation of variational_recurrent in the Dropout Wrapper. In the paper it says: 'Implementing our approximate inference is identical to implementing dropout in RNNs with the same network units dropped at each time step, randomly dropping inputs, outputs, and recurrent connections. This is in contrast to existing techniques, where different network units would be dropped at different time steps, and no dropout would be applied to the recurrent connections '

When setting variational_recurrent=True, do I get the full functionality of variational dropout presented in the paper (Y. Gal, Z. Ghahramani) ? I have looked at the source code but I'm still not sure whether it includes all aspects of variational dropout presented in the paper. Recent papers like this one use intra-layer dropout in addition to variational_dropout. But when variational_recurrent is already dropping output units at each time step I don't see the need for using dropout between RNN layers as well.

@ruoruoliu
Copy link

After check out the implementation of variational_recurrent in the DropoutWrapper, I'm a little confused. When variational_recurrent=True, the mask is generated when the wrapper is initialized before the training process, and is applied across every batch. Would it be better if a new mask be generated every batch as @beeCwright first requested?

Since the wrapper implementation is to build a rnncell object, how can it be aware of a new batch is coming. From my limited understanding, only dynamic_rnn knows when a batch is coming and rnncell only deals with single timestep input and state. Correct me if I'm wrong :P

@ebrevdo
Copy link
Contributor

ebrevdo commented Nov 22, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower
Projects
None yet
Development

No branches or pull requests

9 participants