Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got errors when calling get_initial_state with attention mechanism #673

CryMasK opened this issue Nov 5, 2019 · 5 comments

Got errors when calling get_initial_state with attention mechanism #673

CryMasK opened this issue Nov 5, 2019 · 5 comments


Copy link

CryMasK commented Nov 5, 2019

System information
Run tfa-nightly (0.7.0.dev20191105 ) and TF 2.0 on colab

Describe the bug
I want to implement encoder-decoder structure with attention mechanism.

But when I called decoder_initial_state = decoder_cell.get_initial_state(masked_input), I got the error ValueError: Duplicate node name in graph: 'AttentionWrapperZeroState/zeros/packed'

If I change to another calling method decoder_initial_state = decoder_cell.get_initial_state(batch_size=tf.constant(batch_size), dtype=tf.float32), there also is an error OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.

Did I do something wrong?

Code to reproduce the issue

import tensorflow as tf
from tensorflow.keras.layers import Input, Concatenate, LSTMCell, Lambda, Masking, LSTM, Dense
from tensorflow.keras.models import Model

num_units = 2
batch_size = 2

data_input = Input(shape=(None, num_units))
masked_input = Masking(mask_value=DUMMY_VALUE)(data_input)
encoder_outputs, state_h, state_c = LSTM(num_units, return_sequences=True, return_state=True, implementation=1)(masked_input)
encoder_state = [state_h, state_c]

sampler = tfa.seq2seq.sampler.TrainingSampler()

decoder_cell = LSTMCell(num_units, implementation=1)
attention_mechanism = tfa.seq2seq.LuongAttention(
decoder_cell = tfa.seq2seq.AttentionWrapper(
    decoder_cell, attention_mechanism)
projection_layer = Dense(1, activation='sigmoid')
decoder = tfa.seq2seq.BasicDecoder(
    decoder_cell, sampler, output_layer=projection_layer)

mask = Lambda(lambda x: x._keras_mask)(masked_input)
attention_mechanism(encoder_outputs, memory_mask=mask, setup_memory=True)
decoder_initial_state = decoder_cell.get_initial_state(masked_input)
#decoder_initial_state = decoder_cell.get_initial_state(batch_size=tf.constant(batch_size), dtype=tf.float32)
decoder_initial_state = decoder_initial_state.clone(cell_state=encoder_state)

outputs, _, _ = decoder(
logits = outputs.rnn_output

model = Model(inputs=[data_input], outputs=[logits])

Copy link

guillaumekln commented Nov 6, 2019

@kazemnejad Do you spot an obvious error in the above code?

Copy link

kazemnejad commented Nov 6, 2019

In terms of obvious errors, I don't think so. However, this line should be changed as follows:

decoder_initial_state = decoder_cell.get_initial_state(batch_size=tf.constant(batch_size), dtype=tf.float32)


decoder_initial_state = decoder_cell.get_initial_state(batch_size=tf.shape(masked_input)[0], dtype=tf.float32)

Nonetheless, even the new version still produces the same error(ValueError: Duplicate node name in graph). I'm not really sure about the compatibility of the Seq2seq module with the Keras functional mode, especially as I didn't face such error in the Model-subclassing mode.

It also worth mentioning that the above code seems to raise another error:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'Tensor'`

Copy link

seanpmorgan commented Nov 6, 2019

Thanks for looking into this!
cc @qlzh727 to see if he has any thoughts as well

Copy link

guillaumekln commented Dec 3, 2019

@CryMasK Do you have some updates on your issue?

Copy link

CryMasK commented Dec 12, 2019

I didn't keep trying with tfa.
Alternatively, I implement the attention mechanism through using tensorflow build-in RNN layer.
For someone also wants to implement the attention mechanism, the concept is illustrated as below:

class FakeRNNCell(Layer):
    # do attention here

encoder_outputs = LSTM(...)
decoder_outputs = LSTM(...)

cell = FakeRNNCell(...)
attention_layer = RNN(cell, return_sequences=True)
attention_outputs = RNN_layer(decoder_outputs, constants=[encoder_outputs])

This workaround works well even with a long sequence and it can perform local-attention and masking by the same concept.
However, this approach doesn't concatenate the input and its last step output (input-feeding).
To address this, I also have tried to overwrite TF LSTMCell for implementing input-feeding.
Unfortunately, maybe since I'm not so familiar with tensorflow, it will cause OOM error if the sequence is too long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

4 participants