Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BasicDecoder fails with masked input tensor #534

Closed
kazemnejad opened this issue Sep 20, 2019 · 0 comments · Fixed by #546
Labels

Comments

@kazemnejad
Copy link
Contributor

@kazemnejad kazemnejad commented Sep 20, 2019

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • TensorFlow version and how it was installed (source or binary): binary(2.0.0-dev20190920)
  • TensorFlow-Addons version and how it was installed (source or binary): binary(0.6.0-dev)
  • Python version: 3.6
  • Is GPU used? (yes/no): yes

Describe the bug

When the BasicDecoder receives a tensor equipped with masking meta-data as the input, it will fail due to the unhandled mask argument in TrainingSampler.initialize(..). This is a common scenario in which the input tensor is computed using the embedding layer.

Code to reproduce the issue

units = 32
vocab_size = 1000
embedding = tf.keras.layers.Embedding(
    input_dim=vocab_size, output_dim=units, mask_zero=True)
attention_mechanism = attention_wrapper.LuongAttention(units)
cell = tf.keras.layers.LSTMCell(units)
cell = tfa.seq2seq.AttentionWrapper(cell, attention_mechanism)
output_layer = tf.keras.layers.Dense(vocab_size)
sampler = tfa.seq2seq.sampler.TrainingSampler()
decoder = tfa.seq2seq.BasicDecoder(
    cell, sampler, output_layer=output_layer)

# setup the attention mechanism's memory with a random tensor
fake_mem = tf.random.normal((2, 3, units))
fake_mem_mask = tf.ones((2,3), dtype=tf.bool)
attention_mechanism(fake_mem, mask=fake_mem_mask, setup_memory=True)

word_ids = tf.random.uniform((2, 5), 0, vocab_size, dtype=tf.int64) \
            * tf.constant([[1, 1, 1, 0, 0], [1, 1, 1, 1, 1]], dtype=tf.int64)
word_embeds = embedding(word_ids)
mask = embedding.compute_mask(word_ids)

initial_state = cell.get_initial_state(batch_size=2, dtype=tf.float32)
outputs = decoder(
    inputs=word_embeds,
    initial_state=initial_state,
    sequence_length=tf.math.reduce_sum(tf.cast(mask, tf.int32), axis=1),
)

It will raise the following error:

  File "tensorflow_addons/seq2seq/basic_decoder.py", line 72, in initialize
    return self.sampler.initialize(inputs, **kwargs) + (initial_state,)
TypeError: initialize() got an unexpected keyword argument 'mask'

Other info / logs
This issue happens because we capture the mask argument on the BaseDecoder.call(..., **kwargs) and propagate it all the way through TrainingSample.initialize(...) except that the initialize method doesn't accept the mask argument.
One can temporarily fix this error by adding delattr(word_embeds, '_keras_mask') just before calling the decoder instance.

I encountered this issue when i was working on #335

Ideas to fix this issue:

  • Fix the BasicDecoder to only pass the appropriate argument when it calls the sampler. or even fix the TrainingSampler to capture all other arguments using the kwargs
  • Add the appropriate support for the mask argument in the TrainingSampler in additon to sequence_length (similar to AttentionMechanism)

I think there is no harm in the latter option as it provides a more Keras-friendly way to work with the framework.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.