<a href="https://colab.research.google.com/github/victorviro/Deep_learning_python/blob/master/Masking_and_padding_with_Keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Masking and padding with Keras



**Masking** is a way to tell sequence-processing layers that certain timesteps
in an input are missing, and thus should be skipped when processing the data.

**Padding** is a special form of masking where the masked steps are at the start or at
the beginning of a sequence. Padding comes from the need to encode sequence data into
contiguous batches: in order to make all sequences in a batch fit a given standard
length, it is necessary to pad or truncate some sequences.


In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

## Padding sequence data

When processing sequence data, it is very common for individual samples to have different lengths. Consider the following example (text tokenized as words):

In [None]:
sentences = [
  ["Hello", "world", "!"],
  ["How", "are", "you", "doing", "today"],
  ["The", "weather", "will", "be", "nice", "tomorrow"],
]

Let's vectorized the data as integers:

In [None]:
tokenizer = keras.preprocessing.text.Tokenizer()
tokenizer.fit_on_texts(sentences)
sentences_encoded = tokenizer.texts_to_sequences(sentences)
sentences_encoded

The data is a nested list where individual samples have length 3, 5, and 6, respectively. Since the input data for a deep learning model must be a single tensor (of shape e.g. `(batch_size, 6, vocab_size)` in this case), samples that are shorter than the longest item need to be padded with some value (alternatively,
one might also truncate long samples before padding short samples).

Keras provides a utility function to truncate and pad Python lists to a common length:
`tf.keras.preprocessing.sequence.pad_sequences`. By default, this will pad using 0s althoug it is configurable via the `value` parameter. Note that we could "pre" padding (at the beginning) or
"post" padding (at the end). Keras recommends using "post" padding when working with RNN layers
(in order to be able to use the CuDNN implementation of the layers).

In [None]:
padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(
    sentences_encoded, padding="post")
print(padded_inputs)


## Masking

Now that all samples have a uniform length, the model must be informed that some part of the data is actually padding and should be ignored. That mechanism is **masking**.

There are three ways to introduce input masks in Keras models:

- Add a `keras.layers.Masking` layer.
- Configure a `keras.layers.Embedding` layer with `mask_zero=True`.
- Pass a `mask` argument manually when calling layers that support this argument (e.g.
RNN layers).

## Mask-generating layers: `Embedding` and `Masking`

Under the hood, these layers will create a mask tensor (2D tensor with shape `(batch_size, sequence_length)`), and attach it to the tensor output returned by the `Masking` or `Embedding` layer.

In [None]:
vocab_size=5000
embedding_dim = 16
embedding = layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim, mask_zero=True)
masked_output = embedding(padded_inputs)
print(masked_output._keras_mask)

In [None]:
masking_layer = layers.Masking()
# Simulate the embedding lookup by expanding the 2D input to 3D, with embedding dimension of 10.
unmasked_embedding = tf.cast(
    tf.tile(tf.expand_dims(padded_inputs, axis=-1), [1, 1, 10]), tf.float32)
masked_embedding = masking_layer(unmasked_embedding)
print(masked_embedding._keras_mask)

As we can see from the printed result, the mask is a 2D boolean tensor with shape `(batch_size, sequence_length)`, where each individual `False` entry indicates that the corresponding timestep should be ignored during processing.

## Mask propagation in the Functional API and Sequential API

When using the Functional API or the Sequential API, a mask generated by an `Embedding` or `Masking` layer will be propagated through the network for any layer that is capable of using them (for example, RNN layers). Keras will automatically fetch the
mask corresponding to an input and pass it to any layer that knows how to use it.

For instance, in the following Sequential model, the `LSTM` layer will automatically receive a mask, which means it will ignore padded values:

In [None]:
model = keras.Sequential(
    [layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim, mask_zero=True), layers.LSTM(32),]
)

This is also the case for the following Functional API model:

In [None]:
inputs = keras.Input(shape=(None,), dtype="int32")
x = layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim, mask_zero=True)(inputs)
outputs = layers.LSTM(32)(x)

model = keras.Model(inputs, outputs)

## Passing mask tensors directly to layers

Layers that can handle masks (such as the `LSTM` layer) have a `mask` argument in their `__call__` method.

Meanwhile, layers that produce a mask (e.g. `Embedding`) expose a `compute_mask(input, previous_mask)` method which we can call.

Thus, we can pass the output of the `compute_mask()` method of a mask-producing layer to the `__call__` method of a mask-consuming layer, like this:

In [None]:
class MyLayer(layers.Layer):
    def __init__(self, **kwargs):
        super(MyLayer, self).__init__(**kwargs)
        self.embedding = layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim, mask_zero=True)
        self.lstm = layers.LSTM(32)

    def call(self, inputs):
        x = self.embedding(inputs)
        # Note that you could also prepare a `mask` tensor manually.
        # It only needs to be a boolean tensor
        # with the right shape, i.e. (batch_size, timesteps).
        mask = self.embedding.compute_mask(inputs)
        output = self.lstm(x, mask=mask)  # The layer will ignore the masked values
        return output


layer = MyLayer()
x = np.random.random((32, 10)) * 100
x = x.astype("int32")
layer(x)

# References

- [pad_sequences method in Keras](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences)

- [Masking layer in Keras](https://keras.io/api/layers/core_layers/masking/)

- [Masking and padding with Keras](https://www.tensorflow.org/guide/keras/masking_and_padding)

- [Padding and masking sequence data Coursera](https://www.coursera.org/lecture/customising-models-tensorflow2/coding-tutorial-padding-and-masking-sequence-data-4cbXR)

- [Python keras.layers.Masking() Examples](https://www.programcreek.com/python/example/89671/keras.layers.Masking)