### Masking

Making the model ignore padding tokens is trivial using Keras: simply add
`mask_zero=True` when creating the Embedding layer. This means that
padding tokens (whose ID is 0) will be ignored by all downstream layers.

Next, if the layer’s supports_masking attribute is True, then the mask is
automatically propagated to the next layer. It keeps propagating this way for
as long as the layers have `supports_masking=True`. As an example, a
recurrent layer’s supports_mask⁠ ing attribute is True when
`return_sequences=True`, but it’s False when `return_sequen⁠ces=False` since
there’s no need for a mask anymore in this case. So if you have a model with
several recurrent layers with `return_sequences=True`, followed by a recurrent
layer with `return_sequences=False`, then the mask will automatically
propagate up to the last recurrent layer: that layer will use the mask to ignore
masked steps, but it will not propagate the mask any further. Similarly, if you
set `mask_zero=True` when creating the Embedding layer in the sentiment
analysis model we just built, then the GRU layer will receive and use the
mask automatically, but it will not propagate it any further, since
return_sequences is not set to True.

#### **TIP**

Some layers need to update the mask before propagating it to the next layer: they do so by
implementing the `compute_mask()` method, which takes two arguments: the inputs and the
previous mask. It then computes the updated mask and returns it. The default
implementation of `compute_mask()` just returns the previous mask unchanged.

In [1]:
from IPython.display import display, Markdown

code = """
embed_size = 128
tf.random.set_seed(42)
model = tf.keras.Sequential([
    text_vec_layer,
    tf.keras.layers.Embedding(vocab_size, embed_size, mask_zero=True),
    tf.keras.layers.GRU(128),
    tf.keras.layers.Dense(1, activation="sigmoid")
])
model.compile(loss="binary_crossentropy", optimizer="nadam", metrics=["accuracy"])
history = model.fit(train_set, validation_data=valid_set, epochs=5)"""
display(Markdown("```python\n{}\n".format(code)))

```python

embed_size = 128
tf.random.set_seed(42)
model = tf.keras.Sequential([
    text_vec_layer,
    tf.keras.layers.Embedding(vocab_size, embed_size, mask_zero=True),
    tf.keras.layers.GRU(128),
    tf.keras.layers.Dense(1, activation="sigmoid")
])
model.compile(loss="binary_crossentropy", optimizer="nadam", metrics=["accuracy"])
history = model.fit(train_set, validation_data=valid_set, epochs=5)


Using masking layers and automatic mask propagation works best for simple
models. It will not always work for more complex models, such as when you
need to mix Conv1D layers with recurrent layers. In such cases, you will
need to *explicitly compute the mask and pass it* to the appropriate layers,
using either the functional API or the subclassing API. For example, the
following model is equivalent to the previous model, except it is built using
the functional API and handles masking manually. It also adds a bit of
dropout since the previous model was overfitting slightly:

In [3]:
code = """
inputs = tf.keras.layers.Input(shape=[], dtype=tf.string)
token_ids = text_vec_layer(inputs)
mask = tf.math.not_equal(token_ids, 0)
Z = tf.keras.layers.Embedding(vocab_size, embed_size)(token_ids)
Z = tf.keras.layers.GRU(128, dropout=0.2)(Z, mask=mask)
outputs = tf.keras.layers.Dense(1, activation="sigmoid")(Z)
model = tf.keras.models.Model(inputs=[inputs], outputs=[outputs])
"""
display(Markdown("```python\n{}\n".format(code)))

```python

inputs = tf.keras.layers.Input(shape=[], dtype=tf.string)
token_ids = text_vec_layer(inputs)
mask = tf.math.not_equal(token_ids, 0)
Z = tf.keras.layers.Embedding(vocab_size, embed_size)(token_ids)
Z = tf.keras.layers.GRU(128, dropout=0.2)(Z, mask=mask)
outputs = tf.keras.layers.Dense(1, activation="sigmoid")(Z)
model = tf.keras.models.Model(inputs=[inputs], outputs=[outputs])

