Skip to content

Loss calculated incorrectly in networks_seq2seq_nmt.ipynb #2637

@martingoodson

Description

@martingoodson

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): N/A
  • TensorFlow version and how it was installed (source or binary): N/A
  • TensorFlow-Addons version and how it was installed (source or binary): N/A
  • Python version: N/A
  • Is GPU used? (yes/no): N/A

Describe the bug
This bug is in https://colab.research.google.com/github/tensorflow/addons/blob/master/docs/tutorials/networks_seq2seq_nmt.ipynb

The loss function is not calculated properly. The mean should only be calculated over non-masked elements. This line should be replaced:

loss = tf.reduce_mean(loss)

with this:

loss = tf.math.reduce_sum(loss) / tf.math.reduce_sum(mask)

This now gives the same results as keras.metrics.SparseCategoricalCrossentropy(from_logits=True), as expected.

def loss_function(real, pred):
  # real shape = (BATCH_SIZE, max_length_output)
  # pred shape = (BATCH_SIZE, max_length_output, tar_vocab_size )
  cross_entropy = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
  loss = cross_entropy(y_true=real, y_pred=pred)
  mask = tf.logical_not(tf.math.equal(real,0))   #output 0 for y=0 else output 1
  mask = tf.cast(mask, dtype=loss.dtype)  
  loss = mask* loss
  loss = tf.reduce_mean(loss)
  return loss  

Code to reproduce the issue

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions