Support RaggedTensors in Keras losses (at least MSE, CE, SCE) #45403

foxik · 2020-12-04T13:42:13Z

System information

TensorFlow version (you are using): TF 2.4.0rc3
Are you willing to contribute it (Yes/No): Yes

Describe the feature and the current behavior/state.

Currently, RaggedTensors can be passed as Keras model inputs. Soon, it will be possible to use them as targets (#45060 and #45015). Therefore, it would be great if they could also be passed to standard losses.

I therefore propose to extend the losses in tf.keras.losses to support RaggedTensors as inputs. That is currently (TF 2.4.0rc3) not possible:

tf.keras.losses.mean_squared_error(tf.ragged.constant([[1.],[2.,3.]]), tf.ragged.constant([[1.], [2., 3.]]))

fails with an exception TypeError: object of type 'RaggedTensor' has no len() because it uses

tensorflow/tensorflow/python/keras/losses.py

Line 1196 in c37b2f1

y_pred = ops.convert_to_tensor_v2_with_dispatch(y_pred)

. Similarly for others like (sparse_)categorical_crossentropy, binary_crossentropy etc.

Will this change the current api? How?

RaggedTensors will be supported as arguments of tf.keras.losses methods and classes. The change is backward compatible.

Who will benefit with this feature?

Anyone wanting to use RaggedTensors as outputs. There seem to be demand for it, see for example #44988, #44112, #43591, #43093, #42320, #41810.

Any Other info.

Note that RaggedTensors can be passed already to tf.keras.metrics, so

tf.keras.metrics.MeanSquaredError()(tf.ragged.constant([[1.],[2.,3.]]), tf.ragged.constant([[1.], [2., 4.]]))

returns correctly 1/3.

The text was updated successfully, but these errors were encountered:

pedro-r-marques · 2020-12-04T17:29:15Z

@foxik

If you are looking for a workaround, the following seems to work for me for 2D tensors shape: (batch, [var len]).

x1 = tf.ragged.constant([[1.],[2.,3.]])
x2 = tf.ragged.constant([[1.], [2., 4.]])

losses = []
for i in tf.range(x1.nrows()):
    losses.append(keras.losses.mean_squared_error(x1[i], x2[i]))
tf.reduce_mean(losses)

For 3d tensors where the ragged dimension is the sequence length and the features dimension is constant I use a user defined loss such as.

class RaggedBinaryCrossentropy(keras.losses.Loss):
    @tf.function
    def call(self, y_true, y_pred):
        losses = tf.ragged.map_flat_values(
            keras.losses.binary_crossentropy, y_true, y_pred)
        return tf.reduce_mean(losses)

foxik · 2020-12-04T18:04:48Z

@pedro-r-marques Thanks! Once your PRs go though, I plan to use something like lambda y_true, y_pred: tf.losses.SparseCategoricalCrossentropy()(y_true.values, y_pred.values) as a loss to .compile (I usually work with sequences of variable length on input, so one .values is fine for me).

But I think basic losses like MSE or CE should support RaggedTensors anyway :-)

pedro-r-marques · 2020-12-05T09:01:08Z

@foxik I agree with the desirability of adding support in the keras library itself.
Just as a note: if I understand correctly using .values would yield slightly different results than doing first the mean of the per batch losses and then the mean of that. For instance if you have very large sequences and very small sequences of values (which is where ragged tensors really helps) then this difference whether to average all the losses independent of batch or on a per batch basis could matter.

In terms of the feature request, it would be nice if keras.losses.Loss could understand 2-D RaggedTensors and perform the mean per-batch and then across batches. And also if it could understand N-D RaggedTensors where the 2nd dimension is the ragged dimension and use map_flat_values to compute the per batch losses.

foxik · 2020-12-05T12:47:16Z

@pedro-r-marques Personally I would prefer the other behaviour, i.e., average across all "words" (assuming the batch examples are sequences of words). If you consider that the gradients in the network are caused "by every word" (because the individual losses are some cross entropies of every word), then to estimate correct second and first moments, I believe you should not average across sentences first.

If you look at it from the metrics point of view, in NLP you usually want to compute accuracy "per word", not average of "sentence-level" accuracies (BTW, this seems to be exactly what the current implementation of tf.metrics does); and I would expect the same from the losses.

foxik · 2021-04-04T21:34:00Z

Thanks @pedro-r-marques for implementing those:

This covers all the losses I mentioned in the original PR.

Do you think @pedro-r-marques that we should keep this open until all tf.keras.losses support ragged tensors, or should we close it now that all frequently used ones do? For the record, what is missing are hinge, squared_hinge, categorical_hinge, log_cosh, kl_divergence, poisson, cosine_similarity (note that huber seems to work without explicit support, but I am not sure if it works correctly). Also for the record, I do not miss any of the not implemented ones.

pedro-r-marques · 2021-04-05T13:30:03Z

@foxik I believe it would make sense to close this issue. Other, less common metrics, can be added on demand when the need arises.

foxik · 2021-04-05T22:44:19Z

@pedro-r-marques Thanks, closing.

foxik added the type:feature Feature requests label Dec 4, 2020

google-ml-butler bot assigned amahendrakar Dec 4, 2020

amahendrakar added the comp:keras Keras related issues label Dec 7, 2020

amahendrakar assigned jvishnuvardhan and unassigned amahendrakar Dec 7, 2020

jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Dec 9, 2020

pedro-r-marques mentioned this issue Jan 8, 2021

Implement support for ragged tensors in mean_squared_error loss function. #46283

Merged

pedro-r-marques mentioned this issue Feb 4, 2021

Add support for RaggedTensors to mean_absolute_percentage_error. #46876

Merged

foxik closed this as completed Apr 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support RaggedTensors in Keras losses (at least MSE, CE, SCE) #45403

Support RaggedTensors in Keras losses (at least MSE, CE, SCE) #45403

foxik commented Dec 4, 2020

pedro-r-marques commented Dec 4, 2020

foxik commented Dec 4, 2020

pedro-r-marques commented Dec 5, 2020

foxik commented Dec 5, 2020

foxik commented Apr 4, 2021

pedro-r-marques commented Apr 5, 2021

foxik commented Apr 5, 2021

Support RaggedTensors in Keras losses (at least MSE, CE, SCE) #45403

Support RaggedTensors in Keras losses (at least MSE, CE, SCE) #45403

Comments

foxik commented Dec 4, 2020

pedro-r-marques commented Dec 4, 2020

foxik commented Dec 4, 2020

pedro-r-marques commented Dec 5, 2020

foxik commented Dec 5, 2020

foxik commented Apr 4, 2021

pedro-r-marques commented Apr 5, 2021

foxik commented Apr 5, 2021