Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support RaggedTensors in Keras losses (at least MSE, CE, SCE) #45403

Closed
foxik opened this issue Dec 4, 2020 · 7 comments
Closed

Support RaggedTensors in Keras losses (at least MSE, CE, SCE) #45403

foxik opened this issue Dec 4, 2020 · 7 comments
Assignees
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:feature Feature requests

Comments

@foxik
Copy link
Contributor

foxik commented Dec 4, 2020

System information

  • TensorFlow version (you are using): TF 2.4.0rc3
  • Are you willing to contribute it (Yes/No): Yes

Describe the feature and the current behavior/state.

Currently, RaggedTensors can be passed as Keras model inputs. Soon, it will be possible to use them as targets (#45060 and #45015). Therefore, it would be great if they could also be passed to standard losses.

I therefore propose to extend the losses in tf.keras.losses to support RaggedTensors as inputs. That is currently (TF 2.4.0rc3) not possible:

tf.keras.losses.mean_squared_error(tf.ragged.constant([[1.],[2.,3.]]), tf.ragged.constant([[1.], [2., 3.]]))

fails with an exception TypeError: object of type 'RaggedTensor' has no len() because it uses

y_pred = ops.convert_to_tensor_v2_with_dispatch(y_pred)
. Similarly for others like (sparse_)categorical_crossentropy, binary_crossentropy etc.

Will this change the current api? How?

RaggedTensors will be supported as arguments of tf.keras.losses methods and classes. The change is backward compatible.

Who will benefit with this feature?

Anyone wanting to use RaggedTensors as outputs. There seem to be demand for it, see for example #44988, #44112, #43591, #43093, #42320, #41810.

Any Other info.

Note that RaggedTensors can be passed already to tf.keras.metrics, so

tf.keras.metrics.MeanSquaredError()(tf.ragged.constant([[1.],[2.,3.]]), tf.ragged.constant([[1.], [2., 4.]]))

returns correctly 1/3.

@pedro-r-marques
Copy link
Contributor

@foxik

If you are looking for a workaround, the following seems to work for me for 2D tensors shape: (batch, [var len]).

x1 = tf.ragged.constant([[1.],[2.,3.]])
x2 = tf.ragged.constant([[1.], [2., 4.]])

losses = []
for i in tf.range(x1.nrows()):
    losses.append(keras.losses.mean_squared_error(x1[i], x2[i]))
tf.reduce_mean(losses)

For 3d tensors where the ragged dimension is the sequence length and the features dimension is constant I use a user defined loss such as.

class RaggedBinaryCrossentropy(keras.losses.Loss):
    @tf.function
    def call(self, y_true, y_pred):
        losses = tf.ragged.map_flat_values(
            keras.losses.binary_crossentropy, y_true, y_pred)
        return tf.reduce_mean(losses)

@foxik
Copy link
Contributor Author

foxik commented Dec 4, 2020

@pedro-r-marques Thanks! Once your PRs go though, I plan to use something like lambda y_true, y_pred: tf.losses.SparseCategoricalCrossentropy()(y_true.values, y_pred.values) as a loss to .compile (I usually work with sequences of variable length on input, so one .values is fine for me).

But I think basic losses like MSE or CE should support RaggedTensors anyway :-)

@pedro-r-marques
Copy link
Contributor

@foxik I agree with the desirability of adding support in the keras library itself.
Just as a note: if I understand correctly using .values would yield slightly different results than doing first the mean of the per batch losses and then the mean of that. For instance if you have very large sequences and very small sequences of values (which is where ragged tensors really helps) then this difference whether to average all the losses independent of batch or on a per batch basis could matter.

In terms of the feature request, it would be nice if keras.losses.Loss could understand 2-D RaggedTensors and perform the mean per-batch and then across batches. And also if it could understand N-D RaggedTensors where the 2nd dimension is the ragged dimension and use map_flat_values to compute the per batch losses.

@foxik
Copy link
Contributor Author

foxik commented Dec 5, 2020

@pedro-r-marques Personally I would prefer the other behaviour, i.e., average across all "words" (assuming the batch examples are sequences of words). If you consider that the gradients in the network are caused "by every word" (because the individual losses are some cross entropies of every word), then to estimate correct second and first moments, I believe you should not average across sentences first.

If you look at it from the metrics point of view, in NLP you usually want to compute accuracy "per word", not average of "sentence-level" accuracies (BTW, this seems to be exactly what the current implementation of tf.metrics does); and I would expect the same from the losses.

@foxik
Copy link
Contributor Author

foxik commented Apr 4, 2021

Thanks @pedro-r-marques for implementing those:

This covers all the losses I mentioned in the original PR.

Do you think @pedro-r-marques that we should keep this open until all tf.keras.losses support ragged tensors, or should we close it now that all frequently used ones do? For the record, what is missing are hinge, squared_hinge, categorical_hinge, log_cosh, kl_divergence, poisson, cosine_similarity (note that huber seems to work without explicit support, but I am not sure if it works correctly). Also for the record, I do not miss any of the not implemented ones.

@pedro-r-marques
Copy link
Contributor

@foxik I believe it would make sense to close this issue. Other, less common metrics, can be added on demand when the need arises.

@foxik
Copy link
Contributor Author

foxik commented Apr 5, 2021

@pedro-r-marques Thanks, closing.

@foxik foxik closed this as completed Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:feature Feature requests
Projects
None yet
Development

No branches or pull requests

4 participants