Add support to cuDNN CTC loss #32302

kaixih · 2019-09-06T23:05:54Z

This PR supports CUDNN CTC Loss as the backend of ctc_loss_v2()
Users need to use the environment variable TF_CUDNN_CTC_LOSS=1
Why can we make it default (not using TF_CUDNN_CTC_LOSS)?
- ctc_loss_v2 supports a variable blank index but will transpose it to the last index before calling the actual implementation. However, CUDNN implementation only supports the 0th blank index. This indicates ctc_loss_v2 has to select the correct implementation based on platform, which cannot be done inside operation definition.
What is the logic in the new ctc_loss_v2()?
- _ctc_loss_impl() will call the actual implementation based on given use_cudnn parameter
  - If true, call the CUDNN implementation
  - If false, call the original implementation
- ctc_loss_v2() will transpose the blank index to 0 if TF_CUDNN_CTC_LOSS=1 and then call _ctc_loss_impl(use_cudnn=true)
- ctc_loss_v2() will transpose the blank index to the last index if TF_CUDNN_CTC_LOSS=0 and then call _ctc_loss_impl(use_cudnn=false)

fyi @nluehr

aaroey · 2019-09-12T20:41:02Z

@pkanwar23 would you please help to find someone to review this?

aaroey · 2019-09-23T17:48:19Z

@chsigg could you help to take a look at this? Thanks

rmlarsen · 2019-11-06T19:27:43Z

tensorflow/core/kernels/ctc_loss_op.cc

+                 std::vector<int> *labels_lengths) {
+  const T* h_in = labels_indices->flat<T>().data();
+  for(int i = 0; i < num_indices; i++) {
+    T key = h_in[i * 2];


rmlarsen · 2019-11-06T19:31:51Z

tensorflow/core/kernels/ctc_loss_op.cc

+// takes the ownership of the underlying memory. The expectation is that the
+// memory should be alive for the span of the cudnnCTCLoss itself.
+template <typename T>
+class CudnnCtcLossAllocatorInTemp : public ScratchAllocator {


This code is identical to e.g. CudnnBatchNormAllocatorInTemp in fused_batch_norm_op.cc. Can you consolidate them to a single location instead of duplicating code, please?

rmlarsen · 2019-11-06T19:41:23Z

tensorflow/core/ops/ctc_ops.cc

@@ -62,6 +62,43 @@ REGISTER_OP("CTCLoss")
      return Status::OK();
    });

+REGISTER_OP("CTCLossV2")
+    .Input("inputs: float")


Does the CuDNN implementation support types other than float? If so, we should also support them here.

#31164 added support for double for CTCLossOp, for example.

No, CuDNN only support the float CTCLoss.

sanjoy · 2019-11-06T19:55:10Z

Adding Tim Shen to review the stream executor bits.

kaixih · 2020-01-09T22:32:19Z

@sanjoy @alextp , I have replaced the previous environment variable with the implementation selector (Thx @qlzh727 for helping me out with some test cases.) Now, we don't need the env var to control if cuDNN is used or not. The runtime can automatically determine that if GPU is available or not.

I added another python function to contain this new implement (ie. ctc_loss_v3), which is only available in TF2.

Please help me find the reviewers to review this part. Thx.

alextp

Looks good! Just a couple of minor nits and we can approve.

Thanks!

alextp · 2020-01-09T22:34:31Z

tensorflow/python/ops/ctc_ops.py

@@ -42,6 +46,28 @@
 from tensorflow.python.util import nest
 from tensorflow.python.util.tf_export import tf_export

+import os


The linter will complain; standard python imports need to be above all others

Sure. Removed this unused import.

alextp · 2020-01-09T22:36:24Z

tensorflow/core/api_def/base_api/api_def_CTCLossV2.pbtxt

@@ -0,0 +1,71 @@
+op {
+  graph_op_name: "CTCLossV2"


Can you add a line here saying "visibility: HIDDEN"; this will prevent the generation of a tf.ctc_loss_v2 API

Sure. Done.

alextp · 2020-01-09T22:45:39Z

tensorflow/tools/api/golden/v2/tensorflow.pbtxt

@@ -576,6 +576,10 @@ tf_module {
    name: "cosh"


Now you can revert this file to make the API tests pass again

alextp · 2020-01-09T22:45:44Z

tensorflow/tools/api/golden/v1/tensorflow.pbtxt

@@ -1068,6 +1068,10 @@ tf_module {
    name: "cross"


Now you can revert this file to make the API tests pass again

Done. PTAL.

kaixih · 2020-01-17T01:17:08Z

Anything else I can do? Thx.

pkanwar23 · 2020-01-17T01:26:13Z

Thanks for checking. We should be good. I'm looking at why it hasn't merged.

sanjoy · 2020-01-17T01:27:00Z

Thanks for checking. We should be good. I'm looking at why it hasn't merged.

It was waiting for an approval from me for some reason. Should be good to go now.

kaixih · 2020-01-17T18:27:30Z

Thx for the update.

PiperOrigin-RevId: 290387603 Change-Id: I28491f42a4559a9f79bd6a7b73d8e6b670f55368

Add changes to support cuDNN CTC loss

a98e8ca

tensorflow-bot bot added the size:L CL Change Size: Large label Sep 6, 2019

googlebot added the cla: yes label Sep 6, 2019

kaixih requested a review from aaroey September 6, 2019 23:06

gbaned self-assigned this Sep 9, 2019

gbaned added this to Assigned Reviewer in PR Queue via automation Sep 9, 2019

gbaned added the awaiting review Pull request awaiting review label Sep 9, 2019

kaixih added 3 commits September 9, 2019 11:30

CPU CTC tests without V2 and update goldens

5a07e2c

Added pbtxt for ctc loss v2

6ff2298

Changed some positions of macros for cuDNN CTC loss

c3e8f6f

tensorflowbutler removed the awaiting review Pull request awaiting review label Sep 13, 2019

Switch to non-deterministic algo which allow larger label size

1ab863f

gbaned added the awaiting review Pull request awaiting review label Sep 20, 2019

aaroey requested a review from chsigg September 23, 2019 17:47

aaroey removed their request for review September 23, 2019 17:48

aaroey requested review from aaroey and removed request for chsigg October 9, 2019 21:00

tensorflowbutler removed the awaiting review Pull request awaiting review label Oct 10, 2019

gbaned added the awaiting review Pull request awaiting review label Oct 11, 2019

aaroey removed their request for review October 25, 2019 16:54

rmlarsen self-requested a review November 1, 2019 17:27

rmlarsen self-assigned this Nov 1, 2019

rmlarsen added the API review API Review label Nov 6, 2019

rmlarsen reviewed Nov 6, 2019

View reviewed changes

rmlarsen requested review from ebrevdo and sanjoy November 6, 2019 19:50

sanjoy requested a review from timshen91 November 6, 2019 19:54

Add impl selector to remove the env var about the CUDNN CTC Loss

7ee06aa

remove unused import

f9e38a4

alextp suggested changes Jan 9, 2020

View reviewed changes

Set CTCLossV2 to visibility:HIDDEN

bb87219

alextp previously approved these changes Jan 9, 2020

View reviewed changes

tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jan 9, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Jan 9, 2020

update goldens

195729d

kaixih dismissed alextp’s stale review via 195729d January 10, 2020 02:30

alextp approved these changes Jan 10, 2020

View reviewed changes

tensorflow-bot bot added the kokoro:force-run Tests on submitted change label Jan 10, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Jan 10, 2020

tensorflowbutler removed the awaiting review Pull request awaiting review label Jan 11, 2020

gbaned added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Jan 13, 2020

sanjoy approved these changes Jan 17, 2020

View reviewed changes

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Jan 17, 2020

tensorflow-bot bot added the kokoro:force-run Tests on submitted change label Jan 17, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Jan 17, 2020

tensorflow-copybara pushed a commit that referenced this pull request Jan 18, 2020

Merge pull request #32302 from houtoms:pr_cudnn_ctc_loss

bd4c38b

PiperOrigin-RevId: 290387603 Change-Id: I28491f42a4559a9f79bd6a7b73d8e6b670f55368

tensorflow-copybara merged commit 195729d into tensorflow:master Jan 18, 2020

PR Queue automation moved this from Approved by Reviewer to Merged Jan 18, 2020

This was referenced Apr 14, 2021

Incorrect gradient for ctc_loss on GPU when using logit_length #41280

Closed

For long labels / logits, torch.nn.CTCloss is 30-100x faster than tf.nn.ctc_loss #32335

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to cuDNN CTC loss #32302

Add support to cuDNN CTC loss #32302

kaixih commented Sep 6, 2019 •

edited

aaroey commented Sep 12, 2019

aaroey commented Sep 23, 2019

rmlarsen Nov 6, 2019

rmlarsen Nov 6, 2019

rmlarsen Nov 6, 2019

kaixih Nov 6, 2019

sanjoy commented Nov 6, 2019

kaixih commented Jan 9, 2020

alextp left a comment

alextp Jan 9, 2020

kaixih Jan 9, 2020

alextp Jan 9, 2020

kaixih Jan 9, 2020

alextp Jan 9, 2020

alextp Jan 9, 2020

kaixih Jan 10, 2020

kaixih commented Jan 17, 2020

pkanwar23 commented Jan 17, 2020

sanjoy commented Jan 17, 2020

kaixih commented Jan 17, 2020

Add support to cuDNN CTC loss #32302

Add support to cuDNN CTC loss #32302

Conversation

kaixih commented Sep 6, 2019 • edited

aaroey commented Sep 12, 2019

aaroey commented Sep 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanjoy commented Nov 6, 2019

kaixih commented Jan 9, 2020

alextp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaixih commented Jan 17, 2020

pkanwar23 commented Jan 17, 2020

sanjoy commented Jan 17, 2020

kaixih commented Jan 17, 2020

kaixih commented Sep 6, 2019 •

edited