Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an issue when fusing BatchNorm in backprop #48893

Merged

Conversation

kaixih
Copy link
Contributor

@kaixih kaixih commented May 3, 2021

Previous PR #48063 might hit CUDNN_STATUS_NOT_SUPPORTED error if the Add node has both inputs to be the BatchNorm nodes, in which case we will mistakenly mark both BatchNormGrad as valid for fusion. Though it is not possible that both BatchNormGrad in the backward pass will be fused, there is chance that the forward pass and backward pass select different branch to fuse, causing the unexpected reserve space sizes.

This PR fixes this issue by checking if the BatchNormGrad's directly connected forward BatchNorm matches the to-be-fused forward BatchNorm. In addition, the unit tests are updated to test the double BN inputs of add op.

cc. @nluehr

@google-ml-butler google-ml-butler bot added the size:S CL Change Size: Small label May 3, 2021
@google-cla google-cla bot added the cla: yes label May 3, 2021
@kaixih kaixih force-pushed the bn_act_grad_double_bn_upstream branch 2 times, most recently from 71c16ee to 289dd53 Compare May 4, 2021 00:57
@kaixih kaixih requested review from sanjoy and ezhulenev May 4, 2021 03:18
@gbaned gbaned self-assigned this May 4, 2021
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation May 4, 2021
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels May 4, 2021
PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer May 4, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label May 4, 2021
@gbaned gbaned added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels May 5, 2021
@sanjoy sanjoy removed the request for review from ezhulenev May 5, 2021 23:22
@copybara-service copybara-service bot merged commit 97e025a into tensorflow:master May 6, 2021
PR Queue automation moved this from Approved by Reviewer to Merged May 6, 2021
copybara-service bot pushed a commit that referenced this pull request May 6, 2021
Still hits "unsupported cudnn config" error

PiperOrigin-RevId: 372364589
Change-Id: Ice2e722179b46090d28a26fb1d3346bfcc131ebf
kaixih referenced this pull request May 14, 2021
…ctivation

Still hits "unsupported cudnn" error

To repro, run (from tensorflow_models repository):

```
//tensorflow_models/official/benchmark:keras_imagenet_benchmark
 -- --logtostderr --benchmarks=Resnet50KerasBenchmarkSynth.benchmark_xla_1_gpu_fp16\$
```

PiperOrigin-RevId: 372380247
Change-Id: Iae61fa8095b0c60716cfdf147544d8f7b3e3f493
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes ready to pull PR ready for merge process size:S CL Change Size: Small
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

4 participants