Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gradient for min and max #4136

Merged
merged 2 commits into from
Oct 26, 2020
Merged

Fix gradient for min and max #4136

merged 2 commits into from
Oct 26, 2020

Conversation

tafsiri
Copy link
Contributor

@tafsiri tafsiri commented Oct 26, 2020

Closes #4135
Adds a regression test to guard against #4130 (which seems to be fixed at HEAD at the time of that issue being reported).

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.


This change is Reviewable

Copy link
Collaborator

@lina128 lina128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Yannick! Can you provide some context here, why we don't need transpose any more?

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @annxingyuan and @lina128)

@lina128 lina128 merged commit 12c4bbf into master Oct 26, 2020
@tafsiri
Copy link
Contributor Author

tafsiri commented Oct 27, 2020

Good question. There were a few things I noticed.

  • The grad function would itself try and transpose the output if permutedAxes was not null. However the gradForMinAndMax helper function would also do this as part of the helper function, effectively nullifying each other (though the outer transpose was called without permutedAxes which just seemed odd).
  • The min and max grad functions seemed to be written against the eariler definition of the kernel in the forward func. For example looking at how the code was at 1.7.4, you can see that that op would sometimes to an internal transpose before passing the input to the backend implementation, however the original input is what gets passed to the gradient. So I believe the gradient was trying to match the shape of what gets passed to backend.max internally. However in new modular kernels there is no op level transformation and thus no transpose to account for (any internal transposition done within the forward kernel are completely hidden). So the extra 'transposition handling' code becomes unnecessary.

@lina128
Copy link
Collaborator

lina128 commented Oct 27, 2020

Ah, makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

grad fails for max
2 participants