New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Derivative of pow(x, y) can give nan when x=0 for higher order derivatives #35011
Comments
Issue replicating in for given code, kindly find the gist of colab.Thanks! |
Thank you for the report. I'm checking in a change that defines a gradient for 0 ** 0 wrt the base. The higher-order gradient examples will still have NaNs. That's an interesting discussion we've had in the past about whether to "fix" backprop in cases like this by not propagating NaNs when multiplying by a zero derivative (so x ** 0 could "hide" NaNs from x ** -1). The conclusion so far has been that we should leave backprop dumb. |
I realize that it's preferable to avoid handling too many special cases, but it seems like some value/utility is lost here by not producing the correct higher-order gradients. The actual higher-order gradients are defined and are continuous, but those computed by TF have a discontinuity (at x = 0). It may be unlikely that an exact value of 0 will show up during typical training, but I ran into it and it sounds like it's happened before. What's the rationale on leaving backprop "dumb"? |
@rmlarsen do you have more background on why we decided not to hide NaNs in backprop? |
Reopening since I'm rolling back the fix (needs investigation). |
Was able to replicate the issue in TF v2.5 ,please find the gist here..Thanks ! |
I could reproduce the issue with TF 2.6 . Please, find the |
I was able to replicate the issue in tf-nightly 2.13.0-dev20230417. Please find the gist for reference. Thank you. |
System information
Describe the current behavior
The output of pow(x, y) is not correct for higher order derivatives when x is 0 and when y is other than a single value. For case 2 below I get:
[0 0 4]
[[-2 1 10]]
[[8 -nan 20]]
[[-18 -nan 30]]
[[24 -nan 24]]
[[0 -nan 0]]
For case 1 the nan only shows up in the 5th derivative, which is still not great, but not as bad:
[1 0 1]
[[-4 0 4]]
[[12 0 12]]
[[-24 0 24]]
[[24 24 24]]
[[-0 -nan 0]]
Describe the expected behavior
The output of this example for case 1 should be:
[1 0 1]
[[-4 0 4]]
[[12 0 12]]
[[-24 0 24]]
[[24 24 24]]
[[-0 0 0]]
The output of this example for case 2 should be:
[0 0 4]
[[-2 1 10]]
[[8 0 20]]
[[-18 0 30]]
[[24 0 24]]
[[0 0 0]]
Code to reproduce the issue
The text was updated successfully, but these errors were encountered: