-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hessian (calling tf.gradients twice) of tf.scan fails #2598
Comments
We don't support taking gradients of nonscalars, so I wouldn't expect this to work. However, this particular error message is pretty confusing. @yuanbyu: Is there a way we could improve this error message if someone tries for a Hessian in the naive way and control flow is involved? |
Yes, the error message should be better. Let me see what I can do. |
@girving Here theta is just a scalar value though. What would be the work around here? |
@dementrock: That's true in this case, but we don't want to do extra work on the control flow ops if all it provides is higher order derivatives w.r.t. scalars. What is your intended use case? |
@girving I was doing some hessian vector product computations and had the same error, but the code snippet above is simpler and highlights the issue. So is there no way to get higher order derivatives w.r.t. scan right now? |
@dementrock: We don't support higher order gradients even ignoring control flow. |
@girving No support as in no official support or it won't work at all? Seems like the following code at least compiles: import tensorflow as tf
theta = tf.Variable(initial_value=1.)
def fn(x, prev):
return prev + x * theta
result = fn(fn(1.0, 2.0), 3.0)
grad_theta = tf.gradients(result, theta)
tf.gradients(grad_theta, theta) Any plan to support it in the future? |
@dementrock The problem is that the registered gradient routines would get significantly more complicated if both sides were nonscalar, and we don't want to support that kind of complexity. As discussed in #675, it's possible one could implement registered gradients with some sort of automatic machinery to map scalar gradient routines to nonscalar gradient routines, but this is a lot of work and we don't have any plans to do it. The other problem is that the applications I know of nonscalar gradients aren't that compelling as yet, since they tend to be impractically huge. However, there are cases where higher order gradient information arises where you're differentiating a scalar, specifically Hessian-free Krylov-ish methods where one evaluates the gradient dotted with a suitably chosen vector. If that last bit is what you're trying to do, or something similar, we'd be happy to accept pull requests to make control flow not interfere. It might be pretty complicated, though. |
…while loop. Fixes tensorflow#2598 Change: 124304732
GitHub issues are for bugs / installation problems / feature requests.
For general support from the community, see StackOverflow.
To make bugs and feature requests more easy to find and organize, we close issues that are deemed
out of scope for GitHub Issues and point people to StackOverflow.
For bugs or installation issues, please provide the following information.
The more information you provide, the more easily we will be able to offer
help and advice.
Environment info
Operating System: Mac OS X 10.11.2
Installed version of CUDA and cuDNN: None
If installed from sources, provide the commit hash:
4455f81
Steps to reproduce
Run the following script:
will result in the following error:
What have you tried?
Nothing beyond creating this minimal reproducible example
Logs or other output that would be helpful
(If logs are large, please upload as attachment).
The text was updated successfully, but these errors were encountered: