-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Hessian for sparse softmax cross entropy #31700
Implement Hessian for sparse softmax cross entropy #31700
Conversation
PiperOrigin-RevId: 260802377
Can one of the admins verify this patch? |
So this was actually Alex's suggestion originally to use @tf.custom_gradient so we don't run into the need to check for zeros. It's a bit complicated to do at the moment; I have a CL out for review which will hopefully make defining second-order gradients with custom_gradient easier. But something like this passes the hessian the test you've written:
That'd be a helper called instead of gen_nn_ops.sparse_softmax_cross_entropy_with_logits. I think we can do something similar for non-sparse softmax_cross_entropy_with_logits if we split up its labels and logits into separate custom_gradients (since there we need gradients for labels too). The trick for second-order gradients with nested Any objections to this approach rather than the Tensor-tagging approach? Happy to clean the example up or explain further if it's helpful. |
@michaelkonobeev gentle ping to check latest comments , thank you |
I will work on it in soon. |
@allenlavoie have the CL which makes defining second order gradients with custom_gradient easier been submitted? If no, I could implement it as in the example you provided. Also are there any ideas why the previous PR broke the convergence of NCF keras model with run_eagerly=True? Maybe I could add a test for it? The Hessian computation itself seems correct after I checked it multiple times. |
Yes, see the documentation for "primals" in custom_gradient now:
https://github.com/tensorflow/tensorflow/blob/27d8d592ce2ca9be52690de734c799eb8731cc5d/tensorflow/python/ops/custom_gradient.py#L86
…On Mon, Sep 30, 2019 at 9:26 PM MichaelKonobeev ***@***.***> wrote:
@allenlavoie <https://github.com/allenlavoie> have the CL which makes
defining second order gradients with custom_gradient easier been submitted?
If no, I could implement it as in the example you provided. Also are there
any ideas why the previous PR broke the convergence of NCF keras model with
run_eagerly=True? Maybe I could add a test for it? The Hessian computation
itself seems correct after I checked it multiple times.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#31700?email_source=notifications&email_token=AAABHRPPDTJP3CKPVUV6LH3QMLGQ3A5CNFSM4IMLTVRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD774KRY#issuecomment-536855879>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAABHRMSPHRSRFGWQYJPRUDQMLGQ3ANCNFSM4IMLTVRA>
.
--
- Alex
|
@allenlavoie in the example you provided why do we call
If multiplication by |
You're right, it needs to be outside the inner custom_gradient. And I need to fix that for |
A second component that will stop us from being able to solve this problem using tf.custom_gradient is the fact that custom_gradient is not serialized with the graph, so serializing a savedmodel containing a cross entropy loss using tf.cusotm_gradient, when you deserialize you can get either errors or incorrect behavior if taking gradients of the resulting tensor. So there are two major issues here. |
I don't think the SavedModel issue needs to block this. We will need a workaround, possibly some indication that this custom_gradient is safe to ignore (meaning we'll still only be able to take first-order gradients from SavedModels). |
custom_gradient with primals set should implement vector-Jacobian product, right? Assuming this works in the general case, the downside is that it will prevent some optimizations like the one done in cross entropy gradient function which computes only the product between dependent parts of gradient and Jacobian. I could implement it using the variant with a wrapper function to keep this optimization. @allenlavoie could you point me to a way of ignoring custom_gradient for serialization or maybe there are other workarounds? |
Yeah the primals= argument has been removed, I don't see a way to support that API (and it was never in a stable release). I've updated the example in the custom_gradient docstring with your suggested fix. The easiest way to ignore these custom_gradients in SavedModels is probably to set an attribute. I'd make an internal-only version with the extra argument, then add an attribute to this node. Then you can delete the _gradient_op_type attribute here. |
@michaelkonobeev Could you please check reviewer comments and keep us posted. Thanks! |
I implemented Hessian computation through tf.custom_gradient but still need to implement the workaround for SavedModels. Hope to work on this next week. |
I noticed that when
This will lead to LookupError from here. If the tape is not persistent or |
@michaelkonobeev can you please resolve conflicts ? |
@michaelkonobeev Can you please resolve conflicts? Thanks! |
PiperOrigin-RevId: 294219774 Change-Id: I6a5324599b192a080fc78ce715b28107fabbc236
This reapplies #22231 with implementation of
_IsZero
in eager mode.