Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autograd/tf.gradients - is there support? #46

Open
qwer1304 opened this issue Feb 6, 2019 · 3 comments
Open

autograd/tf.gradients - is there support? #46

qwer1304 opened this issue Feb 6, 2019 · 3 comments

Comments

@qwer1304
Copy link

qwer1304 commented Feb 6, 2019

Hello,
I'm trying to implement in MatConvLab/autonn this network implemented here in PyTorch and here in Tenserflow. I need to define a network that uses gradients on-the-fly to calculate some other gradients and updates.
image
Note that access to 2nd order derivative (of loss) is needed at construction time, since

Loss(S; Θ) = F(S_bar; Θ) + β * G(S_breve; Θ') = F(S_bar; Θ) + β * G(S_breve; Θ - α * ∂F(S_bar; Θ)/∂Θ)
∂Loss(S; Θ)/∂Θ = ∂F(S_bar; Θ)/∂Θ + β * ∂G(S_breve; Θ')/∂Θ
G(Θ')/∂Θ = ∂G(Θ')/∂Θ' * ∂Θ'/∂Θ, and
∂Θ'/∂Θ = 1 - α * ∂²F(Θ)/∂Θ²
∂Loss(S; Θ)/∂Θ = ∂F(S_bar; Θ)/∂Θ + β * ∂G(S_breve; Θ')/∂Θ' - α*β * ∂G(S_breve; Θ')/∂Θ' * ∂²F(Θ)/∂Θ²

Calculating ∂F(S_bar; Θ)/∂Θ and ∂G(S_breve; Θ')/∂Θ' is simple: just run F(S_bar; Θ) and G(S_breve; Θ') backwards. The problem is how to obtain ∂²F(Θ)/∂Θ²?

In PyTorch, the crucial step is implemented here using autograd.grad during network assembly (before compile!), s.t. proper differentiation occurs during backpropagation. Here is a Tenserflow implementation using tf.gradients.

Can this be done in autonn and if so - how?
Thx

PS From what I understand getDer and setDer are run-time methods that provide access to numeric value of a derivative. I need to use 2nd order derivative in network construction, so access to 1st order derivative at build-time is needed.

@qwer1304 qwer1304 changed the title autograd - is there support? autograd/tf.gradients - is there support? Feb 7, 2019
@jotaf98
Copy link
Collaborator

jotaf98 commented Feb 8, 2019

Hi, I wish there was support for 2nd order derivatives, but unfortunately there isn't at the time.

Essentially, the gradients of the operations need to be written as compositions of differentiable operations too. This is already the case for all math operators, and the layers that are defined in pure Matlab like ReLUs/sigmoids/etc.

However, monolithic layers like convolution or batch-norm are made up of custom C++/CUDA code and do not enjoy the benefits of automatic differentiation for free. I think it's possible to define their 2nd (and Nth) order derivatives without writing custom CUDA code but I didn't follow up on it.

If you only care about MLPs and not CNNs that is not a concern and implementing 2nd order derivatives will be easier, but I don't have any ready-made code for it unfortunately.

@qwer1304
Copy link
Author

qwer1304 commented Feb 8, 2019

Thx for the response.
Practically, I'd like to try a network built of LSTM, FC, softmax and later Relu and maybe dropout.
How to approach getting 2nd order derivative of such a network within autonn?
Thx
PS As I understand it, currently there's NO gradient operator which for a given network would return another network which is the gradient of the first, even with some structural limitations imposed on it (e.g., no CNN, etc).

@jotaf98
Copy link
Collaborator

jotaf98 commented Feb 11, 2019

Exactly, currently there is no such operator.

The way it would work in practice is to walk through the layers in backward order (this list can be obtained by running layer.find() with no arguments and reversing the list), and call autonn_der with each layer's forward function to get the corresponding backward function.

It would then call each backward function with Layer objects as inputs, so instead of computing a gradient immediately, autonn_der would give back the list of Layers that implements its contents.

After doing this for all layers in the list you have the composition of Layer objects that expresses the gradient computation, and you can backpropagate through it -- this would be the double gradient. The process can be iterated for Nth order gradients.

This is a fun coding challenge, if you have time to do it step by step (e.g. starting with simple nets and only one or two layers, and building from there). I wish I had time to implement it myself, but I'm happy to assist if someone wants to give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants