Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditionally trainable variables and stochastic depth neural networks #8817

Closed
awav opened this issue Mar 29, 2017 · 7 comments
Closed

Conditionally trainable variables and stochastic depth neural networks #8817

awav opened this issue Mar 29, 2017 · 7 comments
Labels
type:feature Feature requests

Comments

@awav
Copy link

awav commented Mar 29, 2017

I came across with a task where I would like to apply stochastic depth regularization technique using Tensorflow (https://arxiv.org/pdf/1603.09382.pdf). Tensorflow doesn't provide enough settings to implement this one. I found closed issue #1784 which is similar to this request, where guys finished the discussion with claim that [ tf.cond | tf.select ] primitives are enough for this task. But if you carefully read the paper it says that during training the depth changes for both directions: forward and backward propagation steps. Therefore number of tranable W parameters of the network changes too. The core conception of the Tensorflow is building computation graph before session of training is run. Currently, I can not create dynamic computation graph, so that depending on a boolean value W parameters of a layer were not engaged in optimisation process.

If tf.Variable accepted trainable parameter as a boolean tensor apart from built-in boolean value it would solve the problem. In this case, it would mean that Tensorflow operates natively with dynamic computational graphs, which in fact very powerful tool.

I would appreciate any suggestions and ideas, so that this question was closed for good and all.

@vrv, @martinwicke, @aselle

@awav awav changed the title Conditional training variables and stochastic depth neural networks Conditionally trainable variables and stochastic depth neural networks Mar 30, 2017
@aselle
Copy link
Contributor

aselle commented Mar 31, 2017

I am adding this to our list of models that we would like to make easier in TensorFlow. I don't have any personal knowledge of the paper, but as far as your comment about having a trainable flag. It seems like you could multiply by a vector of 0's or 1' to mask the variable dynamically to achieve the same effect. Let me know if that would be sufficient? Thanks!

@aselle aselle added stat:awaiting response Status - Awaiting response from author type:feature Feature requests labels Mar 31, 2017
@awav
Copy link
Author

awav commented Apr 4, 2017

Thank you @aselle,
I implemented cancelling in a residual block using multiplication by zero and tf.stop_gradient as well, and it only prevents the contribution of this particular gradient of residual block to the parent layers. But as far as I understood the parameters which are marked as trainable will be always updated regardless of procedures mentioned above.
Just clarify the idea a bit, let's say we have a graph a -> b -> c, and a, b, c trainable layers (tensors) are included in the list of tf.graph.TRAINABLE_VARIABLES. Some layers may disappear randomly from trainable list during training according to stochastic depth nets, but Tensorflow does not allow it in any case now.

@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label Apr 4, 2017
@aselle
Copy link
Contributor

aselle commented Apr 4, 2017

I see. I think you could probably implement this using a custom optimizer that controls the update vector and disables it using knowledge of variables and their position in layers. This may not be easy, but it may be possible.

@jart
Copy link
Contributor

jart commented Apr 14, 2017

Closing due to inactivity. I'll reopen this issue if @awav indicates the previous suggestion was not sufficient.

Note: We have an on-call rotation for triaging issue. When filing issues, please let us take care of tagging team members for you.

@jart jart closed this as completed Apr 14, 2017
@samjabrahams
Copy link
Contributor

@awav - correct me if I'm missing something, but is the goal to simply not update Variables that aren't used due to a conditional? TensorFlow already zeros out these gradients. Here's some sample code:

import tensorflow as tf

tf.reset_default_graph()

a = tf.Variable(10.0)
b = tf.Variable(10.0)
switch = tf.placeholder(tf.bool)
res = tf.cond(switch, lambda: tf.mul(2.0, a), lambda: tf.square(b))
opt = tf.train.GradientDescentOptimizer(0.05)
grads = opt.compute_gradients(res)
train = opt.apply_gradients(grads)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    print(sess.run(grads, {switch: True}))

If you adjust the {switch: True} dict to be set to false, you'll see that the gradient values will flip depending on which path is taken.

When {switch: True}:

[(2.0, 10.0), (0.0, 10.0)]

When {switch: False}:

[(0.0, 10.0), (20.0, 10.0)]

For completeness, if you apply the gradients with different switches set, you only update one or the other:

{switch: True}:

with tf.Session() as sess:
    sess.run(init)
    sess.run(train, {switch: True})
    print(sess.run([a, b]))

>>> [9.8999996, 10.0]

{switch: False}:

with tf.Session() as sess:
    sess.run(init)
    sess.run(train, {switch: False})
    print(sess.run([a, b]))

>>> [10.0, 9.0]

I think the most likely problem that might occur when trying to implement stochastic depth is that you may not see the reduced computation due to the less-lazy way tf.cond executes (see the last paragraph of the documentation before the "args" section).

@stanislavfort
Copy link

The conditional statement does not seem to cut it for me. In my case, I have a model of the form data -> encoder -> intermediate result -> decoder -> result. I would like to be able to set the variables in the decoder and encoder as trainable during training by passing a boolean tensor. Is it possible to do it using 'tf.cond'? When I pass a boolean tensor as tf.get_variable(...,trainable = boolTensor) I get a TypeError.

@shengcheng
Copy link

shengcheng commented Mar 11, 2018

I have same problem. In my case, I have the model input -> features -> decode1 -> loss1
|-------> decode2 -> loss2.
Loss1 and Loss2 are different loss function. When I minimize the loss1, I want to fix the weights in decode2. Also, when I minimize the loss2, I want to fix the weights in features and decode1. In training process, I need to iterative training two losses. I need the trainable flag to determine which part to train and which loss is needed to minimize. One of the solution online is to save and restore the weights every time. But it is not efficient at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature Feature requests
Projects
None yet
Development

No branches or pull requests

6 participants