Conditionally trainable variables and stochastic depth neural networks #8817

awav · 2017-03-29T19:43:24Z

I came across with a task where I would like to apply stochastic depth regularization technique using Tensorflow (https://arxiv.org/pdf/1603.09382.pdf). Tensorflow doesn't provide enough settings to implement this one. I found closed issue #1784 which is similar to this request, where guys finished the discussion with claim that [ tf.cond | tf.select ] primitives are enough for this task. But if you carefully read the paper it says that during training the depth changes for both directions: forward and backward propagation steps. Therefore number of tranable W parameters of the network changes too. The core conception of the Tensorflow is building computation graph before session of training is run. Currently, I can not create dynamic computation graph, so that depending on a boolean value W parameters of a layer were not engaged in optimisation process.

If tf.Variable accepted trainable parameter as a boolean tensor apart from built-in boolean value it would solve the problem. In this case, it would mean that Tensorflow operates natively with dynamic computational graphs, which in fact very powerful tool.

I would appreciate any suggestions and ideas, so that this question was closed for good and all.

@vrv, @martinwicke, @aselle

The text was updated successfully, but these errors were encountered:

aselle · 2017-03-31T18:16:00Z

I am adding this to our list of models that we would like to make easier in TensorFlow. I don't have any personal knowledge of the paper, but as far as your comment about having a trainable flag. It seems like you could multiply by a vector of 0's or 1' to mask the variable dynamically to achieve the same effect. Let me know if that would be sufficient? Thanks!

awav · 2017-04-04T19:59:26Z

Thank you @aselle,
I implemented cancelling in a residual block using multiplication by zero and tf.stop_gradient as well, and it only prevents the contribution of this particular gradient of residual block to the parent layers. But as far as I understood the parameters which are marked as trainable will be always updated regardless of procedures mentioned above.
Just clarify the idea a bit, let's say we have a graph a -> b -> c, and a, b, c trainable layers (tensors) are included in the list of tf.graph.TRAINABLE_VARIABLES. Some layers may disappear randomly from trainable list during training according to stochastic depth nets, but Tensorflow does not allow it in any case now.

aselle · 2017-04-04T22:17:33Z

I see. I think you could probably implement this using a custom optimizer that controls the update vector and disables it using knowledge of variables and their position in layers. This may not be easy, but it may be possible.

jart · 2017-04-14T00:07:56Z

Closing due to inactivity. I'll reopen this issue if @awav indicates the previous suggestion was not sufficient.

Note: We have an on-call rotation for triaging issue. When filing issues, please let us take care of tagging team members for you.

samjabrahams · 2017-04-21T04:41:54Z

@awav - correct me if I'm missing something, but is the goal to simply not update Variables that aren't used due to a conditional? TensorFlow already zeros out these gradients. Here's some sample code:

import tensorflow as tf

tf.reset_default_graph()

a = tf.Variable(10.0)
b = tf.Variable(10.0)
switch = tf.placeholder(tf.bool)
res = tf.cond(switch, lambda: tf.mul(2.0, a), lambda: tf.square(b))
opt = tf.train.GradientDescentOptimizer(0.05)
grads = opt.compute_gradients(res)
train = opt.apply_gradients(grads)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    print(sess.run(grads, {switch: True}))

If you adjust the {switch: True} dict to be set to false, you'll see that the gradient values will flip depending on which path is taken.

When {switch: True}:

[(2.0, 10.0), (0.0, 10.0)]

When {switch: False}:

[(0.0, 10.0), (20.0, 10.0)]

For completeness, if you apply the gradients with different switches set, you only update one or the other:

{switch: True}:

with tf.Session() as sess:
    sess.run(init)
    sess.run(train, {switch: True})
    print(sess.run([a, b]))

>>> [9.8999996, 10.0]

{switch: False}:

with tf.Session() as sess:
    sess.run(init)
    sess.run(train, {switch: False})
    print(sess.run([a, b]))

>>> [10.0, 9.0]

I think the most likely problem that might occur when trying to implement stochastic depth is that you may not see the reduced computation due to the less-lazy way tf.cond executes (see the last paragraph of the documentation before the "args" section).

stanislavfort · 2017-10-25T22:07:38Z

The conditional statement does not seem to cut it for me. In my case, I have a model of the form data -> encoder -> intermediate result -> decoder -> result. I would like to be able to set the variables in the decoder and encoder as trainable during training by passing a boolean tensor. Is it possible to do it using 'tf.cond'? When I pass a boolean tensor as tf.get_variable(...,trainable = boolTensor) I get a TypeError.

shengcheng · 2018-03-11T22:13:17Z

I have same problem. In my case, I have the model input -> features -> decode1 -> loss1
|-------> decode2 -> loss2.
Loss1 and Loss2 are different loss function. When I minimize the loss1, I want to fix the weights in decode2. Also, when I minimize the loss2, I want to fix the weights in features and decode1. In training process, I need to iterative training two losses. I need the trainable flag to determine which part to train and which loss is needed to minimize. One of the solution online is to save and restore the weights every time. But it is not efficient at all.

awav changed the title ~~Conditional training variables and stochastic depth neural networks~~ Conditionally trainable variables and stochastic depth neural networks Mar 30, 2017

aselle added stat:awaiting response Status - Awaiting response from author type:feature Feature requests labels Mar 31, 2017

aselle removed the stat:awaiting response Status - Awaiting response from author label Apr 4, 2017

jart closed this as completed Apr 14, 2017

MHStadler mentioned this issue Jul 22, 2020

Feature Request : Stochastic Depth/ResDrop tensorflow/addons#2032

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditionally trainable variables and stochastic depth neural networks #8817

Conditionally trainable variables and stochastic depth neural networks #8817

awav commented Mar 29, 2017 •

edited

aselle commented Mar 31, 2017

awav commented Apr 4, 2017 •

edited

aselle commented Apr 4, 2017

jart commented Apr 14, 2017

samjabrahams commented Apr 21, 2017

stanislavfort commented Oct 25, 2017

shengcheng commented Mar 11, 2018 •

edited

Conditionally trainable variables and stochastic depth neural networks #8817

Conditionally trainable variables and stochastic depth neural networks #8817

Comments

awav commented Mar 29, 2017 • edited

aselle commented Mar 31, 2017

awav commented Apr 4, 2017 • edited

aselle commented Apr 4, 2017

jart commented Apr 14, 2017

samjabrahams commented Apr 21, 2017

stanislavfort commented Oct 25, 2017

shengcheng commented Mar 11, 2018 • edited

awav commented Mar 29, 2017 •

edited

awav commented Apr 4, 2017 •

edited

shengcheng commented Mar 11, 2018 •

edited