[TF 2.0] Nested Gradient Tape - unconnected graphs #34335

janbolle · 2019-11-16T08:25:20Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
OS Platform and Distribution: MacOS 10.15.1
TensorFlow installed from binary (pip 19.3.1)
TensorFlow version: v2.0.0-rc2-26-g64c3d382ca 2.0.0
Python version: Python 3.6.5

Describe the current behavior
A copy of my model (model_copy) should be trained one step, then I need my meta_model to be trained with the loss of my model_copy. It seems, that the graphs are unconnected.
It only works, if I use the meta_model for the training step.

Describe the expected behavior
I would expect, that model_copy is known to both gradient tapes and can be used w/o using meta_model.

Code to reproduce the issue

import tensorflow as tf
import tensorflow.keras.backend as keras_backend
import tensorflow.keras as keras

class MetaModel(keras.Model):
    def __init__(self):
        super().__init__()
        self.hidden1 = keras.layers.Dense(5, input_shape=(1,))
        self.out = keras.layers.Dense(1)
    def forward(self, x):
        x = keras.activations.relu(self.hidden1(x))
        x = self.out(x)
        return x

def copy_model(model, x):
    copied_model = MetaModel()
    copied_model.forward(x)
    copied_model.set_weights(model.get_weights())
    return copied_model

def compute_loss(model, x, y):
    logits = model.forward(x)  # prediction of my model
    mse = keras_backend.mean(keras.losses.mean_squared_error(y, logits))  # compute loss between prediciton and label/truth
    return mse, logits

optimizer_outer = keras.optimizers.Adam()
alpha = 0.01
with tf.GradientTape() as g:
    # meta_model to learn in outer gradient tape
    meta_model = MetaModel()
    # inputs for training
    x = tf.constant(3.0, shape=(1, 1, 1))
    y = tf.constant(3.0, shape=(1, 1, 1))

    meta_model.forward(x)
    model_copy = copy_model(meta_model, x)
    with tf.GradientTape() as gg:
        loss, _ = compute_loss(model_copy, x, y)
        gradients = gg.gradient(loss, model_copy.trainable_variables)
        k = 0
        for layer in range(len(model_copy.layers)):
            """ If I use meta-model for updating, this works """
            # model_copy.layers[layer].kernel = tf.subtract(meta_model.layers[layer].kernel,
            #                                               tf.multiply(alpha, gradients[k]))
            # model_copy.layers[layer].bias = tf.subtract(meta_model.layers[layer].bias,
            #                                             tf.multiply(alpha, gradients[k + 1]))

            """ If I use model-copy for updating instead, gradients_meta always will be [None,None,...]"""
            model_copy.layers[layer].kernel = tf.subtract(model_copy.layers[layer].kernel,
                                                          tf.multiply(alpha, gradients[k]))
            model_copy.layers[layer].bias = tf.subtract(model_copy.layers[layer].bias,
                                                        tf.multiply(alpha, gradients[k + 1]))

            k += 2

    # calculate loss of model_copy
    test_loss, _ = compute_loss(model_copy, x, y)
    # build gradients for meta_model update
    gradients_meta = g.gradient(test_loss, meta_model.trainable_variables)
    """ gradients always None !?!!11 elf """
    optimizer_outer.apply_gradients(zip(gradients_meta, meta_model.trainable_variables))

Other info / logs
Is it intended to work as above? This would force me not to be able to use a different optimizer in the inner loop, as the networks need somehow to be connected.

rmothukuru · 2019-11-18T10:21:45Z

@janbolle,
When trying to reproduce your issue, I encounter the error, ValueError: No gradients provided for any variable: ['dense_8/kernel:0', 'dense_8/bias:0', 'dense_9/kernel:0', 'dense_9/bias:0'].. Can you please help us reproduce the issue. Here is the Gist. Thanks!

janbolle · 2019-11-18T10:26:27Z

@rmothukuru , thanks for your reply.
This is my problem. gradients_meta are always [None, None, ...].
So TF tells me that there are no gradients provided..

rmothukuru · 2019-11-18T11:28:08Z

@janbolle,
So do you mean you are encountering same error as that of mine. Please confirm.

janbolle · 2019-11-18T12:47:52Z

@rmothukuru , yes, same error on my side.

Rahulmishra07 · 2019-11-18T13:03:48Z

Chandan Kumar On Nov 18, 2019 6:24 PM, "Jan Bollenbacher" <notifications@github.com> wrote: @rmothukuru <https://github.com/rmothukuru> , yes, same error on my side. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#34335?email_source=notifications&email_token=AMUWCJI3UKK2QVZRZWSF4ODQUKGA5A5CNFSM4JOEEYF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEKKIUI#issuecomment-555000913>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMUWCJJLEZ5UJYBX3FEMQZ3QUKGA5ANCNFSM4JOEEYFQ> .

rmothukuru · 2019-11-18T13:15:48Z

Could reproduce the error with TF Version 2.0. Here is the Gist. Thanks!

janbolle · 2019-11-18T19:19:39Z

Also what is very odd: if I print the weights of the layers before and after the training, they are available. But if I use the function model_copy.get_weights() it results in an empty array.

Following code:

        k = 0
        for layer in range(len(model_copy.layers)):
            # calculate adapted parameters w/ gradient descent
            # \theta_i' = \theta - \alpha * gradients
            print("pre: ", model_copy.layers[layer].kernel.shape, model_copy.layers[layer].kernel)
            model_copy.layers[layer].kernel = tf.subtract(model_copy.layers[layer].kernel,
                                                          tf.multiply(alpha, gradients[k]))
            model_copy.layers[layer].bias = tf.subtract(model_copy.layers[layer].bias,
                                                        tf.multiply(alpha, gradients[k + 1]))
            print("post: ", model_copy.layers[layer].kernel.shape, model_copy.layers[layer].kernel)
            k += 2
    print(model_copy.get_weights())  # results in empty array

janbolle · 2019-11-25T14:29:20Z

@jvishnuvardhan do you need further information?
@rmothukuru did you connect the right person?

Also, I think this is a bug, not a support case :-/

Maybe related to #29535

gadagashwini-zz · 2020-03-19T07:31:37Z

I was able to replicate the issue with Tf-nightly==2.2.0.dev20200318.
Please find the gist here. Thanks!

ravikyram · 2020-06-12T10:33:58Z

I was able to replicate the issue with Tf-nightly==2.3.0-dev20200612.Please, find the gist here.Thanks!.

Saduf2019 · 2020-08-06T18:34:28Z

I was able to replicate the issue with Tf-nightly==2.4.0-dev20200806, Please, find the gist here.

velocirabbit · 2020-09-02T02:50:14Z

Have there been any updates with this issue? Running into a similar None gradients case when using nested tf.GradientTapes.

interactivetech · 2020-11-17T06:50:13Z

Looking for updates on this as well! I am able to get my U-Net model to do an inner update, but this issue shows its only possible to do one inner update.

sushreebarsa · 2021-05-29T17:56:47Z

Was able to replicate the issue in TF v2.5,please find the gist here..Thanks !

sushreebarsa · 2021-06-14T18:20:28Z

Was able to replicate the issue with TF 2.6.0-dev20210606,please find the gist here ..Thanks!

kumariko · 2021-08-26T08:12:10Z

I could reproduce the issue with TF 2.6 .Please, find the gist here.Thanks!

tensorflowbutler · 2021-12-14T23:25:17Z

Hi There,

This is a stale issue. As you are using an older version of tensorflow, we are checking to see if you still need help on this issue. Please test the issue with the latest TensorFlow (TF2.7 and tf-nightly). If the issue still persists with the newer versions of TF, please feel free to open it in keras-team/keras repository by providing details about the issue and a standalone code to reproduce the issue. Thanks!

Please note that Keras development has moved to a separate Keras-team/keras repository to focus entirely on only Keras. Thanks!

andrewemendez · 2021-12-14T23:36:39Z

Still an issue with TF 2.7. Gist here

google-ml-butler · 2022-01-11T17:49:00Z

Are you satisfied with the resolution of your issue?
Yes
No

evanfwelch · 2022-04-27T01:23:11Z

Was there ever a resolution here? Curious why the issue was closed after the last poster confirmed it was still an issue in TF 2.7.

rmothukuru self-assigned this Nov 18, 2019

rmothukuru added the TF 2.0 Issues relating to TensorFlow 2.0 label Nov 18, 2019

rmothukuru added stat:awaiting response Status - Awaiting response from author comp:keras Keras related issues type:support Support issues labels Nov 18, 2019

rmothukuru assigned jvishnuvardhan and unassigned rmothukuru Nov 18, 2019

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Nov 19, 2019

jvishnuvardhan added type:bug Bug comp:keras Keras related issues and removed comp:keras Keras related issues type:support Support issues labels Nov 26, 2019

jvishnuvardhan assigned pavithrasv and unassigned jvishnuvardhan Nov 26, 2019

jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 26, 2019

shufflebyte mentioned this issue Dec 2, 2019

How to apply several SGD steps within the ineer loop? hereismari/tensorflow-maml#3

Open

gadagashwini-zz added the TF 2.1 for tracking issues in 2.1 release label Mar 19, 2020

gadagashwini-zz self-assigned this Mar 19, 2020

tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 21, 2020

Saduf2019 unassigned gadagashwini-zz Apr 24, 2020

ravikyram added the TF 2.2 Issues related to TF 2.2 label Jun 12, 2020

kumariko added 2.6.0 and removed TF 2.0 Issues relating to TensorFlow 2.0 TF 2.1 for tracking issues in 2.1 release TF 2.2 Issues related to TF 2.2 labels Aug 26, 2021

tensorflowbutler closed this as completed Jan 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TF 2.0] Nested Gradient Tape - unconnected graphs #34335

[TF 2.0] Nested Gradient Tape - unconnected graphs #34335

janbolle commented Nov 16, 2019

rmothukuru commented Nov 18, 2019

janbolle commented Nov 18, 2019

rmothukuru commented Nov 18, 2019

janbolle commented Nov 18, 2019

Rahulmishra07 commented Nov 18, 2019 via email

rmothukuru commented Nov 18, 2019

janbolle commented Nov 18, 2019 •

edited

Loading

janbolle commented Nov 25, 2019 •

edited

Loading

gadagashwini-zz commented Mar 19, 2020

ravikyram commented Jun 12, 2020

Saduf2019 commented Aug 6, 2020

velocirabbit commented Sep 2, 2020 •

edited

Loading

interactivetech commented Nov 17, 2020

sushreebarsa commented May 29, 2021

sushreebarsa commented Jun 14, 2021

kumariko commented Aug 26, 2021

tensorflowbutler commented Dec 14, 2021

andrewemendez commented Dec 14, 2021

google-ml-butler bot commented Jan 11, 2022

evanfwelch commented Apr 27, 2022

[TF 2.0] Nested Gradient Tape - unconnected graphs #34335

[TF 2.0] Nested Gradient Tape - unconnected graphs #34335

Comments

janbolle commented Nov 16, 2019

rmothukuru commented Nov 18, 2019

janbolle commented Nov 18, 2019

rmothukuru commented Nov 18, 2019

janbolle commented Nov 18, 2019

Rahulmishra07 commented Nov 18, 2019 via email

rmothukuru commented Nov 18, 2019

janbolle commented Nov 18, 2019 • edited Loading

janbolle commented Nov 25, 2019 • edited Loading

gadagashwini-zz commented Mar 19, 2020

ravikyram commented Jun 12, 2020

Saduf2019 commented Aug 6, 2020

velocirabbit commented Sep 2, 2020 • edited Loading

interactivetech commented Nov 17, 2020

sushreebarsa commented May 29, 2021

sushreebarsa commented Jun 14, 2021

kumariko commented Aug 26, 2021

tensorflowbutler commented Dec 14, 2021

andrewemendez commented Dec 14, 2021

google-ml-butler bot commented Jan 11, 2022

evanfwelch commented Apr 27, 2022

janbolle commented Nov 18, 2019 •

edited

Loading

janbolle commented Nov 25, 2019 •

edited

Loading

velocirabbit commented Sep 2, 2020 •

edited

Loading