Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ploting Gradients to Tensorboard and Console #31542

Closed
SPP3000 opened this issue Aug 12, 2019 · 11 comments
Closed

Ploting Gradients to Tensorboard and Console #31542

SPP3000 opened this issue Aug 12, 2019 · 11 comments
Assignees
Labels
comp:tensorboard Tensorboard related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug

Comments

@SPP3000
Copy link

SPP3000 commented Aug 12, 2019

System information

  • Windows 10 Pro Version 1903
  • TensorFlow installed from pip in Anaconda:
  • TensorFlow version 2.0.0-beta1 (gpu)
  • Python version: 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)]
  • CUDA/cuDNN version: Cuda compilation tools, release 10.0, V10.0.130
  • GPU model and memory: GeForce GTX 980 Ti major: 5 minor: 2 memoryClockRate(GHz): 1.2785

Describe the current behavior
Program ends with an unclear error, while trying to retrieve the bias gradients of the two dense layers in the model.

  • Writing to tensorboard (console parameter = False)
Train on 60000 samples
Epoch 1/5
2019-08-12 15:24:48.713962: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
2019-08-12 15:24:48.718362: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library cupti64_100.dll
   32/60000 [..............................] - ETA: 12:48 - loss: 2.2374 - accuracy: 0.18752019-08-12 15:24:48.840661: I tensorflow/core/platform/default/device_tracer.cc:641] Collecting 81 kernel records, 14 memcpy rec
ords.
59744/60000 [============================>.] - ETA: 0s - loss: 0.2971 - accuracy: 0.9123Traceback (most recent call last):
  File "C:/Users/Harald Schweiger/PycharmProjects/Gradients/gradient_test.py", line 42, in <module>
    model.fit(x_train, y_train, epochs=5, callbacks=[gradient_cb, tensorboard_cb])
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py", line 643, in fit
    use_multiprocessing=use_multiprocessing)
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 664, in fit
    steps_name='steps_per_epoch')
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 439, in model_iteration
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\keras\callbacks.py", line 295, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "C:/Users/Harald Schweiger/PycharmProjects/Gradients/gradient_test.py", line 32, in on_epoch_end
    tf.summary.histogram(t.name, data=t)
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorboard\plugins\histogram\summary_v2.py", line 77, in histogram
    tensor = _buckets(data, bucket_count=buckets)
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorboard\plugins\histogram\summary_v2.py", line 139, in _buckets
    return tf.cond(is_empty, when_empty, when_nonempty)
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1382, in cond_for_tf_v2
    return cond(pred, true_fn=true_fn, false_fn=false_fn, strict=True, name=name)
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1177, in cond
    result = false_fn()
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorboard\plugins\histogram\summary_v2.py", line 137, in when_nonempty
    return tf.cond(is_singular, when_singular, when_nonsingular)
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1382, in cond_for_tf_v2
    return cond(pred, true_fn=true_fn, false_fn=false_fn, strict=True, name=name)
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1174, in cond
    if pred:
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 698, in __bool__
    raise TypeError("Using a `tf.Tensor` as a Python `bool` is not allowed. "
TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed. Use `if t is not None:` instead of `if t:` to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the
 value of a tensor.
  • Priniting bias gradients to console(console parameter = True)
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Train on 60000 samples
Epoch 1/5
2019-08-12 15:26:01.265400: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
2019-08-12 15:26:01.268877: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library cupti64_100.dll
   32/60000 [..............................] - ETA: 12:53 - loss: 2.4094 - accuracy: 0.09382019-08-12 15:26:01.391432: I tensorflow/core/platform/default/device_tracer.cc:641] Collecting 81 kernel records, 14 memcpy rec
ords.
59776/60000 [============================>.] - ETA: 0s - loss: 0.3015 - accuracy: 0.9121Tensor: Adam/gradients_1/dense128/BiasAdd_grad/BiasAddGrad:0
Traceback (most recent call last):
  File "C:/Users/Harald Schweiger/PycharmProjects/Gradients/gradient_test.py", line 42, in <module>
    model.fit(x_train, y_train, epochs=5, callbacks=[gradient_cb, tensorboard_cb])
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py", line 643, in fit
    use_multiprocessing=use_multiprocessing)
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 664, in fit
    steps_name='steps_per_epoch')
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 439, in model_iteration
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\keras\callbacks.py", line 295, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "C:/Users/Harald Schweiger/PycharmProjects/Gradients/gradient_test.py", line 30, in on_epoch_end
    print('{}\n'.format(K.get_value(t)[:10]))
  File "C:\Users\Harald Schweiger\Anaconda3\lib\site-packages\tensorflow\python\keras\backend.py", line 2981, in get_value
    return x.numpy()
AttributeError: 'Tensor' object has no attribute 'numpy'

Describe the expected behavior

  • Writing to tensorboard (console parameter = False)
    Tensorboard event file which contains the distribution and histograms of gradients
    derived from the total loss that has been accumulated over the last epoch.

  • Priniting to console(console parameter = True)
    The program should print the first 10 gradient bias values of each of the two dense layer
    to the console.

If the exceptions produced here are the expected behavior due to errors in the developers code
a more meaningful error message would be appriciated.
In that case a correction of the code would be useful for me and other people as well who had to update their code as the write_grads parameter has been removed from the tensorboard callback in version 2.0.

Code to reproduce the issue

import tensorflow as tf
from tensorflow.python.keras import backend as K

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu', name='dense128'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax', name='dense10')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])


class GradientCallback(tf.keras.callbacks.Callback):
    console = True

    def on_epoch_end(self, epoch, logs=None):
        weights = [w for w in self.model.trainable_weights if 'dense' in w.name and 'bias' in w.name]
        loss = self.model.total_loss
        optimizer = self.model.optimizer
        gradients = optimizer.get_gradients(loss, weights)
        for t in gradients:
            if self.console:
                print('Tensor: {}'.format(t.name))
                print('{}\n'.format(K.get_value(t)[:10]))
            else:
                tf.summary.histogram(t.name, data=t)


file_writer = tf.summary.create_file_writer("./metrics")
file_writer.set_as_default()

# write_grads has been removed
tensorboard_cb = tf.keras.callbacks.TensorBoard(histogram_freq=1, write_grads=True)
gradient_cb = GradientCallback()

model.fit(x_train, y_train, epochs=5, callbacks=[gradient_cb, tensorboard_cb])
@oanush oanush self-assigned this Aug 13, 2019
@oanush oanush added 2.0.0-beta0 comp:tensorboard Tensorboard related issues type:bug Bug labels Aug 13, 2019
@oanush oanush assigned ymodak and unassigned oanush Aug 13, 2019
@oanush
Copy link

oanush commented Aug 13, 2019

Issue replicating for TF version 2.0beta,please find the Gist of Colab. Thanks!

@ymodak ymodak assigned gowthamkpr and unassigned ymodak Aug 13, 2019
@gowthamkpr gowthamkpr assigned nfelt and unassigned gowthamkpr Sep 3, 2019
@gowthamkpr gowthamkpr added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Sep 3, 2019
@jvishnuvardhan jvishnuvardhan added TF 2.0 Issues relating to TensorFlow 2.0 and removed TF 2.0.0-beta0 labels Oct 8, 2019
@richardwth
Copy link

I also find it challenging to plot gradients to Tensorboard in TF 2.2.0-rc3 on Colab. My case is different from @SPP3000 in that instead of

if self.console:
    print('Tensor: {}'.format(t.name))
    print('{}\n'.format(K.get_value(t)[:10]))
else:
    tf.summary.histogram(t.name, data=t)

I simply have tf.summary.histogram(t.name, data=t). The error I came across is:

AttributeError: 'Sequential' object has no attribute 'total_loss'

Using tf.keras.Model, the error became 'Model' object has no attribute 'total_loss'.

@teodor440
Copy link

Im also affected by the issue

@richardwth
Copy link

Here is a workaround where the gradients are explicitly calculated. It avoids the total_loss error I mentioned above.

class ExtendedTensorBoard(tf.keras.callbacks.TensorBoard):
  def _log_gradients(self, epoch):
    step = tf.cast(tf.math.floor((epoch+1)*num_instance/batch_size), dtype=tf.int64)
    writer = self._get_writer(self._train_run_name)

    with writer.as_default(), tf.GradientTape() as g:
      # here we use test data to calculate the gradients
      _x_batch = x_te[:100]
      _y_batch = y_te[:100]

      g.watch(_x_batch)
      _y_pred = self.model(_x_batch)  # forward-propagation
      loss = self.model.loss(y_true=_y_batch, y_pred=_y_pred)  # calculate loss
      gradients = g.gradient(loss, self.model.trainable_weights)  # back-propagation

      # In eager mode, grads does not have name, so we get names from model.trainable_weights
      for weights, grads in zip(self.model.trainable_weights, gradients):
        tf.summary.histogram(
            weights.name.replace(':', '_')+'_grads', data=grads, step=step)
    
    writer.flush()

  def on_epoch_end(self, epoch, logs=None):  
    # This function overwrites the on_epoch_end in tf.keras.callbacks.TensorBoard
    # but we do need to run the original on_epoch_end, so here we use the super function. 
    super(ExtendedTensorBoard, self).on_epoch_end(epoch, logs=logs)

    if self.histogram_freq and epoch % self.histogram_freq == 0:
      self._log_gradients(epoch)

ExtendedTensorBoard can then be used in replace of tf.keras.callbacks.TensorBoard.

@mmehedin
Copy link

mmehedin commented Nov 12, 2020

Thanks for posting the code. I am getting and error saying:
NameError: name 'num_instance' is not defined
The same goes for batch_size, x_te, y_te. How can I fix this. Thanks in advance.

@Alwaysproblem
Copy link

Thanks for posting the code. I am getting and error saying:
NameError: name 'num_instance' is not defined
The same goes for batch_size, x_te, y_te. How can I fix this. Thanks in advance.

you can try this code

import tensorflow as tf
from tensorflow.python.keras import backend as K

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu', name='l_1st'),
  tf.keras.layers.Dense(128, activation='relu', name='l_2nd'),
  tf.keras.layers.Dense(128, activation='relu', name='l_3rd'),
  tf.keras.layers.Dense(128, activation='relu', name='l_4th'),
  tf.keras.layers.Dense(128, activation='relu', name='l_5th'),
  tf.keras.layers.Dense(128, activation='relu', name='l_6th'),
  tf.keras.layers.Dense(128, activation='relu', name='l_7th'),
  tf.keras.layers.Dense(128, activation='relu', name='l_8th'),
  tf.keras.layers.Dense(128, activation='relu', name='l_9th'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax', name='dense10')
])

l = tf.keras.losses.SparseCategoricalCrossentropy()
opt = tf.keras.optimizers.Adam(0.001)

model.compile(optimizer=opt, loss=l, metrics=['accuracy'])

class ExtendedTensorBoard(tf.keras.callbacks.TensorBoard):

  def _log_gradients(self, epoch):
    step = tf.cast(epoch, dtype=tf.int64)
    writer = self._train_writer
    # writer = self._get_writer(self._train_run_name)

    with writer.as_default(), tf.GradientTape() as g:
      # here we use test data to calculate the gradients
      _x_batch = x_train[:100]
      _y_batch = y_train[:100]

      g.watch(tf.convert_to_tensor(_x_batch))
      _y_pred = self.model(_x_batch)  # forward-propagation
      loss = self.model.loss(y_true=_y_batch, y_pred=_y_pred)  # calculate loss
      gradients = g.gradient(loss, self.model.trainable_weights)  # back-propagation

      # In eager mode, grads does not have name, so we get names from model.trainable_weights
      for weights, grads in zip(self.model.trainable_weights, gradients):
        tf.summary.histogram(
            weights.name.replace(':', '_')+'_grads', data=grads, step=step)

    writer.flush()

  def on_epoch_end(self, epoch, logs=None):  
  # def on_train_batch_end(self, batch, logs=None):  
    # This function overwrites the on_epoch_end in tf.keras.callbacks.TensorBoard
    # but we do need to run the original on_epoch_end, so here we use the super function. 
    super(ExtendedTensorBoard, self).on_epoch_end(epoch, logs=logs)
    # super(ExtendedTensorBoard, self).on_train_batch_end(batch, logs=logs)
    if self.histogram_freq and epoch % self.histogram_freq == 0:
      self._log_gradients(epoch)

ee = ExtendedTensorBoard(histogram_freq=1, write_images=True, update_freq='batch')
model.fit(x_train, y_train, epochs=10, callbacks=[ee], validation_data=(x_test, y_test), )
# model.fit(x_train, y_train, epochs=5, callbacks=[gradient_cb, tensorboard_cb])

@sushreebarsa
Copy link
Contributor

sushreebarsa commented May 27, 2021

I tried to run the code on colab using tf v2.5 and faced attribute error ,please find the gist here..Thanks !

@sachinprasadhs
Copy link
Contributor

Now i'm able to get specific error message in the recent Tensorflow version, please find the gist here and confirm the same. Thanks!

@sachinprasadhs sachinprasadhs added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Jun 23, 2021
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jul 3, 2021
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:tensorboard Tensorboard related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug
Projects
None yet
Development

No branches or pull requests