Skip to content
This repository has been archived by the owner on Mar 11, 2021. It is now read-only.

Implement gradient norm logging #63

Merged
merged 3 commits into from
Feb 7, 2018

Conversation

brilee
Copy link
Contributor

@brilee brilee commented Feb 6, 2018

No description provided.

@brilee brilee requested a review from amj February 6, 2018 15:19
@jmgilmer
Copy link
Contributor

jmgilmer commented Feb 6, 2018

Added a few comments. George recommended (and I agree) that it can be useful to also track the norm of the gradient with respect to each of the weights. Currently implemented is the norm of the weight update, which is a little different.

dual_net.py Outdated
@@ -301,61 +312,39 @@ def logging_ops():
name="weight_summaries")


def compute_gradient_ratio(weight_tensors, before_weights, after_weights):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually doesn't compute anything about the gradient, this is computing || delta w || / || w|| for all weight tensors w. The gradient is related but some optimizers have complicated weight updates. I would call this function compute_update_ratio instead.

dual_net.py Outdated
delta_norms = [np.linalg.norm(d.ravel()) for d in deltas]
weight_norms = [np.linalg.norm(w.ravel()) for w in before_weights]
ratios = [d / w for d, w in zip(delta_norms, weight_norms)]
all_summaries = [tf.Summary.Value(tag=tensor.name, simple_value=ratio)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the tensor name is the summary tag? Maybe append something like "update_ratio" to be more clear?

dual_net.py Outdated
'combined_cost')})
if should_log(i):
after_weights = self.sess.run(weight_tensors)
gradient_summaries = compute_gradient_ratio(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call these weight_update summaries. It would be useful to also track gradient summaries, which can be computed with the following line:
grads = tf.gradients(loss, weight_tensors)

dual_net.py Outdated
@@ -81,6 +83,8 @@ def bootstrap(self):
tf.train.Saver().save(sess, self.save_file)

def train(self, tf_records, init_from=None, logdir=None, num_steps=None):
def should_log(i):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my point is, it seems like we should be exporting metrics rather than dumping a bunch of logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to actually do these computations every step for millions of steps -- it'd be a lot of wasted work. So what we 'log' are actually the metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit badly named; the "logger" is really a TFSummaryWriter that dumps summary protos to a log file that tensorboard can then parse and visualize.

Copy link
Contributor

@amj amj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to jmgilmer's comments, one nit.


def report(self, values):
"""Take a dict of scalar names to scalars, and aggregate by name."""
for key, val in values.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be a comprehension? [self.accums[key].append(val) for key,val in values.items()] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a comprehension that has side effects like appending feels weird to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem. LGTM then.

@amj
Copy link
Contributor

amj commented Feb 6, 2018

LGTM :)

dual_net.py Outdated
@@ -81,6 +83,8 @@ def bootstrap(self):
tf.train.Saver().save(sess, self.save_file)

def train(self, tf_records, init_from=None, logdir=None, num_steps=None):
def should_log(i):
return logdir is not None and i % 100 == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make log sample a constant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this, extracting to a constant would also make it monkeypatchable for local_rl_loop or etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we talking about making 100 a parameter log_every or something like that?

Copy link
Contributor

@amj amj Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, SUMMARY_FREQUENCY or whatever. not really needed but might be nice

dual_net.py Outdated
@@ -81,6 +83,8 @@ def bootstrap(self):
tf.train.Saver().save(sess, self.save_file)

def train(self, tf_records, init_from=None, logdir=None, num_steps=None):
def should_log(i):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making this a top-level function

log_sample(i):

Or do you need all the closure vars?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the closure var is used. This way is both DRY and readable.

@brilee
Copy link
Contributor Author

brilee commented Feb 6, 2018

@jmgilmer - "it can be useful to also track the norm of the gradient with respect to each of the weights. Currently implemented is the norm of the weight update, which is a little different."

I think I do have what you're asking - the norm of each individual trainable variable (where a training variable might be a shape [3, 3, IN_CHANNELS, OUT_CHANNELS] conv kernel or a shape [OUT_CHANNELS] bias term).

screen shot 2018-02-06 at 5 57 44 pm

Here's what I'm seeing in tensorboard.

@brilee brilee merged commit 360e056 into tensorflow:master Feb 7, 2018
@brilee brilee deleted the log_gradient_norms branch February 7, 2018 01:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants