Implement gradient norm logging #63

brilee · 2018-02-06T15:19:10Z

No description provided.

jmgilmer · 2018-02-06T18:15:04Z

Added a few comments. George recommended (and I agree) that it can be useful to also track the norm of the gradient with respect to each of the weights. Currently implemented is the norm of the weight update, which is a little different.

jmgilmer · 2018-02-06T18:03:06Z

dual_net.py

@@ -301,61 +312,39 @@ def logging_ops():
        name="weight_summaries")


+def compute_gradient_ratio(weight_tensors, before_weights, after_weights):


This actually doesn't compute anything about the gradient, this is computing || delta w || / || w|| for all weight tensors w. The gradient is related but some optimizers have complicated weight updates. I would call this function compute_update_ratio instead.

jmgilmer · 2018-02-06T18:06:09Z

dual_net.py

+    delta_norms = [np.linalg.norm(d.ravel()) for d in deltas]
+    weight_norms = [np.linalg.norm(w.ravel()) for w in before_weights]
+    ratios = [d / w for d, w in zip(delta_norms, weight_norms)]
+    all_summaries = [tf.Summary.Value(tag=tensor.name, simple_value=ratio)


So the tensor name is the summary tag? Maybe append something like "update_ratio" to be more clear?

jmgilmer · 2018-02-06T18:09:14Z

dual_net.py

+                            'combined_cost')})
+                if should_log(i):
+                    after_weights = self.sess.run(weight_tensors)
+                    gradient_summaries = compute_gradient_ratio(


I would call these weight_update summaries. It would be useful to also track gradient summaries, which can be computed with the following line:
grads = tf.gradients(loss, weight_tensors)

artasparks · 2018-02-06T18:29:28Z

dual_net.py

@@ -81,6 +83,8 @@ def bootstrap(self):
            tf.train.Saver().save(sess, self.save_file)

    def train(self, tf_records, init_from=None, logdir=None, num_steps=None):
+        def should_log(i):


I guess my point is, it seems like we should be exporting metrics rather than dumping a bunch of logs.

we don't want to actually do these computations every step for millions of steps -- it'd be a lot of wasted work. So what we 'log' are actually the metrics.

It's a bit badly named; the "logger" is really a TFSummaryWriter that dumps summary protos to a log file that tensorboard can then parse and visualize.

amj

+1 to jmgilmer's comments, one nit.

amj · 2018-02-06T19:27:47Z

dual_net.py

+
+    def report(self, values):
+        """Take a dict of scalar names to scalars, and aggregate by name."""
+        for key, val in values.items():


can this be a comprehension? [self.accums[key].append(val) for key,val in values.items()] ?

a comprehension that has side effects like appending feels weird to me.

no problem. LGTM then.

amj · 2018-02-06T20:21:49Z

LGTM :)

artasparks · 2018-02-06T20:53:04Z

dual_net.py

@@ -81,6 +83,8 @@ def bootstrap(self):
            tf.train.Saver().save(sess, self.save_file)

    def train(self, tf_records, init_from=None, logdir=None, num_steps=None):
+        def should_log(i):
+            return logdir is not None and i % 100 == 0


Make log sample a constant?

+1 to this, extracting to a constant would also make it monkeypatchable for local_rl_loop or etc.

are we talking about making 100 a parameter log_every or something like that?

yeah, SUMMARY_FREQUENCY or whatever. not really needed but might be nice

artasparks · 2018-02-06T20:53:47Z

dual_net.py

@@ -81,6 +83,8 @@ def bootstrap(self):
            tf.train.Saver().save(sess, self.save_file)

    def train(self, tf_records, init_from=None, logdir=None, num_steps=None):
+        def should_log(i):


How about making this a top-level function

log_sample(i):

Or do you need all the closure vars?

the closure var is used. This way is both DRY and readable.

brilee · 2018-02-06T22:58:23Z

@jmgilmer - "it can be useful to also track the norm of the gradient with respect to each of the weights. Currently implemented is the norm of the weight update, which is a little different."

I think I do have what you're asking - the norm of each individual trainable variable (where a training variable might be a shape [3, 3, IN_CHANNELS, OUT_CHANNELS] conv kernel or a shape [OUT_CHANNELS] bias term).

Here's what I'm seeing in tensorboard.

Implement gradient norm logging

76af726

brilee requested a review from amj February 6, 2018 15:19

jmgilmer reviewed Feb 6, 2018

View reviewed changes

artasparks reviewed Feb 6, 2018

View reviewed changes

amj reviewed Feb 6, 2018

View reviewed changes

artasparks reviewed Feb 6, 2018

View reviewed changes

brilee added 2 commits February 6, 2018 18:04

PR comments

7061a39

Merge branch 'master' into log_gradient_norms

8409ed9

brilee merged commit 360e056 into tensorflow:master Feb 7, 2018

brilee deleted the log_gradient_norms branch February 7, 2018 01:07

brilee mentioned this pull request Feb 7, 2018

Add monitoring of gradient magnitude #42

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement gradient norm logging #63

Implement gradient norm logging #63

brilee commented Feb 6, 2018

jmgilmer commented Feb 6, 2018

jmgilmer Feb 6, 2018

jmgilmer Feb 6, 2018

jmgilmer Feb 6, 2018

artasparks Feb 6, 2018

artasparks Feb 6, 2018

amj Feb 6, 2018

brilee Feb 6, 2018

amj left a comment

amj Feb 6, 2018

brilee Feb 6, 2018

amj Feb 6, 2018

amj commented Feb 6, 2018

artasparks Feb 6, 2018

amj Feb 6, 2018

brilee Feb 6, 2018

amj Feb 6, 2018 •

edited

Loading

artasparks Feb 6, 2018

amj Feb 6, 2018

brilee commented Feb 6, 2018

		@@ -301,61 +312,39 @@ def logging_ops():
		name="weight_summaries")


		def compute_gradient_ratio(weight_tensors, before_weights, after_weights):

Implement gradient norm logging #63

Implement gradient norm logging #63

Conversation

brilee commented Feb 6, 2018

jmgilmer commented Feb 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amj commented Feb 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amj Feb 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brilee commented Feb 6, 2018

amj Feb 6, 2018 •

edited

Loading