-
Notifications
You must be signed in to change notification settings - Fork 558
Conversation
Added a few comments. George recommended (and I agree) that it can be useful to also track the norm of the gradient with respect to each of the weights. Currently implemented is the norm of the weight update, which is a little different. |
dual_net.py
Outdated
@@ -301,61 +312,39 @@ def logging_ops(): | |||
name="weight_summaries") | |||
|
|||
|
|||
def compute_gradient_ratio(weight_tensors, before_weights, after_weights): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually doesn't compute anything about the gradient, this is computing || delta w || / || w|| for all weight tensors w. The gradient is related but some optimizers have complicated weight updates. I would call this function compute_update_ratio instead.
dual_net.py
Outdated
delta_norms = [np.linalg.norm(d.ravel()) for d in deltas] | ||
weight_norms = [np.linalg.norm(w.ravel()) for w in before_weights] | ||
ratios = [d / w for d, w in zip(delta_norms, weight_norms)] | ||
all_summaries = [tf.Summary.Value(tag=tensor.name, simple_value=ratio) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the tensor name is the summary tag? Maybe append something like "update_ratio" to be more clear?
dual_net.py
Outdated
'combined_cost')}) | ||
if should_log(i): | ||
after_weights = self.sess.run(weight_tensors) | ||
gradient_summaries = compute_gradient_ratio( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call these weight_update summaries. It would be useful to also track gradient summaries, which can be computed with the following line:
grads = tf.gradients(loss, weight_tensors)
dual_net.py
Outdated
@@ -81,6 +83,8 @@ def bootstrap(self): | |||
tf.train.Saver().save(sess, self.save_file) | |||
|
|||
def train(self, tf_records, init_from=None, logdir=None, num_steps=None): | |||
def should_log(i): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my point is, it seems like we should be exporting metrics rather than dumping a bunch of logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to actually do these computations every step for millions of steps -- it'd be a lot of wasted work. So what we 'log' are actually the metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit badly named; the "logger" is really a TFSummaryWriter that dumps summary protos to a log file that tensorboard can then parse and visualize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to jmgilmer's comments, one nit.
|
||
def report(self, values): | ||
"""Take a dict of scalar names to scalars, and aggregate by name.""" | ||
for key, val in values.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be a comprehension? [self.accums[key].append(val) for key,val in values.items()] ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a comprehension that has side effects like appending feels weird to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem. LGTM then.
LGTM :) |
dual_net.py
Outdated
@@ -81,6 +83,8 @@ def bootstrap(self): | |||
tf.train.Saver().save(sess, self.save_file) | |||
|
|||
def train(self, tf_records, init_from=None, logdir=None, num_steps=None): | |||
def should_log(i): | |||
return logdir is not None and i % 100 == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make log sample a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this, extracting to a constant would also make it monkeypatchable for local_rl_loop or etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we talking about making 100 a parameter log_every
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, SUMMARY_FREQUENCY or whatever. not really needed but might be nice
dual_net.py
Outdated
@@ -81,6 +83,8 @@ def bootstrap(self): | |||
tf.train.Saver().save(sess, self.save_file) | |||
|
|||
def train(self, tf_records, init_from=None, logdir=None, num_steps=None): | |||
def should_log(i): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making this a top-level function
log_sample(i):
Or do you need all the closure vars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the closure var is used. This way is both DRY and readable.
@jmgilmer - "it can be useful to also track the norm of the gradient with respect to each of the weights. Currently implemented is the norm of the weight update, which is a little different." I think I do have what you're asking - the norm of each individual trainable variable (where a training variable might be a shape [3, 3, IN_CHANNELS, OUT_CHANNELS] conv kernel or a shape [OUT_CHANNELS] bias term). Here's what I'm seeing in tensorboard. |
No description provided.