-
Notifications
You must be signed in to change notification settings - Fork 2k
[tfjs-node] Fix bug in node.tensorBoard() callback re initialEpoch #3714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tfjs-node] Fix bug in node.tensorBoard() callback re initialEpoch #3714
Conversation
Fixes tensorflow#3705 Previously, the `tensorBoard` callback in tfjs-node does not honor the `initialEpoch` arg passed to the `fit()` call that uses the callback. It always incorrectly starts from 0. This CL fixes this bug by using the `epoch` arg passed to `onEpochEnd()` instead of an `epochsSeen` counter maintained by the callback object itself.
pyu10055
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @lina128)
tfjs-node/src/callbacks.ts, line 215 at r1 (raw file):
this.batchesSeen++; if (this.args.updateFreq !== 'epoch') { this.logMetrics(logs, 'batch_', this.batchesSeen);
should the batch number be updated the same way as epoch?
caisq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @lina128 and @pyu10055)
tfjs-node/src/callbacks.ts, line 215 at r1 (raw file):
Previously, pyu10055 (Ping Yu) wrote…
should the batch number be updated the same way as epoch?
Good question. The behavior of tf.keras in Python is that batch numbers logged by TensorBoard callback with update_freq='batch' doesn't reflect the initial_epoch arg, even though the epoch numbers do. For instance, I tested the following code:
import numpy as np
import tensorflow as tf
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(units=1, input_shape=(4,)))
model.compile(loss="mse", optimizer="sgd")
xs = np.ones([8000, 4])
ys = np.zeros([8000, 1])
model.fit(xs, ys, epochs=3)
callback = tf.keras.callbacks.TensorBoard(
"/tmp/initial_epochs_logdir", update_freq="batch")
model.fit(xs,
ys,
batch_size=40,
epochs=6,
initial_epoch=3,
callbacks=[callback])Here the tensorboard scalar log "batch_loss" starts at step 1, instead of a larger number that reflects the batches that have already happened in the previous (first) model.fit() call. Therefore the behavior in tfjs-node code here is correct: it keeps track of the batch number by itself.
pyu10055
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1.
Reviewable status:complete! 1 of 1 approvals obtained (waiting on @lina128 and @pyu10055)
pyu10055
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error seems to related to the lint error on the test file:
ERROR: (no-any) /workspace/tfjs-node/src/tensorboard_test.ts[309, 18]: Type declaration of 'any' loses type-safety. Consider replacing it with a more precise type.
ERROR: (no-any) /workspace/tfjs-node/src/tensorboard_test.ts[310, 18]: Type declaration of 'any' loses type-safety. Consider replacing it with a more precise type.
ERROR: (no-any) /workspace/tfjs-node/src/tensorboard_test.ts[311, 53]: Type declaration of 'any' loses type-safety. Consider replacing it with a more precise type.
ERROR: (no-any) /workspace/tfjs-node/src/tensorboard_test.ts[312, 51]: Type declaration of 'any' loses type-safety. Consider replacing it with a more precise type.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @lina128 and @pyu10055)
|
Hi @caisq , it complains the newly added test: maybe skip lint? |
…com:caisq/tfjs-1 into node-tensorboard-callback-initial-epoch-fix

Fixes #3705
Previously, the
tensorBoardcallback in tfjs-node does not honorthe
initialEpocharg passed to thefit()call that uses thecallback. It always incorrectly starts from 0.
This CL fixes this bug by using the
epocharg passed toonEpochEnd()instead of an
epochsSeencounter maintained by the callback objectitself.
This change is