AdamOptimizer + checkpoints #634

wchan · 2015-12-28T04:35:33Z

When using the AdamOptimizer, does the checkpoint save the state of the AdamOptimizer? If yes, how do you clear the state (i.e., restart the AdamOptimizer but keep the model weights from the checkpoint); if no, how do you maintain the Adam state. Question can be applied to AdaGrad as well.

girving · 2015-12-28T16:56:20Z

This kind of question should be asked on stackoverflow, since it is not about a bug or feature request for tensorflow. However, the answer is yes: the state of optimizers is stored in tf.Variable objects just like normal state, and these are saved to checkpoints.

xksteven · 2016-09-21T05:23:34Z

Can this be reopened as a feature request to create an API call to reset the state of an optimizer?

There doesn't seem to be a nice way to do it as far as I know.

yaroslavvb · 2016-09-21T05:28:17Z

You could "reset" state of optimizer by creating new optimizer over the
same parameter variable

On Tue, Sep 20, 2016 at 10:23 PM, Steven Basart notifications@github.com
wrote:

Can this be reopened as a feature request to create an API call to reset
the state of an optimizer?

There doesn't seem to be a nice way to do it as far as I know.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#634 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABaHMt3mME5XOahc81xRZZbU8OqkbWqks5qsL9YgaJpZM4G7spa
.

xksteven · 2016-09-21T15:20:56Z

That seems more like a "hack" in the sense that when I save my new model now I'll need to find a way to find and exclude the old optimizer variables and only save the old variables except the old optimizer and new optimizer. This starts to break the code. I was interested in experimenting with this technique here but other optimization schemes. However I don't see a nice way to do this once the model has been saved and I need to restart it.

Unless I'm misunderstanding how you intend to code it up?

I was imagining I have my graph:

def train():
  with tf.Session() as sess:
    inputs = get_inputs()
    model_out = infer(inputs)
    loss = (model_out, targets)
    train_op = (loss)
    saver = tf.train.Saver()
    sess.run(tf.initialize_all_variables())
    saver.restore(sess, checkpoint)

That's the old model then I imagine the change your suggesting is to add the one below?

    .... (same as above)
    saver.restore(sess, checkpoint)
    train_op = (loss)
    sess.run(tf.initialize_variables(tf.report_uninitialized_variables(tf.all_variables)))

yaroslavvb · 2016-09-21T16:04:18Z

You could also load checkpoint in a regular way, and then run initialize on all the variables you want to reset sess.run([v.initialize for v in variables_to_reset])

dylanbfox · 2017-03-07T16:31:39Z

@xksteven did that method work for you? simply overwriting the train_op variable after restoring the checkpoint?

xksteven · 2017-06-15T16:10:47Z

@dylanbfox

If you're using the tensorflow built in saver then if you "reset" the adam optimizer by simply creating a new adam optimizer then it will create an entire extra set of parameters for each variable in the graph while maintaining the old unused ones. This process will not work as a long term solution as you're almost doubling the size of graph every time you reset it in this way.

It is not a viable solution.

I honestly still don't have a good way to do this as one would need to iterate through all of the variables and reset all of the "slot" variables. I do not know of a way that works to accomplish this.

I would suggest keeping this issue open.

ghost · 2018-01-11T23:34:48Z

To solve this problem, you need to simply pass a list of variables to restore to the Saver object. I had this problem before, and after experimenting, I have posted my finding on saving and restoring variables along with the code on a stack overflow post. Here is the link:

https://stackoverflow.com/questions/48161147/error-restoring-model-in-tensorflow-after-changing-the-optimizer-paramter/48212514#48212514

…stration (tensorflow#634) * Block MatMul + BiasAdd fusion for FP16 to use MatMul primitive * Adding guards around FP16 registration

girving closed this as completed Dec 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AdamOptimizer + checkpoints #634

AdamOptimizer + checkpoints #634

wchan commented Dec 28, 2015

girving commented Dec 28, 2015

xksteven commented Sep 21, 2016

yaroslavvb commented Sep 21, 2016

xksteven commented Sep 21, 2016 •

edited

yaroslavvb commented Sep 21, 2016 •

edited

dylanbfox commented Mar 7, 2017

xksteven commented Jun 15, 2017

ghost commented Jan 11, 2018

AdamOptimizer + checkpoints #634

AdamOptimizer + checkpoints #634

Comments

wchan commented Dec 28, 2015

girving commented Dec 28, 2015

xksteven commented Sep 21, 2016

yaroslavvb commented Sep 21, 2016

xksteven commented Sep 21, 2016 • edited

yaroslavvb commented Sep 21, 2016 • edited

dylanbfox commented Mar 7, 2017

xksteven commented Jun 15, 2017

ghost commented Jan 11, 2018

xksteven commented Sep 21, 2016 •

edited

yaroslavvb commented Sep 21, 2016 •

edited