Recover callback history when training is interrupted #50516

dennymarcels · 2021-06-29T14:26:21Z

I could not find a solution to my demand, so I believe it should be a feature to be implemented.

System information

TensorFlow version (you are using): 2
Are you willing to contribute it (Yes/No): Yes

Describe the feature and the current behavior/state.
I was wondering if it would be possible to checkpoint and recover the callback history, so that my callbacks can continue whatever they were tracking when training is interrupted for any reason.

Will this change the current api? How?
I don't know.

Who will benefit with this feature?
Whoever performs long training sessions.

Any Other info.
NA

jvishnuvardhan · 2021-07-01T16:23:15Z

@dennymarcels Thanks for creating the issue. keras moved to a new repository https://github.com/keras-team/keras/issues and that repo is dedicated for keras development. Earlier keras team published about the move in TF forum. The link is given below.

https://discuss.tensorflow.org/t/keras-project-moved-to-new-repository-in-https-github-com-keras-team-keras/1999

Regarding this feature, you could write custom callback to checkpoint and recover anytime. For example, you can use on_train_begin to checkpoint at the start of training and you can use other methods available to checkpoint at different times in the model training/testing as described here https://keras.io/guides/writing_your_own_callbacks/.

Please let me know what you think. Also, provide any use-case for further discussion. Thanks!

dennymarcels · 2021-07-02T12:09:03Z

Hey @jvishnuvardhan thank you for replying!

I don't think writing a custom callback would work because my model is customized itself, and I could only save its weights. I can surely load the weights if the training was interrupted, but all callbacks will be reset, meaning none knows which was the best loss so far, nor in which epoch it happened.

Also, if you feel that is more appropriate and you have the power to do so, would you mind moving this request to the Keras repository?

jvishnuvardhan · 2021-07-03T00:17:53Z

@dennymarcels When you write custom callback, you can inherit ModelCheckpoint callback and write the weights based on the performance of a metric. Please check this example. When you want to load weights, you can choose weights of last iteration/batch/epoch or best weights as mentioned in this TF tutorial.

If this is still an issue, please open in keras-team/keras. I cannot be able to move the issue to that repo because it is not part of tensorflow/tensorflow repository and I don't have permission. It is easy for you to open there and reference this issue. Thanks!

As entire Keras team is focussed on that repo, you would get faster response and resolution. Thanks!

dennymarcels · 2021-07-03T02:03:36Z

I did post an issue there, and while writing it I figured I was not that clear here. Thank you nonetheless. Closing.

dennymarcels added the type:feature Feature requests label Jun 29, 2021

google-ml-butler bot assigned saikumarchalla Jun 29, 2021

saikumarchalla added the comp:keras Keras related issues label Jul 1, 2021

saikumarchalla assigned jvishnuvardhan and unassigned saikumarchalla Jul 1, 2021

jvishnuvardhan added the stat:awaiting response Status - Awaiting response from author label Jul 1, 2021

dennymarcels closed this as completed Jul 3, 2021

haifeng-jin mentioned this issue Jul 6, 2021

Recover callback state when training is interrupted keras-team/keras#14862

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recover callback history when training is interrupted #50516

Recover callback history when training is interrupted #50516

dennymarcels commented Jun 29, 2021

jvishnuvardhan commented Jul 1, 2021

dennymarcels commented Jul 2, 2021 •

edited

jvishnuvardhan commented Jul 3, 2021

dennymarcels commented Jul 3, 2021

Recover callback history when training is interrupted #50516

Recover callback history when training is interrupted #50516

Comments

dennymarcels commented Jun 29, 2021

jvishnuvardhan commented Jul 1, 2021

dennymarcels commented Jul 2, 2021 • edited

jvishnuvardhan commented Jul 3, 2021

dennymarcels commented Jul 3, 2021

dennymarcels commented Jul 2, 2021 •

edited