How can I continue my predictor training when interrupted? #63

hcdeng6 · 2020-05-22T16:29:04Z

Hi,
I am using a very large corpus to train a predictor, and I set 6 epochs totally. Each epoch costs me more than 24 hours because of the large-scale corpus. However, it seems that my machine could not stand such a heavy work and the program got interrupted two times when it was on the 4th epoch. However, restarting the kiwi program will waste the former epoch, so I wonder how I can get the checkpoint or continue predictor training from where the program interrupted. Could you tell me what I should do? Thank you.

kepler · 2020-05-25T08:17:02Z

Hi @hcdeng6,

You should use the --resume flag and specify either --output-dir or --run-uuid to point to your partially trained model (https://unbabel.github.io/OpenKiwi/cli/train.html#training-save-load).

captainvera · 2020-09-08T14:40:04Z

Hey @hcdeng6 I'm going to assume this issue has been solved.

Feel free to re-open if you still have problems

captainvera closed this as completed Sep 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I continue my predictor training when interrupted? #63

How can I continue my predictor training when interrupted? #63

hcdeng6 commented May 22, 2020

kepler commented May 25, 2020

captainvera commented Sep 8, 2020

How can I continue my predictor training when interrupted? #63

How can I continue my predictor training when interrupted? #63

Comments

hcdeng6 commented May 22, 2020

kepler commented May 25, 2020

captainvera commented Sep 8, 2020