Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I continue my predictor training when interrupted? #63

Closed
hcdeng6 opened this issue May 22, 2020 · 2 comments
Closed

How can I continue my predictor training when interrupted? #63

hcdeng6 opened this issue May 22, 2020 · 2 comments

Comments

@hcdeng6
Copy link

hcdeng6 commented May 22, 2020

Hi,
I am using a very large corpus to train a predictor, and I set 6 epochs totally. Each epoch costs me more than 24 hours because of the large-scale corpus. However, it seems that my machine could not stand such a heavy work and the program got interrupted two times when it was on the 4th epoch. However, restarting the kiwi program will waste the former epoch, so I wonder how I can get the checkpoint or continue predictor training from where the program interrupted. Could you tell me what I should do? Thank you.

@kepler
Copy link
Collaborator

kepler commented May 25, 2020

Hi @hcdeng6,

You should use the --resume flag and specify either --output-dir or --run-uuid to point to your partially trained model (https://unbabel.github.io/OpenKiwi/cli/train.html#training-save-load).

@captainvera
Copy link
Contributor

Hey @hcdeng6 I'm going to assume this issue has been solved.

Feel free to re-open if you still have problems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants