-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to resume train.py from checkpoint files #8
Comments
Hi, could you show the training error? You can reload the model from the latest checkpoint. |
Hi Zaixi, thanks for your reply. I got a lot of checkpoint files in the ".log/train_model_2023_xx_xx__xx_xx_xx/checkpoints " directory. If I want to reload the model from the latest checkpoint, what should I do is modify the checkpoint parameter in ./config/train_model.yml file? Are there any other parameters that need to be changed? What is the output file if the training job finished normally? model: For training error, the last few lines of log.txt file is as follow: It looks like an accidental interruption. By the way, can you provide an estimated training time on what kind of hardware? I would really appreciate it if you could provide the pre-trained model. Many thanks. |
I ran "python train.py" but it seemed that the training job didn't finish normally. How can I resume the training process with a checkpoint file? Many thanks!
The text was updated successfully, but these errors were encountered: