-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem resuming training in Google Colab (Continued) #674
Comments
Hello! 👋 It seems the issue you're encountering is related to the download or loading of the model's checkpoint file. The error message you're seeing ( As you've already tried the recommended steps (re-running the cell, checking internet connection, and clearing the Colab environment), you could try the following additional step to ensure the
If the problem persists even after these steps, it's possible there may be an issue with the Remember to check the Ultralytics HUB Docs at https://docs.ultralytics.com/hub for more detailed instructions and troubleshooting tips. Your feedback is valuable, and the Ultralytics team appreciates your community involvement. Let's work together to solve this issue! 🚀 |
Hey there! 👋 It looks like you're encountering access issues due to permission settings or an expired token for the For downloading model checkpoints from the Ultralytics HUB, I recommend ensuring that you're logged into the hub using the Regarding the expired token in the second URL, this usually occurs because URLs with embedded credentials have a short validity period for security reasons. To resolve this, it's best to generate a fresh download URL by re-initiating your session or request immediately before you plan to download the file. If these steps don't resolve the issue, I'd suggest reaching out through our support channels with specifics (while avoiding sharing sensitive information like API keys publicly), so we can ensure proper access on your account. Let's get this sorted! 🚀 |
@sebasmej We just checked and everything is working fine when starting/resuming training in Google Colab. Do you still have the issues above? |
Yes, the problem persists. I have not been able to resume training, I am encountering the same errors I mentioned before. |
I’ve reviewed your model, and it appears there was indeed a hiccup with uploading the checkpoint for epoch 32. As a temporary measure, I’ve reverted the checkpoint to epoch 31 (previous successful checkpoint upload), which should allow you to resume training immediately. Could you please confirm if everything is back on track on your end? Additionally, I’ve documented this incident with our development team to investigate further and ensure a permanent fix is implemented. This will help prevent such issues from recurring in the future. PS If the error still occurs, maybe consider starting the training again (new model). |
Thank you for your prompt reply. Yes everything is working fine now. I was able to continue the model training from epoch 31 without any problem. |
@sebasmej I am glad your issue was solved. Thank you for you patience! |
@sergiuwaxmann I am having the same issue. I tried to manually download the checkpoint and its size is just 1kb. Could you revert my checkpoint to a previous successful checkpoint? Thanks in advance My model ID: https://hub.ultralytics.com/models/ung87rRVHYHU5Wrhmq8p?tab=train |
@vwyLss Can you check now? Last checkpoint should be epoch 125. |
@sergiuwaxmann It is working now, thanks! |
@vwyLss You're welcome! 🚀 |
Search before asking
HUB Component
Training
Bug
I am training a model using google colab and when I try to resume executing the commands:
the following error message appears:
I follow your recomendations to solve this issue, I Rerun the Training Cell, Check Internet Connection and Clear Colab Environment. but the issue persits. For further investigation i append details of the error after rerun:
Environment
Google Colab
Minimal Reproducible Example
Additional
No response
The text was updated successfully, but these errors were encountered: