Stucked on 100% Optimizing weights #630

sdy623 · 2024-04-09T06:11:43Z

Search before asking

I have searched the HUB issues and found no similar bug report.

HUB Component

Training

Bug

I used my own agent to train the model, but I can't find the model I trained on the HUB webpage.
Some has this simailar problem, but I can find my results.csv file. For that problem no results.csv file.

Here is my training log

Ultralytics HUB: Uploading checkpoint https://hub.ultralytics.com/models/GxzAXECMi65QweQlpTjs

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     97/100      17.4G      1.365     0.5464      1.235        101        640: 100%|██████████| 264/264 [01:42<00:00,  2
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 29/29 [00:11<
                   all        747      49603      0.948      0.942      0.976      0.667

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     98/100      17.3G      1.363     0.5444      1.235        139        640: 100%|██████████| 264/264 [01:40<00:00,  2
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 29/29 [00:11<
                   all        747      49603      0.949      0.941      0.977      0.667

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     99/100      17.3G       1.36     0.5425      1.232        137        640: 100%|██████████| 264/264 [01:40<00:00,  2
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 29/29 [00:11<
                   all        747      49603      0.949      0.941      0.977      0.668

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
    100/100      17.3G      1.349     0.5367       1.22        156        640: 100%|██████████| 264/264 [01:41<00:00,  2
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 29/29 [00:11<
                   all        747      49603       0.95      0.941      0.977      0.668

100 epochs completed in 3.364 hours.
Optimizer stripped from runs/detect/train12/weights/last.pt, 311.6MB
Optimizer stripped from runs/detect/train12/weights/best.pt, 311.6MB

Validating runs/detect/train12/weights/best.pt...
Ultralytics YOLOv8.1.45 🚀 Python-3.10.12 torch-2.1.0a0+32f93b1 CUDA:0 (B1.gpu.large, 24118MiB)
YOLOv5x6u summary (fused): 463 layers, 155375236 parameters, 0 gradients, 250.3 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 29/29 [00:42<
                   all        747      49603       0.95      0.941      0.977      0.668
Speed: 0.1ms preprocess, 8.5ms inference, 0.0ms loss, 1.8ms postprocess per image
Results saved to runs/detect/train12
Ultralytics HUB: Syncing final model...
100%|██████████| 297M/297M [02:44<00:00, 1.89MB/s]
Ultralytics HUB: Done ✅
Ultralytics HUB: View model at https://hub.ultralytics.com/models/GxzAXECMi65QweQlpTjs 🚀

After the View model at https://hub.ultralytics.com/models/GxzAXECMi65QweQlpTjs 🚀, the training exits, but shows
Optimizing weights. After a while it quits, it become disconnected, and I can't find the model I trained in the HUB.

Could some can help me, I will appreciate their help.

Environment

Trainging agent: docker-conatiner
Kernel version: 5.15.146
Memory: 24GB
GPU Memory: 24GB
Python: 3.10.12
CUDA: 12.2.140
torch: 2.1.0a0+32f93b1
ultralytics: 8.1.45

Minimal Reproducible Example

Login to hub
Choose 'Bring your own agent' option to train the model
Exec the model on my trainging agent
Wait the train ends
The program comes up.

Additional

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-04-09T06:12:05Z

👋 Hello @sdy623, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

Quickstart. Start training and deploying YOLO models with HUB in seconds.
Datasets: Preparing and Uploading. Learn how to prepare and upload your datasets to HUB in YOLO format.
Projects: Creating and Managing. Group your models into projects for improved organization.
Models: Training and Exporting. Train YOLOv5 and YOLOv8 models on your custom datasets and export them to various formats for deployment.
Integrations. Explore different integration options for your trained models, such as TensorFlow, ONNX, OpenVINO, CoreML, and PaddlePaddle.
Ultralytics HUB App. Learn about the Ultralytics App for iOS and Android, which allows you to run models directly on your mobile device.
- iOS. Learn about YOLO CoreML models accelerated on Apple's Neural Engine on iPhones and iPads.
- Android. Explore TFLite acceleration on mobile devices.
Inference API. Understand how to use the Inference API for running your trained models in the cloud to generate predictions.

If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

sdy623 · 2024-04-09T07:10:38Z

sergiuwaxmann · 2024-04-10T08:45:14Z

@sdy623 Thank you for bringing this to our attention.
It appears that the upload of final weights encountered a failure. Our team is currently investigating the issue to identify and resolve the underlying cause. I will keep you updated on our progress
Your patience and understanding are greatly appreciated.

sergiuwaxmann · 2024-04-22T10:55:21Z

Hello @sdy623!
Great news! Our team has released a fix for the issue you reported. You should no longer experience this problem in new Cloud Training sessions.
Thanks for your patience!

sdy623 · 2024-04-22T10:57:08Z

Thank you very much. How can I deploy the fixed version

sergiuwaxmann · 2024-04-22T11:52:35Z

@sdy623 When using Ultralytics HUB, the system automatically utilizes the latest version. For local training, please ensure you are using the most recent ultralytics version (8.2.2).
Unfortunately, the recent fix does not apply to models trained on earlier versions, so you will need to retrain your model with the latest version. We sincerely apologize for the inconvenience this causes.

sdy623 added the bug Something isn't working label Apr 9, 2024

ultralytics deleted a comment from pderrenger Apr 10, 2024

sergiuwaxmann self-assigned this Apr 10, 2024

sergiuwaxmann added the fixed Bug is resolved label Apr 22, 2024

sdy623 closed this as completed Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stucked on 100% Optimizing weights #630

Stucked on 100% Optimizing weights #630

sdy623 commented Apr 9, 2024

github-actions bot commented Apr 9, 2024

sdy623 commented Apr 9, 2024

sergiuwaxmann commented Apr 10, 2024

sergiuwaxmann commented Apr 22, 2024

sdy623 commented Apr 22, 2024

sergiuwaxmann commented Apr 22, 2024

Stucked on 100% Optimizing weights #630

Stucked on 100% Optimizing weights #630

Comments

sdy623 commented Apr 9, 2024

Search before asking

HUB Component

Bug

Environment

Minimal Reproducible Example

Additional

github-actions bot commented Apr 9, 2024

sdy623 commented Apr 9, 2024

sergiuwaxmann commented Apr 10, 2024

sergiuwaxmann commented Apr 22, 2024

sdy623 commented Apr 22, 2024

sergiuwaxmann commented Apr 22, 2024