Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple problem with YOLOv8 (Export .engine) - (Slow predict with best.pt or last.pt) #9556

Closed
1 task done
pnthai88 opened this issue Apr 4, 2024 · 4 comments
Closed
1 task done
Labels
question Further information is requested Stale

Comments

@pnthai88
Copy link

pnthai88 commented Apr 4, 2024

Search before asking

Question

I have multiple problem with Yolov8:

- Very slow predict on best.pt nor last.pt exported from custom train

TRAIN (all images are same size)

if __name__ == '__main__':
    if train:
        # Load a model
        model = YOLO('yolo_models/yolo_models/last.pt')  # load a pretrained model (recommended for training)

        # Define the path to save the weights
        weights_path = 'yolo_models/yolo_models/weights'
        save_dir = 'yolo_modesl/yolo_models/'

        # Train the model
        # optimizer='Adam', save_dir=weights_path,  project=weights_path
        # SGD, Adam, AdamW, NAdam, RAdam, RMSProp
        results = model.train(data='yolo_models/yolo_models/coco8.yaml', epochs=500, imgsz=640, batch=-1, device=[0], pretrained=False, model='yolov8n.pt', save_dir=save_dir, project=weights_path)

        # Save the trained model
        model.cuda()
        model.save('yolo_models/yolo_models/trained_model.pt')
        model.cuda()
        model.export(opset=12, int8=True, device=0)#, imgsz=[1080, 1920])
        model.cuda()
        model.export(format='onnx', opset=12, int8=True, device=0)#, imgsz=[1080, 1920])
        model.cuda()
        model.export(format='engine', opset=12, int8=True, device=0)#, imgsz=[1080, 1920])

PREDICT

model_path = "yolo_models/yolo_models/race.pt" #(this is best.pt)
# Initialize YOLO object with the trained model
MODEL = YOLO(model_path, task='detect')
results = MODEL.predict(resized_frame, stream=True, device=0)#, imgsz=[1080, 1920])

RESULT

0: 1088x1088 (no detections), 51.9ms
Speed: 8.0ms preprocess, 51.9ms inference, 15.7ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 66.1ms
Speed: 4.0ms preprocess, 66.1ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 50.4ms
Speed: 8.5ms preprocess, 50.4ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 64.1ms
Speed: 1.5ms preprocess, 64.1ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 71.1ms
Speed: 0.0ms preprocess, 71.1ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 77.8ms
Speed: 15.6ms preprocess, 77.8ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 76.8ms
Speed: 15.6ms preprocess, 76.8ms inference, 0.5ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 68.9ms
Speed: 8.6ms preprocess, 68.9ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 52.9ms
Speed: 15.6ms preprocess, 52.9ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 52.5ms

I try with onnx file exported which much better but not expected since my train report speed
best.onnx predict speed

0: 1088x1920 (no detections), 15.6ms
Speed: 7.5ms preprocess, 15.6ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1920)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1920 (no detections), 15.6ms
Speed: 21.1ms preprocess, 15.6ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1920)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1920 (no detections), 22.1ms
Speed: 15.7ms preprocess, 22.1ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1920)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1920 (no detections), 22.1ms
Speed: 0.0ms preprocess, 22.1ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1920)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1920 (no detections), 25.1ms
Speed: 15.6ms preprocess, 25.1ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1920)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1920 (no detections), 15.6ms

train report speed

Ultralytics YOLOv8.1.36 🚀 Python-3.11.0 torch-2.2.1+cu118 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11264MiB)
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:01<00:00,  2.43it/s]
                   all        243        233      0.871      0.897      0.897      0.744
Speed: 0.1ms preprocess, 0.7ms inference, 0.0ms loss, 1.3ms postprocess per image
Results saved to yolo_models\yolo_models\weights\train34

- Unable to export .engine format to try

Optimizer stripped from yolo_models\yolo_models\weights\train34\weights\last.pt, 6.2MB
Optimizer stripped from yolo_models\yolo_models\weights\train34\weights\best.pt, 6.2MB

Validating yolo_models\yolo_models\weights\train34\weights\best.pt...
Ultralytics YOLOv8.1.36 🚀 Python-3.11.0 torch-2.2.1+cu118 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11264MiB)
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:01<00:00,  2.43it/s]
                   all        243        233      0.871      0.897      0.897      0.744
Speed: 0.1ms preprocess, 0.7ms inference, 0.0ms loss, 1.3ms postprocess per image
Results saved to yolo_models\yolo_models\weights\train34
Ultralytics YOLOv8.1.36 🚀 Python-3.11.0 torch-2.2.1+cu118 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11264MiB)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs

PyTorch: starting from 'yolo_models\yolo_models\weights\train34\weights\best.pt' with input shape (1, 3, 1088, 1920) BCHW and output shape(s) (1, 5, 42840) (5.9 MB)

TorchScript: starting export with torch 2.2.1+cu118...
TorchScript: export success ✅ 1.0s, saved as 'yolo_models\yolo_models\weights\train34\weights\best.torchscript' (12.3 MB)

Export complete (1.3s)
Results saved to E:\livetracker\yolo_models\yolo_models\weights\train34\weights
Predict:         yolo predict task=detect model=yolo_models\yolo_models\weights\train34\weights\best.torchscript imgsz=1088,1920 int8
Validate:        yolo val task=detect model=yolo_models\yolo_models\weights\train34\weights\best.torchscript imgsz=1088,1920 data=yolo_models/yolo_models/coco8.yaml int8 WARNING ⚠️ non-PyTorch val requires square images, 'imgsz=[1088, 1920]' will not work. Use export 'imgsz=1920' if val is required.
Visualize:       https://netron.app
Ultralytics YOLOv8.1.36 🚀 Python-3.11.0 torch-2.2.1+cu118 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11264MiB)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs

PyTorch: starting from 'yolo_models\yolo_models\weights\train34\weights\best.pt' with input shape (1, 3, 1088, 1920) BCHW and output shape(s) (1, 5, 42840) (5.9 MB)

ONNX: starting export with onnx 1.16.0 opset 12...
ONNX: export success ✅ 0.8s, saved as 'yolo_models\yolo_models\weights\train34\weights\best.onnx' (12.3 MB)

Export complete (1.0s)
Results saved to E:\livetracker\yolo_models\yolo_models\weights\train34\weights
Predict:         yolo predict task=detect model=yolo_models\yolo_models\weights\train34\weights\best.onnx imgsz=1088,1920 int8
Validate:        yolo val task=detect model=yolo_models\yolo_models\weights\train34\weights\best.onnx imgsz=1088,1920 data=yolo_models/yolo_models/coco8.yaml int8 WARNING ⚠️ non-PyTorch val requires square images, 'imgsz=[1088, 1920]' will not work. Use export 'imgsz=1920' if val is required.
Visualize:       https://netron.app
Ultralytics YOLOv8.1.36 🚀 Python-3.11.0 torch-2.2.1+cu118 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11264MiB)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs

PyTorch: starting from 'yolo_models\yolo_models\weights\train34\weights\best.pt' with input shape (1, 3, 1088, 1920) BCHW and output shape(s) (1, 5, 42840) (5.9 MB)

ONNX: starting export with onnx 1.16.0 opset 12...
ONNX: export success ✅ 0.7s, saved as 'yolo_models\yolo_models\weights\train34\weights\best.onnx' (12.3 MB)

TensorRT: starting export with TensorRT 10.0.0b6...
[04/04/2024-22:50:27] [TRT] [I] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 29515, GPU 1844 (MiB)
[04/04/2024-22:50:30] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1398, GPU +178, now: CPU 31161, GPU 2022 (MiB)
TensorRT: export failure ❌ 4.3s: 'tensorrt_bindings.tensorrt.IBuilderConfig' object has no attribute 'max_workspace_size'
Traceback (most recent call last):
  File "E:\livetracker\train.py", line 52, in <module>

  File "C:\Users\thaip\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\model.py", line 590, in export
    return Exporter(overrides=args, _callbacks=self.callbacks)(model=self.model)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thaip\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thaip\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\exporter.py", line 285, in __call__
    f[1], _ = self.export_engine()
              ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thaip\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\exporter.py", line 138, in outer_func
    raise e
  File "C:\Users\thaip\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\exporter.py", line 133, in outer_func
    f, model = inner_func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thaip\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\exporter.py", line 678, in export_engine
    config.max_workspace_size = self.args.workspace * 1 << 30
    ^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'tensorrt_bindings.tensorrt.IBuilderConfig' object has no attribute 'max_workspace_size'

Additional

Great for the TensorRT 10 support, i'm able to export and work with .engine file with amazing speed

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

From pull request: #9516

@pnthai88 pnthai88 added the question Further information is requested label Apr 4, 2024
Copy link

github-actions bot commented Apr 4, 2024

👋 Hello @pnthai88, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

Ultralytics CI

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

@pnthai88
Copy link
Author

pnthai88 commented Apr 4, 2024

Great for the TensorRT 10 support, i'm able to export and work with .engine file with amazing speed

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

From pull request: #9516

@glenn-jocher
Copy link
Member

@pnthai88 thrilled to hear you're loving the TensorRT 10 support and seeing such incredible speed with the .engine files! 🚀 It's fantastic to know the enhancements are making a significant impact. If you have any more feedback or need assistance with anything else, feel free to reach out. Happy detecting!

Copy link

github-actions bot commented May 6, 2024

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale label May 6, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

2 participants