Multiple problem with YOLOv8 (Export .engine) - (Slow predict with best.pt or last.pt) #9556

pnthai88 · 2024-04-04T16:05:13Z

Search before asking

I have searched the YOLOv8 issues and discussions and found no similar questions.

Question

I have multiple problem with Yolov8:

- Very slow predict on best.pt nor last.pt exported from custom train

TRAIN (all images are same size)

if __name__ == '__main__':
    if train:
        # Load a model
        model = YOLO('yolo_models/yolo_models/last.pt')  # load a pretrained model (recommended for training)

        # Define the path to save the weights
        weights_path = 'yolo_models/yolo_models/weights'
        save_dir = 'yolo_modesl/yolo_models/'

        # Train the model
        # optimizer='Adam', save_dir=weights_path,  project=weights_path
        # SGD, Adam, AdamW, NAdam, RAdam, RMSProp
        results = model.train(data='yolo_models/yolo_models/coco8.yaml', epochs=500, imgsz=640, batch=-1, device=[0], pretrained=False, model='yolov8n.pt', save_dir=save_dir, project=weights_path)

        # Save the trained model
        model.cuda()
        model.save('yolo_models/yolo_models/trained_model.pt')
        model.cuda()
        model.export(opset=12, int8=True, device=0)#, imgsz=[1080, 1920])
        model.cuda()
        model.export(format='onnx', opset=12, int8=True, device=0)#, imgsz=[1080, 1920])
        model.cuda()
        model.export(format='engine', opset=12, int8=True, device=0)#, imgsz=[1080, 1920])

PREDICT

model_path = "yolo_models/yolo_models/race.pt" #(this is best.pt)
# Initialize YOLO object with the trained model
MODEL = YOLO(model_path, task='detect')
results = MODEL.predict(resized_frame, stream=True, device=0)#, imgsz=[1080, 1920])

RESULT

0: 1088x1088 (no detections), 51.9ms
Speed: 8.0ms preprocess, 51.9ms inference, 15.7ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 66.1ms
Speed: 4.0ms preprocess, 66.1ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 50.4ms
Speed: 8.5ms preprocess, 50.4ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 64.1ms
Speed: 1.5ms preprocess, 64.1ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 71.1ms
Speed: 0.0ms preprocess, 71.1ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 77.8ms
Speed: 15.6ms preprocess, 77.8ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 76.8ms
Speed: 15.6ms preprocess, 76.8ms inference, 0.5ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 68.9ms
Speed: 8.6ms preprocess, 68.9ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 52.9ms
Speed: 15.6ms preprocess, 52.9ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1088)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1088 (no detections), 52.5ms

I try with onnx file exported which much better but not expected since my train report speed
best.onnx predict speed

0: 1088x1920 (no detections), 15.6ms
Speed: 7.5ms preprocess, 15.6ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1920)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1920 (no detections), 15.6ms
Speed: 21.1ms preprocess, 15.6ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1920)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1920 (no detections), 22.1ms
Speed: 15.7ms preprocess, 22.1ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1920)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1920 (no detections), 22.1ms
Speed: 0.0ms preprocess, 22.1ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1920)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1920 (no detections), 25.1ms
Speed: 15.6ms preprocess, 25.1ms inference, 0.0ms postprocess per image at shape (1, 3, 1088, 1920)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
0: 1088x1920 (no detections), 15.6ms

train report speed

Ultralytics YOLOv8.1.36 🚀 Python-3.11.0 torch-2.2.1+cu118 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11264MiB)
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:01<00:00,  2.43it/s]
                   all        243        233      0.871      0.897      0.897      0.744
Speed: 0.1ms preprocess, 0.7ms inference, 0.0ms loss, 1.3ms postprocess per image
Results saved to yolo_models\yolo_models\weights\train34

- Unable to export .engine format to try

Optimizer stripped from yolo_models\yolo_models\weights\train34\weights\last.pt, 6.2MB
Optimizer stripped from yolo_models\yolo_models\weights\train34\weights\best.pt, 6.2MB

Validating yolo_models\yolo_models\weights\train34\weights\best.pt...
Ultralytics YOLOv8.1.36 🚀 Python-3.11.0 torch-2.2.1+cu118 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11264MiB)
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 3/3 [00:01<00:00,  2.43it/s]
                   all        243        233      0.871      0.897      0.897      0.744
Speed: 0.1ms preprocess, 0.7ms inference, 0.0ms loss, 1.3ms postprocess per image
Results saved to yolo_models\yolo_models\weights\train34
Ultralytics YOLOv8.1.36 🚀 Python-3.11.0 torch-2.2.1+cu118 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11264MiB)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs

PyTorch: starting from 'yolo_models\yolo_models\weights\train34\weights\best.pt' with input shape (1, 3, 1088, 1920) BCHW and output shape(s) (1, 5, 42840) (5.9 MB)

TorchScript: starting export with torch 2.2.1+cu118...
TorchScript: export success ✅ 1.0s, saved as 'yolo_models\yolo_models\weights\train34\weights\best.torchscript' (12.3 MB)

Export complete (1.3s)
Results saved to E:\livetracker\yolo_models\yolo_models\weights\train34\weights
Predict:         yolo predict task=detect model=yolo_models\yolo_models\weights\train34\weights\best.torchscript imgsz=1088,1920 int8
Validate:        yolo val task=detect model=yolo_models\yolo_models\weights\train34\weights\best.torchscript imgsz=1088,1920 data=yolo_models/yolo_models/coco8.yaml int8 WARNING ⚠️ non-PyTorch val requires square images, 'imgsz=[1088, 1920]' will not work. Use export 'imgsz=1920' if val is required.
Visualize:       https://netron.app
Ultralytics YOLOv8.1.36 🚀 Python-3.11.0 torch-2.2.1+cu118 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11264MiB)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs

PyTorch: starting from 'yolo_models\yolo_models\weights\train34\weights\best.pt' with input shape (1, 3, 1088, 1920) BCHW and output shape(s) (1, 5, 42840) (5.9 MB)

ONNX: starting export with onnx 1.16.0 opset 12...
ONNX: export success ✅ 0.8s, saved as 'yolo_models\yolo_models\weights\train34\weights\best.onnx' (12.3 MB)

Export complete (1.0s)
Results saved to E:\livetracker\yolo_models\yolo_models\weights\train34\weights
Predict:         yolo predict task=detect model=yolo_models\yolo_models\weights\train34\weights\best.onnx imgsz=1088,1920 int8
Validate:        yolo val task=detect model=yolo_models\yolo_models\weights\train34\weights\best.onnx imgsz=1088,1920 data=yolo_models/yolo_models/coco8.yaml int8 WARNING ⚠️ non-PyTorch val requires square images, 'imgsz=[1088, 1920]' will not work. Use export 'imgsz=1920' if val is required.
Visualize:       https://netron.app
Ultralytics YOLOv8.1.36 🚀 Python-3.11.0 torch-2.2.1+cu118 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11264MiB)
WARNING ⚠️ imgsz=[1080, 1920] must be multiple of max stride 32, updating to [1088, 1920]
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs

PyTorch: starting from 'yolo_models\yolo_models\weights\train34\weights\best.pt' with input shape (1, 3, 1088, 1920) BCHW and output shape(s) (1, 5, 42840) (5.9 MB)

ONNX: starting export with onnx 1.16.0 opset 12...
ONNX: export success ✅ 0.7s, saved as 'yolo_models\yolo_models\weights\train34\weights\best.onnx' (12.3 MB)

TensorRT: starting export with TensorRT 10.0.0b6...
[04/04/2024-22:50:27] [TRT] [I] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 29515, GPU 1844 (MiB)
[04/04/2024-22:50:30] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1398, GPU +178, now: CPU 31161, GPU 2022 (MiB)
TensorRT: export failure ❌ 4.3s: 'tensorrt_bindings.tensorrt.IBuilderConfig' object has no attribute 'max_workspace_size'
Traceback (most recent call last):
  File "E:\livetracker\train.py", line 52, in <module>

  File "C:\Users\thaip\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\model.py", line 590, in export
    return Exporter(overrides=args, _callbacks=self.callbacks)(model=self.model)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thaip\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thaip\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\exporter.py", line 285, in __call__
    f[1], _ = self.export_engine()
              ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thaip\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\exporter.py", line 138, in outer_func
    raise e
  File "C:\Users\thaip\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\exporter.py", line 133, in outer_func
    f, model = inner_func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thaip\AppData\Local\Programs\Python\Python311\Lib\site-packages\ultralytics\engine\exporter.py", line 678, in export_engine
    config.max_workspace_size = self.args.workspace * 1 << 30
    ^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'tensorrt_bindings.tensorrt.IBuilderConfig' object has no attribute 'max_workspace_size'

Additional

Great for the TensorRT 10 support, i'm able to export and work with .engine file with amazing speed

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

From pull request: #9516

The text was updated successfully, but these errors were encountered:

github-actions · 2024-04-04T16:05:40Z

👋 Hello @pnthai88, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

pnthai88 · 2024-04-04T17:08:45Z

Great for the TensorRT 10 support, i'm able to export and work with .engine file with amazing speed

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 MUC_TIEU, 0.0ms
Speed: 0.0ms preprocess, 0.0ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

From pull request: #9516

glenn-jocher · 2024-04-04T19:37:54Z

@pnthai88 thrilled to hear you're loving the TensorRT 10 support and seeing such incredible speed with the .engine files! 🚀 It's fantastic to know the enhancements are making a significant impact. If you have any more feedback or need assistance with anything else, feel free to reach out. Happy detecting!

github-actions · 2024-05-06T00:16:04Z

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

pnthai88 added the question Further information is requested label Apr 4, 2024

github-actions bot added the Stale label May 6, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple problem with YOLOv8 (Export .engine) - (Slow predict with best.pt or last.pt) #9556

Multiple problem with YOLOv8 (Export .engine) - (Slow predict with best.pt or last.pt) #9556

pnthai88 commented Apr 4, 2024 •

edited

github-actions bot commented Apr 4, 2024

pnthai88 commented Apr 4, 2024 •

edited

glenn-jocher commented Apr 4, 2024

github-actions bot commented May 6, 2024

Multiple problem with YOLOv8 (Export .engine) - (Slow predict with best.pt or last.pt) #9556

Multiple problem with YOLOv8 (Export .engine) - (Slow predict with best.pt or last.pt) #9556

Comments

pnthai88 commented Apr 4, 2024 • edited

Search before asking

Question

- Very slow predict on best.pt nor last.pt exported from custom train

- Unable to export .engine format to try

Additional

github-actions bot commented Apr 4, 2024

Install

Environments

Status

pnthai88 commented Apr 4, 2024 • edited

glenn-jocher commented Apr 4, 2024

github-actions bot commented May 6, 2024

pnthai88 commented Apr 4, 2024 •

edited

pnthai88 commented Apr 4, 2024 •

edited