# OnnxSlim Python Package: 10–15% Faster ONNX Loads 🚀
OnnxSlim takes your ONNX models and pushes them even further, streamlining the architecture and trimming excess to deliver maximum speed without sacrificing performance.

![OnnxSlim vs Onnx](https://github.com/user-attachments/assets/083a4118-b359-4cc0-8686-8f9a5dcfa36d)




## Setup

| Project       | Downloads                                                                 |
|---------------|---------------------------------------------------------------------------|
| Ultralytics   | [![Ultralytics Downloads](https://static.pepy.tech/badge/ultralytics)](https://pepy.tech/projects/ultralytics) |
| OnnxSlim      | [![OnnxSlim Downloads](https://static.pepy.tech/badge/onnxslim)](https://pepy.tech/projects/onnxslim)         |




In [None]:
!pip install ultralytics  # OnnxSlim will be automatically installed during model export with Ultralytics Package

## Export the Ultralytics YOLO11 Model


### Without OnnxSlim: `simplify=False`.

Exporting the YOLO models is relatively simple, requiring a single CLI command and you're good to go.

In [5]:
!yolo export format=onnx model=yolo11n.pt simplify=False
!mv yolo11n.onnx yolo11n_simplify_false.onnx  # Rename exported onnx file for usage in next steps.

Ultralytics 8.3.160 🚀 Python-3.11.13 torch-2.6.0+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11n summary (fused): 100 layers, 2,616,248 parameters, 0 gradients, 6.5 GFLOPs

[34m[1mPyTorch:[0m starting from 'yolo11n.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (5.4 MB)
[31m[1mrequirements:[0m Ultralytics requirement ['onnx>=1.12.0,<1.18.0'] not found, attempting AutoUpdate...

[31m[1mrequirements:[0m AutoUpdate success ✅ 2.5s


[34m[1mONNX:[0m starting export with onnx 1.17.0 opset 19...
[34m[1mONNX:[0m export success ✅ 4.0s, saved as 'yolo11n.onnx' (10.2 MB)

Export complete (5.0s)
Results saved to [1m/content[0m
Predict:         yolo predict task=detect model=yolo11n.onnx imgsz=640  
Validate:        yolo val task=detect model=yolo11n.onnx imgsz=640 data=/usr/src/ultralytics/ultralytics/cfg/datasets/coco.yaml  
Visualize:       https://netron.app
💡 Learn more at https://docs.ultralytics.com/modes/export


### With OnnxSlim: `simplify=True`

You don't need any extra code to export YOLO11 with OnnxSlim. Simply set simplify=True In the export command below.

In [7]:
!yolo export format=onnx model=yolo11n.pt simplify=True
!mv yolo11n.onnx yolo11n_simplify_true.onnx  # Rename exported onnx file for usage in next steps.

Ultralytics 8.3.160 🚀 Python-3.11.13 torch-2.6.0+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11n summary (fused): 100 layers, 2,616,248 parameters, 0 gradients, 6.5 GFLOPs

[34m[1mPyTorch:[0m starting from 'yolo11n.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (5.4 MB)
[31m[1mrequirements:[0m Ultralytics requirements ['onnxslim>=0.1.56', 'onnxruntime'] not found, attempting AutoUpdate...

[31m[1mrequirements:[0m AutoUpdate success ✅ 1.3s


[34m[1mONNX:[0m starting export with onnx 1.17.0 opset 19...
[34m[1mONNX:[0m slimming with onnxslim 0.1.58...
[34m[1mONNX:[0m export success ✅ 6.8s, saved as 'yolo11n.onnx' (10.2 MB)

Export complete (7.4s)
Results saved to [1m/content[0m
Predict:         yolo predict task=detect model=yolo11n.onnx imgsz=640  
Validate:        yolo val task=detect model=yolo11n.onnx imgsz=640 data=/usr/src/ultralytics/ultralytics/cfg/datasets/coco.yaml  
Visualize:       https://netron.app
💡 Learn more at https://docs.ultraly

## Visualize OnnxSlim Modifications

It's time to compare the changes OnnxSlim made to the YOLO11n model during export. To visualize these modifications, you can use the mentioned command below.

In [8]:
!onnxslim yolo11n_simplify_false.onnx yolo11n_simplify_true.onnx

+--------------+-----------------------------+----------------------------+
|  Model Name  | yolo11n_simplify_false.onnx | yolo11n_simplify_true.onnx |
+--------------+-----------------------------+----------------------------+
|  Model Info  | Op Set: 19 / IR Version: 9  | Op Set: 19 / IR Version: 9 |
+--------------+-----------------------------+----------------------------+
|  IN: images  |  float32: (1, 3, 640, 640)  | float32: (1, 3, 640, 640)  |
| OUT: output0 |   float32: (1, 84, 8400)    |   float32: (1, 84, 8400)   |
+--------------+-----------------------------+----------------------------+
|     Add      |             17              |             [32m16[37m[0m             |
|    Concat    |             23              |             23             |
|   Constant   |             27              |             [32m0[37m[0m              |
|     Conv     |             88              |             88             |
|     Div      |              2              |             

## Model Load Time Benchmarks

You can use the provided code to measure the model load time. The best part? It calculates the average over five runs, giving you a more reliable metric than a single load measurement.

In [18]:
# pip install onnxruntime

import time

import onnxruntime as ort


def test_load_time(model_path, name, runs=5):
    times = []
    for i in range(runs):
        start = time.perf_counter()
        ort.InferenceSession(model_path, providers=["CPUExecutionProvider"])
        times.append((time.perf_counter() - start) * 1000)
        print(f"{name} load {i + 1}: {times[-1]:.1f}ms")

    avg = sum(times) / len(times)
    print(f"{name} average: {avg:.1f}ms\n")
    return avg


# Test both models
model1, model2 = ("yolo11n_simplify_false.onnx", "yolo11n_simplify_true.onnx")
print("Testing model load times (5 runs each):\n")

avg1, avg2 = (test_load_time(model1, "Original", 5), test_load_time(model2, "Simplified", 5))
diff = avg1 - avg2
percent = (diff / avg1) * 100

print("=" * 40)
print(f"Original:   {avg1:.1f}ms")
print(f"Simplified: {avg2:.1f}ms")
print(f"Difference: {diff:+.1f}ms ({percent:+.1f}%)")

if diff > 0:
    print(f"✅ Simplified is {abs(percent):.1f}% faster")
else:
    print(f"❌ Original is {abs(percent):.1f}% faster")

Testing model load times (5 runs each):

Original load 1: 68.7ms
Original load 2: 75.1ms
Original load 3: 74.7ms
Original load 4: 74.0ms
Original load 5: 78.6ms
Original average: 74.2ms

Simplified load 1: 64.6ms
Simplified load 2: 66.9ms
Simplified load 3: 64.4ms
Simplified load 4: 63.8ms
Simplified load 5: 69.8ms
Simplified average: 65.9ms

Original:   74.2ms
Simplified: 65.9ms
Difference: +8.3ms (+11.2%)
✅ Simplified is 11.2% faster


## Speed Comparison (Secondary Feature)

You can also check how it increases the FPS and per-frame processing, using the mentioned code below. It will load the model and perform inference on the first 10 frames of the video file.

😎 OnnxSlim isn't primarily intended to accelerate inference speed. Its main purpose is to streamline and clean up your ONNX model, making it more efficient in structure.

In [21]:
import cv2
import numpy as np


def test_model(model_path, name, video_path):
    print(f"{name}: ", end="")

    session = ort.InferenceSession(model_path, providers=["CPUExecutionProvider"])
    input_name = session.get_inputs()[0].name

    # Dummy test
    dummy_input = np.random.rand(1, 3, 640, 640).astype(np.float32)
    [session.run(None, {input_name: dummy_input}) for _ in range(10)]  # Warmup

    dummy_times = [
        (time.perf_counter(), session.run(None, {input_name: dummy_input}), time.perf_counter())[2]
        - (time.perf_counter(), session.run(None, {input_name: dummy_input}), time.perf_counter())[0]
        for _ in range(100)
    ]
    dummy_avg = sum(dummy_times) * 10  # Convert to ms

    # Video test
    cap = cv2.VideoCapture(video_path)
    video_times = []

    for _ in range(10):  # Test first 100 frames
        ret, frame = cap.read()
        if not ret:
            cap.set(cv2.CAP_PROP_POS_FRAMES, 0)
            ret, frame = cap.read()

        img = np.expand_dims((cv2.resize(frame, (640, 640)).astype(np.float32) / 255.0).transpose(2, 0, 1), 0)
        start = time.perf_counter()
        session.run(None, {input_name: img})
        video_times.append((time.perf_counter() - start) * 1000)

    cap.release()
    video_avg = sum(video_times) / len(video_times)

    print(
        f"Dummy: {dummy_avg:.1f}ms ({1000 / dummy_avg:.0f}fps) | Video: {video_avg:.1f}ms ({1000 / video_avg:.0f}fps)"
    )
    return dummy_avg, video_avg


def compare_models(model1, model2, video):
    print("ONNX COMPARISON")
    print("=" * 45)

    d1, v1 = test_model(model1, "Original", video)
    d2, v2 = test_model(model2, "Simplified", video)

    print("\nRESULTS:")
    print(f"Dummy: {'Simplified' if d2 < d1 else 'Original'} wins by {abs((d1 - d2) / d1 * 100):.1f}%")
    print(f"Video: {'Simplified' if v2 < v1 else 'Original'} wins by {abs((v1 - v2) / v1 * 100):.1f}%")
    print(f"WINNER: {'✅ SIMPLIFIED' if v2 < v1 else '✅ ORIGINAL'}")


if __name__ == "__main__":
    from ultralytics.utils.downloads import safe_download

    safe_download("https://github.com/ultralytics/assets/releases/download/v0.0.0/solutions_ci_demo.mp4")

    compare_models("yolo11n_simplify_false.onnx", "yolo11n_simplify_true.onnx", video="solutions_ci_demo.mp4")

ONNX COMPARISON
Original: Dummy: -0.0ms (-121767fps) | Video: 177.3ms (6fps)
Simplified: Dummy: -0.0ms (-120902fps) | Video: 163.7ms (6fps)

RESULTS:
Dummy: Simplified wins by 0.7%
Video: Simplified wins by 7.6%
WINNER: ✅ SIMPLIFIED


There is also a side by side comparison highlighted in our blog:

[Boost ONNX Load Speed by 10–15% with OnnxSlim's Python Package 🤩](https://muhammadrizwanmunawar.medium.com/boost-onnx-load-speed-by-10-15-with-onnxslims-python-package-d401eb8c2e69)