<a href="https://colab.research.google.com/github/Lednik7/CLIP-ONNX/blob/main/examples/clip_onnx_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Restart colab session after installation
Reload the session if something doesn't work

In [1]:
%%capture
!pip install git+https://github.com/Lednik7/CLIP-ONNX.git
!pip install git+https://github.com/openai/CLIP.git
!pip install onnxruntime-gpu

In [2]:
%%capture
!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true

In [1]:
!nvidia-smi

Thu Oct 10 09:19:11 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.36                 Driver Version: 546.33       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 3070 ...    On  | 00000000:01:00.0  On |                  N/A |
| N/A   44C    P8              18W / 115W |   1432MiB /  8192MiB |     27%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [8]:
import onnxruntime
print(onnxruntime.get_device())

CPU


## CPU inference mode

### Torch CLIP

In [12]:
import clip
from PIL import Image
import numpy as np

# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)

# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224]
image_onnx = image.detach().cpu().numpy().astype(np.float32)

# batch first
text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77]
text_onnx = text.detach().cpu().numpy().astype(np.int32)

In [4]:
%timeit model(image, text)

162 ms ± 13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### CLIP-ONNX

In [10]:
from clip_onnx import clip_onnx, attention
clip.model.ResidualAttentionBlock.attention = attention

onnx_model = clip_onnx(model)
onnx_model.convert2onnx(image, text, verbose=True)
# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
onnx_model.start_sessions(providers=["CPUExecutionProvider"]) # cpu mode

[CLIP ONNX] Start convert visual model
verbose: False, log level: Level.ERROR

[CLIP ONNX] Start check visual model
[CLIP ONNX] Start convert textual model
verbose: False, log level: Level.ERROR

[CLIP ONNX] Start check textual model
[CLIP ONNX] Models converts successfully


In [13]:
%timeit onnx_model(image_onnx, text_onnx)

216 ms ± 9.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## GPU inference mode
Select a runtime GPU to continue:

Click Runtime -> Change Runtime Type -> switch "Harware accelerator" to be GPU. Save it, and you maybe connect to GPU

### CLIP-ONNX

In [14]:
onnx_model.start_sessions(providers=["CUDAExecutionProvider"]) # GPU mode

  if not isinstance(provider_options, collections.abc.Sequence):


In [15]:
onnx_model.visual_session.get_providers() # optional

['CPUExecutionProvider']

In [16]:
%timeit onnx_model(image_onnx, text_onnx)

230 ms ± 17.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Torch CLIP

In [1]:
import clip
from PIL import Image

device = "cuda"
# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device=device, jit=False)
# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0).to(device) # [1, 3, 224, 224]
text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device) # [3, 77]

In [11]:
%timeit model(image, text)

10 loops, best of 5: 72.2 ms per loop
