## Exporting PyTorch Models to ONNX and Running Inference

### Overview

ONNX (Open Neural Network Exchange) is an open-source format designed for the efficient exchange of machine learning models across different frameworks. It can be seen as a standardized language for neural networks, enabling models trained in one framework (such as PyTorch) to be used in another environment, simplifying deployment and optimizing performance. For more details, refer to the [ONNX documentation](https://onnx.ai/onnx/intro/concepts.html#onnx-concepts).

By converting models to the ONNX format, machine learning practitioners can avoid framework-specific dependencies and enable seamless deployment across a wide variety of platforms and languages like C++, Python, Java, or even WebAssembly.

### Advantages of Using ONNX

- **Framework Interoperability**: ONNX provides a unified representation of models that can be shared and deployed irrespective of the development environment.
- **Optimized Inference**: Many runtimes are optimized for ONNX models, which can lead to faster inference times compared to native model formats.
- **Flexible Deployment**: ONNX models can be executed in production environments using ONNX-compatible runtimes, making it easier to integrate with applications across different platforms and languages.

### How It Works

Once converted, an ONNX model defines a computational graph consisting of various ONNX operators. During inference, the production environment requires an ONNX-compatible runtime (such as ONNX Runtime) to interpret and execute this graph. Serialization is used to optimize the model's size by compressing the entire model.

### PyTorch to ONNX Conversion and Inference

In this tutorial, we'll walk through the steps of exporting a PyTorch model to ONNX format and demonstrate how to perform inference using the ONNX Runtime in Python.


#### Example for a convnext-tiny model


1. basic imports

In [None]:
import torch
import torch.nn as nn
from torchvision import models

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

2. Define the model architecture used to train the model

In [None]:
class ConvNext_tiny_pretrained(nn.Module): # simple implementation of the ConvNext_tiny
    def __init__(self, num_classes=2):
        super(ConvNext_tiny_pretrained, self).__init__()
        self.model = models.convnext.convnext_tiny(weights='DEFAULT') # importing the model with pretrained weights
        self.model.classifier[-1] = nn.Linear(768, num_classes) # changing the last layer to output num_classes

    def forward(self, x):
        return self.model(x) # forward pass 

3. Get the pre-trained model from path and load the weights to the model 

In [None]:
model_path = 'path/to/your/model.pth' # path to the model
model = ConvNext_tiny_pretrained() # creating the model
model.to(device) # moving the model to the device
model.load_state_dict(torch.load(model_path)) # loading the model weights
model.eval() # setting the model to evaluation mode


4. set the input size used for training

In [None]:
input_size = (3, 224, 224) # input size of the model
input_tensor = torch.rand(1, *input_size) # creating a random input tensor
input_tensor = input_tensor.to(device) # moving the input tensor to the device (same as the model)

5. export the model

In [None]:
torch.onnx.export(model, input_tensor, 'ConvNext_tiny_pretrained.onnx', verbose=True) # exporting the model to ONNX format

### Running inference 

1. imports

In [None]:
import onnx, onnxruntime
import onnx.numpy_helper
import numpy as np
import cv2
import glob

In [None]:
# onnx model path
model_onnx_path = 'ConvNext_tiny_pretrained.onnx'
# runtime initialization
ort_session = onnxruntime.InferenceSession(model_onnx_path, ['CPUExecutionProvider']) # use 'CUDAExecutionProvider' for CPU



2. Preparing the images for inference:
* Model accepts 3x244x244 RGB images 

In [None]:
def preprocess_images(path): # path to folder containing images
    images = []
    for image_path in glob.glob(path + '/*.jpg'): # read all jpg images in the folder
        image = cv2.imread(image_path) # read the image
        image = cv2.resize(image, (224, 224)) # resize the image
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # convert the image to RGB , because opencv reads the image in BGR format
        image = np.transpose(image, (2, 0, 1)) # change the image format to CxHxW (channels first)
        image = np.expand_dims(image, axis=0) # add a batch dimension to the image (1, C, H, W), i.e one image per batch
        images.append(image) # append the image to the list
    return np.array(images) # return the list of images as a numpy array

3. add the softmax function as it's not inheritely in torch models

In [None]:
def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

4. function to run inference

In [None]:
def run_inference(image, ort_session):
    input_name = ort_session.get_inputs()[0].name # get the input name of the model 
    output_name = ort_session.get_outputs()[0].name # get the output name of the model
    result = ort_session.run([output_name], {input_name: image}) # run the inference on the image 
    probabilities = softmax(result[0][0]) # apply softmax to the output to get the probabilities
    return probabilities

5. define output class names as in the pre-trained model 

In [None]:
class_names = ['class_1', 'class_2', '...'] # class names

6. run the inference

In [None]:
images = preprocess_images('path/to/your/images') # preprocess the images
for image in images:
    probabilities = run_inference(image, ort_session) # run the inference on the image
    class_idx = np.argmax(probabilities) # get the index of the class with the highest probability
    class_name = class_names[class_idx] # get the class name
    print(f'Class: {class_name}, Probability: {probabilities[class_idx]}') # print the class name and probability
