# Waktu Proses ONNX

Di notebook ini, kami akan menunjukkan cara menggunakan ONNX Runtime untuk mempercepat inferensi model yang dilatih di PyTorch. Selain itu, kami akan menggunakan ONNX untuk mengkuantisasi model hingga presisi int8 guna lebih meningkatkan kinerja dengan mengurangi jejak memori. Kami akan melatih model sederhana pada kumpulan data MNIST dan kemudian mengonversinya ke format ONNX. Kami kemudian akan menggunakan ONNX Runtime untuk mempercepat inferensi model. Terakhir, kami akan mengkuantisasi model hingga presisi int8

## Atur Waktu Proses ONNX

Pertama, instal torch, torchvision, onnx dan onnxruntime. Kemudian, impor modul yang diperlukan

In [1]:
%pip install torch torchvision
%pip install onnx onnxruntime

Collecting onnx
  Downloading onnx-1.17.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (16 kB)
Collecting onnxruntime
  Downloading onnxruntime-1.20.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting coloredlogs (from onnxruntime)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime)
  Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Downloading onnx-1.17.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.0/16.0 MB[0m [31m23.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading onnxruntime-1.20.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (13.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.3/13.3 MB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 k

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import torch.quantization
import pathlib
import numpy as np
import torch.onnx
import onnx
import onnxruntime
from onnxruntime.quantization import quantize_dynamic, quantize_static, CalibrationDataReader, QuantType

## Model Kereta Api

Kami akan melatih model CNN sederhana pada dataset MNIST.

In [3]:
transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
        ])

train_dataset = datasets.MNIST('./data', train=True, download=True,transform=transform)
test_dataset = datasets.MNIST('./data', train=False,transform=transform)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=12, kernel_size=3)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc = nn.Linear(12 * 13 * 13, 10)

    def forward(self, x):
        x = x.view(-1, 1, 28, 28)
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        output = F.log_softmax(x, dim=1)
        return output


train_loader = torch.utils.data.DataLoader(train_dataset, 32)
test_loader = torch.utils.data.DataLoader(test_dataset, 32)

device = "cpu"

epochs = 1

model = Net().to(device)
optimizer = optim.Adam(model.parameters())

model.train()

for epoch in range(1, epochs+1):
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
            epoch, batch_idx * len(data), len(train_loader.dataset),
            100. * batch_idx / len(train_loader), loss.item()))

MODEL_DIR = pathlib.Path("./onnx_models")
MODEL_DIR.mkdir(exist_ok=True)
torch.save(model.state_dict(), MODEL_DIR / "original_model.p")

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:00<00:00, 57.1MB/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 2.04MB/s]

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz





Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:00<00:00, 15.0MB/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<00:00, 7.79MB/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw








## Ekspor ke ONNX

Setelah pelatihan, ekspor model ke format ONNX.

In [4]:
x, _ = next(iter(train_loader))
torch.onnx.export(model,
                  x,
                  MODEL_DIR / "mnist_model.onnx",
                  export_params=True,
                  opset_version=10,
                  do_constant_folding=True,
                  input_names = ['input'],
                  output_names = ['output'],
                  dynamic_axes={'input' : {0 : 'batch_size'},
                                'output' : {0 : 'batch_size'}})

## Jalankan Inferensi dan Uji Kemiripan

Selanjutnya, validasi model yang dikonversi dengan menjalankan inferensi dan membandingkan hasilnya dengan model PyTorch.

In [5]:
torch_out = model(x)

onnx_model = onnx.load(MODEL_DIR / "mnist_model.onnx")
onnx.checker.check_model(onnx_model)

ort_session = onnxruntime.InferenceSession(MODEL_DIR / "mnist_model.onnx", providers=["CPUExecutionProvider"])

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)

# compare ONNX Runtime and PyTorch results
np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05)

print("Exported model has been tested with ONNXRuntime, and the result looks good!")

Exported model has been tested with ONNXRuntime, and the result looks good!


## Quantization

### Kuantisasi Dinamis

Kuantisasi dinamis menghitung parameter yang akan dikuantisasi untuk aktivasi secara dinamis. Perhitungan ini meningkatkan akurasi model namun juga meningkatkan biaya inferensi.

In [6]:
!python -m onnxruntime.quantization.preprocess --input {MODEL_DIR / "mnist_model.onnx"} --output {MODEL_DIR / "mnist_model_processed.onnx"}

In [7]:
model_fp32 = MODEL_DIR / "mnist_model_processed.onnx"
model_quant = MODEL_DIR / "mnist_model_quant.onnx"
quantized_model = quantize_dynamic(model_fp32, model_quant, weight_type=QuantType.QUInt8)



### Bandingkan Ukuran

Mari kita bandingkan ukuran model aslinya, model terkuantisasi

In [8]:
%ls -lh {MODEL_DIR}

total 280K
-rw-r--r-- 1 root root 82K Jan  3 15:14 mnist_model.onnx
-rw-r--r-- 1 root root 82K Jan  3 15:14 mnist_model_processed.onnx
-rw-r--r-- 1 root root 26K Jan  3 15:14 mnist_model_quant.onnx
-rw-r--r-- 1 root root 82K Jan  3 15:14 original_model.p


### Bandingkan Akurasi

Mari kita bandingkan keakuratan model onnx yang dikonversi dan model terkuantisasi. Keakuratan model terkuantisasi harus mendekati model aslinya

In [9]:
def test_onnx(model_name, data_loader):
    onnx_model = onnx.load(model_name)
    onnx.checker.check_model(onnx_model)
    ort_session = onnxruntime.InferenceSession(model_name)
    test_loss = 0
    correct = 0
    for data, target in data_loader:
        ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(data)}
        output = ort_session.run(None, ort_inputs)[0]
        output = torch.from_numpy(output)
        test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
        pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
        correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(data_loader.dataset)

    return 100. * correct / len(data_loader.dataset)

acc = test_onnx(MODEL_DIR / "mnist_model.onnx", test_loader)
print(f"Accuracy of the original model is {acc}%")

qacc = test_onnx(MODEL_DIR / "mnist_model_quant.onnx", test_loader)
print(f"Accuracy of the quantized model is {qacc}%")


Accuracy of the original model is 96.47%
Accuracy of the quantized model is 96.53%


## Kuantisasi Statis

Untuk metode kuantisasi statis, parameter dikuantisasi terlebih dahulu menggunakan dataset kalibrasi. Metode ini lebih cepat dibandingkan kuantisasi dinamis namun akurasinya lebih rendah. Oleh karena itu, kumpulan data kalbrasi perlu dibuat menggunakan kelas `CalibrationDataReader`.

In [10]:
class QuantDR(CalibrationDataReader):
    def __init__(self, torch_data_loader, input_name):
        self.torch_data_loader = torch_data_loader
        self.input_name = input_name
        self.datasize = len(torch_data_loader)
        self.enum_data = iter(torch_data_loader)

    def to_numpy(self, tensor):
        return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

    def get_next(self):
        batch = next(self.enum_data, None)
        if batch is not None:
          return {self.input_name: self.to_numpy(batch[0])}
        else:
          return None

    def rewind(self):
        self.enum_data = iter(self.torch_data_loader)

calibration_data = QuantDR(train_loader, ort_session.get_inputs()[0].name)
model__static_quant = MODEL_DIR / "mnist_model_static_quant.onnx"
static_quant_model = quantize_static(model_fp32, model__static_quant, calibration_data, weight_type=QuantType.QInt8)



### Bandingkan Ukuran

Mari kita bandingkan ukuran model asli dan model terkuantisasi

In [11]:
%ls -lh {MODEL_DIR}

total 308K
-rw-r--r-- 1 root root 82K Jan  3 15:14 mnist_model.onnx
-rw-r--r-- 1 root root 82K Jan  3 15:14 mnist_model_processed.onnx
-rw-r--r-- 1 root root 26K Jan  3 15:14 mnist_model_quant.onnx
-rw-r--r-- 1 root root 25K Jan  3 15:15 mnist_model_static_quant.onnx
-rw-r--r-- 1 root root 82K Jan  3 15:14 original_model.p


### Bandingkan Akurasi

Mari kita bandingkan keakuratan model onnx yang dikonversi dan model terkuantisasi. Keakuratan model terkuantisasi harus mendekati model aslinya

In [12]:
static_qacc = test_onnx(model__static_quant, test_loader)
print(f"Accuracy of the static quantized model is {static_qacc}%")

Accuracy of the static quantized model is 96.51%
