In [1]:
%matplotlib inline


`Learn the Basics <intro.html>`_ ||
**Quickstart** ||
`Tensors <tensorqs_tutorial.html>`_ ||
`Datasets & DataLoaders <data_tutorial.html>`_ ||
`Transforms <transforms_tutorial.html>`_ ||
`Build Model <buildmodel_tutorial.html>`_ ||
`Autograd <autogradqs_tutorial.html>`_ ||
`Optimization <optimization_tutorial.html>`_ ||
`Save & Load Model <saveloadrun_tutorial.html>`_

Quickstart
===================
This section runs through the API for common tasks in machine learning. Refer to the links in each section to dive deeper.

Working with data
-----------------
PyTorch has two `primitives to work with data <https://pytorch.org/docs/stable/data.html>`_:
``torch.utils.data.DataLoader`` and ``torch.utils.data.Dataset``.
``Dataset`` stores the samples and their corresponding labels, and ``DataLoader`` wraps an iterable around
the ``Dataset``.


In [8]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

PyTorch offers domain-specific libraries such as `TorchText <https://pytorch.org/text/stable/index.html>`_,
`TorchVision <https://pytorch.org/vision/stable/index.html>`_, and `TorchAudio <https://pytorch.org/audio/stable/index.html>`_,
all of which include datasets. For this tutorial, we  will be using a TorchVision dataset.

The ``torchvision.datasets`` module contains ``Dataset`` objects for many real-world vision data like
CIFAR, COCO (`full list here <https://pytorch.org/vision/stable/datasets.html>`_). In this tutorial, we
use the FashionMNIST dataset. Every TorchVision ``Dataset`` includes two arguments: ``transform`` and
``target_transform`` to modify the samples and labels respectively.



In [9]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="./project/data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="./project/data",
    train=False,
    download=True,
    transform=ToTensor(),
)

We pass the ``Dataset`` as an argument to ``DataLoader``. This wraps an iterable over our dataset, and supports
automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element
in the dataloader iterable will return a batch of 64 features and labels.



In [4]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


Read more about `loading data in PyTorch <data_tutorial.html>`_.




--------------




Creating Models
------------------
To define a neural network in PyTorch, we create a class that inherits
from `nn.Module <https://pytorch.org/docs/stable/generated/torch.nn.Module.html>`_. We define the layers of the network
in the ``__init__`` function and specify how data will pass through the network in the ``forward`` function. To accelerate
operations in the neural network, we move it to the GPU if available.



In [12]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

Using cpu device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Read more about `building neural networks in PyTorch <buildmodel_tutorial.html>`_.




--------------




Optimizing the Model Parameters
----------------------------------------
To train a model, we need a `loss function <https://pytorch.org/docs/stable/nn.html#loss-functions>`_
and an `optimizer <https://pytorch.org/docs/stable/optim.html>`_.



In [6]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and
backpropagates the prediction error to adjust the model's parameters.



In [7]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

We also check the model's performance against the test dataset to ensure it is learning.



In [8]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several iterations (*epochs*). During each epoch, the model learns
parameters to make better predictions. We print the model's accuracy and loss at each epoch; we'd like to see the
accuracy increase and the loss decrease with every epoch.



In [9]:
epochs = 9
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.299248  [    0/60000]
loss: 2.285987  [ 6400/60000]
loss: 2.268440  [12800/60000]
loss: 2.267603  [19200/60000]
loss: 2.261528  [25600/60000]
loss: 2.229390  [32000/60000]
loss: 2.233319  [38400/60000]
loss: 2.198051  [44800/60000]
loss: 2.186997  [51200/60000]
loss: 2.170012  [57600/60000]
Test Error: 
 Accuracy: 46.3%, Avg loss: 2.155484 

Epoch 2
-------------------------------
loss: 2.161245  [    0/60000]
loss: 2.144951  [ 6400/60000]
loss: 2.090120  [12800/60000]
loss: 2.112195  [19200/60000]
loss: 2.064187  [25600/60000]
loss: 2.010152  [32000/60000]
loss: 2.031390  [38400/60000]
loss: 1.952070  [44800/60000]
loss: 1.954746  [51200/60000]
loss: 1.886197  [57600/60000]
Test Error: 
 Accuracy: 59.3%, Avg loss: 1.881937 

Epoch 3
-------------------------------
loss: 1.913361  [    0/60000]
loss: 1.871713  [ 6400/60000]
loss: 1.766361  [12800/60000]
loss: 1.807992  [19200/60000]
loss: 1.690934  [25600/60000]
loss: 1.662497  [32000/600

Read more about `Training your model <optimization_tutorial.html>`_.




--------------




Saving Models
-------------
A common way to save a model is to serialize the internal state dictionary (containing the model parameters).



In [10]:
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


Loading Models
----------------------------

The process for loading a model includes re-creating the model structure and loading
the state dictionary into it.



In [13]:
model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))

<All keys matched successfully>

This model can now be used to make predictions.



In [16]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"


Read more about `Saving & Loading your model <saveloadrun_tutorial.html>`_.




# Exporting model
### by Neu.ro MLOps

[ONNX](https://onnx.ai/) - open standard to represent the machine learning models.
This format is supported by the majority of inference engines.

Later, we will deploy the model into Triton, so let's install ONNX dependencies and save the model in ONNX format. 

In [13]:
!sudo pip3 install onnx onnxruntime



In [14]:
onnix_model = torch.onnx.export(model.eval(),              # model being run
                  x,                         # model input (or a tuple for multiple inputs)
                  "model.onnx",   # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=10,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes
                                'output' : {0 : 'batch_size'}})

In [17]:
import onnx
import mlflow

In [18]:
model = onnx.load("model.onnx")
onnx.checker.check_model(model)

# Local inference
Let's verify the model infer data properly

In [23]:
import onnxruntime as ort
ort_session = ort.InferenceSession("model.onnx")

onnx_outputs = ort_session.run(
    None,
    {"input": x.numpy()},
)

In [44]:
print(f"Model outputs: {onnx_outputs}")

predicted, actual = classes[onnx_outputs[0][0].argmax(0)], classes[y]
print(f'Predicted: "{predicted}", Actual: "{actual}"')

Model outputs: [array([[-3.3324342, -4.699691 , -1.9844813, -2.676971 , -1.4370284,
         4.440385 , -2.0312605,  4.229054 ,  2.4164305,  4.86053  ]],
      dtype=float32)]
Predicted: "Ankle boot", Actual: "Ankle boot"


## Model repository

If you are reading this article, it should be clear as a sky that model lineage is _one of_ crutial notions in ML products which will help you to understand where did the results come from.

Hence this is only a brief tutorial with focus to deployment, we simply save the model as an artifact into running MLFlow server.

In production workloads you should also consider code and data lineage.

In [19]:
model_name = "demo_model"
mlflow.set_tracking_uri('sqlite:///mymlflow.db')
with mlflow.start_run() as run:
    mlflow.onnx.log_model(model, "model", registered_model_name=model_name)

Registered model 'demo_model' already exists. Creating a new version of this model...
2022/08/11 06:11:27 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: demo_model, version 3
Created version '3' of model 'demo_model'.


In [41]:
# load model from MLflow Model Registry
import onnx
import mlflow
import numpy as np
import onnxruntime 

model_path = "models:/demo_model/None"
mlflow.set_tracking_uri('sqlite:///mymlflow.db')

print("\n**** mlflow.onnx.load_model\n")
model = mlflow.onnx.load_model(model_path)
session = onnxruntime.InferenceSession(model.SerializeToString())
input_name = session.get_inputs()[0].name

print( session.get_inputs()[0], input_name, type(x), type(x.numpy()))


**** mlflow.onnx.load_model

NodeArg(name='input', type='tensor(float)', shape=['batch_size', 28, 28]) input <class 'torch.Tensor'> <class 'numpy.ndarray'>


In [49]:
print("model.type:", type(model))
predictions = session.run(None, {'input': x.numpy()})
print(f"Model outputs: {predictions}")

print(predictions[0][0].argmax(0))
predicted, actual = classes[predictions[0][0].argmax(0)], classes[y]
print(f'Predicted: "{predicted}", Actual: "{actual}"')

model.type: <class 'onnx.onnx_ml_pb2.ModelProto'>
Model outputs: [array([[-3.3324342, -4.699691 , -1.9844813, -2.676971 , -1.4370284,
         4.440385 , -2.0312605,  4.229054 ,  2.4164305,  4.86053  ]],
      dtype=float32)]
9
Predicted: "Ankle boot", Actual: "Ankle boot"


# Triton inference

### Deploying to Triton
Install MLFlow plugin to deploy the model into Triton inference server and use this plugin to deploy the model.

Here we also install Triton client to perform test inference call. 

In [1]:
!sudo pip3 install tritonclient[http]
!git clone https://github.com/triton-inference-server/server --depth=1 /tmp/triton_server
!cd /tmp/triton_server/deploy/mlflow-triton-plugin/ && sudo python setup.py install

Cloning into '/tmp/triton_server'...
remote: Enumerating objects: 1751, done.[K
remote: Counting objects: 100% (1751/1751), done.[K
remote: Compressing objects: 100% (1142/1142), done.[K
remote: Total 1751 (delta 704), reused 1177 (delta 419), pack-reused 0[K
Receiving objects: 100% (1751/1751), 7.75 MiB | 20.04 MiB/s, done.
Resolving deltas: 100% (704/704), done.
Traceback (most recent call last):
  File "setup.py", line 26, in <module>
    from setuptools import setup, find_packages
ImportError: No module named setuptools


In [None]:
version = mlflow.tracking.MlflowClient().get_registered_model(model_name).latest_versions[0].version

!mlflow deployments create -t triton --flavor onnx --name $model_name -m models:/$model_name/$version

## Test inference call

Usually, one will need a Triton inference client to communicate with the models deployed into Triton server.

In [3]:
import os
import tritonclient.http as httpclient
from tritonclient.utils import InferenceServerException
from tritonclient.utils import triton_to_np_dtype

In [None]:
url = os.environ["TRITON_URL"].strip("http(s)://")
triton_client = httpclient.InferenceServerClient(url=url)
deployed_model_meta = triton_client.get_model_metadata(model_name)

In [None]:
model_input = httpclient.InferInput(
    deployed_model_meta["inputs"][0]["name"],
    x.shape,
    deployed_model_meta["inputs"][0]["datatype"],
)
model_input.set_data_from_numpy(x.numpy(), binary_data=True)


model_output = httpclient.InferRequestedOutput(
    deployed_model_meta["outputs"][0]["name"],
    binary_data=True
)

request = triton_client.async_infer(model_name=model_name, inputs=[model_input])
result = request.get_result()
triton_outputs = result.as_numpy(deployed_model_meta["outputs"][0]["name"])

In [None]:
print(triton_outputs)
predicted, actual = classes[triton_outputs[0].argmax(0)], classes[y]
print(f'Predicted: "{predicted}", Actual: "{actual}"')

### Optionaly, compare the output of two deployed models

In [None]:
import numpy as np
print(np.allclose(onnx_outputs, triton_outputs, rtol=10e-6, atol=10e-7))