# Python SDK Tutorial
This notebook explains how to use the Python SDK of ModelBox and explains the major concepts and how to use the API independent of any Deep Learning Framework. Please follow the PyTorch notebook to see how the SDK can be integrated with a PyTorch trainer. 



## Install ModelBox SDK

In [17]:
pip install modelbox==0.0.5 grpcio

Collecting modelbox==0.0.5
  Downloading modelbox-0.0.5-py3-none-any.whl (24 kB)
Collecting protobuf>=3.20.1
  Using cached protobuf-4.21.7-cp37-abi3-manylinux2014_x86_64.whl (408 kB)
Installing collected packages: protobuf, modelbox
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.19.6
    Uninstalling protobuf-3.19.6:
      Successfully uninstalled protobuf-3.19.6
  Attempting uninstall: modelbox
    Found existing installation: modelbox 0.0.4
    Uninstalling modelbox-0.0.4:
      Successfully uninstalled modelbox-0.0.4
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorboard 2.10.1 requires protobuf<3.20,>=3.9.2, but you have protobuf 4.21.7 which is incompatible.[0m[31m
[0mSuccessfully installed modelbox-0.0.5 protobuf-4.21.7
Note: you may need to restart the kernel to use updated packages.


## Initialize the ModelBox Client
First, we initialize the client by pointing it to the address of the ModelBox Server

In [6]:
from modelbox.modelbox import ModelBox, MLFramework, Artifact, ArtifactMime, MetricValue, LocalFile

mbox = ModelBox(addr="127.0.0.1:8085")

## Create an Experiment 
Once we have a client, we can start using it to create a new Experiment to train a model or track an existing pre-trained model. Let us first see how to create an experiment. We are going to create an experiment to train a Wav2Vec Model with Pytorch and store it in a namespace called *langtech*. If you are using an experiment management service like Weights and Biases or Nepute, you could associate the ID from that service with modelbox to create a lineage.

In [2]:
experiment = mbox.new_experiment("wav2vec", "owner@pytorch.org", "langtech", "extern123", MLFramework.PYTORCH)
experiment.id

'8649388612666864736'

The above code is going to create a new experiment and give us an ID. You can list the experiments of a namespace by -

In [3]:
mbox.experiments(namespace="langtech")

ListExperimentsResponse(experiments=[Experiment(id='8649388612666864736', name='wav2vec', owner='owner@pytorch.org', namespace='langtech', external_id='extern123', created_at=seconds: 41
nanos: 674434000
, updated_at=seconds: 41
nanos: 674440000
, framework=<MLFramework.PYTORCH: 2>)])

#### Adding metadata
Metadata can be added to any of the objects in ModelBox after they have been created. For example, once an experiment is created, metadata can be added and listed at any stage -

In [4]:
experiment.update_metadata("foo/bar", 12)
experiment.metadata()

ListMetadataResponse(metadata={'foo/bar': 12})

## Working with Checkpoints
Once we have an experiment we can create model checkpoints from the trainers. Let's assume the file stored in assets/mnist_checkpoint1.pt is a checkpoint created by the trainer. We will now associate this checkpoint with ModelBox.

We could either track the path of the checkpoint or upload the blob and let ModelBox store it in the configured blob store. The benefit of letting ModelBox store the checkpoint is that the trainer doesn't need to have access to the blob store directly. However, in some cases, it's more optimal to have the trainer store the blob directly when the path to IO to the blob store from the trainer is much faster.

In [7]:
import os
file =LocalFile.from_path('artifacts/mnist_cnn.pt')
metrics = {'val_accu': 98.5, 'train_accu': 99.2}
experiment.track_file("checkpoint-0", file)
artifacts = experiment.artifacts
artifacts[0].log_metrics(metrics=metrics, step=1)

LogMetricsResponse(updated_at=1678663308)

In [8]:
artifacts[0].all_metrics()

{'train_accu': [MetricValue(step=1, wallclock_time=64, value=99.19999694824219),
  MetricValue(step=1, wallclock_time=64, value=99.19999694824219),
  MetricValue(step=1, wallclock_time=64, value=99.19999694824219),
  MetricValue(step=1, wallclock_time=64, value=99.19999694824219),
  MetricValue(step=1, wallclock_time=64, value=99.19999694824219)],
 'val_accu': [MetricValue(step=1, wallclock_time=64, value=98.5),
  MetricValue(step=1, wallclock_time=64, value=98.5),
  MetricValue(step=1, wallclock_time=64, value=98.5),
  MetricValue(step=1, wallclock_time=64, value=98.5),
  MetricValue(step=1, wallclock_time=64, value=98.5)]}

This returns us the artifactID and tracks the path of the checkpoint created by the trainer.

Now let's say that we also want ModelBox to store the checkpoint, we will simply set the flag `upload` in the above api

In [10]:
resp = experiment.upload_file(artifact_name="checkpoint-0", f=file)
resp

Once checkpoints are created they can be listed by passing the experiment name

In [11]:
experiment.artifacts

[Artifact(name='checkpoint-0', id='10820184364444362974', parent_id='8649388612666864736', assets=[ArtifactAsset(parent='8649388612666864736', src_path='/home/diptanuc/Projects/modelbox/tutorials/artifacts/mnist_cnn.pt', upload_path='', mime_type=<ArtifactMime.Unknown: 0>, checksum='', id='2639677390021709401'), ArtifactAsset(parent='8649388612666864736', src_path='/home/diptanuc/Projects/modelbox/tutorials/artifacts/mnist_cnn.pt', upload_path='', mime_type=<ArtifactMime.Unknown: 0>, checksum='31c437e7d87fa749a7e049f0ccc46dd0', id='12632304970294682093')]),
 Artifact(name='checkpoint1', id='16406521829230001821', parent_id='8649388612666864736', assets=[ArtifactAsset(parent='8649388612666864736', src_path='/home/diptanuc/Projects/modelbox/tutorials/artifacts/mnist_cnn.pt', upload_path='modelbox/artifacts/8649388612666864736/155603561208062398', mime_type=<ArtifactMime.Checkpoint: 2>, checksum='31c437e7d87fa749a7e049f0ccc46dd0', id='155603561208062398')])]

## Working with Models and ModelVersions

Model objects describe tasks performed, metadata, which datasets are used to train, how to use the models during inference, etc. ModelVersions are trained instances of a model. So for example over time an English ASR(speech to text) model can have multiple model versions as they are trained with different datasets and such. 

We don't prescribe the granularity of Models and ModelVersions. If it's easier to create different Models every time a new model is trained with different hyperparameters and a single ModelVersion pointing to the model artifacts and all the metrics that is fine.

In [12]:
model = mbox.new_model(name='asr_en', owner='owner@owner.org', namespace='langtech', task='asr', description='ASR for english')
model.update_metadata(key='x', value='y')
model.id

'14142242426467866576'

In the same way a ModelVersion can be created by the client, and track the associated artifacts and metadata.


In [13]:
tags =["test"]
model_version = model.new_model_version(name="asr_en_july", version="1", description='ASR for english',unique_tags=tags, artifacts=[], framework=MLFramework.PYTORCH)
model_version.id

'16818571058352993853'

Once a modelversion is created we can upload the model and associate with the model version object.

In [15]:
model_path = LocalFile.from_path('artifacts/mnist_cnn.pt')
model_version.upload_file(artifact_name="model1", f=model_path)
model_version.artifacts

[Artifact(name='model1', id='6984394584362292612', parent_id='16818571058352993853', assets=[ArtifactAsset(parent='16818571058352993853', src_path='/home/diptanuc/Projects/modelbox/tutorials/artifacts/mnist_cnn.pt', upload_path='modelbox/artifacts/16818571058352993853/5402444647884978184', mime_type=<ArtifactMime.Checkpoint: 2>, checksum='31c437e7d87fa749a7e049f0ccc46dd0', id='5402444647884978184'), ArtifactAsset(parent='16818571058352993853', src_path='/home/diptanuc/Projects/modelbox/tutorials/artifacts/mnist_cnn.pt', upload_path='modelbox/artifacts/16818571058352993853/9243652877292513617', mime_type=<ArtifactMime.Unknown: 0>, checksum='31c437e7d87fa749a7e049f0ccc46dd0', id='9243652877292513617')])]

The model file can now be served by the file server built into Model Box to inference servers. Inference services can either use the language specific SDKs in Python, Rust or Go or call the GRPC `DownloadFile` API directly which streams the files.

> **_NOTE: Checkpoints Transforms to ModelVersions_** 
Usually in production engineers look at checkpoints/models created during training and select a version which has the best metrics. Once we have the worker infrastructure in place, we will create APIs which to do automatic convertion of checkpoints to ModelVersions.

## Tracking Artifacts and Working with Files
Modelbox can track artifacts used in training and also users can upload Files and associate them with experiments, models and model versions. For example, a user can track the dataset files used for training stored in S3 or even upload them to ModelBox. A trained model can be uploaded and then later streamed to applications for inferencing.


In [17]:
resp = experiment.track_file(artifact_name="dataset", f=LocalFile.from_path("artifacts/test_artifact.txt"))
resp

Modelbox is now tracking the artifact and has information about the checksum, local path of the file, etc.

In [18]:
experiment.artifacts

[Artifact(name='checkpoint-0', id='10820184364444362974', parent_id='8649388612666864736', assets=[ArtifactAsset(parent='8649388612666864736', src_path='/home/diptanuc/Projects/modelbox/tutorials/artifacts/mnist_cnn.pt', upload_path='', mime_type=<ArtifactMime.Unknown: 0>, checksum='', id='2639677390021709401'), ArtifactAsset(parent='8649388612666864736', src_path='/home/diptanuc/Projects/modelbox/tutorials/artifacts/mnist_cnn.pt', upload_path='', mime_type=<ArtifactMime.Unknown: 0>, checksum='31c437e7d87fa749a7e049f0ccc46dd0', id='12632304970294682093')]),
 Artifact(name='dataset', id='2562978676191970736', parent_id='8649388612666864736', assets=[ArtifactAsset(parent='8649388612666864736', src_path='/home/diptanuc/Projects/modelbox/tutorials/artifacts/test_artifact.txt', upload_path='', mime_type=<ArtifactMime.Unknown: 0>, checksum='0019d23bef56a136a1891211d7007f6f', id='15017724009377637244')]),
 Artifact(name='checkpoint1', id='16406521829230001821', parent_id='8649388612666864736'

## Metrics 
ModelBox supports adding Metrics to experiments. Metrics can be logged to a key with values being a float, string or bytes. Metric values are associated with a step unit, and wallclock time when the metric was emitted by the application.

In [19]:
import time
experiment.log_metrics(metrics={'val_accu': 0.73, 'loss': 0.15}, step=1, wallclock=int(time.time()))
experiment.log_metrics(metrics={'val_accu': 0.77, 'loss': 0.12}, step=2, wallclock=int(time.time()))

experiment.all_metrics()

{'val_accu': [MetricValue(step=1, wallclock_time=64, value=0.7300000190734863),
  MetricValue(step=2, wallclock_time=64, value=0.7699999809265137)],
 'loss': [MetricValue(step=1, wallclock_time=64, value=0.15000000596046448),
  MetricValue(step=2, wallclock_time=64, value=0.11999999731779099)]}

Metrics can be added to any of the modelbox objects including Model, ModelVersion.

## Events
Events can be logged into the system against an experiment or a model to aid observability of the processes involved with training models or deployment of models in production.

In [20]:
from modelbox.modelbox import Event, EventSource

# Log events to 
experiment.log_event(Event(name="data_download_start", source=EventSource(name="trainer"), wallclock_time = int(time.time()) , metadata={}))
#..... load training data from the network in memory or on the disk of the trainer
experiment.log_event(Event(name="data_download_finish", source=EventSource(name="trainer"), wallclock_time = int(time.time()) , metadata={}))

experiment.events() # same events can be logged repeatedly as long as wallclock time is incremented

[Event(name='data_download_start', source=EventSource(name='trainer'), wallclock_time=seconds: 19
 nanos: 895433000
 , metadata={}),
 Event(name='data_download_finish', source=EventSource(name='trainer'), wallclock_time=seconds: 19
 nanos: 908435000
 , metadata={})]

Events generally help in troubleshooting training processes or any other MLOps processes when a stage in the pipeline takes a long time to execute or the pipeline doesn't work as expected.

In [21]:
model_version.log_event(Event(name="prod_deployment", source=EventSource(name="CD_Service_name"), wallclock_time = int(time.time()) , metadata={}))
model_version.events()

[Event(name='prod_deployment', source=EventSource(name='CD_Service_name'), wallclock_time=seconds: 22
 nanos: 411304000
 , metadata={})]