# A Practical Guide to Running CV Models: ResNet Use Case

This notebook serves as a practical guide to getting started running Computer Vision (CV) models on Tenstorrent hardware devices using the TT-BUDA compiler stack. *For detailed information on model compatibility, please refer to the [models support table](../model_demos/README.md#models-support-table) to check which model works with which Tenstorrent device(s).*

The tutorial will walk through an example of running the [ResNet](https://en.wikipedia.org/wiki/Residual_neural_network) model on Tenstorrent AI accelerator hardware. The model weights will be directly downloaded from the [HuggingFace library](https://huggingface.co/docs/transformers/model_doc/resnet) and executed through the PyBUDA SDK.

**Note on terminology:**

While TT-BUDA is the official Tenstorrent AI/ML compiler stack, PyBUDA is the Python interface for TT-BUDA. TT-BUDA is the core technology; however, PyBUDA allows users to access and utilize TT-BUDA's features directly from Python. This includes directly importing model architectures and weights from PyTorch, TensorFlow, ONNX, and TFLite.

## Guide Overview

In this guide, we will talk through the steps for running the ResNet model trained on [ImageNet](https://www.image-net.org/) data for the **Image Classification** task.

You will learn how to import the appropriate libraries, how to download model weights from popular site such as HuggingFace, utilize the PyBUDA API to initiate an inference experiment, and observe the results from running on Tenstorrent hardware.

## Step 1: Import libraries

Make sure that you have an activate Python environment with the latest version of PyBUDA installed.

In [None]:
# Start by importing the pybuda library, modules from HuggingFace's transformers library, and requests, PIL, & matplotlib libraries for downloading a sample image
import matplotlib.pyplot as plt
import pybuda
import requests
from PIL import Image
from transformers import AutoFeatureExtractor, ResNetForImageClassification

## Step 2: Download the model weights from HuggingFace

In [None]:
# Load ResNet feature extractor and model from HuggingFace
model_ckpt = "microsoft/resnet-50"
feature_extractor = AutoFeatureExtractor.from_pretrained(model_ckpt)
model = ResNetForImageClassification.from_pretrained(model_ckpt)

## Step 3: Set example input

We will use a real image sample from the web. Let's stream in an image of a tiger from the ImageNet-1k dataset and view the sample.

In [None]:
# Load data sample from ImageNet-1k
url = "https://images.rawpixel.com/image_1300/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIyLTA1L3BkMTA2LTA0Ny1jaGltXzEuanBn.jpg"
image = Image.open(requests.get(url, stream=True).raw)

# View the image
plt.imshow(image); plt.axis("off");

## Step 4: Data Preprocessing

Data preprocessing is an important step in the AI inference pipeline. For CV models, we apply transformations to the input image such as centering, cropping, padding, resizing, scaling, and normalizing. Some libraries, such as HuggingFace's transformers and PyTorch Image Models (timm), have transform classes to handle this for you.

In [None]:
# Data preprocessing
pixel_values = feature_extractor(image, return_tensors="pt")["pixel_values"]

## Step 5: Configure PyBUDA Parameters

There are optional configurations that can be adjusted before compiling and running a model on Tenstorrent hardware. Sometimes, the configurations are necessary to compile the model and other times they are tuneable parameters that can be adjusted for performance enhancement.

For the ResNet model, two key parameters are required for compilation:

* `balancer_policy`
* `enable_t_streaming`

In [None]:
# Set PyBUDA configuration parameters
compiler_cfg = pybuda.config._get_global_compiler_config()  # get global configuration object
compiler_cfg.balancer_policy = "Ribbon"  # set balancer policy
compiler_cfg.enable_t_streaming = True  # enable tensor streaming

## Step 6: Instantiate Tenstorrent device

The first time we use PyBUDA, we must initialize a `TTDevice` object which serves as the abstraction over the target hardware.

In [None]:
tt0 = pybuda.TTDevice(
    name="tt_device_0",  # here we can give our device any name we wish, for tracking purposes
)

## Step 7: Create a PyBUDA module from PyTorch model

Next, we must abstract the PyTorch model loaded from HuggingFace into a `pybuda.PyTorchModule` object. This will let the BUDA compiler know which model architecture and AI framework it has to compile.

We then "place" this module onto the previously initialized `TTDevice`.

In [None]:
# Create module
pybuda_module = pybuda.PyTorchModule(
    name = "pt_resnet50",  # give the module a name, this will be used for tracking purposes
    module=model  # specify the model that is being targeted for compilation
)

# Place module on device
tt0.place_module(module=pybuda_module)

## Step 8: Push the inputs into the model input queue

In [None]:
# Push inputs
tt0.push_to_inputs((pixel_values,))

## Step 9: Run inference on the targeted device

Running a model on a Tenstorrent device invovles two parts: compilation and runtime.

Compilation -- TT-BUDA is a compiler. Meaning that it will take a model architecture graph and compile it for the target hardware. Compilation can take anywhere from a few seconds to a few minutes, depending on the model. This only needs to happen once. When you execute the following block of code the compilation logs will be displayed.

Runtime -- once the model has been compiled and loaded onto the device, the user can push new inputs which will execute immediately.

The `run_inference` API can achieve both steps in a single call. If it's the first call, the model will compile. Any subsequent calls will execute runtime only.

Please refer to the documentation for alternative APIs such as `initialize_pipeline` and `run_forward`.

In [None]:
# Run inference on Tenstorrent device
output_q = pybuda.run_inference()  # executes compilation (if first time) + runtime
output = output_q.get()  # get last value from output queue

## Step 10: Data Postprocessing

Data postprocessing is done to convert the model outputs into a readable / useful format. For image classification tasks, this usually means receiving the logit outputs from the model, extracting the top predicted class, and matching this with an entry from the label dictionary.

In [None]:
# Data postprocessing
predicted_value = output[0].value().argmax(-1).item()
predicted_label = model.config.id2label[predicted_value]

## Step 11: Print and evaluate outputs

In [None]:
# Print outputs
print(f"Predicted_label: {predicted_label}")

## Step 12: Shutdown PyBuda

In [None]:
pybuda.shutdown()