# Working with video on GPU

**Notebook 3 of 3**<br><br>
This tutorial provides a high-level overview of working with Intel GPUs in OpenVINO. It shows how to use Query Device to list system GPUs and check their properties, and it explains some of the key properties. It shows how to compile a model on GPU with performance hints. 

The tutorial also shows example commands for benchmark_app that can be run to compare GPU performance in different configurations. It also provides the code for a basic end-to-end application that compiles a model on GPU and uses it to run inference.

# Learning objectives
Average time to complete 25min

By the end of this tutorial you should be able to:
* Get information about the GPUs we want to run on.
* Highlight the difference between running the inference on GPU vs CPU.
* Convert a model from Tensorflow and compile it on the GPU
* Perform a simple video inferencing application on the GPU.

## What you will need for this tutorial

* See the [introduction document](https://uottawa-it-research-teaching.github.io/machinelearning/) for general requirements and how Jupyter notebooks work.
* We'll need Pandas for convenient data handling. It's a very powerful Python package that can read CSV and Excel files. It also has very good data manipulation capabilities which come in use for data cleaning.
* openVINO 2023 or later
* We will use scikit learn as our machine learning package. Scikit-learn provides simple and efficient tools for data mining and analysis.
* numpy. Numpy provides support for large, multi-dimentional arrays and matrices.
* seaborn. Provides an intuitive and attractive interface for creating informative and visually appealing statistical graphics.
* matplotlib. Allows the generation of plots and charts.
* requests. Handles Http requests and responses when downloading datasets or pre-trained models.
* ipywidgets. Allows the creation of interactive plota, graphs and other visualizations, as well as control the execution of code.
* The data files that should have come with this notebook.

## RDM best practices

Good data handling for machine learning begins with good Research Data Managment (RDM). The quality of your source data will impact the outcome of your results, just like the reproducibility of your results will depend on the quality of your data sources, in addition to how you organize the data so that other people (and machines!) can understand and reuse it.

We also need to respect a few research data management best practices along the way, these best practices are recommended by the Digital Research Alliance of Canada. In the first tutorial we encouraged you to resepct two RDM best practices:

* SAVE YOUR RAW DATA IN ORIGINAL FORMAT<br>
* BACKUP YOUR DATA (3-2-1 rule)<br>

These practices should apply in this tutorial as well, but we will also look at best practices of data description, documentation and file naming that will streamline your data processing and project management. 

DESCRIBE YOUR DATA

* Machine Friendly: Describe your dataset with a metadata standard for discovery.
* Human Friendly: Describe your variables, so your colleagues will understand what you meant. Data without good metadata is useless. Give your variables clear names.
* Do not leave cells blank -use numeric values clearly out of range to define missing (e.g. '99999') or not applicable (e.g. '88888') data anddescribe these in your data dictionary.
* Convert your data to open, non-proprietary formats 
* Name your files well with basic meta-data in the file names

## Introduction

Originally, graphic processing units (GPUs) began as specialized chips, developed to accelerate the rendering of computer graphics. In contrast to CPUs, which have few but powerful cores, GPUs have many more specialized cores, making them ideal for workloads that can be parallelized into simpler tasks. Nowadays, one such workload is deep learning, where GPUs can easily accelerate inference of neural networks by splitting operations across multiple cores.

In this tutorial, we will use the **MobileNet V2** model due to it's light weight and ease of use.  This model and others like it tend to have relatively simple architectures and fewer parameters compared to larger, more complex models.  Because of it's light weight and simplicity, it's is actually best suited for CPU devices.  Complex Deep Learning models requiring more parallel will run more efficiently on GPU devices.

## Checking GPUs with Query Device

In this section, we will see how to list the available GPUs and check their properties. Some of the key properties will also be defined.

### List GPUs with core.available_devices


OpenVINO Runtime provides the `available_devices` method for checking which devices are available for inference. The following code will output a list of compatible OpenVINO devices, in which Intel GPUs should appear.

In [1]:
from openvino.runtime import Core

core = Core()
core.available_devices

['CPU', 'GPU.0', 'GPU.1', 'GPU.2']

Note that GPU devices are numbered starting at 0, where the integrated GPU always takes the id `0` if the system has one. For instance, if the system has a CPU, an integrated and discrete GPU, we should expect to see a list like this: `['CPU', 'GPU.0', 'GPU.1']`. To simplify its use, the "GPU.0" can also be addressed with just "GPU". For more details, see the [Device Naming Convention](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_GPU.html#device-naming-convention) section.

If the GPUs are installed correctly on the system and still do not appear in the list, follow the steps described [here](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html) to configure your GPU drivers to work with OpenVINO. Once we have the GPUs working with OpenVINO, we can proceed with the next sections.

### Check Properties with core.get_property

To get information about the GPUs, we can use device properties. In OpenVINO, devices have properties that describe their characteristics and configuration. Each property has a name and associated value that can be queried with the `get_property` method.

To get the value of a property, such as the device name, we can use the `get_property` method as follows:

In [2]:
device = "GPU.1"
core.get_property(device, "FULL_DEVICE_NAME")

'gfx90a:sramecc+:xnack- (dGPU)'

Each device also has a specific property called `SUPPORTED_PROPERTIES`, that enables viewing all the available properties in the device. We can check the value for each property by simply looping through the dictionary returned by `core.get_property("GPU", "SUPPORTED_PROPERTIES")` and then querying for that property.

In [3]:
print(f"{device} SUPPORTED_PROPERTIES:\n")
supported_properties = core.get_property(device, "SUPPORTED_PROPERTIES")
indent = len(max(supported_properties, key=len))

for property_key in supported_properties:
    if property_key not in ('SUPPORTED_METRICS', 'SUPPORTED_CONFIG_KEYS', 'SUPPORTED_PROPERTIES'):
        try:
            property_val = core.get_property(device, property_key)
        except TypeError:
            property_val = 'UNSUPPORTED TYPE'
        print(f"{property_key:<{indent}}: {property_val}")

GPU.1 SUPPORTED_PROPERTIES:

AVAILABLE_DEVICES               : ['0', '1', '2']
RANGE_FOR_ASYNC_INFER_REQUESTS  : (1, 2, 1)
RANGE_FOR_STREAMS               : (1, 2)
OPTIMAL_BATCH_SIZE              : 1
MAX_BATCH_SIZE                  : 1
DEVICE_ARCHITECTURE             : GPU: vendor=0x1002 arch=gfx90a:sramecc+:xnack-
FULL_DEVICE_NAME                : gfx90a:sramecc+:xnack- (dGPU)
DEVICE_UUID                     : 00000000000000000000000000000000
DEVICE_LUID                     : 0000000000000000
DEVICE_TYPE                     : Type.DISCRETE
DEVICE_GOPS                     : {<Type: 'float16'>: 0.0, <Type: 'float32'>: 0.0, <Type: 'int8_t'>: 0.0, <Type: 'uint8_t'>: 0.0}
OPTIMIZATION_CAPABILITIES       : ['FP32', 'BIN', 'FP16', 'INT8', 'EXPORT_IMPORT']
GPU_DEVICE_TOTAL_MEM_SIZE       : 68702699520
GPU_UARCH_VERSION               : unknown
GPU_EXECUTION_UNITS_COUNT       : 104
GPU_MEMORY_STATISTICS           : {}
PERF_COUNT                      : False
MODEL_PRIORITY                  : Pri

### Brief Descriptions of Key Properties

Each device has several properties as seen in the last command. Some of the key properties are:

* `FULL_DEVICE_NAME` - The product name of the GPU and whether it is an integrated or discrete GPU (iGPU or dGPU).
* `OPTIMIZATION_CAPABILITIES` - The model data types (INT8, FP16, FP32, etc) that are supported by this GPU.
* `GPU_EXECUTION_UNITS_COUNT` - The execution cores available in the GPU's architecture, which is a relative measure of the GPU's processing power.
* `RANGE_FOR_STREAMS` - The number of processing streams available on the GPU that can be used to execute parallel inference requests. When compiling a model in LATENCY or THROUGHPUT mode, OpenVINO will automatically select the best number of streams for low latency or high throughput.
* `PERFORMANCE_HINT` - A high-level way to tune the device for a specific performance metric, such as latency or throughput, without worrying about device-specific settings.
* `CACHE_DIR` - The directory where the model cache data is stored to speed up compilation time.


To learn more about devices and properties, see the [Query Device Properties](https://docs.openvino.ai/latest/openvino_docs_OV_UG_query_api.html) page.

## Compiling a Model on GPU

Now, we know how to list the GPUs in the system and check their properties. We can easily use one for compiling and running models with OpenVINO [GPU plugin](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_GPU.html).

### Download and Convert a Model

This tutorial uses the `ssdlite_mobilenet_v2` model. The `ssdlite_mobilenet_v2` model is used for object detection. The model was trained on [Common Objects in Context (COCO)](https://cocodataset.org/#home) dataset version with 91 categories of object. For details, see the [paper](https://arxiv.org/abs/1801.04381).

#### Download and unpack the Model

Use the `download_file` function from the `notebook_utils` to download an archive with the model. It automatically creates a directory structure and downloads the selected model. This step is skipped if the package is already downloaded.

In [4]:
import sys
import tarfile
from pathlib import Path

sys.path.append("../utils")

import notebook_utils as utils

# A directory where the model will be downloaded.
base_model_dir = Path("./model").expanduser()

model_name = "ssdlite_mobilenet_v2"
archive_name = Path(f"{model_name}_coco_2018_05_09.tar.gz")

# Download the archive
downloaded_model_path = base_model_dir / archive_name
if not downloaded_model_path.exists():
    model_url = f"http://download.tensorflow.org/models/object_detection/{archive_name}"
    utils.download_file(model_url, downloaded_model_path.name, downloaded_model_path.parent)

# Unpack the model
tf_model_path = base_model_dir / archive_name.with_suffix("").stem / "frozen_inference_graph.pb"
if not tf_model_path.exists():
    with tarfile.open(downloaded_model_path) as file:
        file.extractall(base_model_dir)

#### Convert the Model to OpenVINO IR format

Use Model Optimizer to convert the model to OpenVINO IR with `FP16` precision. The models are saved to the `model/ir_model/` directory. For more information about Model Optimizer, see the [Model Optimizer Developer Guide](https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html).

In [5]:
from openvino.tools import mo
from openvino.runtime import serialize
from openvino.tools.mo.front import tf as ov_tf_front

precision = 'FP16'

# The output path for the conversion.
model_path = base_model_dir / 'ir_model' / f'{model_name}_{precision.lower()}.xml'

trans_config_path = Path(ov_tf_front.__file__).parent / "ssd_v2_support.json"
pipeline_config = base_model_dir / archive_name.with_suffix("").stem / "pipeline.config"

model = None
if not model_path.exists():
    model = mo.convert_model(input_model=tf_model_path,
                             output_dir=base_model_dir / 'ir_model',
                             model_name=f'{model_name}_{precision.lower()}',
                             input_shape=[1, 300, 300, 3],
                             layout='NHWC',
                             compress_to_fp16=True if precision == 'FP16' else False,
                             transformations_config=trans_config_path,
                             tensorflow_object_detection_api_pipeline_config=pipeline_config,
                             reverse_input_channels=True)
    serialize(model, str(model_path))
    print("IR model saved to {}".format(model_path))
else:
    print("Read IR model from {}".format(model_path))
    model = core.read_model(model_path)

Read IR model from model/ir_model/ssdlite_mobilenet_v2_fp16.xml


### Compile with Default Configuration

When the model is ready, first we need to read it, using the `read_model` method. Then, we can use the `compile_model` method and specify the name of the device we want to compile the model on, in this case, "GPU".

In [6]:
compiled_model = core.compile_model(model, device)

If you have multiple GPUs in the system, you can specify which one to use by using "GPU.0", "GPU.1", etc. Any of the device names returned by the `available_devices` method are valid device specifiers. You may also use "AUTO", which will automatically select the best device for inference (which is often the GPU).

Depending on the model used, device-specific optimizations and network compilations can cause the compile step to be time-consuming, especially with larger models, which may lead to bad user experience in the application, in which they are used. To solve this, OpenVINO can cache the model once it is compiled on supported devices and reuse it in later `compile_model` calls by simply setting a cache folder beforehand. For instance, to cache the same model we compiled above, we can do the following:

In [7]:
import time
from pathlib import Path

# Create cache folder
cache_folder = Path("cache")
cache_folder.mkdir(exist_ok=True)

start = time.time()
core = Core()

# Set cache folder
core.set_property({'CACHE_DIR': cache_folder})
#core.set_property({'CACHE_MODE': ''})

# Compile the model as before
model = core.read_model(model=model_path)
compiled_model = core.compile_model(model, device)
# compiled_model = core.compile_model(model, "CPU")
print(f"Cache enabled (first time) - compile time: {time.time() - start}s")

Cache enabled (first time) - compile time: 6.006357908248901s


## Basic Application Using GPUs

We will now show an end-to-end object detection example using GPUs in OpenVINO. The application compiles a model on GPU with the "THROUGHPUT" hint, then loads a video and preprocesses every frame to convert them to the shape expected by the model. Once the frames are loaded, it sets up an asynchronous pipeline, performs inference and saves the detections found in each frame. The detections are then drawn on their corresponding frame and saved as a video, which is displayed at the end of the application.

### Import Necessary Packages

In [8]:
import time
from pathlib import Path

import cv2
import numpy as np
from IPython.display import Video
from openvino.runtime import AsyncInferQueue, Core, InferRequest

# Instantiate OpenVINO Runtime
core = Core()
#core.available_devices

### Compile the Model

In [9]:
# Read model and compile it on GPU in THROUGHPUT mode
model = core.read_model(model=model_path)
device_name = "GPU.2"

# compiled_model = core.compile_model(model=model, device_name=device_name, config={"PERFORMANCE_HINT": "THROUGHPUT"})
compiled_model = core.compile_model(model=model, device_name=device_name)

# Get the input and output nodes
input_layer = compiled_model.input(0)
output_layer = compiled_model.output(0)

# Get the input size
num, height, width, channels = input_layer.shape
print('Model input shape:', num, height, width, channels)

Model input shape: 1 300 300 3


### Load and Preprocess Video Frames 

In [10]:
# Load video
video_file = "../data/video/Coco Walking in Berkeley.mp4"
video = cv2.VideoCapture(video_file)
framebuf = []

# Go through every frame of video and resize it
print('Loading video...')
while video.isOpened():
    ret, frame = video.read()
    if not ret:
        print('Video loaded!')
        video.release()
        break
    
    # Preprocess frames - convert them to shape expected by model
    input_frame = cv2.resize(src=frame, dsize=(width, height), interpolation=cv2.INTER_AREA)
    input_frame = np.expand_dims(input_frame, axis=0)

    # Append frame to framebuffer
    framebuf.append(input_frame)
    

print('Frame shape: ', framebuf[0].shape)
print('Number of frames: ', len(framebuf))

# Show original video file
# If the video does not display correctly inside the notebook, please open it with your favorite media player
Video(video_file)

Loading video...
Video loaded!
Frame shape:  (1, 300, 300, 3)
Number of frames:  288


### Define Model Output Classes

In [11]:
# Define the model's labelmap (this model uses COCO classes)
classes = [
    "background", "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train",
    "truck", "boat", "traffic light", "fire hydrant", "street sign", "stop sign",
    "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant",
    "bear", "zebra", "giraffe", "hat", "backpack", "umbrella", "shoe", "eye glasses",
    "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite",
    "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle",
    "plate", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
    "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair",
    "couch", "potted plant", "bed", "mirror", "dining table", "window", "desk", "toilet",
    "door", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven",
    "toaster", "sink", "refrigerator", "blender", "book", "clock", "vase", "scissors",
    "teddy bear", "hair drier", "toothbrush", "hair brush"
]

### Set up Pipeline

#### Callback Definition

In [12]:
# Define a callback function that runs every time the asynchronous pipeline completes inference on a frame
def completion_callback(infer_request: InferRequest, frame_id: int) -> None:
    global frame_number
    stop_time = time.time()
    frame_number += 1

    predictions = next(iter(infer_request.results.values()))
    results[frame_id] = predictions[:10]  # Grab first 10 predictions for this frame
    
    total_time = stop_time - start_time
    frame_fps[frame_id] = frame_number / total_time

#### Create Pipeline

In [13]:
# Create asynchronous inference queue with optimal number of infer requests
infer_queue = AsyncInferQueue(compiled_model)
infer_queue.set_callback(completion_callback)

### Perform Inference

In [14]:
# Perform inference on every frame in the frame buffer
results = {}
frame_fps = {}
frame_number = 0
start_time = time.time()
for i, input_frame in enumerate(framebuf):
    infer_queue.start_async({0: input_frame}, i)

infer_queue.wait_all()  # Wait until all inference requests in the AsyncInferQueue are completed
stop_time = time.time()

# Calculate total inference time and FPS
total_time = stop_time - start_time
fps = len(framebuf) / total_time
time_per_frame = 1 / fps 
print("With model caching:")
print(f'Total time to infer all frames: {total_time:.3f}s')
print(f'Time per frame: {time_per_frame:.6f}s ({fps:.2f} FPS)')

With model caching:
Total time to infer all frames: 1.207s
Time per frame: 0.004190s (238.65 FPS)


### Process Results

In [15]:
# Set minimum detection threshold
min_thresh = .6

# Load video
video = cv2.VideoCapture(video_file)

# Get video parameters
frame_width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(video.get(cv2.CAP_PROP_FPS))
fourcc = int(video.get(cv2.CAP_PROP_FOURCC))
Path('./output').mkdir(exist_ok=True)


# Draw detection results on every frame of video and save as a new video file
while video.isOpened():
    current_frame = int(video.get(cv2.CAP_PROP_POS_FRAMES))
    ret, frame = video.read()
    if not ret:
        print('Video loaded!')
        video.release()
        break
        
    # Draw info at the top left such as current fps, the devices and the performance hint being used
    cv2.putText(frame, f"fps {str(round(frame_fps[current_frame], 2))}", (5, 20), cv2.FONT_ITALIC, 0.6, (0, 0, 0), 1, cv2.LINE_AA)
    cv2.putText(frame, f"device {device_name}", (5, 40), cv2.FONT_ITALIC, 0.6, (0, 0, 0), 1, cv2.LINE_AA) 
    cv2.putText(frame, f"hint {compiled_model.get_property('PERFORMANCE_HINT').name}", (5, 60), cv2.FONT_ITALIC, 0.6, (0, 0, 0), 1, cv2.LINE_AA)

    # prediction contains [image_id, label, conf, x_min, y_min, x_max, y_max] according to model
    for prediction in np.squeeze(results[current_frame]):
        if prediction[2] > min_thresh:
            x_min = int(prediction[3] * frame_width)
            y_min = int(prediction[4] * frame_height)
            x_max = int(prediction[5] * frame_width)
            y_max = int(prediction[6] * frame_height)
            label = classes[int(prediction[1])]
            
            # Draw a bounding box with its label above it
            cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0,255,0), 1, cv2.LINE_AA)
            cv2.putText(frame, label, (x_min, y_min - 10), cv2.FONT_ITALIC, 1, (255,0,0), 1, cv2.LINE_AA)
            cv2.putText(frame, label, (x_min, y_min - 10), cv2.FONT_ITALIC, 1, (255,0,0), 1, cv2.LINE_AA)

print(f"object detected:", label)
Video("../data/video/Coco Walking in Berkeley.mp4")

Video loaded!
object detected: dog


## Conclusion

This tutorial demonstrates how easy it is to use one or more GPUs in OpenVINO, check their properties, and even tailor the model performance through the different performance hints. It also provides a walk-through of a basic object detection application that uses a GPU and displays the detected bounding boxes.
