# DPU example: Resnet50
----

## Aim/s
* This notebooks shows an example of DPU applications. The application,as well as the DPU IP, is pulled from the official 
[Vitis AI Github Repository](https://github.com/Xilinx/Vitis-AI).

## References
* [Vitis AI Github Repository](https://www.xilinx.com/products/design-tools/vitis/vitis-ai.html).

## Last revised
* Mar 3, 2021
    * Initial revision
----

## 1. Prepare the overlay
We will download the overlay onto the board. 

The `load_model()` method will automatically prepare the `graph`
which is used by VART.

In [1]:
from pynq_dpu import DpuOverlay
overlay = DpuOverlay("dpu.bit")
overlay.load_model("mnist_classifier_TFM_v5.xmodel")

<div class="alert alert-heading alert-info">
Starting from Vitis AI 1.3, xmodel files will be used as the models
instead of elf files.
</div>

## 2. Utility functions

In this section, we will prepare a few functions for later use.

In [2]:
import os
import time
import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline

Let's first define a few useful preprocessing functions. These functions
will make sure the DPU can take input images with arbitrary sizes.

In [3]:
_R_MEAN = 123.68
_G_MEAN = 116.78
_B_MEAN = 103.94

MEANS = [_B_MEAN,_G_MEAN,_R_MEAN]

def resize_shortest_edge(image, size):
    H, W = image.shape[:2]
    if H >= W:
        nW = size
        nH = int(float(H)/W * size)
    else:
        nH = size
        nW = int(float(W)/H * size)
    return cv2.resize(image,(nW,nH))

def mean_image_subtraction(image, means):
    B, G, R = cv2.split(image)
    B = B - means[0]
    G = G - means[1]
    R = R - means[2]
    image = cv2.merge([R, G, B])
    return image

def BGR2RGB(image):
    B, G, R = cv2.split(image)
    image = cv2.merge([R, G, B])
    return image

def central_crop(image, crop_height, crop_width):
    image_height = image.shape[0]
    image_width = image.shape[1]
    offset_height = (image_height - crop_height) // 2
    offset_width = (image_width - crop_width) // 2
    return image[offset_height:offset_height + crop_height, offset_width:
                 offset_width + crop_width, :]

def normalize(image):
    image=image/256.0
    image=image-0.5
    image=image*2
    return image

def preprocess_fn(image, crop_height = 32, crop_width = 32):
    image = resize_shortest_edge(image, 32)
    image = mean_image_subtraction(image, MEANS)
    image = central_resize_shortest_edgecrop(image, crop_height, crop_width)
    return image

We will also define a few functions to calculate softmax and provide 
the output class after running a DPU task.

In [4]:
softmax = [1, 0, 0, 0.55, 0]
lines = ['A','B','C','D','E']
print(np.argmax(softmax))
print(lines[np.argmax(softmax)-1])
print(lines[np.argmax(softmax)])


0
E
A


In [5]:
def calculate_softmax(data):
    result = np.exp(data)
    return result

# OJO el -1 esta mal. recorre la lista en orden inverso
# y hace que el primer elemento sea el último

#softmax = [1, 0, 0, 0.55, 0]
#lines = ['A','B','C','D','E']
#print(np.argmax(softmax))
#print(lines[np.argmax(softmax)-1])
#print(lines[np.argmax(softmax)])

def predict_label(softmax):
    with open("imagenes_test_DPU_2904/words.txt", "r") as f:
        lines = f.readlines()
    #return lines[np.argmax(softmax)-1]
    return lines[np.argmax(softmax)]

Keep in mind that our original images are 640x480 so we need to preprocess them
later to make sure it fits our model.

In [6]:
import re
def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)

In [7]:
image_folder = 'imagenes_test_DPU_2904'
original_images = [i for i in sorted_alphanumeric(os.listdir(image_folder)) if i.endswith("png")]
total_images = len(original_images)
print(len(original_images))

33626


## 3. Use VART
Now we should be able to use VART to do image classification.

In [8]:
dpu = overlay.runner

inputTensors = dpu.get_input_tensors()
outputTensors = dpu.get_output_tensors()

shapeIn = tuple(inputTensors[0].dims)
shapeOut = tuple(outputTensors[0].dims)
outputSize = int(outputTensors[0].get_data_size() / shapeIn[0])
softmax = np.empty(outputSize)

In [9]:
# Check que coincide con nClasses
outputTensors[0].dims

[1, 5]

We can define a few buffers to store input and output data. They will be reused
during multiple runs.

In [10]:
output_data = [np.empty(shapeOut, dtype=np.float32, order="C")]
input_data = [np.empty(shapeIn, dtype=np.float32, order="C")]
image = input_data[0]
print(shapeIn)

(1, 32, 32, 1)


Remember that we have a list of `original_images`. 
We can now define a new function `run()` which takes the image index as 
the input, and calculate the softmax as the classification result.
With the argument `display` set to `True`, the original image as well as the
predicted label can be rendered.

It is obvious that the range of `image_index` should be [0, `total_images`-1].

In [11]:
def run(image_index, display=False):
    #preprocessed = preprocess_fn(cv2.imread(os.path.join(image_folder, original_images[image_index])))
    #preprocessed = plt.imread(os.path.join(image_folder, original_images[image_index]))
    preprocessed = plt.imread(dir_img[image_index])
    image[0,...] = preprocessed.reshape(shapeIn[1:])
    job_id = dpu.execute_async(input_data, output_data)
    dpu.wait(job_id)
    temp = [j.reshape(1, outputSize) for j in output_data]
    softmax = calculate_softmax(temp[0][0])
    
    # comentar cuando se haga test de latency
    file.write(predict_label(softmax))
    
    if display:
        #preprocessed = cv2.imread(os.path.join(image_folder, original_images[image_index]))
        #display_image = cv2.imread(os.path.join(image_folder, original_images[image_index]))
        #_, ax = plt.subplots(1)
        #_ = ax.imshow(cv2.cvtColor(display_image, cv2.COLOR_BGR2RGB))
        print(predict_label(softmax))
        #print("Classification: {}".format(predict_label(softmax)))
        #print(image_index)
        #print(original_images[image_index])

Let's run it for 1 image and print out the predicted label.

In [12]:
print(original_images[0:10])

['1.png', '2.png', '3.png', '4.png', '5.png', '6.png', '7.png', '8.png', '9.png', '10.png']


We can also run it for multiple images as shown below. In this example
we have only used 1 thread; in principle, users should be able to boost
the performance by employing more threads.

In [13]:
file = open("labels_predict_DPU_2904.txt","w")

In [14]:
# precargar array de direcciones de imagenes

dir_img = []
for i in range(total_images):
    dir_img.append(os.path.join(image_folder, original_images[i]))
    
#print(dir_img[5])

In [15]:
time1 = time.time()
[run(i) for i in range(total_images)]
#[run(i,display=True) for i in range(5)]
time2 = time.time()
fps = total_images/(time2-time1)
print("Performance: {} FPS".format(fps))
print("Tiempo total: {} [s]".format(time2-time1))

Performance: 658.3805460416836 FPS
Tiempo total: 51.07380557060242 [s]


In [16]:
file.close()

In [17]:
# Cargar labels test y calcular accuracy
datos_labels_test = []
with open('labels_test_2904.txt') as f:
    for linea in f.readlines():
        datos_labels_test.append(linea.strip())
        

datos_labels_DPU = []
with open('labels_predict_DPU_2904.txt') as f:
    for linea in f.readlines():
        datos_labels_DPU.append(linea.strip())
        
suma = 0
for i in range(len(datos_labels_DPU)):
    if datos_labels_test[i] == datos_labels_DPU[i]:
        suma = suma + 1

total_test = suma/len(datos_labels_DPU) 

print(total_test)
        

0.6280259323142806


In [26]:
# Probar codigo Run():

def run_sin_carga_img():
    job_id = dpu.execute_async(input_data, output_data)
    temp = [j.reshape(1, outputSize) for j in output_data]
    softmax = calculate_softmax(temp[0][0])


image_index = 1
#preprocessed = preprocess_fn(cv2.imread(os.path.join(image_folder, original_images[image_index])))
time1 = time.time()
preprocessed = plt.imread(dir_img[image_index])
image[0,...] = preprocessed.reshape(shapeIn[1:])
time2 = time.time()
job_id = dpu.execute_async(input_data, output_data)
time3 = time.time()
#dpu.wait(job_id)
temp = [j.reshape(1, outputSize) for j in output_data]
softmax = calculate_softmax(temp[0][0])
time4 = time.time()
print(predict_label(softmax))

# Ejecutar run sin precargar imagenes (no llama imread())
time5 = time.time()
[run_sin_carga_img() for i in range(total_images)]
time6 = time.time()

print(total_images)

print("Tiempo carga imagen:", time2-time1)
print("Tiempo proceso CNN imagen:", time3-time2)
print("Tiempo proceso CNN + calculo softmax:", time4-time2)
print("Tiempo proceso calculo softmax:", time4-time3)
print("Tiempo Total:",time4-time1)
print("Tiempo Total sin precargar imagenes:",time6-time5)
print("FPS sin precargar imagenes:",total_images/(time6-time5))


Chat

33626
Tiempo carga imagen: 0.0014162063598632812
Tiempo proceso CNN imagen: 0.0007245540618896484
Tiempo proceso CNN + calculo softmax: 0.0013904571533203125
Tiempo proceso calculo softmax: 0.0006659030914306641
Tiempo Total: 0.0028066635131835938
Tiempo Total sin precargar imagenes: 7.986032724380493
FPS sin precargar imagenes: 4210.601328660157


In [None]:
print(len(datos_labels_test))