## Upload the .rbf

Upload the .rbf file of ORNN accelerator using the provided `make` file. Make sure that all the 8 leds go high indicating that the accelerator rbf is uploaded.

In [1]:
!make install

cp DE10_NANO.rbf /lib/firmware/
dtbocfg.rb --install soc_system --dts soc_system.dts


## Necessary Imports

In [2]:
import time
import numpy as np
from pycy import *
from PIL import Image

## Quantizing the Input

Next we will define a function for quantizing the input image in signed 8-bits i.e., Q1.7 format and packing it to 64-bits. We will discuss it later in the section in defining io buffers.

In [10]:
def preprocess_image(inp):
    inp = np.clip(inp, -1, 1-2**-7)
    inp = np.round(inp*2**7)
    inp = inp.astype(np.int8).view(np.uint64)
    return inp

## Register Address of the Accelerator

Here we will define the necessary register addresses for controlling the ORNN accelerator. These addresses are taken directly from the files `hps_0.h` and `ACCL_TOP_csr.h`.

In [11]:
accl_offset = 0xff200000
start_reg = 0x8
interrupt_enable_reg = 0x10
interrupt_status_reg = 0x18
input_buffer_reg = 0x20
output_buffer_reg = 0x28
seq_len_reg = 0x30

## Defining IO buffer

We will not define contigious memory buffers for reading and writing from the accelerator. These buffers will hold the input which is to be passed to the accelerator, and the output scores given by the acclerator. Moreover these buffers will be 64-bit buffers as the width of the avalon bridge is 64-bits. THe `cma_buffer` class is defined in the `pycy.py` file.

In [12]:
SEQ_LEN = 35
INPUT_BW = 8
OUTPUT_BW = 16
ELEMENTS_IN = 16*SEQ_LEN*INPUT_BW//64 
ELEMENTS_OUT = 16*SEQ_LEN*OUTPUT_BW//64
array_a = cma_buffer(ELEMENTS_IN, np.uint64)
array_b = cma_buffer(ELEMENTS_OUT, np.uint64)

## Instantiating the driver

Now we will used the class `Device_Driver` defined in the `pycy.py` file to create a python driver which allows us to interact with the accelerator with ease. This driver will be used to control the acclerator, pass the io buffers and setting the different input arguments. 

Moreover we will also pass the address of the cma buffers for input output to the accelerator.

In [13]:
ACCL_TOP_csr = Device_Driver(accl_offset, 64)
ACCL_TOP_csr.write(input_buffer_reg,  array_a.physical_address)
ACCL_TOP_csr.write(output_buffer_reg, array_b.physical_address)
ACCL_TOP_csr.write(seq_len_reg, SEQ_LEN)
ACCL_TOP_csr.write(interrupt_enable_reg, 0)
ACCL_TOP_csr.write(interrupt_status_reg, 0x00)


# Single File Test

Now we are ready to pass the input. We will use the same input file which was used for the accelerator csim verification.

In [14]:
file = np.loadtxt("../Training/export/QORNN_W4R4/test_image.txt")
q_file = preprocess_image(file)
array_a.write(q_file)

## Start the accelerator

Lets now start the accelerator by writing 1 to the start register and wait until its complete.

In [15]:
ACCL_TOP_csr.write(start_reg, 1)
start_time = time.time()
while(not (ACCL_TOP_csr.read(interrupt_status_reg) & 0x2)):
    pass

Let look at the predicted value now which is written to the output buffer by the accelerator. Rememeber that the output type of the accelerator is values is 16-bits i.e., Q6.10 format.

In [16]:
pred = array_b.read().view(np.int16)
pred = pred/2**10
max_val = 0
for i in range(SEQ_LEN):
    max_val = 0
    max_score = pred[i*16]
    for j in range(16):
        if j<10:
            if pred[i*16+j] >= max_score:
                max_score = pred[i*16+j]
                max_val = j

print("Prediction = {}".format(max_val))

Prediction = 4


## Free the Accelerator

In [17]:
!make uninstall

dtbocfg.rb --remove soc_system
rm -f /lib/firmware/DE10_NANO.rbf
