# Deployment on the Pynq Z2 SoC

Finally, we will deploy the two models and measure the inference latency on a Pynq Z2 System-on-Chip. You can buy this board yourself for under 200 euros!

<img src="images/pynq.png" alt="pynq" width="500" img align="center"/>

The first thing you have to do, is connect the board following [these instructions](https://pynq.readthedocs.io/en/latest/getting_started/pynq_z2_setup.html). I always connect it to a power source and then directly to my router, connect my computer to the WiFi on the same router, check which IP the pynq gets, then connect to the board in my browser with [http://<board IP address>](http://<board IP address>). You can also connect the board directly to your computer, but that honestly never worked for me on Mac. But please try by following the instructions on the link above! You will be prompted for a password, which is *xilinx*.

You're in!

Now, we need to copy over a few things by pressing the `Upload` button and find:
- The two tarballs we made in the previous exercise "baseline_ae_pynq_package.tar.gz" and "qkeras_ae_pynq_package.tar.gz"
- The part3_pynqz2.ipynb notebook

That's it!

Let's load our model onto the FPGA and check the inference latency!

In [None]:
import numpy as np
import os
import tarfile
import shutil

In [None]:
# Extract the bitfile, driver and some test data
tar = tarfile.open("baseline_ae_pynq_package.tar.gz")
tar.extractall()

driver = [f for f in os.listdir('./baseline_ae_pynq_package/') if 'driver' in f][0]
shutil.copy(f'baseline_ae_pynq_package/{driver}', 'baseline_ae_pynq_package/hls4mlruntime.py')
os.chdir('baseline_ae_pynq_package')

With the hls4ml driver, it is super easy loading the network onto the Zynq Z2! We can also check the inference latency:

In [None]:
from hls4mlruntime import NeuralNetworkOverlay

bitfile = [f for f in os.listdir() if '.bit' in f][0]

X = np.load('X.npy').astype(np.float32)
y_ref = np.load('y.npy').astype(np.float32)
nn = NeuralNetworkOverlay(bitfile, X.shape, X.shape )
y_hw, _, _ = nn.predict(X, X.shape, profile=True)
np.testing.assert_allclose(y_hw, y_ref)

Cool! The total latency per inference is roughly 0.5s/55969 ~ 9 microseconds. However, this includes data transfer and software overhead! SO the "real" latency is still the one we saw from the reports (3 micorseconds).

In the L1T, the data comes from optical fibers whearas in the Pynq its stored in memory, also we dont have the same software overhead in the L1-trigger.


Let's verify that the latency is roughly the same for the quantized model also on the board:

In [None]:
# Extract the bitfile, driver and some test data
os.chdir('../')  
tar = tarfile.open("qkeras_ae_pynq_package.tar.gz")
tar.extractall()

driver = [f for f in os.listdir('./qkeras_ae_pynq_package/') if 'driver' in f][0]
shutil.copy(f'qkeras_ae_pynq_package/{driver}', 'qkeras_ae_pynq_package/hls4mlruntime.py')
os.chdir('baseline_ae_pynq_package')   
files = os.listdir('.')
print(files)

In [None]:
bitfile = [f for f in os.listdir() if '.bit' in f][0]

X = np.load('X.npy').astype(np.float32)
y_ref = np.load('y.npy').astype(np.float32)
q_nn = NeuralNetworkOverlay(bitfile, X.shape, X.shape )
y_hw, _, _ = q_nn.predict(X, X.shape, profile=True)
np.testing.assert_allclose(y_hw, y_ref)

For fun, let's also check out the true and reconstructed muon $p_{T}$ to convince ourselves that the model is on the board and doing... something? Note that we do not expected it to do something amazing, considering the latent dimension is 3!

In [None]:
import matplotlib.pyplot as plt

fig, axs = plt.subplots(1,2,figsize=(8,5))
fig.suptitle('Kinematic distributions in test data versus reconstructed data')

axs[0].hist(X[:,3],bins=100,label=r'Truth',histtype='step', linewidth=2, facecolor='none', edgecolor='green',fill=True,density=True)
axs[0].hist(y_hw[:,3],bins=100,label=r'AE RECO',histtype='step', linewidth=2, facecolor='none', edgecolor='orchid',fill=True,density=True)
# axs[0].semilogy()
axs[0].set(xlabel=u'Leading electron $p_{T}$ (GeV)', ylabel='A.U')
axs[0].legend(loc='best',frameon=False, ncol=1,fontsize='large')

axs[1].hist(X[:,15],bins=100,label=r'Truth',histtype='step', linewidth=2, facecolor='none', edgecolor='green',fill=True,density=True)
axs[1].hist(y_hw[:,15],bins=100,label=r'AE reco',histtype='step', linewidth=2, facecolor='none', edgecolor='orchid',fill=True,density=True)
axs[1].set(xlabel=u'Leading muon $p_{T}$ (GeV)', ylabel='A.U')
# axs[1].semilogy()
axs[1].legend(loc='best',frameon=False, ncol=1,fontsize='large')

Gross! This is definitely not a good generator. But we've also seen that the algorithms that reconstruct the data better, are not neccessarily the best anomaly detection algorithms. I'll leave it up to you to make it better :) 