## 1. Brevitas Export, FINN Import and Tidy-Up

Similar to what we did in the TFC-w1a1 end-to-end notebook, we will start by exporting the [pretrained CNV-w1a1 network](https://github.com/maltanar/brevitas_cnv_lfc) to ONNX, importing that into FINN and running the "tidy-up" transformations to have a first look at the topology.

In [1]:
from finn.util.basic import make_build_dir
from finn.util.visualization import showInNetron

#TODO: Make RPN to load config from xyres16.proto

build_dir = "/workspace/finn"
base_file_name = "rpn"
config_path = "/workspace/finn/pointpillars/second/configs/pointpillars/car/xyres_16.proto"
#in_shape = (1,64,496,432) - this is the real input shape, but FINN gets only symmetric tensors, FUCK!
#in_shape = (1,64,432,432)
in_shape = (1,64,320,320)

import onnx
from finn.util.test import get_test_model_trained
import brevitas.onnx as bo
from finn.core.modelwrapper import ModelWrapper
from finn.transformation.double_to_single_float import DoubleToSingleFloat
from finn.transformation.infer_shapes import InferShapes
from finn.transformation.fold_constants import FoldConstants
from finn.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames

import torch
import torch.nn.functional as F
import torch.nn as nn
from brevitas.quant_tensor import pack_quant_tensor
import brevitas.nn as qnn
from brevitas.core.quant import QuantType
from brevitas.core.restrict_val import RestrictValueType
from brevitas.core.scaling import ScalingImplType
from brevitas.core.stats import StatsOp


from second.pytorch.builder import second_builder
from second.pytorch.models.quantization import QuantConfig
from torchplus.tools import change_default_args
from second.pytorch.models.quantization import MyQuantReLU
import torchplus
from second.protos import pipeline_pb2

QuantConfig.BACKBONE_CONV_QUANT_TYPE = QuantType.BINARY
QuantConfig.BACKBONE_CONV_BIT_WIDTH  = 1

QuantConfig.LAST_LAYER_QUANT_TYPE = QuantType.INT
QuantConfig.LAST_LAYER_BIT_WIDTH  = 8

QuantConfig.ACTIVATION_QUANT_TYPE = QuantType.INT
QuantConfig.ACTIVATION_BIT_WIDTH  = 2
QuantConfig.ACTIVATION_FUNCTION   = change_default_args(
    max_val           = 6,
    quant_type        = QuantConfig.ACTIVATION_QUANT_TYPE, 
    bit_width         = QuantConfig.ACTIVATION_BIT_WIDTH, 
    scaling_impl_type = ScalingImplType.CONST)(MyQuantReLU)

    
import onnx
from finn.util.test import get_test_model_trained
import brevitas.onnx as bo
from google.protobuf import text_format

config = pipeline_pb2.TrainEvalPipelineConfig()
with open(config_path, "r") as f:
    proto_str = f.read()
    text_format.Merge(proto_str, config)
input_cfg = config.train_input_reader
eval_input_cfg = config.eval_input_reader
model_cfg = config.model.second
train_cfg = config.train_config

from second.pytorch.models.voxelnet import RPN

rpn = RPN(
     use_norm                   = True,
     num_class                  = 1,
     layer_nums                 = [3, 3, 3],
     layer_strides              = [2, 1, 1],
     num_filters                = [64, 128, 256],
     upsample_strides           = [1, 1, 1],
     num_upsample_filters       = [128, 128, 128],
     num_input_filters          = 64,
     num_anchor_per_loc         = 2,
     encode_background_as_zeros = True,
     use_direction_classifier   = True,
     use_groupnorm              = False,
     num_groups                 = 32,
     use_bev                    = False,
     box_code_size              = 7,
)
checkpoint_loc = "/workspace/finn/pp_net_params/rpn_weights"
checkpoint = torch.load(checkpoint_loc, map_location="cpu")
rpn.load_state_dict(checkpoint)
rpn = rpn.eval()
bo.export_finn_onnx(rpn, in_shape, build_dir + "/{}.onnx".format(base_file_name))

model = ModelWrapper(build_dir + "/{}.onnx".format(base_file_name))
model = model.transform(DoubleToSingleFloat())
model = model.transform(InferShapes())
model = model.transform(FoldConstants())
model = model.transform(GiveUniqueNodeNames())
model = model.transform(GiveReadableTensorNames())
model.save(build_dir + "/{}_tidy.onnx".format(base_file_name))

Now that the model is exported, let's have a look at its layer structure with Netron. Remember that the visualization below is interactive, you can click on the individual nodes and view the layer attributes, trained weights and so on.

In [2]:
showInNetron(build_dir + "/{}_tidy.onnx".format(base_file_name))

Serving '/workspace/finn/rpn_tidy.onnx' at http://0.0.0.0:8081


You can see that the network is composed of a repeating convolution-convolution-maxpool layer pattern to extract features using 3x3 convolution kernels (with weights binarized) and `Sign` activations, followed by fully connected layers acting as the classifier. Also notice the initial `MultiThreshold` layer at the beginning of the network, which is quantizing float inputs to 8-bit ones.

## 2. How FINN Implements Convolutions: Lowering and Streamlining

In FINN, we implement convolutions with the *lowering* approach: we convert them to matrix-matrix multiply operations, where one of the matrices is generated by sliding a window over the input image. You can read more about the sliding window operator and how convolution lowering works [in this notebook](https://github.com/maltanar/qnn-inference-examples/blob/master/3-convolutional-binarized-gtsrb.ipynb). The streaming dataflow architecture we will end up with is going to look something like this figure from the [FINN-R paper](https://arxiv.org/abs/1809.04570):

![](cnv-mp-fc.png)

Note how the convolution layer looks very similar to the fully connected one in terms of the matrix-vector-threshold unit (MVTU), but now the MVTU is preceded by a sliding window unit that produces the matrix from the input image. All of these building blocks, including the `MaxPool` layer you see in this figure, exist as templated Vivado HLS C++ functions in [finn-hlslib](https://github.com/Xilinx/finn-hlslib).


To target this kind of hardware architecture with our network we'll apply a convolution lowering transformation, in addition to streamlining. You may recall the *streamlining transformation* that we applied to the TFC-w1a1 network, which is a series of mathematical simplifications that allow us to get rid of floating point scaling operations by implementing few-bit activations as thresholding operations. **The current implementation of streamlining is highly network-specific and may not work for your network if its topology is very different than the example network here. We hope to rectify this in future releases.**

In [3]:
from finn.transformation.streamline import Streamline
from finn.transformation.lower_convs_to_matmul import LowerConvsToMatMul
from finn.transformation.bipolar_to_xnor import ConvertBipolarMatMulToXnorPopcount
import finn.transformation.streamline.absorb as absorb
from finn.transformation.streamline.reorder import MakeMaxPoolNHWC, MoveAddMulPastIm2Col

model = ModelWrapper(build_dir + "/{}_tidy.onnx".format(base_file_name))
model = model.transform(Streamline())
model.save(build_dir + "/{}_streamline_tmp_1.onnx".format(base_file_name))
model = model.transform(LowerConvsToMatMul())
model.save(build_dir + "/{}_streamline_tmp_2.onnx".format(base_file_name))
model = model.transform(MakeMaxPoolNHWC())
model = model.transform(MoveAddMulPastIm2Col())
model.save(build_dir + "/{}_streamline_tmp_3.onnx".format(base_file_name))
model = model.transform(absorb.AbsorbTransposeIntoMultiThreshold())
model.save(build_dir + "/{}_streamline_tmp_4.onnx".format(base_file_name))
model = model.transform(ConvertBipolarMatMulToXnorPopcount())
model.save(build_dir + "/{}_streamline_tmp_5.onnx".format(base_file_name))
model = model.transform(Streamline())
model.save(build_dir + "/{}_streamlined.onnx".format(base_file_name))


  Tnew = T / A.reshape(-1, 1)


We won't go into too much detail about what happens in each transformation and why they are called in the particular order they are (feel free to visualize the intermediate steps using Netron yourself if you are curious) but here is a brief summmmary:

* `Streamline` moves floating point scaling and addition operations closer to the input of the nearest thresholding activation and absorbs them into thresholds
* `LowerConvsToMatMul` converts ONNX `Conv` nodes into sequences of `Im2Col, MatMul` nodes as discussed above. `Im2Col` is a custom FINN ONNX high-level node type that implements the sliding window operator.
* `MakeMaxPoolNHWC` and `AbsorbTransposeIntoMultiThreshold` convert the *data layout* of the network into the NHWC data layout that finn-hlslib primitives use. NCHW means the tensor dimensions are ordered as `(N : batch, H : height, W : width, C : channels)` (assuming 2D images). The ONNX standard ops normally use the NCHW layout, but the ONNX intermediate representation itself does not dictate any data layout.
* You may recall `ConvertBipolarMatMulToXnorPopcount` from the TFC-w1a1 example, which is needed to implement bipolar-by-bipolar (w1a1) networks correctly using finn-hlslib.

Let's visualize the streamlined and lowered network with Netron. Observe how all the `Conv` nodes have turned into pairs of `Im2Col, MatMul` nodes, and many nodes including `BatchNorm, Mul, Add` nodes have disappeared and replaced with `MultiThreshold` nodes.

In [4]:
showInNetron(build_dir + "/{}_streamline_tmp_1.onnx".format(base_file_name))


Stopping http://0.0.0.0:8081
Serving '/workspace/finn/rpn_streamline_tmp_1.onnx' at http://0.0.0.0:8081


In [None]:
showInNetron(build_dir + "/{}_streamlined.onnx".format(base_file_name))

## 3. Partitioning, Conversion to HLS Layers and Folding

The next steps will be (again) very similar to what we did for the TFC-w1a1 network. We'll first convert the layers that we can put into the FPGA into their HLS equivalents and separate them out into a *dataflow partition*:


In [5]:
import finn.transformation.fpgadataflow.convert_to_hls_layers as to_hls
from finn.transformation.fpgadataflow.create_dataflow_partition import (
    CreateDataflowPartition,
)
from finn.transformation.move_reshape import RemoveCNVtoFCFlatten
from finn.custom_op.registry import getCustomOp

# choose the memory mode for the MVTU units, decoupled or const
mem_mode = "decoupled"
#mem_mode = "MVTU"

model = ModelWrapper(build_dir + "/{}_streamlined.onnx".format(base_file_name))
model = model.transform(to_hls.InferBinaryStreamingFCLayer(mem_mode))
model = model.transform(to_hls.InferQuantizedStreamingFCLayer(mem_mode))
#model.save(build_dir + "/{}_dataflow_tmp_1.onnx".format(base_file_name))
#model = ModelWrapper(build_dir + "/{}_dataflow_tmp_1.onnx".format(base_file_name))
model = model.transform(to_hls.InferConvInpGen())
model = model.transform(to_hls.InferStreamingMaxPool())
# get rid of Reshape(-1, 1) operation between hlslib nodes
model = model.transform(RemoveCNVtoFCFlatten())
parent_model = model.transform(CreateDataflowPartition())
parent_model.save(build_dir + "/{}_dataflow_parent.onnx".format(base_file_name))
sdp_node = parent_model.get_nodes_by_op_type("StreamingDataflowPartition")[0]
sdp_node = getCustomOp(sdp_node)
dataflow_model_filename = sdp_node.get_nodeattr("model")
# save the dataflow partition with a different name for easier access
dataflow_model = ModelWrapper(dataflow_model_filename)
dataflow_model.save(build_dir + "/{}_dataflow_model.onnx".format(base_file_name))

In [2]:
showInNetron(build_dir + "/{}_dataflow_parent.onnx".format(base_file_name))

Serving '/workspace/finn/rpn_dataflow_parent.onnx' at http://0.0.0.0:8081


In [4]:
dbg = True
if dbg:
    import finn.custom_op.registry as registry
    import numpy as np
    op_type = "Mul"
    
    model = ModelWrapper(build_dir + "/{}_dataflow_parent.onnx".format(base_file_name))
    nodes = model.get_nodes_by_op_type(op_type)
    #print(nodes)
    node = nodes[0]
    
    if op_type == "MultiThreshold":
        inst = registry.custom_op[op_type](node)
        thresholds = model.get_initializer(node.input[1])
        out_scale  = inst.get_nodeattr("out_scale")
        out_bias   = inst.get_nodeattr("out_bias")
        data_layout = inst.get_nodeattr("data_layout")
        print("data layout (if other than NCHW, then check MultThreshold class code): {}".format(data_layout))
        print("out_scale: {}".format(type(out_scale), out_scale))
        print("out_bias: {}".format(type(out_bias), out_bias))
        np.save("{}/pp_net_params/thresholds.npy".format(build_dir), thresholds)
    elif op_type == "Add":
        tensor = model.get_initializer(node.input[1])
        print(tensor.shape)
        np.save("{}/pp_net_params/add_params.npy".format(build_dir), tensor)
    elif op_type == "Mul":
        tensor = model.get_initializer(node.input[1])
        print(tensor.shape)
        np.save("{}/pp_net_params/mul_params.npy".format(build_dir), tensor)


(1, 20, 1, 1)


In [None]:
showInNetron(build_dir + "/{}_dataflow_model.onnx".format(base_file_name))

Now we have to set the *folding factors* for certain layers to adjust the performance of our accelerator, similar to the TFC-w1a1 example. We'll also set the desired FIFO depths around those layers, which are important to achieve full throughput in the accelerator.

In [6]:
from finn.transformation.fpgadataflow.insert_dwc import InsertDWC
from finn.transformation.fpgadataflow.insert_tlastmarker import InsertTLastMarker
from finn.transformation.fpgadataflow.insert_fifo import InsertFIFO
from finn.transformation.fpgadataflow.annotate_resources import AnnotateResources

model = ModelWrapper(build_dir + "/{}_dataflow_model.onnx".format(base_file_name))
fc_layers = model.get_nodes_by_op_type("StreamingFCLayer_Batch")
print(len(fc_layers))

# each tuple is (PE, SIMD, in_fifo_depth, ram_style) for a layer
# there are 17 StreamingFCLayer_Batch
PEs   = 16
SIMDs = 16
FIFOs = 256
#At 16, 32, 64 we had 988% Slice utlization
folding = [
    (PEs, SIMDs, FIFOs, "block"), #0
    (PEs, SIMDs, FIFOs, "block"),
    (PEs, SIMDs, FIFOs, "block"),
    (PEs, SIMDs, FIFOs, "block"),
    (PEs, SIMDs, FIFOs, "block"), #4
    
    (PEs, SIMDs, FIFOs, "block"), #5
    (PEs, SIMDs, FIFOs, "block"),
    (PEs, SIMDs, FIFOs, "block"),
    (PEs, SIMDs, FIFOs, "block"),
    (PEs, SIMDs, FIFOs, "block"), #9
    
    (PEs, SIMDs, FIFOs, "block"), #10
#     (PEs, SIMDs, FIFOs, "block"),
#     (PEs, SIMDs, FIFOs, "block"),
#     (PEs, SIMDs, FIFOs, "block"),
#     (PEs, SIMDs, FIFOs, "block"), #14
    
    (PEs, SIMDs, FIFOs, "block"), #15
    (1,   SIMDs, FIFOs, "block"), #16
]
for fcl, (pe, simd, ififodepth, ram_style) in zip(fc_layers, folding):
    fcl_inst = getCustomOp(fcl)
    fcl_inst.set_nodeattr("PE", pe)
    fcl_inst.set_nodeattr("SIMD", simd)
    fcl_inst.set_nodeattr("inFIFODepth", ififodepth)
    fcl_inst.set_nodeattr("ram_style", ram_style) #default is auto
    

# use same SIMD values for the sliding window operators
swg_layers = model.get_nodes_by_op_type("ConvolutionInputGenerator")
for i in range(len(swg_layers)):
    swg_inst = getCustomOp(swg_layers[i])
    simd = folding[i][1]
    swg_inst.set_nodeattr("SIMD", simd)

model = model.transform(InsertDWC())
model = model.transform(InsertFIFO())
model = model.transform(InsertTLastMarker())
model = model.transform(GiveUniqueNodeNames())
model = model.transform(AnnotateResources("estimate"))
model.save(build_dir + "/{}_folded.onnx".format(base_file_name))
print("Estimation of used resources: {}".format(model.get_metadata_prop("res_total_estimate")))

13
Estimation of used resources: {'BRAM_18K': 196.0, 'LUT': 12629.600000000002}


Below we visualize in Netron to observe the `StreamingDataWidthConverter` and `StreamingFIFO` nodes that have been inserted into graph, as well as the folding factors in the `PE` and `SIMD` attributes of each `StreamingFCLayer_Batch`.

In [None]:
showInNetron(build_dir + "/{}_folded.onnx".format(base_file_name))

Our network is now ready and we can start with the hardware generation.

## 4. Hardware Generation

From this point onward, the steps we have to follow do not depend on the particular network and will be exactly the same as the TFC-w1a1 example. We first proceed with HLS synthesis, **which may take 10-20 minutes depending on your host computer**.

In [7]:
from finn.transformation.fpgadataflow.prepare_ip import PrepareIP
from finn.transformation.fpgadataflow.hlssynth_ip import HLSSynthIP
from finn.util.basic import pynq_part_map
import time
import traceback

test_pynq_board = "ZCU104"
test_fpga_part = pynq_part_map[test_pynq_board]
target_clk_ns = 5

time_start = time.time()
try:
    model = ModelWrapper(build_dir + "/{}_folded.onnx".format(base_file_name))
    print("PrepareIP started!")
    model = model.transform(PrepareIP(test_fpga_part, target_clk_ns))
    print("Preparing IP took {:.2f}".format(time.time() - time_start))
    time_start = time.time()
    print("HLSSynth started!")
    model = model.transform(HLSSynthIP(0))
    model.save(build_dir + "/{}_ipgen.onnx".format(base_file_name))
except Exception as e:
    print("Exception: {}\n {}".format(e, traceback.format_exc()))
print("HLS Synthesis took {:.2f} seconds".format(time.time() - time_start))

PrepareIP started!
Preparing IP took 67.09
HLSSynth started!
HLS Synthesis took 378.66 seconds


In [8]:
model = ModelWrapper(build_dir + "/{}_ipgen.onnx".format(base_file_name))
model = model.transform(AnnotateResources("hls"))
print("Estimation of used resources (HLS): {}".format(model.get_metadata_prop("res_total_hls")))

Estimation of used resources (HLS): {'BRAM_18K': 0.0, 'FF': 60681.0, 'LUT': 149612.0, 'DSP48E': 0.0, 'URAM': 0.0}


                        for this node. Please run "PrepareIP" transformation and
                        "HLSSynthIP" first to generate the report files
  "HLSSynthIP" first to generate the report files"""


Once the HLS synthesis is complete, we can stitch together the generated IP blocks into a larger IP that is the implementation of our network:

In [9]:
from finn.transformation.fpgadataflow.replace_verilog_relpaths import (
    ReplaceVerilogRelPaths,
)
from finn.transformation.fpgadataflow.create_stitched_ip import CreateStitchedIP

model = ModelWrapper(build_dir + "/{}_ipgen.onnx".format(base_file_name))
model = model.transform(ReplaceVerilogRelPaths())
model = model.transform(CreateStitchedIP(test_fpga_part))
model.save(build_dir + "/{}_ipstitch.onnx".format(base_file_name))

Finally, we create a PYNQ project that includes the hardware "shell" that will support our accelerator, including the data movers, and run Vivado synthesis, **which may take around 30 minutes depending on your host computer.**

*If you'd like to watch the progress, you can open the generated project file (printed below) with the Vivado GUI.*

In [10]:
from finn.transformation.fpgadataflow.make_pynq_proj import MakePYNQProject
from finn.transformation.fpgadataflow.synth_pynq_proj import SynthPYNQProject

model = ModelWrapper(build_dir + "/{}_ipstitch.onnx".format(base_file_name))
model = model.transform(MakePYNQProject(test_pynq_board))
vivado_proj = model.get_metadata_prop("vivado_pynq_proj")
print("Vivado synthesis project is at %s/resizer.xpr" % vivado_proj)
model.save(build_dir + "/{}_pynqproj.onnx".format(base_file_name))

Vivado synthesis project is at /tmp/finn_dev_konradl/vivado_pynq_proj_nqo2_yv6/resizer.xpr


In [11]:
model = ModelWrapper(build_dir + "/{}_pynqproj.onnx".format(base_file_name))
time_start = time.time()
try:
    model = model.transform(SynthPYNQProject())
    model.save(build_dir + "/{}_synth.onnx".format(base_file_name))
except Exception as e:
    print("Exception: {}\n {}".format(e, traceback.format_exc()))
print("Vivado project Synthesis took {:.2f} seconds".format(time.time() - time_start))

Vivado project Synthesis took 212.03 seconds


## 5. Deployment and Remote Execution

Now that we're done with the hardware generation, we can generate a Python driver for accelerator and copy the necessary files onto our PYNQ board.

In [1]:
import os
from finn.transformation.fpgadataflow.make_pynq_driver import MakePYNQDriver
from finn.transformation.fpgadataflow.make_deployment import DeployToPYNQ
from finn.util.basic import make_build_dir
from finn.util.visualization import showInNetron
from finn.core.modelwrapper import ModelWrapper
from finn.custom_op.registry import getCustomOp
build_dir = "/workspace/finn"
base_file_name = "rpn"

# set up the following values according to your own environment
# FINN will use ssh to deploy and run the generated accelerator
# ip = os.getenv("PYNQ_IP", "192.168.1.99")
ip = "192.168.2.99"
username = os.getenv("PYNQ_USERNAME", "xilinx")
password = os.getenv("PYNQ_PASSWORD", "xilinx")
port = os.getenv("PYNQ_PORT", 22)
target_dir = os.getenv("PYNQ_TARGET_DIR", "/home/xilinx/finn")

model = ModelWrapper(build_dir + "/{}_synth.onnx".format(base_file_name))
print("1")
model = model.transform(MakePYNQDriver())
print("2")
model = model.transform(DeployToPYNQ(ip, port, username, password, target_dir))
print("3")
deploy_dir = model.get_metadata_prop("pynq_deploy_dir")
print("4")
model.save(build_dir + "/{}_pynq_deploy.onnx".format(base_file_name))

1
2
3
4


In [None]:
! sshpass -p {password} ssh {username}@{ip} -p {port} 'ls -l {target_dir}/*'
print(deploy_dir)

We only have two more steps to be able to remotely execute the deployed bitfile with some test data from the CIFAR-10 dataset. Let's load up some test data that comes bundled with FINN -- and before you ask, that's supposed to be a cat (CIFAR-10 class number 3).

In [None]:
import pkg_resources as pk
import matplotlib.pyplot as plt
import numpy as np

# fn = pk.resource_filename("finn", "data/cifar10/cifar10-test-data-class3.npz")
# x = np.load(fn)["arr_0"].astype(np.float32)
# x = x / 255
# plt.imshow(x.reshape(3, 32,32).transpose(1, 2, 0))

x = np.ones((1,64,320,320)).astype(np.float32)

Recall that we partitioned our original network into a parent graph that contained the non-synthesizable nodes and a child graph that contained the bulk of the network, which we turned into a bitfile. We'll load up the parent graph, modify the `StreamingDataflowPartition` node so that it points to the deployed ONNX graph.

In [None]:
# point to the PYNQ-deployed model as the StreamingDataflowPartition in the parent
parent_model = ModelWrapper(build_dir+"/{}_dataflow_parent.onnx".format(base_file_name))
sdp_node = parent_model.get_nodes_by_op_type("StreamingDataflowPartition")[0]
sdp_node = getCustomOp(sdp_node)
sdp_node.set_nodeattr("model", build_dir + "/rpn_pynq_deploy.onnx")
parent_model.save(build_dir+"/rpn_dataflow_parent_with_remote_bitfile_exec.onnx")

Finally, we can call `execute_onnx` on the parent graph, which will internally call remote execution with the bitfile once the `StreamingDataflowPartition` node is reached, grab the results, then continue executing the last portion of the network. 

In [None]:
import numpy as np
from finn.core.onnx_exec import execute_onnx
iname = parent_model.graph.input[0].name
oname = parent_model.graph.output[0].name
ishape = parent_model.get_tensor_shape(iname)
input_dict = {iname: x.reshape(ishape)}
parent_model.set_metadata_prop("pynq_ip", ip)
parent_model.set_metadata_prop("pynq_port", str(port))
parent_model.set_metadata_prop("pynq_username", username)
parent_model.set_metadata_prop("pynq_password", password)
parent_model.set_metadata_prop("pynq_target_dir", target_dir)
parent_model.set_metadata_prop("pynq_deploy_dir", deploy_dir)
print(parent_model.get_metadata_prop("pynq_ip"))
print(parent_model.get_metadata_prop("pynq_port"))
print(parent_model.get_metadata_prop("pynq_username"))
print(parent_model.get_metadata_prop("pynq_password"))
print(parent_model.get_metadata_prop("pynq_target_dir"))
print(parent_model.get_metadata_prop("pynq_deploy_dir"))
ret = execute_onnx(parent_model, input_dict, True)

We'll pass the output of the network through a softmax function to interpret it as probabilities, and plot the per-class probabilities as a bar chart.

In [None]:
print(ret.shape)

# def softmax(x):
#     """Compute softmax values for each sets of scores in x."""
#     e_x = np.exp(x - np.max(x))
#     return e_x / e_x.sum()

# logits = ret[oname].flatten()
# prob = softmax(logits)

# classes = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]

# plt.figure(figsize=(20, 3)) 
# plt.bar(classes, prob)

We see that the network correctly predicts this as a class 3 ("cat") with high probability. This concludes our tutorial on how to take a convolutional BNN all the way down to hardware with FINN, and execute it remotely on a PYNQ board.