# Introduction to AMD Xilinx Vitis AI

AMD Xilinx Vitis AI is an environment (or development stack) for the whole process of embedded implementation for FPGA.

The environment consist of:
- Vitis AI Quantizer - delivers quantizer for model conversion from floating point model to quantized model.

    There is available Post Training Quantization and Quantization Aware Training.

    PTQ allows for designation of INT8 quantization parameters ow weights, biases, inputs and features maps.

    For quantization is used small part of training dataset.

    Vitis AI supports popular frameworks like PyTorch, TensorFlow or Caffe.
    
- Vitis AI Optiomizer - delivers tools for network optimization f.g. pruning.

- Vitis AI Compiler - compiles quantized model into representation understandable by accelerator.

- Vitis AI RunTime (VART) - driver library for communication with accelerator.

- Vitis AI Deep learning Processor Unit (DPU) - sequential / general purpose 
    
    accelerator implemented in reconfigurable logic - FPGA.
    
    DPU allows for execution of 
    
    - Convolution 1-3D: standard, depthwise, transposed
    - upsampling: bilinear, nearest neighbor
    - max / average pooling
    - elementwise addition and multiplication
    - activations: ReLU, ReLU6, LeakyReLU, softmax
    - for some HW platforms available also sigmoid and hyperbolic tangent.
    
    DPU is generated by appropriate software (Vitis HLS / Vivado).
    Generation time allows for final DPU configuration changes:
    - available operations (depthwise, elementwise mul., LeakyReLU, softmax, average pooling).
    - resources usage: DSP, dRAM
    - energy saving mode

This laboratory is dedicated to the part related to quantization and compilation of PyTorch NN model with Vitis AI. 

## Part 1 - FLOATING-POINT TRAINING

1. Instantiate evaluation (batch size = 1) loader with test data.

Instantiate MiniResNet model.

Print number of model parameters.

Use functions from `local_utils` module. 

In [7]:
import torch
import matplotlib.pyplot as plt
import local_utils


eval_loader = local_utils.get_test_dataset(batch_size=1)
print("len(eval_loader) =", len(eval_loader))

net = local_utils.MiniResNet()

len(eval_loader) = 10000


2. Instantiate train and test loaders with batch size = 64.

Use functions from `local_utils`

In [8]:
BATCH_SIZE = 64

train_loader = local_utils.get_train_dataset(batch_size=64)
test_loader = local_utils.get_test_dataset(batch_size=64)

print("len(train_loader) =", len(train_loader))
print("len(test_loader) =", len(test_loader))

loader = train_loader
for X, y in loader:
    print(X.shape)
    print(y.shape)
    break

len(train_loader) = 938
len(test_loader) = 157
torch.Size([64, 1, 28, 28])
torch.Size([64])


3. Train the network with:
- SGD optimizer
- learning rate 0.1
- update period of 5
- 5 epochs
- accuracy metric

Plot history.

In [None]:
metric = local_utils.AccuracyMetic()
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=net.parameters() , lr=0.1)

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(device)

net.to(device)
net, history = local_utils.training(model=net,
                                    train_loader=train_loader,
                                    test_loader=test_loader,
                                    loss_fcn=criterion,
                                    metric=metric,
                                    optimizer=optimizer,
                                    update_period=5,
                                    epoch_max=5,
                                    device=device)

local_utils.plot_history(history=history)


cuda
Epoch 1 / 5: STARTED
TRAINING
Running on platform: Linux-5.10.147+-x86_64-with-glibc2.29, machine: x86_64, python_version: 3.8.10, processor: x86_64, system: Linux, 


706it [00:12, 73.77it/s]

4. Extract model state dict and save it in file `weights.pth`.

Note: `_use_new_zipfile_serialization=False` is needed for backward compatibility.
with older version of PyTorch in Vitis AI docker environment.

In [None]:

torch.save(net.state_dict(), "weights.pth" ,_use_new_zipfile_serialization=False)

## Part 2 - EVALUATION - host device

5. Instantiate `MiniResNet` network with the same input shape.

Load state dict from `weights.pth` file and initialize with them network (`load_state_dict`, with `map_location=device`).  

Evaluate model on `eval_loader` dataset with `local_utils.train_test_pass`.

Print information about loss, accuracy, time of execution, number of processed images and throughput (fps).

Experiment do for 'cpu' and for 'cuda' devices.

In [None]:
# CUDA - GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
newModel = local_utils.MiniResNet()
newModel.load_state_dict(torch.load("weights.pth"))

tm = local_utils.TimeMeasurement("Host-GPU", len(eval_loader))
with tm:
    net, loss, acc = local_utils.train_test_pass(model=newModel,data_generator=eval_loader,criterion=criterion,metric=metric,optimizer=optimizer,update_period=1,mode="test",device=device)

print(repr(tm))
print("loss:", loss)
print("acc:", acc)

In [None]:
# CPU
device = torch.device("cpu")

tm = local_utils.TimeMeasurement("Host-CPU", len(eval_loader))
with tm:
    net, loss, acc = local_utils.train_test_pass(model=newModel,data_generator=eval_loader,criterion=criterion,metric=metric,optimizer=optimizer,update_period=1,mode="test",device=device)

print(repr(tm))
print("loss:", loss)
print("acc:", acc)

# Part 3 - AMD Xilinx Vitis AI environment

This part is a guid how to run AMD Xilinx Vitis AI (Vitis AI / VAI as shortcuts):

1. Open console in directory with `lab_11` files 

(or create new cell and write commands with `!` at the beginning of line f.g. `!ls`).

2. Run docker container (preceded by some Xilinx consents of VAI usage) by running a script:

`./docker_run.sh xilinx/vitis-ai:1.4.916`

Mentioned script pulls (if it's necessary) the docker image of Vitis AI environment with version 1.4.916

and starts bash terminal.

This operation may take some time...

3. Now your terminal is placed in VAI container.

Current directory in container (`/workspace`) is mapped to directory where you run a VAI container
(`lab_11` directory).

4. Activate VAI conda environment dedicated for PyTorch library;

`conda activate vitis-ai-pytorch`

5. Run Jupyter server inside container.

`jupyter notebook --no-browser --ip=0.0.0.0 --NotebookApp.token='' --NotebookApp.password=''`
 
6. Save this file.

7. Open link to Jupyter's browser interface and run `notebook_quntize.ipynb`.