# Accuracy Analysis Tool Tutorial

This is an advanced tutorial, if the accuracy results obtained were satisfactory it can be omitted.
Before using it, make sure that your native (pre-quantization) results are satisfying.
For more details refer to `Debugging Accuracy` section on the Dataflow Compiler User Guide.

---

This tutorial will serve as a guide for how model quantization analysis breaks down the quantization noise per layer. The tutorial is intended to guide the user in using Hailo analyze noise tool, by using it to analyze the classification model MobileNet-v3-Large-Minimalistic.

The flow is mainly comprised of:

* Paths definitions: Defining the paths to the model and data for analysis.
* Preparing the model: Initial Parse and Optimize of the model.
* Accuracy analysis: This step is the heart of the tool, and computes the quantization noise of each layer output.  
For each layer, the layer under analysis is the **only** quantized layer, while the rest of the model is kept in full precision.  
This highlights the quantization sensitivity of the model to the noise of that specific layer.
* Visualizing the results: Walk through the results of the accuracy analysis and explain the different graphs and information.
* Re-optimizing the model: After debugging the noise we repeat the optimization process to improve the results.

**Requirements:**

* Run this code in Jupyter notebook, see the Introduction tutorial for more details.
* Verify that you've completed the Parsing tutorial and the Model Optimization tutorial or generated analysis data in another way.

In [None]:
import os

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

from hailo_sdk_client import ClientRunner

%matplotlib inline

### Input Definitions
* Model path: path to the model to be used in this tutorial
* data_path: path to preprocessed .npy image files for optimization and analysis

In [None]:
model_name = "v3-large-minimalistic_224_1.0_float"
model_path = "../models/" + model_name + ".tflite"
assert os.path.isfile(model_path), "Please provide valid path for the model"

data_path = "./calib_set.npy"
assert os.path.isfile(data_path), "Please provide valid path for a dataset"
har_path = model_name + ".har"

It is highly recommended to use GPU when running the analysis tool.  
If there isn't one in the machine, the code will be executed on the CPU and it will take a longer time to run.

In [None]:
if len(tf.config.list_physical_devices("GPU")) == 0:
    print("Warning: you are running the accuracy analysis tool without a GPU, expect long running time.")

### Preparing the Model
In this step, the model will be parsed and optimized to prepare it for analysis.
For more details checkout the Parsing tutorial and the Model Optimization tutorial.

In [None]:
runner = ClientRunner(hw_arch="hailo8")
runner.translate_tf_model(model_path, model_name)

model_script = "normalization1 = normalization([127.5, 127.5, 127.5], [127.5, 127.5, 127.5])\n"
runner.load_model_script(model_script)

runner.optimize(data_path)

### Accuracy Analysis
Though ​most models work well with our default optimization, some suffer from high quantization noise that induces substantial accuracy degradation. As an example, we choose the MobileNet-v3-Large-Minimalistic neural network model that, due to its structural characteristics, results in a high degradation of 6% for Top-1 accuracy on the ImageNet-1K validation dataset.

To analyze the source of degradation, the Hailo `analyze_noise` API will be used. The analysis tool uses a given dataset to measure the noise level in each layer and allows to pinpoint problematic layers that should be handled. The analysis tool uses the entire dataset by default, use the `data_count` argument to limit the number of images.  
It is recommended to use at least 64 images, preferably not from the same calibration set, however, to keep the tool’s processing time to a reasonable level, it is also recommended not to use more than 100-200 images.

The following is equivalent to running the CLI command:

`hailo analyze-noise quantized_model_har_path --data-path data_path --batch-size 2 --data-count 16`

The output is saved inside the HAR, to be visualized later on by the Profiler.

In [None]:
runner.analyze_noise(data_path, batch_size=2, data_count=16)  # Batch size is 1 by default
runner.save_har(har_path)

### Visualizing the Results
In this section, a general explanation for the noise analysis report will be provided.  

To visualize the accuracy analysis results and debug the quantization noise, the Hailo Model Profiler will be used.  
The Hailo Model Profiler will generate an HTML report with all the information for the model.  
In the Optimization Details tab of the report, all the relevant information for this tutorial can be found:



In [None]:
!hailo profiler {har_path}
# Note: When working on a remote computer, manual opening of the HTML file may be required

##### SNR Chart
Displayed on the top ribbon, only if the profiled HAR contains the analyze-noise data.

This chart shows the sensitivity of each layer to quantization (measured separately for each output layer). To measure the quantization noise of each layer's output, iterate over all layers when the given layer is the **only** quantized layer, while the rest are kept in full precision and measure the SNR at each output layer. The number of SNR values will be the number of outputs layer affected by the quantized layer.
The graph shows the SNR values in decibels (dB) and any value higher than 10 should be fine (higher is better).

In case an output layer is sensitive (low SNR) across many layers it is recommended to re-quantize with one of the following model script commands (not in the scope of this tutorial):

  * Configure the output layer to 16-bit output. For example, using the model script command: `quantization_param(output_layer1, precision_mode=a16_w16)`.
  * When possible, offload output activation to the accelerator. For example, the following command adds sigmoid activation to the output layer conv51: `change_output_activation(conv51, sigmoid)` and should be used to offload sigmoid from post-processing code to the accelerator.
  * Use massive fine tune which is enabled by default in optimization_level=2 but can be customized. For example, specific fine-tune command: `post_quantization_optimization(finetune, policy=enabled, learning_rate=0.0001, epochs=8, batch_size=4, dataset_size=4000)`. Other useful attributes to this command are: loss_layer_names, loss_factors and loss_types which allows the user to manually edit the loss function of the fine tune training. In a case where the fine tune failed due to GPU memory, try to use a lower batch_size.
  * Increase the optimization level. For example, `model_optimization_flavor(optimization_level=4)` will set the highest optimization level (default is 2).
  * Decrease the compression level. For example, `model_optimization_flavor(compression_level=0)` will disable compression (default value is 1).


##### Layers Information

Displayed on the right when a layer is selected.

This section provide per-layer detailed information that will help debug the local quantization errors in the model, for example, specific layer that is very sensitive for quantization. Note that quantization noise may stem from the layers' weights, activations or both.

* **Weight Histogram**: this graph shows the weights distribution and can help to identify outliers. If outliers exist in the weight distribution, the following command can be used to clip it, for example, clip the kernel values of conv27:
    `pre_quantization_optimization(weights_clipping, layers=[conv27], mode=percentile, clipping_values=[0.01, 99.99])`

* **Activation Histogram**: this graph shows the activation distribution as collected by the layer noise analysis tool. Wide activation distribution is a major source of degradation source and in general it is strongly recommend to use a model with batch normalization after each layer to limit the layer's extreme activation values. Another important argument that affects the activation distribution is the calibration size that was used during quantization, to raise it, use the following command: `model_optimization_config(calibration, calibset_size=512)`, the default value for calibration is 64. In case of outliers in the layers' activation distribution, we recommend using the activation clipping command, for example:
  `pre_quantization_optimization(activation_clipping, layers={*}, mode=percentile, clipping_values=[0.01, 99.99])`

* **Scatter Plot**: this graph shows a comparison between full precision and quantized values of the layers' activation. The X-axis of each point in this graph is its value in full precision and Y-axis is the value after quantization. Zero quantization noise means the slope would be exactly one. In case of bias noise you expect to find many points above/below the line that represent imperfect quantization, if this is the case, you should use the following commands:
  `post_quantization_optimization(bias_correction, policy=enabled)` and
  `post_quantization_optimization(finetune, policy=disabled)`

To examine these results, first plot the SNR graph for this specific model. Note that in general the profiler report should be used but here an alternative visualization will be used.

In [None]:
def get_snr_results():
    # SNR results are saved in the params statistics object
    params_statistics = runner.get_params_statistics()
    out_layer = "v3-large-minimalistic_224_1_0_float/output_layer1"
    layers = []
    snr = []
    for layer in runner.get_hn_model():
        # We get the SNR for each analyzed layer for a specific output layer (there is only one in this case)
        layer_snr = params_statistics.get(f"{layer.name}/layer_noise_analysis/noise_results/{out_layer}")
        if layer_snr is not None:
            layers.append(layer.name_without_scope)
            snr.append(layer_snr[0].tolist())
    return layers, snr


def get_worst_snr_layers(layers, snr):
    worst_snr_layers = [(layers[i], snr[i]) for i in np.argpartition(snr, 3)[:3]]
    print(f"Worst SNR is obtained in the following layers:\n{worst_snr_layers}")
    return worst_snr_layers


def plot_snr_graph(layers, snr):
    fig, ax = plt.subplots(figsize=(12, 3))
    plt.plot(layers, snr)
    plt.title(f"Per-Layer Logits SNR ({model_name}), higher is better.")
    plt.xlabel("Layer")
    plt.xticks(rotation=75, fontsize="x-small")
    plt.ylabel("SNR")
    plt.grid()
    plt.show()


layers, snr = get_snr_results()
get_worst_snr_layers(layers, snr)
plot_snr_graph(layers, snr)

### Re-Optimizing the Model
Next, we will try to improve the model accuracy results by using specific model script commands. Specifically, we will use the `activation_clipping` command on the problematic layers to clip outliers from the output of the layers and `optimization_level=2`. For further information we refer the user to the full Accuracy report in the profiler HTML.

In [None]:
runner = ClientRunner(hw_arch="hailo8")
runner.translate_tf_model(model_path, model_name)

model_script_commands = [
    "normalization1 = normalization([127.5, 127.5, 127.5], [127.5, 127.5, 127.5])\n",
    "model_optimization_config(calibration, calibset_size=128)\n",
    "pre_quantization_optimization(activation_clipping, layers=[dw1, conv2, conv3], mode=percentile, clipping_values=[0.5, 99.5])\n",
    "pre_quantization_optimization(weights_clipping, layers=[dw1], mode=percentile, clipping_values=[0.0, 99.99])\n",
    "model_optimization_flavor(optimization_level=2, compression_level=0)\n",
]
runner.load_model_script("".join(model_script_commands))

runner.optimize(data_path)

runner.analyze_noise(data_path, batch_size=2, data_count=16)  # Batch size is 1 by default
runner.save_har(har_path)

!hailo profiler {har_path}
# Note: When working on a remote computer, manual opening of the HTML file may be required

After fixing the optimization process, it should be possible to reduce the model degradation to 1% (Top-1 accuracy on the ImageNet-1K validation dataset)
which is usually the target goal for classification models.

The improvement can also be seen from the new SNR graph:

In [None]:
layers, snr = get_snr_results()
get_worst_snr_layers(layers, snr)
plot_snr_graph(layers, snr)