This tutorial demonstrates how to apply INT8
quantization to the speech recognition models,
using post-training quantization with NNCF (Neural Network Compression Framework).
The code of the tutorial is designed to be extendable to custom models and datasets.
The tutorial consists of the following steps:
- Downloading and preparing the model and dataset.
- Defining data loading and accuracy validation functionality.
- Preparing the model for quantization.
- Running quantization.
- Comparing performance of the original and quantized models.
- Compare accuracy of the original and quantized models.
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to Installation Guide.