This tutorial demonstrates how to use NNCF 8-bit quantization in post-training mode (without the fine-tuning pipeline) to optimize a PyTorch model for high-speed inference via OpenVINO Toolkit. For more advanced NNCF usage, refer to these examples.
To speed up download and validation, this tutorial uses a pre-trained ResNet-50 model on the Tiny ImageNet dataset.
The tutorial consists of the following steps:
- Evaluating the original model.
- Transforming the original
FP32
model toINT8
. - Exporting optimized and original models to ONNX and then to OpenVINO IR.
- Comparing performance of the obtained
FP32
andINT8
models.
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to Installation Guide.