Model optimization

Tensorflow Lite and the Tensorflow Model Optimization Toolkit provide tools to minimize the complexity of optimizing inference.

Inference efficiency is particularly important for edge devices, such as mobile and Internet of Things (IoT). Such devices have many restrictions on processing, memory, power-consumption, and storage for models. Furthermore, model optimization unlocks the processing power of fixed-point hardware and next generation hardware accelerators.

Model quantization

Quantizing deep neural networks uses techniques that allow for reduced precision representations of weights and, optionally, activations for both storage and computation. Quantization provides several benefits:

Support on existing CPU platforms.
Quantization of activations reduces memory access costs for reading and storing intermediate activations.
Many CPU and hardware accelerator implementations provide SIMD instruction capabilities, which are especially beneficial for quantization.

TensorFlow Lite provides several levels of support for quantization.

Tensorflow Lite post-training quantization quantizes weights and activations post training easily.
Quantization-aware training{:.external} allows for training of networks that can be quantized with minimal accuracy drop; this is only available for a subset of convolutional neural network architectures.

Latency and accuracy results

Below are the latency and accuracy results for post-training quantization and quantization-aware training on a few models. All latency numbers are measured on Pixel 2 devices using a single big core. As the toolkit improves, so will the numbers here:

Model	Top-1 Accuracy (Original)	Top-1 Accuracy (Post Training Quantized)	Top-1 Accuracy (Quantization Aware Training)	Latency (Original) (ms)	Latency (Post Training Quantized) (ms)	Latency (Quantization Aware Training) (ms)	Size (Original) (MB)	Size (Optimized) (MB)
Mobilenet-v1-1-224	0.709	0.657	0.70	124	112	64	16.9	4.3
Mobilenet-v2-1-224	0.719	0.637	0.709	89	98	54	14	3.6
Inception_v3	0.78	0.772	0.775	1130	845	543	95.7	23.9
Resnet_v2_101	0.770	0.768	N/A	3973	2868	N/A	178.3	44.9

Table 1 Benefits of model quantization for select CNN models

Choice of tool

As a starting point, check if the models in hosted models can work for your application. If not, we recommend that users start with the post-training quantization tool since this is broadly applicable and does not require training data.

For cases where the accuracy and latency targets are not met, or hardware accelerator support is important, quantization-aware training{:.external} is the better option. See additional optimization techniques under the Tensorflow Model Optimization Toolkit.

Note: Quantization-aware training supports a subset of convolutional neural network architectures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model_optimization.md

model_optimization.md

Model optimization

Model quantization

Latency and accuracy results

Choice of tool

Files

model_optimization.md

Latest commit

History

model_optimization.md

File metadata and controls

Model optimization

Model quantization

Latency and accuracy results

Choice of tool