Skip to content

Quantization TOCO implementation #42231

@lorenz0890

Description

@lorenz0890

Hi,

i am trying to understand how exactly TOCO does quantize an annotated model. Assume we have a keras model:

Conv2D
MaxPool
Conv2D
MaxPool
Flatten
Dense
Dense(SoftMax)

where the Conv2D layers have been annotated with for post-training quantization using a QuantizeConfig for quantizing weights and activations based on the following Quantizer:

LastValueQuantizer(num_bits=8, symmetric=True,
 narrow_range=False,
per_axis=False)

As far as i understand, converting a such annotated keras model to TFLite using TOCO should yield a a model where the Conv2D layers have been quantized to 8bit integers while the rest stays at 32bit float. Is that correct? If yes, how is the rounding from the input keras Conv2D layers in 32bit to int8 accomplished? Is the situation the same when using float16 instead of int8? What abot the siutuation where i set num_bits=4?

Thanks for your help!

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions