-
Notifications
You must be signed in to change notification settings - Fork 75.3k
Quantization TOCO implementation #42231
Copy link
Copy link
Closed
Labels
Description
Hi,
i am trying to understand how exactly TOCO does quantize an annotated model. Assume we have a keras model:
Conv2D
MaxPool
Conv2D
MaxPool
Flatten
Dense
Dense(SoftMax)
where the Conv2D layers have been annotated with for post-training quantization using a QuantizeConfig for quantizing weights and activations based on the following Quantizer:
LastValueQuantizer(num_bits=8, symmetric=True,
narrow_range=False,
per_axis=False)
As far as i understand, converting a such annotated keras model to TFLite using TOCO should yield a a model where the Conv2D layers have been quantized to 8bit integers while the rest stays at 32bit float. Is that correct? If yes, how is the rounding from the input keras Conv2D layers in 32bit to int8 accomplished? Is the situation the same when using float16 instead of int8? What abot the siutuation where i set num_bits=4?
Thanks for your help!
Reactions are currently unavailable