Quantization TOCO implementation

Hi,

i am trying to understand how exactly TOCO does quantize an annotated model. Assume we have a keras model:
```
Conv2D
MaxPool
Conv2D
MaxPool
Flatten
Dense
Dense(SoftMax)
```
where the Conv2D layers have been annotated with for post-training quantization using a QuantizeConfig for quantizing weights and activations based on the following Quantizer: 

```
LastValueQuantizer(num_bits=8, symmetric=True,
 narrow_range=False,
per_axis=False)
```

As far as i understand, converting a such annotated keras model to TFLite using TOCO should yield a a model where the Conv2D layers have been quantized to 8bit integers while the rest stays at 32bit float. Is that correct? If yes, how is the rounding from the input keras Conv2D layers in 32bit to int8 accomplished? Is the situation the same when using float16 instead of int8? What abot the siutuation where i set num_bits=4?

Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization TOCO implementation #42231

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quantization TOCO implementation #42231

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions