After quantization aware training, the inference time of the int8 tflite model is slower than float32 in the CPU

**Describe the bug**
For the int8 tflite model created by quantization aware training, the inference time is very slow. I got the int8 and flaot32 tflite models by [training example](https://colab.research.google.com/github/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/g3doc/guide/quantization/training_example.ipynb). And I run both on the [TFLite Model Benchmark Tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark). I found that the int8 tflite's average running time is 258.413 microseconds and float32 tflite's average running time is 41.7952 microseconds.

**System information**

TensorFlow version (installed from source or binary): tf-nightly(2.5.0-dev20201123)

TensorFlow Model Optimization version (installed from source or binary):0.5.0

**Describe the expected behavior**
the int8 tflite should run faster than float32 in the CPU. 

**Describe the current behavior**
the int8 tflite runs much slower than float32 in the CPU. 

**Screenshots**
Here is the result of the benchmark

For the float32,
![Screen Shot 2020-11-23 at 5 05 44 PM](https://user-images.githubusercontent.com/60892197/100020703-3fa0e300-2dae-11eb-8198-def478c97ed7.png)


For the int8,
![Screen Shot 2020-11-23 at 5 05 50 PM](https://user-images.githubusercontent.com/60892197/100020692-3adc2f00-2dae-11eb-82a8-26141e0e7e11.png)


**Additional context**
I also found a similar situation for the MobileNetV1. The system I am using is macOS 10.15.7 with Intel Core i9-8950HK.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

After quantization aware training, the inference time of the int8 tflite model is slower than float32 in the CPU #599

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

After quantization aware training, the inference time of the int8 tflite model is slower than float32 in the CPU #599

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions