Skip to content

After quantization aware training, the inference time of the int8 tflite model is slower than float32 in the CPU #599

@pl-ang

Description

@pl-ang

Describe the bug
For the int8 tflite model created by quantization aware training, the inference time is very slow. I got the int8 and flaot32 tflite models by training example. And I run both on the TFLite Model Benchmark Tool. I found that the int8 tflite's average running time is 258.413 microseconds and float32 tflite's average running time is 41.7952 microseconds.

System information

TensorFlow version (installed from source or binary): tf-nightly(2.5.0-dev20201123)

TensorFlow Model Optimization version (installed from source or binary):0.5.0

Describe the expected behavior
the int8 tflite should run faster than float32 in the CPU.

Describe the current behavior
the int8 tflite runs much slower than float32 in the CPU.

Screenshots
Here is the result of the benchmark

For the float32,
Screen Shot 2020-11-23 at 5 05 44 PM

For the int8,
Screen Shot 2020-11-23 at 5 05 50 PM

Additional context
I also found a similar situation for the MobileNetV1. The system I am using is macOS 10.15.7 with Intel Core i9-8950HK.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions