Some conv2d operations remain float32 after post training full integer quantization

### 1. System information

- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Monteley 12.5.1, Android 12
- TensorFlow installation (pip package or built from source): pip package
- TensorFlow library (version, if pip package or github SHA, if built from source): 2.5.3

### 2. Code

#### Option B: Paste your code here or provide a link to a custom end-to-end colab

Some of the conversion codes are as follows.  

```
converter.experimental_new_quantizer = True
converter._experimental_disable_per_channel = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8, tf.lite.OpsSet.SELECT_TF_OPS]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
converter.representative_dataset = representative_dataset_gen
tflite_model = converter.convert()
```

### 3. Failure after conversion

- The conversion is successful, but some conv2d operations remain float32 after post training full integer quantization. Some conv2d operations are sandwiched between dequantize and quantize operation, and these conv2d operations has float32 weights and bias. All conv2d should have int8 weights and int32 biases.  
- I tried running this model on Pixel 6 NNAPI delegate with TFLite model benchmark tool. Some operations fallbacks to the XNNPACK delegate. I suspect that the fallback to XNNPACK delegate is due to the float32 conv2d layers and it causes a slight degradation in inference speed.
- How can I quantize all operations to int8 weights and int32 biases? Also, will quantizing all operations to int8 improve inference speed?

Quantized tflite model is here.  

[model_full_integer_quant.tflite.zip](https://github.com/tensorflow/tensorflow/files/9603395/model_full_integer_quant.tflite.zip)

### 4. (optional) RNN conversion support  

None.  

### 5. (optional) Any other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.  

The command to run model benchmark tool on Pixel 6 is as follows. I found specifying `nnapi_accelerator_name="google-edgetpu"` accelerates the inference speed the most.  

`android_aarch64_benchmark_model --graph=./model_full_integer_quant.tflite --use_nnapi=true --nnapi_accelerator_name="google-edgetpu" --enable_op_profiling=true`  

The log is here.  

<details>
<summary>Click here to expand the log.</summary>

```
STARTING!
Log parameter values verbosely: [0]
Graph: [./exp_2/model_full_integer_quant.tflite]
Enable op profiling: [1]
Use NNAPI: [1]
NNAPI accelerator name: [google-edgetpu]
NNAPI accelerators available: [google-edgetpu,google-armnn,nnapi-reference]
Loaded model ./exp_2/model_full_integer_quant.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for NNAPI.
NNAPI delegate created.
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
VERBOSE: Replacing 198 node(s) with delegate (TfLiteNnapiDelegate) node, yielding 7 partitions.
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
Explicitly applied NNAPI delegate, and the model graph will be partially executed by the delegate w/ 4 delegate kernels.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
VERBOSE: Replacing 18 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 7 partitions.
The input model file size (MB): 3.56052
Initialized session in 1620.78ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=7 first=114164 curr=68436 min=67413 max=114164 avg=75216.3 std=15972

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=68293 curr=66811 min=66170 max=70650 avg=68201.1 std=1021

Inference timings in us: Init: 1620775, First inference: 114164, Warmup (avg): 75216.3, Inference (avg): 68201.1
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=9.42969 overall=51.1094
Profiling Info for Benchmark Initialization:
============================== Run Order ==============================
	             [node type]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	 ModifyGraphWithDelegate	 1614.218	  808.024	 50.129%	 50.129%	  1252.000	        2	ModifyGraphWithDelegate/0
	         AllocateTensors	 1607.685	  803.867	 49.871%	100.000%	     0.000	        2	AllocateTensors/0

============================== Top by Computation Time ==============================
	             [node type]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	 ModifyGraphWithDelegate	 1614.218	  808.024	 50.129%	 50.129%	  1252.000	        2	ModifyGraphWithDelegate/0
	         AllocateTensors	 1607.685	  803.867	 49.871%	100.000%	     0.000	        2	AllocateTensors/0

Number of nodes executed: 2
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	 ModifyGraphWithDelegate	        1	  1616.047	    50.129%	    50.129%	  1252.000	        2
	         AllocateTensors	        1	  1607.735	    49.871%	   100.000%	     0.000	        2

Timings (microseconds): count=1 curr=3223782
Memory (bytes): count=0
2 nodes observed



Operator-wise Profiling Info for Regular Benchmark Runs:
============================== Run Order ==============================
	             [node type]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	     TfLiteNnapiDelegate	    6.458	    6.441	  9.446%	  9.446%	     0.000	        1	[Identity]:219
	   TfLiteXNNPackDelegate	    5.023	    5.078	  7.448%	 16.894%	     0.000	        1	[tfl.quantize4]:222
	     TfLiteNnapiDelegate	   10.697	   10.688	 15.675%	 32.569%	     0.000	        1	[model/tf.math.add_33/Add;model/conv2d_12/Conv2D;model/conv2d_32/Conv2D;model/tf.math.add_33/Add/y1, model/tf.nn.relu_24/Relu;model/tf.math.add_35/Add;model/conv2d_12/Conv2D;model/conv2d_34/Conv2D;model/tf.math.add_35/Add/y, model/tf.math.add_71/Add;model/conv2d_9/Conv2D;model/conv2d_66/Conv2D;model/tf.math.add_71/Add/y1, model/tf.math.add_5/Add;model/conv2d_12/Conv2D;model/conv2d_5/Conv2D;model/tf.math.add_5/Add/y11, model/tf.math.add_56/Add;model/conv2d_12/Conv2D;model/conv2d_53/Conv2D;model/tf.math.add_56/Add/y1]:218
	   TfLiteXNNPackDelegate	   10.060	    9.159	 13.432%	 46.001%	     0.000	        1	[tfl.quantize1, tfl.quantize3]:221
	     TfLiteNnapiDelegate	   15.087	   15.096	 22.140%	 68.141%	     0.000	        1	[model/tf.math.add_14/Add;model/conv2d_12/Conv2D;model/conv2d_14/Conv2D;model/tf.math.add_14/Add/y11, model/tf.math.add_29/Add;model/conv2d_12/Conv2D;model/conv2d_28/Conv2D;model/tf.math.add_29/Add/y1, model/tf.math.add_99/Add;model/conv2d_transpose/stack;model/conv2d_transpose_7/conv2d_transpose;model/tf.math.add_99/Add/y1, model/tf.math.add_2/Add;model/conv2d_9/Conv2D;model/conv2d_2/Conv2D;model/tf.math.add_2/Add/y1, model/tf.math.add_2/Add;model/conv2d_9/Conv2D;model/conv2d_2/Conv2D;model/tf.math.add_2/Add/y11, model/tf.math.add_75/Add;model/conv2d_12/Conv2D;model/conv2d_70/Conv2D;model/tf.math.add_75/Add/y1]:217
	     TfLiteNnapiDelegate	    8.404	    8.342	 12.235%	 80.376%	     0.000	        1	[tfl.dequantize, model/tf.strided_slice/StridedSlice31, model/tf.math.add_11/Add;model/conv2d_9/Conv2D;model/conv2d_11/Conv2D;model/tf.math.add_11/Add/y11, model/tf.math.add_47/Add;model/conv2d_9/Conv2D;model/conv2d_44/Conv2D;model/tf.math.add_47/Add/y1]:216
	   TfLiteXNNPackDelegate	   12.543	   13.381	 19.624%	100.000%	     0.000	        1	[tfl.quantize, model/tf.math.add_94/Add;model/conv2d_9/Conv2D;model/conv2d_79/Conv2D;model/tf.math.add_94/Add/y11, tfl.quantize2]:220

============================== Top by Computation Time ==============================
	             [node type]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	     TfLiteNnapiDelegate	   15.087	   15.096	 22.140%	 22.140%	     0.000	        1	[model/tf.math.add_14/Add;model/conv2d_12/Conv2D;model/conv2d_14/Conv2D;model/tf.math.add_14/Add/y11, model/tf.math.add_29/Add;model/conv2d_12/Conv2D;model/conv2d_28/Conv2D;model/tf.math.add_29/Add/y1, model/tf.math.add_99/Add;model/conv2d_transpose/stack;model/conv2d_transpose_7/conv2d_transpose;model/tf.math.add_99/Add/y1, model/tf.math.add_2/Add;model/conv2d_9/Conv2D;model/conv2d_2/Conv2D;model/tf.math.add_2/Add/y1, model/tf.math.add_2/Add;model/conv2d_9/Conv2D;model/conv2d_2/Conv2D;model/tf.math.add_2/Add/y11, model/tf.math.add_75/Add;model/conv2d_12/Conv2D;model/conv2d_70/Conv2D;model/tf.math.add_75/Add/y1]:217
	   TfLiteXNNPackDelegate	   12.543	   13.381	 19.624%	 41.764%	     0.000	        1	[tfl.quantize, model/tf.math.add_94/Add;model/conv2d_9/Conv2D;model/conv2d_79/Conv2D;model/tf.math.add_94/Add/y11, tfl.quantize2]:220
	     TfLiteNnapiDelegate	   10.697	   10.688	 15.675%	 57.439%	     0.000	        1	[model/tf.math.add_33/Add;model/conv2d_12/Conv2D;model/conv2d_32/Conv2D;model/tf.math.add_33/Add/y1, model/tf.nn.relu_24/Relu;model/tf.math.add_35/Add;model/conv2d_12/Conv2D;model/conv2d_34/Conv2D;model/tf.math.add_35/Add/y, model/tf.math.add_71/Add;model/conv2d_9/Conv2D;model/conv2d_66/Conv2D;model/tf.math.add_71/Add/y1, model/tf.math.add_5/Add;model/conv2d_12/Conv2D;model/conv2d_5/Conv2D;model/tf.math.add_5/Add/y11, model/tf.math.add_56/Add;model/conv2d_12/Conv2D;model/conv2d_53/Conv2D;model/tf.math.add_56/Add/y1]:218
	   TfLiteXNNPackDelegate	   10.060	    9.159	 13.432%	 70.871%	     0.000	        1	[tfl.quantize1, tfl.quantize3]:221
	     TfLiteNnapiDelegate	    8.404	    8.342	 12.235%	 83.106%	     0.000	        1	[tfl.dequantize, model/tf.strided_slice/StridedSlice31, model/tf.math.add_11/Add;model/conv2d_9/Conv2D;model/conv2d_11/Conv2D;model/tf.math.add_11/Add/y11, model/tf.math.add_47/Add;model/conv2d_9/Conv2D;model/conv2d_44/Conv2D;model/tf.math.add_47/Add/y1]:216
	     TfLiteNnapiDelegate	    6.458	    6.441	  9.446%	 92.552%	     0.000	        1	[Identity]:219
	   TfLiteXNNPackDelegate	    5.023	    5.078	  7.448%	100.000%	     0.000	        1	[tfl.quantize4]:222

Number of nodes executed: 7
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	     TfLiteNnapiDelegate	        4	    40.564	    59.495%	    59.495%	     0.000	        4
	   TfLiteXNNPackDelegate	        3	    27.616	    40.505%	   100.000%	     0.000	        3

Timings (microseconds): count=50 first=68272 curr=66794 min=66156 max=70634 avg=68184.7 std=1022
Memory (bytes): count=0
7 nodes observed

Delegate internal: 
============================== Run Order ==============================
	             [node type]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	        DelegateOpInvoke	    0.265	    0.283	  1.025%	  1.025%	     0.000	        1	Delegate/Convert (NC, F32, QS8):5
	        DelegateOpInvoke	    1.737	    2.685	 29.222%	 30.247%	     0.000	        3	Delegate/Convolution (NHWC, F32) IGEMM:0
	        DelegateOpInvoke	    0.035	    0.035	  0.382%	 30.629%	     0.000	        3	Delegate/Convert (NC, F32, QS8):1
	        DelegateOpInvoke	    0.978	    4.353	 31.581%	 62.210%	     0.000	        2	Delegate/Convolution (NHWC, F32) IGEMM:2
	        DelegateOpInvoke	    0.133	    0.149	  1.085%	 63.295%	     0.000	        2	Delegate/Convert (NC, F32, QS8):3
	        DelegateOpInvoke	    9.371	   10.118	 36.705%	100.000%	     0.000	        1	Delegate/Convolution (NHWC, F32) IGEMM:4

============================== Top by Computation Time ==============================
	             [node type]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	        DelegateOpInvoke	    9.371	   10.118	 36.705%	 36.705%	     0.000	        1	Delegate/Convolution (NHWC, F32) IGEMM:4
	        DelegateOpInvoke	    0.978	    4.353	 31.581%	 68.286%	     0.000	        2	Delegate/Convolution (NHWC, F32) IGEMM:2
	        DelegateOpInvoke	    1.737	    2.685	 29.222%	 97.508%	     0.000	        3	Delegate/Convolution (NHWC, F32) IGEMM:0
	        DelegateOpInvoke	    0.265	    0.283	  1.025%	 98.533%	     0.000	        1	Delegate/Convert (NC, F32, QS8):5
	        DelegateOpInvoke	    0.133	    0.149	  1.085%	 99.618%	     0.000	        2	Delegate/Convert (NC, F32, QS8):3
	        DelegateOpInvoke	    0.035	    0.035	  0.382%	100.000%	     0.000	        3	Delegate/Convert (NC, F32, QS8):1

Number of nodes executed: 6
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	        DelegateOpInvoke	        6	    27.562	   100.000%	   100.000%	     0.000	       12

Timings (microseconds): count=50 first=27574 curr=26304 min=26187 max=29665 avg=27565.3 std=973
Memory (bytes): count=0
6 nodes observed
```
</details>



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some conv2d operations remain float32 after post training full integer quantization #57758

1. System information

2. Code

Option B: Paste your code here or provide a link to a custom end-to-end colab

3. Failure after conversion

4. (optional) RNN conversion support

5. (optional) Any other info / logs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Some conv2d operations remain float32 after post training full integer quantization #57758

Description

1. System information

2. Code

Option B: Paste your code here or provide a link to a custom end-to-end colab

3. Failure after conversion

4. (optional) RNN conversion support

5. (optional) Any other info / logs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions