STARTING!
Log parameter values verbosely: [0]
Graph: [./exp_2/model_full_integer_quant.tflite]
Enable op profiling: [1]
Use NNAPI: [1]
NNAPI accelerator name: [google-edgetpu]
NNAPI accelerators available: [google-edgetpu,google-armnn,nnapi-reference]
Loaded model ./exp_2/model_full_integer_quant.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for NNAPI.
NNAPI delegate created.
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
VERBOSE: Replacing 198 node(s) with delegate (TfLiteNnapiDelegate) node, yielding 7 partitions.
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
WARNING: NNAPI SL driver did not implement SL_ANeuralNetworksDiagnostic_registerCallbacks!
Explicitly applied NNAPI delegate, and the model graph will be partially executed by the delegate w/ 4 delegate kernels.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
VERBOSE: Replacing 18 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 7 partitions.
The input model file size (MB): 3.56052
Initialized session in 1620.78ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=7 first=114164 curr=68436 min=67413 max=114164 avg=75216.3 std=15972
Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=68293 curr=66811 min=66170 max=70650 avg=68201.1 std=1021
Inference timings in us: Init: 1620775, First inference: 114164, Warmup (avg): 75216.3, Inference (avg): 68201.1
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=9.42969 overall=51.1094
Profiling Info for Benchmark Initialization:
============================== Run Order ==============================
[node type] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
ModifyGraphWithDelegate 1614.218 808.024 50.129% 50.129% 1252.000 2 ModifyGraphWithDelegate/0
AllocateTensors 1607.685 803.867 49.871% 100.000% 0.000 2 AllocateTensors/0
============================== Top by Computation Time ==============================
[node type] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
ModifyGraphWithDelegate 1614.218 808.024 50.129% 50.129% 1252.000 2 ModifyGraphWithDelegate/0
AllocateTensors 1607.685 803.867 49.871% 100.000% 0.000 2 AllocateTensors/0
Number of nodes executed: 2
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
ModifyGraphWithDelegate 1 1616.047 50.129% 50.129% 1252.000 2
AllocateTensors 1 1607.735 49.871% 100.000% 0.000 2
Timings (microseconds): count=1 curr=3223782
Memory (bytes): count=0
2 nodes observed
Operator-wise Profiling Info for Regular Benchmark Runs:
============================== Run Order ==============================
[node type] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
TfLiteNnapiDelegate 6.458 6.441 9.446% 9.446% 0.000 1 [Identity]:219
TfLiteXNNPackDelegate 5.023 5.078 7.448% 16.894% 0.000 1 [tfl.quantize4]:222
TfLiteNnapiDelegate 10.697 10.688 15.675% 32.569% 0.000 1 [model/tf.math.add_33/Add;model/conv2d_12/Conv2D;model/conv2d_32/Conv2D;model/tf.math.add_33/Add/y1, model/tf.nn.relu_24/Relu;model/tf.math.add_35/Add;model/conv2d_12/Conv2D;model/conv2d_34/Conv2D;model/tf.math.add_35/Add/y, model/tf.math.add_71/Add;model/conv2d_9/Conv2D;model/conv2d_66/Conv2D;model/tf.math.add_71/Add/y1, model/tf.math.add_5/Add;model/conv2d_12/Conv2D;model/conv2d_5/Conv2D;model/tf.math.add_5/Add/y11, model/tf.math.add_56/Add;model/conv2d_12/Conv2D;model/conv2d_53/Conv2D;model/tf.math.add_56/Add/y1]:218
TfLiteXNNPackDelegate 10.060 9.159 13.432% 46.001% 0.000 1 [tfl.quantize1, tfl.quantize3]:221
TfLiteNnapiDelegate 15.087 15.096 22.140% 68.141% 0.000 1 [model/tf.math.add_14/Add;model/conv2d_12/Conv2D;model/conv2d_14/Conv2D;model/tf.math.add_14/Add/y11, model/tf.math.add_29/Add;model/conv2d_12/Conv2D;model/conv2d_28/Conv2D;model/tf.math.add_29/Add/y1, model/tf.math.add_99/Add;model/conv2d_transpose/stack;model/conv2d_transpose_7/conv2d_transpose;model/tf.math.add_99/Add/y1, model/tf.math.add_2/Add;model/conv2d_9/Conv2D;model/conv2d_2/Conv2D;model/tf.math.add_2/Add/y1, model/tf.math.add_2/Add;model/conv2d_9/Conv2D;model/conv2d_2/Conv2D;model/tf.math.add_2/Add/y11, model/tf.math.add_75/Add;model/conv2d_12/Conv2D;model/conv2d_70/Conv2D;model/tf.math.add_75/Add/y1]:217
TfLiteNnapiDelegate 8.404 8.342 12.235% 80.376% 0.000 1 [tfl.dequantize, model/tf.strided_slice/StridedSlice31, model/tf.math.add_11/Add;model/conv2d_9/Conv2D;model/conv2d_11/Conv2D;model/tf.math.add_11/Add/y11, model/tf.math.add_47/Add;model/conv2d_9/Conv2D;model/conv2d_44/Conv2D;model/tf.math.add_47/Add/y1]:216
TfLiteXNNPackDelegate 12.543 13.381 19.624% 100.000% 0.000 1 [tfl.quantize, model/tf.math.add_94/Add;model/conv2d_9/Conv2D;model/conv2d_79/Conv2D;model/tf.math.add_94/Add/y11, tfl.quantize2]:220
============================== Top by Computation Time ==============================
[node type] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
TfLiteNnapiDelegate 15.087 15.096 22.140% 22.140% 0.000 1 [model/tf.math.add_14/Add;model/conv2d_12/Conv2D;model/conv2d_14/Conv2D;model/tf.math.add_14/Add/y11, model/tf.math.add_29/Add;model/conv2d_12/Conv2D;model/conv2d_28/Conv2D;model/tf.math.add_29/Add/y1, model/tf.math.add_99/Add;model/conv2d_transpose/stack;model/conv2d_transpose_7/conv2d_transpose;model/tf.math.add_99/Add/y1, model/tf.math.add_2/Add;model/conv2d_9/Conv2D;model/conv2d_2/Conv2D;model/tf.math.add_2/Add/y1, model/tf.math.add_2/Add;model/conv2d_9/Conv2D;model/conv2d_2/Conv2D;model/tf.math.add_2/Add/y11, model/tf.math.add_75/Add;model/conv2d_12/Conv2D;model/conv2d_70/Conv2D;model/tf.math.add_75/Add/y1]:217
TfLiteXNNPackDelegate 12.543 13.381 19.624% 41.764% 0.000 1 [tfl.quantize, model/tf.math.add_94/Add;model/conv2d_9/Conv2D;model/conv2d_79/Conv2D;model/tf.math.add_94/Add/y11, tfl.quantize2]:220
TfLiteNnapiDelegate 10.697 10.688 15.675% 57.439% 0.000 1 [model/tf.math.add_33/Add;model/conv2d_12/Conv2D;model/conv2d_32/Conv2D;model/tf.math.add_33/Add/y1, model/tf.nn.relu_24/Relu;model/tf.math.add_35/Add;model/conv2d_12/Conv2D;model/conv2d_34/Conv2D;model/tf.math.add_35/Add/y, model/tf.math.add_71/Add;model/conv2d_9/Conv2D;model/conv2d_66/Conv2D;model/tf.math.add_71/Add/y1, model/tf.math.add_5/Add;model/conv2d_12/Conv2D;model/conv2d_5/Conv2D;model/tf.math.add_5/Add/y11, model/tf.math.add_56/Add;model/conv2d_12/Conv2D;model/conv2d_53/Conv2D;model/tf.math.add_56/Add/y1]:218
TfLiteXNNPackDelegate 10.060 9.159 13.432% 70.871% 0.000 1 [tfl.quantize1, tfl.quantize3]:221
TfLiteNnapiDelegate 8.404 8.342 12.235% 83.106% 0.000 1 [tfl.dequantize, model/tf.strided_slice/StridedSlice31, model/tf.math.add_11/Add;model/conv2d_9/Conv2D;model/conv2d_11/Conv2D;model/tf.math.add_11/Add/y11, model/tf.math.add_47/Add;model/conv2d_9/Conv2D;model/conv2d_44/Conv2D;model/tf.math.add_47/Add/y1]:216
TfLiteNnapiDelegate 6.458 6.441 9.446% 92.552% 0.000 1 [Identity]:219
TfLiteXNNPackDelegate 5.023 5.078 7.448% 100.000% 0.000 1 [tfl.quantize4]:222
Number of nodes executed: 7
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
TfLiteNnapiDelegate 4 40.564 59.495% 59.495% 0.000 4
TfLiteXNNPackDelegate 3 27.616 40.505% 100.000% 0.000 3
Timings (microseconds): count=50 first=68272 curr=66794 min=66156 max=70634 avg=68184.7 std=1022
Memory (bytes): count=0
7 nodes observed
Delegate internal:
============================== Run Order ==============================
[node type] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
DelegateOpInvoke 0.265 0.283 1.025% 1.025% 0.000 1 Delegate/Convert (NC, F32, QS8):5
DelegateOpInvoke 1.737 2.685 29.222% 30.247% 0.000 3 Delegate/Convolution (NHWC, F32) IGEMM:0
DelegateOpInvoke 0.035 0.035 0.382% 30.629% 0.000 3 Delegate/Convert (NC, F32, QS8):1
DelegateOpInvoke 0.978 4.353 31.581% 62.210% 0.000 2 Delegate/Convolution (NHWC, F32) IGEMM:2
DelegateOpInvoke 0.133 0.149 1.085% 63.295% 0.000 2 Delegate/Convert (NC, F32, QS8):3
DelegateOpInvoke 9.371 10.118 36.705% 100.000% 0.000 1 Delegate/Convolution (NHWC, F32) IGEMM:4
============================== Top by Computation Time ==============================
[node type] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
DelegateOpInvoke 9.371 10.118 36.705% 36.705% 0.000 1 Delegate/Convolution (NHWC, F32) IGEMM:4
DelegateOpInvoke 0.978 4.353 31.581% 68.286% 0.000 2 Delegate/Convolution (NHWC, F32) IGEMM:2
DelegateOpInvoke 1.737 2.685 29.222% 97.508% 0.000 3 Delegate/Convolution (NHWC, F32) IGEMM:0
DelegateOpInvoke 0.265 0.283 1.025% 98.533% 0.000 1 Delegate/Convert (NC, F32, QS8):5
DelegateOpInvoke 0.133 0.149 1.085% 99.618% 0.000 2 Delegate/Convert (NC, F32, QS8):3
DelegateOpInvoke 0.035 0.035 0.382% 100.000% 0.000 3 Delegate/Convert (NC, F32, QS8):1
Number of nodes executed: 6
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
DelegateOpInvoke 6 27.562 100.000% 100.000% 0.000 12
Timings (microseconds): count=50 first=27574 curr=26304 min=26187 max=29665 avg=27565.3 std=973
Memory (bytes): count=0
6 nodes observed
1. System information
2. Code
Option B: Paste your code here or provide a link to a custom end-to-end colab
Some of the conversion codes are as follows.
3. Failure after conversion
Quantized tflite model is here.
model_full_integer_quant.tflite.zip
4. (optional) RNN conversion support
None.
5. (optional) Any other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
The command to run model benchmark tool on Pixel 6 is as follows. I found specifying
nnapi_accelerator_name="google-edgetpu"accelerates the inference speed the most.android_aarch64_benchmark_model --graph=./model_full_integer_quant.tflite --use_nnapi=true --nnapi_accelerator_name="google-edgetpu" --enable_op_profiling=trueThe log is here.
Click here to expand the log.