System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Android
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Huawei P30 Lite
- TensorFlow installed from (source or binary): source
- TensorFlow version (use command below): 2.4.0
- Python version: -
- Bazel version (if compiling from source): 3.1.0
- GCC/Compiler version (if compiling from source): Android NDK 21.3.6528147
- CUDA/cuDNN version: -
- GPU model and memory: Mali-G51 MP4 as per gsmarena on an Android smartphone
I have been trying to run inference of some CNN model using TFLite 2.4.0 with OpenCL GPU delegate enabled and found that Conv2D operator may produce NaNs, Infs and other invalid values when running on the Mali-G51 MP4 GPU if precision loss is allowed (I assume that getting NaNs is not considered as a reasonable precision loss) and Cond2D padding is set to same. For valid padding model produces valid results.
I've created a simple Conv2D-only (simple_conv.zip, shown on the illustration below) model to test via inference_diff util:

Here are some sample outputs of the inference_diff/run_eval util obtained using the described model on the Huawei P30 Lite (Mali-G51 MP4 GPU) smartphone:
$ adb shell /data/local/tmp/run_eval --model_file=/data/local/tmp/simple_conv.tflite --num_runs=1 --delegate=gpu
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
GPU delegate is created.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
native : inference_profiler_stage.cc:77 Test interpreter has been initialized.
native : tflite_inference_stage.cc:128
native : inference_profiler_stage.cc:91 Reference interpreter (1 thread on CPU) has been initialized.
Num evaluation runs: 1
Reference run latency: avg=55112(us), std_dev=17548(us)
Test run latency: avg=11990(us), std_dev=1488(us)
OutputDiff[0]: avg_error=inf, std_dev=nan
After disabling precision loss manually, model seemed to be working fine, but much slower obviously:
$ adb shell /data/local/tmp/run_eval --model_file=/data/local/tmp/simple_conv.tflite --num_runs=1 --delegate=gpu --gpu_precision_loss_allowed=false
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
GPU delegate is created.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
native : inference_profiler_stage.cc:77 Test interpreter has been initialized.
native : tflite_inference_stage.cc:128
native : inference_profiler_stage.cc:91 Reference interpreter (1 thread on CPU) has been initialized.
Num evaluation runs: 1
Reference run latency: avg=121364(us), std_dev=20829(us)
Test run latency: avg=28716(us), std_dev=1158(us)
OutputDiff[0]: avg_error=0.000120298, std_dev=0
After further investigation I found out that this kind of behavior may be fixed by manually commenting the piece of code responsible for selecting Winograd algorithm kernel as a Conv2D node implementation (i.e. SelectConvolution branch is always used). After this fix, model seemed to be working fine:
$ adb shell /data/local/tmp/run_eval --model_file=/data/local/tmp/simple_conv.tflite --num_runs=1 --delegate=gpu
GPU delegate is created.
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
native : inference_profiler_stage.cc:77 Test interpreter has been initialized.
native : tflite_inference_stage.cc:128
native : inference_profiler_stage.cc:91 Reference interpreter (1 thread on CPU) has been initialized.
Num evaluation runs: 1
Reference run latency: avg=113876(us), std_dev=17465(us)
Test run latency: avg=30590(us), std_dev=3084(us)
OutputDiff[0]: avg_error=0.304206, std_dev=0
Thus I assume that Winograd algorithm implementation for OpenCL delegate is the root cause of the issue. To sum up, here is the list of conditions to reproduce the bug, at least on the Mali-G51 MP4 GPU:
- Create Conv2D node which is suitable for using Winograd algorithm as per check.
- Use
same padding in Conv2D node.
- Use OpenCL backend.
- Allow precision loss.
The same behavior was also observed when running on Samsung Galaxy M31 (Mali-G72 MP3 GPU) and Huawei P20 (Mali-G72 MP12 GPU). However, running default build (i.e. without disabling Winograd manually) on Samsung Galaxy S20+ (Mali-G77 MP11 GPU) was successful.
Please let me know, if you need more details/logs/code etc.
System information
I have been trying to run inference of some CNN model using TFLite 2.4.0 with OpenCL GPU delegate enabled and found that Conv2D operator may produce NaNs, Infs and other invalid values when running on the Mali-G51 MP4 GPU if precision loss is allowed (I assume that getting NaNs is not considered as a reasonable precision loss) and Cond2D padding is set to
same. Forvalidpadding model produces valid results.I've created a simple Conv2D-only (simple_conv.zip, shown on the illustration below) model to test via inference_diff util:
Here are some sample outputs of the
inference_diff/run_evalutil obtained using the described model on the Huawei P30 Lite (Mali-G51 MP4 GPU) smartphone:After disabling precision loss manually, model seemed to be working fine, but much slower obviously:
After further investigation I found out that this kind of behavior may be fixed by manually commenting the piece of code responsible for selecting Winograd algorithm kernel as a Conv2D node implementation (i.e.
SelectConvolutionbranch is always used). After this fix, model seemed to be working fine:Thus I assume that Winograd algorithm implementation for OpenCL delegate is the root cause of the issue. To sum up, here is the list of conditions to reproduce the bug, at least on the Mali-G51 MP4 GPU:
samepadding in Conv2D node.The same behavior was also observed when running on Samsung Galaxy M31 (Mali-G72 MP3 GPU) and Huawei P20 (Mali-G72 MP12 GPU). However, running default build (i.e. without disabling Winograd manually) on Samsung Galaxy S20+ (Mali-G77 MP11 GPU) was successful.
Please let me know, if you need more details/logs/code etc.