Skip to content

TensorflowLite Android OpenCL delegate may produce invalid Conv2D result #45974

@dev0x13

Description

@dev0x13

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Android
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Huawei P30 Lite
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): 2.4.0
  • Python version: -
  • Bazel version (if compiling from source): 3.1.0
  • GCC/Compiler version (if compiling from source): Android NDK 21.3.6528147
  • CUDA/cuDNN version: -
  • GPU model and memory: Mali-G51 MP4 as per gsmarena on an Android smartphone

I have been trying to run inference of some CNN model using TFLite 2.4.0 with OpenCL GPU delegate enabled and found that Conv2D operator may produce NaNs, Infs and other invalid values when running on the Mali-G51 MP4 GPU if precision loss is allowed (I assume that getting NaNs is not considered as a reasonable precision loss) and Cond2D padding is set to same. For valid padding model produces valid results.
I've created a simple Conv2D-only (simple_conv.zip, shown on the illustration below) model to test via inference_diff util:

image

Here are some sample outputs of the inference_diff/run_eval util obtained using the described model on the Huawei P30 Lite (Mali-G51 MP4 GPU) smartphone:

$ adb shell /data/local/tmp/run_eval --model_file=/data/local/tmp/simple_conv.tflite --num_runs=1 --delegate=gpu
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
GPU delegate is created.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
native : inference_profiler_stage.cc:77 Test interpreter has been initialized.
native : tflite_inference_stage.cc:128 
native : inference_profiler_stage.cc:91 Reference interpreter (1 thread on CPU) has been initialized.
Num evaluation runs: 1
Reference run latency: avg=55112(us), std_dev=17548(us)
Test run latency: avg=11990(us), std_dev=1488(us)
OutputDiff[0]: avg_error=inf, std_dev=nan

After disabling precision loss manually, model seemed to be working fine, but much slower obviously:

$ adb shell /data/local/tmp/run_eval --model_file=/data/local/tmp/simple_conv.tflite --num_runs=1 --delegate=gpu --gpu_precision_loss_allowed=false
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
GPU delegate is created.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
native : inference_profiler_stage.cc:77 Test interpreter has been initialized.
native : tflite_inference_stage.cc:128 
native : inference_profiler_stage.cc:91 Reference interpreter (1 thread on CPU) has been initialized.
Num evaluation runs: 1
Reference run latency: avg=121364(us), std_dev=20829(us)
Test run latency: avg=28716(us), std_dev=1158(us)
OutputDiff[0]: avg_error=0.000120298, std_dev=0

After further investigation I found out that this kind of behavior may be fixed by manually commenting the piece of code responsible for selecting Winograd algorithm kernel as a Conv2D node implementation (i.e. SelectConvolution branch is always used). After this fix, model seemed to be working fine:

$ adb shell /data/local/tmp/run_eval --model_file=/data/local/tmp/simple_conv.tflite --num_runs=1 --delegate=gpu
GPU delegate is created.
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
native : inference_profiler_stage.cc:77 Test interpreter has been initialized.
native : tflite_inference_stage.cc:128 
native : inference_profiler_stage.cc:91 Reference interpreter (1 thread on CPU) has been initialized.
Num evaluation runs: 1
Reference run latency: avg=113876(us), std_dev=17465(us)
Test run latency: avg=30590(us), std_dev=3084(us)
OutputDiff[0]: avg_error=0.304206, std_dev=0

Thus I assume that Winograd algorithm implementation for OpenCL delegate is the root cause of the issue. To sum up, here is the list of conditions to reproduce the bug, at least on the Mali-G51 MP4 GPU:

  1. Create Conv2D node which is suitable for using Winograd algorithm as per check.
  2. Use same padding in Conv2D node.
  3. Use OpenCL backend.
  4. Allow precision loss.

The same behavior was also observed when running on Samsung Galaxy M31 (Mali-G72 MP3 GPU) and Huawei P20 (Mali-G72 MP12 GPU). However, running default build (i.e. without disabling Winograd manually) on Samsung Galaxy S20+ (Mali-G77 MP11 GPU) was successful.

Please let me know, if you need more details/logs/code etc.

Metadata

Metadata

Labels

TF 2.4for issues related to TF 2.4comp:liteTF Lite related issuestype:bugBug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions