Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android Tflite model fails to load on GPU Delegate: CL_OUT_OF_HOST_MEMORY #68470

Open
filip-halt opened this issue May 22, 2024 · 6 comments
Open
Assignees
Labels
comp:lite TF Lite related issues TF 2.16 TFLiteGpuDelegate TFLite Gpu delegate issue type:bug Bug

Comments

@filip-halt
Copy link

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

org.tensorflow:tensorflow-lite:2.16.1

Custom code

Yes

OS platform and distribution

Android

Mobile device

Samsung S23

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Currently trying to get a larger model to load on an S23 but I am running into OOM errors. When initializing an Interpreter using a GPUDelegate with factory options grabbed from CompatibilityList.getBestOptionsForThisDevice(), the Interpreter crashes with Failed to apply delegate: Failed to build program executable - Out of host memoryError: Program not built!. This seems to pop up from an OpenCL error that is parsed with:

case CL_OUT_OF_HOST_MEMORY:

My best guess is that this is due to hitting the Dalvik-heap memory limit of 512mb found on my device with Runtime.maxMemory(). I profiled the memory usage and it seems to crash around the 450mb mark. Does Tflite on android not use native memory to get around this? I seem to recall people getting 1gb+ models running on their devices. I guess this could possibly be a build step that is going over the limits, but once built it would be offloaded to native?

Note: I am using pyjnius to do this which might be causing problems, but I feel like that isn't the cause.

Standalone code to reproduce the issue

Not sure how useful.

Relevant log output

05-22 16:31:08.869 23806 23859 I python  :  jnius.jnius.JavaException: JVM exception occurred: Internal error: Failed to apply delegate: Failed to build program executable - Out of host memoryError: Program not built!
05-22 16:31:08.869 23806 23859 I python  :  Falling back to OpenGL
05-22 16:31:08.869 23806 23859 I python  :  TfLiteGpuDelegate Init: No shader implementation for transpose
05-22 16:31:08.869 23806 23859 I python  :  TfLiteGpuDelegate Prepare: delegate is not initialized
05-22 16:31:08.869 23806 23859 I python  :  Node number 2612 (TfLiteGpuDelegateV2) failed to prepare.
@google-ml-butler google-ml-butler bot added the type:bug Bug label May 22, 2024
@tilakrayal tilakrayal added TF 2.16 comp:lite TF Lite related issues TFLiteGpuDelegate TFLite Gpu delegate issue labels May 23, 2024
@tilakrayal tilakrayal assigned sawantkumar and unassigned tilakrayal May 23, 2024
@sawantkumar
Copy link

Hi @filip-halt ,

Can you please provide the tflite model file so that i can replicate the issue?

@sawantkumar sawantkumar added the stat:awaiting response Status - Awaiting response from author label May 23, 2024
@filip-halt
Copy link
Author

Hi @filip-halt ,

Can you please provide the tflite model file so that i can replicate the issue?

You can find a copy here: https://github.com/filip-halt/tflite_bug It was too large to upload directly into this chat.

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label May 23, 2024
@filip-halt
Copy link
Author

filip-halt commented May 24, 2024

Turns out that this is most likely due to a conv2dtranspose layer in the model. I was under the impression that conv2dtranpose was supported but I could be wrong.

Another interesting thing that happens is that when you convert the model with:

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]

the resulting binary is 2x as large as float32 and 20% slower on mobile. When I inspected the graph with Netron, it looks like nothing was converted to float16, not even the conv2d's

@sawantkumar
Copy link

Hi @filip-halt ,

I ran your model using GPU delegate on dimensity 9000 and it ran fine without giving any issues. Can you please try it out with a different device and let me know if it worked there. However the list of supported operators for tflite is here and TRANSPOSE_CONV is listed there.

@sawantkumar sawantkumar added the stat:awaiting response Status - Awaiting response from author label May 27, 2024
Copy link

github-actions bot commented Jun 4, 2024

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 4, 2024
@filip-halt
Copy link
Author

I believe that this is a grouped TRANSPOSE_CONV conversion problem. Tensorflow seems to barely support this and is what is breaking when converting onnx to tf. The default conversion creates a large amount of layers that ultimately cause an OOM on the phone when loading.

image

@google-ml-butler google-ml-butler bot removed stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author labels Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues TF 2.16 TFLiteGpuDelegate TFLite Gpu delegate issue type:bug Bug
Projects
None yet
Development

No branches or pull requests

3 participants