-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Android Tflite model fails to load on GPU Delegate: CL_OUT_OF_HOST_MEMORY #68470
Comments
Hi @filip-halt , Can you please provide the tflite model file so that i can replicate the issue? |
You can find a copy here: https://github.com/filip-halt/tflite_bug It was too large to upload directly into this chat. |
Turns out that this is most likely due to a conv2dtranspose layer in the model. I was under the impression that conv2dtranpose was supported but I could be wrong. Another interesting thing that happens is that when you convert the model with: converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16] the resulting binary is 2x as large as float32 and 20% slower on mobile. When I inspected the graph with Netron, it looks like nothing was converted to float16, not even the conv2d's |
Hi @filip-halt , I ran your model using GPU delegate on dimensity 9000 and it ran fine without giving any issues. Can you please try it out with a different device and let me know if it worked there. However the list of supported operators for tflite is here and TRANSPOSE_CONV is listed there. |
This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you. |
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
source
TensorFlow version
org.tensorflow:tensorflow-lite:2.16.1
Custom code
Yes
OS platform and distribution
Android
Mobile device
Samsung S23
Python version
No response
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
Currently trying to get a larger model to load on an S23 but I am running into OOM errors. When initializing an Interpreter using a GPUDelegate with factory options grabbed from CompatibilityList.getBestOptionsForThisDevice(), the Interpreter crashes with
Failed to apply delegate: Failed to build program executable - Out of host memoryError: Program not built!
. This seems to pop up from an OpenCL error that is parsed with:tensorflow/tensorflow/lite/delegates/gpu/cl/util.cc
Line 42 in dd5c426
My best guess is that this is due to hitting the Dalvik-heap memory limit of 512mb found on my device with Runtime.maxMemory(). I profiled the memory usage and it seems to crash around the 450mb mark. Does Tflite on android not use native memory to get around this? I seem to recall people getting 1gb+ models running on their devices. I guess this could possibly be a build step that is going over the limits, but once built it would be offloaded to native?
Note: I am using pyjnius to do this which might be causing problems, but I feel like that isn't the cause.
Standalone code to reproduce the issue
Relevant log output
The text was updated successfully, but these errors were encountered: