Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BatchToSpaceND and SpaceToBatchND ERROR_GPU_NOT_COMPATIBLE #59870

Open
ShashmurinSergey opened this issue Mar 2, 2023 · 10 comments
Open

BatchToSpaceND and SpaceToBatchND ERROR_GPU_NOT_COMPATIBLE #59870

ShashmurinSergey opened this issue Mar 2, 2023 · 10 comments
Assignees
Labels
comp:lite TF Lite related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.10 TFLiteConverter For issues related to TFLite converter type:feature Feature requests

Comments

@ShashmurinSergey
Copy link

ShashmurinSergey commented Mar 2, 2023

1. System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.04):
  • TensorFlow installation (pip package or built from source): pip
  • TensorFlow library (version, if pip package or github SHA, if built from source): 2.10.0

2. Code

converter = tf.lite.TFLiteConverter.from_saved_model(in_keras_path)
converter.target_spec.supported_ops = [
    tf.lite.OpsSet.TFLITE_BUILTINS,  # enable TensorFlow Lite ops.
    tf.lite.OpsSet.SELECT_TF_OPS  # enable TensorFlow ops.
]
converter.target_spec.experimental_supported_backends = ["GPU"] # if empty, GPU is not enabled

converter.experimental_new_converter = True

converter.optimizations = [tf.lite.Optimize.DEFAULT] #8-bit quantization
converter.allow_custom_ops = True

tflite_quant_model = converter.convert()

3. Failure after conversion

'BatchToSpaceND' ERROR_GPU_NOT_COMPATIBLE
'SpaceToBatchND' ERROR_GPU_NOT_COMPATIBLE

5. (optional) Any other info / logs

Hi everyone! I am having issues converting the U-2-Net model from PyTorch to TFLite for running on Android. My conversion path is PyTorch(pth) -> ONNX -> TensorFlow -> TFLite. An important requirement is that the TFLite model should support GPU execution and needs to be quantized. The main conversion path works fine, but I encountered an error during the TFLite conversion. After some investigation, I found out that if I change the layers with Conv2D parameters dilation > 1 and padding > 1 to dilation=1 and padding=1, the conversion works without any issues. However, this reduces the model's quality. Obviously, if GPU support is disabled, the model can be converted. I have tried using QuantizationDebugOptions and QuantizationDebugger, but it did not yield any results, the error remains the same. Can you please suggest any way to perform this conversion without compromising the model's quality?

@ShashmurinSergey ShashmurinSergey added the TFLiteConverter For issues related to TFLite converter label Mar 2, 2023
@ShashmurinSergey
Copy link
Author

This is my model. You can do conversion with my code and reproduce my result.
https://drive.google.com/file/d/1nTVfurH8gEjSU8WOx7XHsCqMJwT5g2IT/view?usp=sharing

@synandi synandi added comp:lite TF Lite related issues TF 2.10 labels Mar 6, 2023
@synandi
Copy link
Contributor

synandi commented Mar 6, 2023

Hi @ShashmurinSergey, I was able to replicate the issue in Colab using TF v2.10, TF v2.11 and tf-nightly(2.13.0.dev20230305). Please find the gists here(2.10), here(2.11) and here(tf-nightly). Thank you!

@synandi synandi assigned pjpratik and unassigned synandi Mar 6, 2023
@ShashmurinSergey
Copy link
Author

@synandi Yes, replicate is correct, Thank you!
What can you suggest to solve the problem?

@pjpratik
Copy link
Contributor

Hi @ShashmurinSergey

Sorry for the delayed response.

The BATCH_TO_SPACE_ND and SPACE_TO_BATCH_ND are not compatible on GPU. Please find the list of of supported ops on GPU here.

Also, as given in the documentation,

Reshape operations - Some operations that are quick on a CPU may have a high cost for the GPU on mobile devices. Reshape operations are particularly expensive to run, including BATCH_TO_SPACE, SPACE_TO_BATCH, SPACE_TO_DEPTH, and so forth. You should closely examine use of reshape operations, and consider that may have been applied only for exploring data or for early iterations of your model. Removing them can significantly improve performance.

The list of ops in the model compatible with GPU can be found using Analyzer. Please find the gist on the usage of the tool for your use case.

Thanks.

@pjpratik pjpratik added the stat:awaiting response Status - Awaiting response from author label Mar 10, 2023
@ShashmurinSergey
Copy link
Author

Hi @pjpratik!
Thank you for the detailed answer!
Why do these layers appear? After all, everything works correctly when we use dilation=1.

Is there any way for me to convert this model to TFLite with GPU support? Can I somehow remove these layers?

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 12, 2023
@pjpratik
Copy link
Contributor

Hi @ShashmurinSergey

They are commonly used in conv2d transpose operation AFAIK.

Have you tried in latest TF 2.11 and TF nightly with dilation!=1 and see if issue still exists?

Thanks.

@pjpratik pjpratik added the stat:awaiting response Status - Awaiting response from author label Mar 13, 2023
@ShashmurinSergey
Copy link
Author

ShashmurinSergey commented Mar 13, 2023

@pjpratik But I am not using Conv2d transpose, the error occurs on Conv2d layers with dilation=2, 4, 8.
I am using a Docker container with the latest version of TensorFlow, I also tried using a container with tf-nightly image, but I still get the same error.
Is there any other way to convert Conv2d layer with dilation != 1 without adding BatchToSpaceND and SpaceToBatchND layers?

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 13, 2023
@pjpratik
Copy link
Contributor

Hi @ShashmurinSergey

Thanks for the information. I have created a toy model with dilation = 1 and dilation = 4. Please find the gist here.

The Conv2D adds the BatchToSpaceND and SpaceToBatchND ops when the dilation rate !=1 for appropriate padding to avoid holes in the output.

Model with dilation = 4

model (2) tflite

Model with dilation = 1

model (1) tflite (1)

This can also be observed at

self.conv_op = _WithSpaceToBatch(

as a result those ops are added which are not compatible for tflite GPU support.

Thanks.

@pjpratik pjpratik added the stat:awaiting response Status - Awaiting response from author label Mar 14, 2023
@ShashmurinSergey
Copy link
Author

Hi @pjpratik!
Thank for your answer!
Am I correct in understanding that there is no workaround for converting Conv2d with dilation != 1 with GPU support? Could this be possible in the future?

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 15, 2023
@pjpratik
Copy link
Contributor

pjpratik commented Mar 15, 2023

@ShashmurinSergey That could be on roadmap. I am not sure about it.

@sachinprasadhs Could you please look into this. Thanks.

@pjpratik pjpratik assigned sachinprasadhs and unassigned pjpratik Mar 15, 2023
@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 24, 2023
@pkgoogle pkgoogle added type:bug Bug type:feature Feature requests and removed type:bug Bug labels Jul 19, 2023
@pkgoogle pkgoogle self-assigned this Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.10 TFLiteConverter For issues related to TFLite converter type:feature Feature requests
Projects
None yet
Development

No branches or pull requests

6 participants