Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with converted 8-bit TFLite models of CycleGAN and running inference (specially allocating tensors) #59922

Open
judeharis opened this issue Mar 7, 2023 · 3 comments
Assignees
Labels
comp:lite TF Lite related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.11 Issues related to TF 2.11 TFLiteConverter For issues related to TFLite converter type:bug Bug

Comments

@judeharis
Copy link

judeharis commented Mar 7, 2023

System information

  • Have I written custom code (as opposed to using a stock example script
    provided in TensorFlow)
    : No, problem are found on normal TF code.
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): WSL2 Ubuntu 22.04
  • TensorFlow installed from (source or binary): python version is installed using pip, and the benchmark binary is built from TF branch "v2.11.0"
  • TensorFlow version (use command below): Tested initially on TF2.7 but also on TF2.11
  • Python version: 3.9.16
  • Bazel version (if compiling from source): 5.3.0
  • GCC/Compiler version (if compiling from source): gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

Description of task

generator_g.save('saved_model/cycle_gan_g')
generator_f.save('saved_model/cycle_gan_f')

discriminator_x.save('saved_model/cycle_disc_x')
discriminator_y.save('saved_model/cycle_disc_y')

  • Following this I convert the model using the TFLiteConverter.
def representative_dataset():
    for _ in range(1):
        data = tf.random.normal([1,256,256,3])
        yield [(tf.cast(data, tf.float32) / 127.5) - 1.0]

converter = tf.lite.TFLiteConverter.from_saved_model(mdir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8,tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()
with open(mdir+'cycle_gan_g.tflite', 'wb') as f:
    f.write(tflite_model)
  • During conversion I get the following warning which I don't know if it is important for the problems I am facing :
"fully_quantize: 0, inference_type: 6WARNING:absl:Buffer deduplication procedure will be skipped when flatbuffer library is not properly loaded
, input_inference_type: 0, output_inference_type: 0"

Description of problems

Quick note: Problems 2 and 3 occur for the generative TFLite models of Cycle GAN and problem 4 is for the discriminative TFLite models of CycleGAN.

1. When trying to simply allocate tensors for the converted TFLite models on the python interpreter I get the following error:
Aborted (core dumped)

interpreter = tf.lite.Interpreter(mdir+'cycle_gan_g.tflite')
interpreter.allocate_tensors()

2. Since the error message was not useful I wanted to run the model using the C++ to understand the problem better. So I built the "benchmark_model" tool from https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark using Bazel and tried to run the same tflite model using the benchmark tool with the following cmd calls:

.\benchmark_model --enable_op_profiling=true --graph=./models/cycle_gan_f.tflite

Running this gave me better idea of the problem:

  • While allocating tensors, node 71 throws an assertion error while dealing with the quantization parameters.
  • Link to the specific assert statement
    TFLITE_CHECK_LT(double_multiplier, 1.);
  • So it seems for the node 71 (a squared_difference operation) the "double multiplier" is greater than 1 which can not be handled by the current code it seems.
  • I am not sure if this is a fault of the conversion step or the backend, if the converter is creating bad values for the quantization parameters is there a way to fix this?

3. Since I was able to access and rebuilt the code I made a small adjustment in the code to artificially reduce the "double multiplier" below 1 for this node to see if the model is able to run completely with out any errors.

  • This enabled the allocate_tensor function to complete but I get new error for during inference:
ERROR: tensorflow/lite/kernels/concatenation.cc:158 t->dims->data[d] != t0->dims->data[d] (1 != 2)
ERROR: Node number 97 (CONCATENATION) failed to prepare.
  • It seems that there is a mismatch between dimension of the input tensors and the output tensor but doing a quick check in netron I can see that the dimension are what I believe to be correct:
    image
  • I have yet to come up with any temporary fix for this issue

4. When I tried to run inference using the benchmark tool for the discriminative models I get the following error

  • This enabled the allocate_tensor function to complete but I get new error for during inference:
ERROR: tensorflow/lite/core/subgraph.cc BytesRequired number of elements overflowed.
ERROR: Node number 38 (CONV_2D) failed to prepare.
ERROR: Failed to apply the default TensorFlow Lite delegate indexed at 0.
Failed to allocate tensors!
  • Again I have yet to come up with any temporary fix for this issue

Overall there seems to be problem converting some trivial operations from TF to TFLite. I am not sure if its because of the way Cycle GAN are defined in TF initially or if I am performing the conversion steps wrong. Any help in this matter would be great, I want to convert CycleGAN to int8 TFLite mode and run it.

@pjpratik pjpratik added type:bug Bug comp:lite TF Lite related issues TF 2.11 Issues related to TF 2.11 labels Mar 8, 2023
@pjpratik
Copy link
Contributor

pjpratik commented Mar 10, 2023

Hi @judeharis Thanks for reporting this issue.

Sorry for the delayed response.

I was able to reproduce the issue from pix2pix generator as used in this tutorial. Please find the gist here.

However I was successfully able to convert into 8-bit tflite model and invoke the interpreter using the resnet generator used in the keras cycle gan tutorial. Please find the gist here.

Thanks.

@pjpratik pjpratik added the stat:awaiting response Status - Awaiting response from author label Mar 10, 2023
@judeharis
Copy link
Author

judeharis commented Mar 16, 2023

Hi @pjpratik, I was able to verify your findings. The CycleGAN model from keras is able to be converted properly.

That being said, it doesn't really tackle the issues. Also, CycleGAN with pix2pix generator and ResNet generator are effectively two different models.

I thought the conversion should produce a valid model, especially if the model is valid in TF?
Also, why does the discriminator model produce "BytesRequired number of elements overflowed." error for basic CONV2D layers?

In regards to the quantization errors, it seems similar issues have been found previously for different operations (and apparently fixed): (#43661 (comment)).
Is it not possible to create a similar fix for the "SquaredDifference" operation?

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 16, 2023
@pjpratik
Copy link
Contributor

Hi @judeharis, thanks for the clarification.

I did observe that using BatchNorm instead of InstanceNorm in UNet generator doesn't cause any problem in allocating the tensors. Please find the same in this gist.

@sachinprasadhs Could you please look into this issue. Thanks.

@pjpratik pjpratik assigned sachinprasadhs and unassigned pjpratik Mar 17, 2023
@sachinprasadhs sachinprasadhs added the TFLiteConverter For issues related to TFLite converter label Mar 28, 2023
@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.11 Issues related to TF 2.11 TFLiteConverter For issues related to TFLite converter type:bug Bug
Projects
None yet
Development

No branches or pull requests

4 participants