Problems with converted 8-bit TFLite models of CycleGAN and running inference (specially allocating tensors) #59922

judeharis · 2023-03-07T13:37:09Z

System information

Have I written custom code (as opposed to using a stock example script
provided in TensorFlow): No, problem are found on normal TF code.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): WSL2 Ubuntu 22.04
TensorFlow installed from (source or binary): python version is installed using pip, and the benchmark binary is built from TF branch "v2.11.0"
TensorFlow version (use command below): Tested initially on TF2.7 but also on TF2.11
Python version: 3.9.16
Bazel version (if compiling from source): 5.3.0
GCC/Compiler version (if compiling from source): gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

Description of task

I wanted to to convert CycleGAN model provided by Tensorflow into a TFLite model and run inference.
I am using this https://github.com/tensorflow/docs/blob/master/site/en/tutorials/generative/cyclegan.ipynb to first define and train the model in TensorFlow
To convert the models ( 2 Generative and 2 Discriminative) I use the model.save() function

generator_g.save('saved_model/cycle_gan_g')
generator_f.save('saved_model/cycle_gan_f')

discriminator_x.save('saved_model/cycle_disc_x')
discriminator_y.save('saved_model/cycle_disc_y')

Following this I convert the model using the TFLiteConverter.

def representative_dataset():
    for _ in range(1):
        data = tf.random.normal([1,256,256,3])
        yield [(tf.cast(data, tf.float32) / 127.5) - 1.0]

converter = tf.lite.TFLiteConverter.from_saved_model(mdir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8,tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()
with open(mdir+'cycle_gan_g.tflite', 'wb') as f:
    f.write(tflite_model)

During conversion I get the following warning which I don't know if it is important for the problems I am facing :

"fully_quantize: 0, inference_type: 6WARNING:absl:Buffer deduplication procedure will be skipped when flatbuffer library is not properly loaded
, input_inference_type: 0, output_inference_type: 0"

Description of problems

Quick note: Problems 2 and 3 occur for the generative TFLite models of Cycle GAN and problem 4 is for the discriminative TFLite models of CycleGAN.

1. When trying to simply allocate tensors for the converted TFLite models on the python interpreter I get the following error:
Aborted (core dumped)

interpreter = tf.lite.Interpreter(mdir+'cycle_gan_g.tflite')
interpreter.allocate_tensors()

2. Since the error message was not useful I wanted to run the model using the C++ to understand the problem better. So I built the "benchmark_model" tool from https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark using Bazel and tried to run the same tflite model using the benchmark tool with the following cmd calls:

.\benchmark_model --enable_op_profiling=true --graph=./models/cycle_gan_f.tflite

Running this gave me better idea of the problem:

While allocating tensors, node 71 throws an assertion error while dealing with the quantization parameters.
Link to the specific assert statement

tensorflow/tensorflow/lite/kernels/internal/quantization_util.cc

Line 117 in be3ea70

TFLITE_CHECK_LT(double_multiplier, 1.);
So it seems for the node 71 (a squared_difference operation) the "double multiplier" is greater than 1 which can not be handled by the current code it seems.
I am not sure if this is a fault of the conversion step or the backend, if the converter is creating bad values for the quantization parameters is there a way to fix this?

3. Since I was able to access and rebuilt the code I made a small adjustment in the code to artificially reduce the "double multiplier" below 1 for this node to see if the model is able to run completely with out any errors.

This enabled the allocate_tensor function to complete but I get new error for during inference:

ERROR: tensorflow/lite/kernels/concatenation.cc:158 t->dims->data[d] != t0->dims->data[d] (1 != 2)
ERROR: Node number 97 (CONCATENATION) failed to prepare.

It seems that there is a mismatch between dimension of the input tensors and the output tensor but doing a quick check in netron I can see that the dimension are what I believe to be correct:
I have yet to come up with any temporary fix for this issue

4. When I tried to run inference using the benchmark tool for the discriminative models I get the following error

This enabled the allocate_tensor function to complete but I get new error for during inference:

ERROR: tensorflow/lite/core/subgraph.cc BytesRequired number of elements overflowed.
ERROR: Node number 38 (CONV_2D) failed to prepare.
ERROR: Failed to apply the default TensorFlow Lite delegate indexed at 0.
Failed to allocate tensors!

Again I have yet to come up with any temporary fix for this issue

Overall there seems to be problem converting some trivial operations from TF to TFLite. I am not sure if its because of the way Cycle GAN are defined in TF initially or if I am performing the conversion steps wrong. Any help in this matter would be great, I want to convert CycleGAN to int8 TFLite mode and run it.

The text was updated successfully, but these errors were encountered:

pjpratik · 2023-03-10T14:10:38Z

Hi @judeharis Thanks for reporting this issue.

Sorry for the delayed response.

I was able to reproduce the issue from pix2pix generator as used in this tutorial. Please find the gist here.

However I was successfully able to convert into 8-bit tflite model and invoke the interpreter using the resnet generator used in the keras cycle gan tutorial. Please find the gist here.

Thanks.

judeharis · 2023-03-16T10:07:02Z

Hi @pjpratik, I was able to verify your findings. The CycleGAN model from keras is able to be converted properly.

That being said, it doesn't really tackle the issues. Also, CycleGAN with pix2pix generator and ResNet generator are effectively two different models.

I thought the conversion should produce a valid model, especially if the model is valid in TF?
Also, why does the discriminator model produce "BytesRequired number of elements overflowed." error for basic CONV2D layers?

In regards to the quantization errors, it seems similar issues have been found previously for different operations (and apparently fixed): (#43661 (comment)).
Is it not possible to create a similar fix for the "SquaredDifference" operation?

pjpratik · 2023-03-17T08:21:11Z

Hi @judeharis, thanks for the clarification.

I did observe that using BatchNorm instead of InstanceNorm in UNet generator doesn't cause any problem in allocating the tensors. Please find the same in this gist.

@sachinprasadhs Could you please look into this issue. Thanks.

google-ml-butler bot assigned pjpratik Mar 7, 2023

pjpratik added type:bug Bug comp:lite TF Lite related issues TF 2.11 Issues related to TF 2.11 labels Mar 8, 2023

pjpratik added the stat:awaiting response Status - Awaiting response from author label Mar 10, 2023

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Mar 16, 2023

pjpratik assigned sachinprasadhs and unassigned pjpratik Mar 17, 2023

sachinprasadhs added the TFLiteConverter For issues related to TFLite converter label Mar 28, 2023

sachinprasadhs assigned miaout17 Mar 28, 2023

sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with converted 8-bit TFLite models of CycleGAN and running inference (specially allocating tensors) #59922

Problems with converted 8-bit TFLite models of CycleGAN and running inference (specially allocating tensors) #59922

judeharis commented Mar 7, 2023 •

edited

pjpratik commented Mar 10, 2023 •

edited

judeharis commented Mar 16, 2023 •

edited

pjpratik commented Mar 17, 2023

Problems with converted 8-bit TFLite models of CycleGAN and running inference (specially allocating tensors) #59922

Problems with converted 8-bit TFLite models of CycleGAN and running inference (specially allocating tensors) #59922

Comments

judeharis commented Mar 7, 2023 • edited

System information

Description of task

Description of problems

pjpratik commented Mar 10, 2023 • edited

judeharis commented Mar 16, 2023 • edited

pjpratik commented Mar 17, 2023

judeharis commented Mar 7, 2023 •

edited

pjpratik commented Mar 10, 2023 •

edited

judeharis commented Mar 16, 2023 •

edited