Missing intermediate input node in TF Lite convert #39276

Wheest · 2020-05-07T16:23:52Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian 9.12 stretch
TensorFlow installed from (source or binary): Binary
TensorFlow version (or github SHA if from source): 2.1.0

Command used to run the converter or code if you’re using the Python API

converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(
    model_file,
    input_arrays=[input_name], 
    output_arrays=[output_name],

The output from the converter invocation

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/.virtualenvs/tf-lite/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py in _import_graph_def_internal(graph_def, input_map, return_elements, validate_colocation_constraints, name, producer_op_list)
    496       try:
--> 497         results = c_api.TF_GraphImportGraphDefWithResults(
    498             graph._c_graph, serialized, options)  # pylint: disable=protected-access

InvalidArgumentError: Node 'batchnorm_13/mul_1': Unknown input node 'Add_2'

Also, please include a link to the saved model or GraphDef

Saved Model GDRIVE link

Failure details

In the graph, these are batch normalisation operations that cannot be removed, since they follow an Add operation. This part of the graph is:


A --
    --> Add --> BatchNorm --> ...
B --               ^
                   |
                  BN params

This issue might make me think that Batch Norm is not supported.

However a very similar model I'm using features Add layers followed by BatchNorm, and is successfully exported.

I'm trying to figure out the source of this issue. Is there anything I should be looking at that might help me pin down the cause?

The text was updated successfully, but these errors were encountered:

amahendrakar · 2020-05-08T12:50:16Z

@Wheest,
In order to expedite the trouble-shooting process, could you please provide the complete code to reproduce the issue reported here. Thanks!

Wheest · 2020-05-08T14:37:54Z

Hi @amahendrakar, thanks for responding.

Here is a Juypter notebook gist that converts first the original model (resnet34, successfully), and then the altered one (resnet34-alt, unsuccessfully).

Both saved_model files are available at this GDrive link:
https://drive.google.com/drive/folders/19Q8YGi6RZd6BpadcwqS7eiRuvjqsQnye?usp=sharing

Wheest · 2020-05-12T10:41:25Z

Further in my investigation, I have tried to find at what point we lose the Add_2 node in resnet34-alt.

Thus I have got the list of nodes in the graph def during export, and checked if the node is present.

In the resnet34, we assert that Add_2 is present, however in resnet34-alt it is not.

If it is not, then why is the exporter trying to find it? However given resnet34 and resnet34-alt are very similar architectures, we would expect almost all the nodes to be the same.

For completeness I have shown the printf debugging I added to check the nodes at this point.

diff --git a/tensorflow/lite/python/lite.py b/tensorflow/lite/python/lite.py
index 7241024..2547271 100644
--- a/tensorflow/lite/python/lite.py
+++ b/tensorflow/lite/python/lite.py
@@ -713,6 +713,9 @@ class TFLiteConverter(TFLiteConverterBase):
         # Handles models with custom TFLite ops that cannot be resolved in
         # TensorFlow.
         load_model_in_session = True
+        nodes = [n.name for n in graph_def.node]
+        print('hey', [n.name for n in graph_def.node])
+        print('hey, Add_2 in nodes?', 'Add_2' in nodes)
         try:
           _import_graph_def(graph_def, name="")
         except _NotFoundError:

amahendrakar · 2020-05-14T13:49:04Z

Was able to reproduce the issue with TF v2.2 and TF-nightly. Please find the attached gist. Thanks!

jvishnuvardhan · 2020-05-15T19:48:42Z

@Wheest BatchNorm is supported but tflite model is used only for inferencing. So we need to pass training=False so that those are not used during inference. Can you please check whether you have BatchNorm in resnet34. here is a gist for our reference. Thanks!

Wheest · 2020-05-15T20:00:55Z

@jvishnuvardhan thanks for looking at this. The batch norm layer which fails takes as input the missing Add_2 tensor.

The Add_2 tensor takes as input a few Convolutional layer outputs. So the batch norm parameters can't be combined with convolution parameters for inference, since it's applied to the Add layer.

So even in inference mode I believe this batch norm layer is needed. From my check of the working resnet34 model this seems to be the case here too. BatchNorm is supported in TF-Lite?

jvishnuvardhan · 2020-05-16T05:34:08Z

@Wheest BatchNormalization is supported. Please check the gist shown here in another TFlite issue. Thanks!

Wheest · 2020-05-18T10:37:57Z

@jvishnuvardhan so batch normalisation being removed can be discarded as a cause of the issue, since we have support for BatchNormalization in TF-Lite?

In that case, it seems that the output node Add_2 is lost in one representation of the graph, but not in another. I'm unsure how to identify where this might be happening. My above git diff confirms that the node is not present in the graph_def. However when we pass this to c_api.TF_GraphImportGraphDefWithResults in tensorflow/python/framework/importer.py, some part of this process looks for the node.

I've not been able to query these SWIG objects to figure out which of them contains the node.

jvishnuvardhan · 2020-05-18T13:50:17Z

@Wheest Can you please try to inspect your graph with netron and see which node is missing? When I checked .pb of resnet34 looks simple and all nodes connected whereas the *.pb of resnet34-alt is complex and saw some missing connections (I am not sure may be it was intentional). Thanks!

Wheest · 2020-05-22T10:48:22Z

I've examined the model in netron, and it does seem strange. However, the alt model carried with it additional output tensors that were used in the training process. Normally these are not used in inference.

However, it seems that keeping these output tensors interfered with the export process. Removing them manually allowed the export to work. I'll see if I can make a minimum working example to reproduce this issue.

MeghnaNatraj · 2020-06-11T03:26:36Z

Marking this as resolved due to inactivity. @Wheest Feel free to re-open this issue if it is still blocking you.

Wheest added the TFLiteConverter For issues related to TFLite converter label May 7, 2020

google-ml-butler bot assigned amahendrakar May 7, 2020

amahendrakar added comp:lite TF Lite related issues stat:awaiting response Status - Awaiting response from author TF 2.1 for tracking issues in 2.1 release labels May 8, 2020

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label May 10, 2020

amahendrakar added TF 2.2 Issues related to TF 2.2 type:support Support issues and removed TF 2.1 for tracking issues in 2.1 release labels May 14, 2020

amahendrakar assigned jvishnuvardhan and unassigned amahendrakar May 14, 2020

jvishnuvardhan added the stat:awaiting response Status - Awaiting response from author label May 15, 2020

jvishnuvardhan assigned MeghnaNatraj and unassigned jvishnuvardhan May 15, 2020

jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 15, 2020

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label May 18, 2020

MeghnaNatraj closed this as completed Jun 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing intermediate input node in TF Lite convert #39276

Missing intermediate input node in TF Lite convert #39276

Wheest commented May 7, 2020

amahendrakar commented May 8, 2020

Wheest commented May 8, 2020

Wheest commented May 12, 2020

amahendrakar commented May 14, 2020

jvishnuvardhan commented May 15, 2020 •

edited

Wheest commented May 15, 2020

jvishnuvardhan commented May 16, 2020

Wheest commented May 18, 2020

jvishnuvardhan commented May 18, 2020

Wheest commented May 22, 2020

MeghnaNatraj commented Jun 11, 2020

Missing intermediate input node in TF Lite convert #39276

Missing intermediate input node in TF Lite convert #39276

Comments

Wheest commented May 7, 2020

amahendrakar commented May 8, 2020

Wheest commented May 8, 2020

Wheest commented May 12, 2020

amahendrakar commented May 14, 2020

jvishnuvardhan commented May 15, 2020 • edited

Wheest commented May 15, 2020

jvishnuvardhan commented May 16, 2020

Wheest commented May 18, 2020

jvishnuvardhan commented May 18, 2020

Wheest commented May 22, 2020

MeghnaNatraj commented Jun 11, 2020

jvishnuvardhan commented May 15, 2020 •

edited