Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing intermediate input node in TF Lite convert #39276

Closed
Wheest opened this issue May 7, 2020 · 11 comments
Closed

Missing intermediate input node in TF Lite convert #39276

Wheest opened this issue May 7, 2020 · 11 comments
Assignees
Labels
comp:lite TF Lite related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.2 Issues related to TF 2.2 TFLiteConverter For issues related to TFLite converter type:support Support issues

Comments

@Wheest
Copy link

Wheest commented May 7, 2020

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian 9.12 stretch
  • TensorFlow installed from (source or binary): Binary
  • TensorFlow version (or github SHA if from source): 2.1.0

Command used to run the converter or code if you’re using the Python API

converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(
    model_file,
    input_arrays=[input_name], 
    output_arrays=[output_name],

The output from the converter invocation

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/.virtualenvs/tf-lite/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py in _import_graph_def_internal(graph_def, input_map, return_elements, validate_colocation_constraints, name, producer_op_list)
    496       try:
--> 497         results = c_api.TF_GraphImportGraphDefWithResults(
    498             graph._c_graph, serialized, options)  # pylint: disable=protected-access

InvalidArgumentError: Node 'batchnorm_13/mul_1': Unknown input node 'Add_2'

Also, please include a link to the saved model or GraphDef

Saved Model GDRIVE link

Failure details

In the graph, these are batch normalisation operations that cannot be removed, since they follow an Add operation. This part of the graph is:


A --
    --> Add --> BatchNorm --> ...
B --               ^
                   |
                  BN params 

This issue might make me think that Batch Norm is not supported.

However a very similar model I'm using features Add layers followed by BatchNorm, and is successfully exported.

I'm trying to figure out the source of this issue. Is there anything I should be looking at that might help me pin down the cause?

@Wheest Wheest added the TFLiteConverter For issues related to TFLite converter label May 7, 2020
@amahendrakar
Copy link
Contributor

@Wheest,
In order to expedite the trouble-shooting process, could you please provide the complete code to reproduce the issue reported here. Thanks!

@amahendrakar amahendrakar added comp:lite TF Lite related issues stat:awaiting response Status - Awaiting response from author TF 2.1 for tracking issues in 2.1 release labels May 8, 2020
@Wheest
Copy link
Author

Wheest commented May 8, 2020

Hi @amahendrakar, thanks for responding.

Here is a Juypter notebook gist that converts first the original model (resnet34, successfully), and then the altered one (resnet34-alt, unsuccessfully).

Both saved_model files are available at this GDrive link:
https://drive.google.com/drive/folders/19Q8YGi6RZd6BpadcwqS7eiRuvjqsQnye?usp=sharing

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label May 10, 2020
@Wheest
Copy link
Author

Wheest commented May 12, 2020

Further in my investigation, I have tried to find at what point we lose the Add_2 node in resnet34-alt.

Thus I have got the list of nodes in the graph def during export, and checked if the node is present.

In the resnet34, we assert that Add_2 is present, however in resnet34-alt it is not.

If it is not, then why is the exporter trying to find it? However given resnet34 and resnet34-alt are very similar architectures, we would expect almost all the nodes to be the same.

For completeness I have shown the printf debugging I added to check the nodes at this point.

diff --git a/tensorflow/lite/python/lite.py b/tensorflow/lite/python/lite.py
index 7241024..2547271 100644
--- a/tensorflow/lite/python/lite.py
+++ b/tensorflow/lite/python/lite.py
@@ -713,6 +713,9 @@ class TFLiteConverter(TFLiteConverterBase):
         # Handles models with custom TFLite ops that cannot be resolved in
         # TensorFlow.
         load_model_in_session = True
+        nodes = [n.name for n in graph_def.node]
+        print('hey', [n.name for n in graph_def.node])
+        print('hey, Add_2 in nodes?', 'Add_2' in nodes)
         try:
           _import_graph_def(graph_def, name="")
         except _NotFoundError:

@amahendrakar
Copy link
Contributor

Was able to reproduce the issue with TF v2.2 and TF-nightly. Please find the attached gist. Thanks!

@amahendrakar amahendrakar added TF 2.2 Issues related to TF 2.2 type:support Support issues and removed TF 2.1 for tracking issues in 2.1 release labels May 14, 2020
@jvishnuvardhan
Copy link
Contributor

jvishnuvardhan commented May 15, 2020

@Wheest BatchNorm is supported but tflite model is used only for inferencing. So we need to pass training=False so that those are not used during inference. Can you please check whether you have BatchNorm in resnet34. here is a gist for our reference. Thanks!

@jvishnuvardhan jvishnuvardhan added the stat:awaiting response Status - Awaiting response from author label May 15, 2020
@Wheest
Copy link
Author

Wheest commented May 15, 2020

@jvishnuvardhan thanks for looking at this. The batch norm layer which fails takes as input the missing Add_2 tensor.

The Add_2 tensor takes as input a few Convolutional layer outputs. So the batch norm parameters can't be combined with convolution parameters for inference, since it's applied to the Add layer.

So even in inference mode I believe this batch norm layer is needed. From my check of the working resnet34 model this seems to be the case here too. BatchNorm is supported in TF-Lite?

@jvishnuvardhan jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 15, 2020
@jvishnuvardhan
Copy link
Contributor

@Wheest BatchNormalization is supported. Please check the gist shown here in another TFlite issue. Thanks!

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label May 18, 2020
@Wheest
Copy link
Author

Wheest commented May 18, 2020

@jvishnuvardhan so batch normalisation being removed can be discarded as a cause of the issue, since we have support for BatchNormalization in TF-Lite?

In that case, it seems that the output node Add_2 is lost in one representation of the graph, but not in another. I'm unsure how to identify where this might be happening. My above git diff confirms that the node is not present in the graph_def. However when we pass this to c_api.TF_GraphImportGraphDefWithResults in tensorflow/python/framework/importer.py, some part of this process looks for the node.

I've not been able to query these SWIG objects to figure out which of them contains the node.

@jvishnuvardhan
Copy link
Contributor

@Wheest Can you please try to inspect your graph with netron and see which node is missing? When I checked .pb of resnet34 looks simple and all nodes connected whereas the *.pb of resnet34-alt is complex and saw some missing connections (I am not sure may be it was intentional). Thanks!

@Wheest
Copy link
Author

Wheest commented May 22, 2020

I've examined the model in netron, and it does seem strange. However, the alt model carried with it additional output tensors that were used in the training process. Normally these are not used in inference.

However, it seems that keeping these output tensors interfered with the export process. Removing them manually allowed the export to work. I'll see if I can make a minimum working example to reproduce this issue.

@MeghnaNatraj
Copy link
Member

Marking this as resolved due to inactivity. @Wheest Feel free to re-open this issue if it is still blocking you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.2 Issues related to TF 2.2 TFLiteConverter For issues related to TFLite converter type:support Support issues
Projects
None yet
Development

No branches or pull requests

5 participants