Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the function signatures loaded from a SavedModel requires the original trackable object be kept in scope. #46708

Open
psobot opened this issue Jan 26, 2021 · 5 comments
Assignees
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.8 type:bug Bug

Comments

@psobot
Copy link
Contributor

psobot commented Jan 26, 2021

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.15.7
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): v2.4.0-rc4-71-g582c8d236cb 2.4.0
  • Python version: 3.8.5

Describe the current behavior

Loading a SavedModel from disk and extracting a signature from it works only while the result of tf.saved_model.load has not be collected by the Python garbage collector. For example, the following code fails:

def load_model(path_to_saved_model):
    saved_model = tf.saved_model.load(path_to_saved_model)
    return saved_model.signatures["serving_default"]

load_model('./my_model.ckpt')(...) # throws "Error while reading resource variable..."

but this code snippet runs successfully:

saved_model = tf.saved_model.load('./my_model.ckpt')
model = saved_model.signatures["serving_default"]
model(...) # No exception is thrown

The issue can be worked around by manually attaching the original trackable object to the return value of the function, preventing the Python garbage collector from collecting the object:

def load_model_with_backref(path_to_saved_model):
    saved_model = tf.saved_model.load(path_to_saved_model)
    model = saved_model.signatures["serving_default"]
    model._backref_to_saved_model = saved_model
    return model
load_model('./my_model.ckpt')(...) # No exception is thrown

The exception thrown can, depending on the model and environment, sometimes be:

AssertionError: Called a function referencing variables which have been deleted. This likely means that function-local variables were created and not referenced elsewhere in the program. This is generally a mistake; consider storing variables in an object attribute on first call.

The exception can also manifest as:

FailedPreconditionError: Error while reading resource variable _AnonymousVar30 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar30/N10tensorflow3VarE does not exist.
[[{{node StatefulPartitionedCall/model/layer/batch_normalization_3/FusedBatchNormV3/ReadVariableOp_1}}]] [Op:__inference_signature_wrapper_11671]

This may be a similar issue to #37615.

Standalone code to reproduce the issue

import os

# Disable ultra-verbose TF logging
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "4"  # noqa
import tensorflow as tf


def load_model(path_to_saved_model):
    saved_model = tf.saved_model.load(path_to_saved_model)
    return saved_model.signatures["serving_default"]


def load_model_with_backref(path_to_saved_model):
    saved_model = tf.saved_model.load(path_to_saved_model)
    model = saved_model.signatures["serving_default"]
    model._backref_to_saved_model = saved_model
    return model


def use_model(model):
    input_tensors = [
        tensor for tensor in model.inputs if tensor.dtype != tf.resource
    ]
    return model(
        *[
            tf.random.uniform(shape=[dim or 1 for dim in _input.shape])
            for _input in input_tensors
        ]
    )


if __name__ == "__main__":
    import sys

    model_path = sys.argv[-1]
    try:
        use_model(load_model(model_path))
        print(
            f"✅ TensorFlow {tf.__version__} loaded model {model_path} without "
            "needing to set a back-reference to the SavedModel on the "
            "ConcreteFunction."
        )
    except Exception as e:
        print(
            f"❌ TensorFlow {tf.__version__} failed to load model without"
            f" setting back-reference {model_path}:\n{e}"
        )

    try:
        use_model(load_model_with_backref(model_path))
        print(
            f"✅ TensorFlow {tf.__version__} loaded model {model_path} by"
            " setting a back-reference to the SavedModel on the"
            " ConcreteFunction."
        )
    except Exception as e:
        print(
            f"❌ TensorFlow {tf.__version__} failed to load model when setting"
            f" back-reference {model_path}:\n{e}"
        )

When run on TensorFlow 2.4.0, the above script prints:

❌ TensorFlow 2.4.0 failed to load model without setting back-reference ../model.ckpt:
 Error while reading resource variable _AnonymousVar46 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar46/N10tensorflow3VarE does not exist.
	 [[{{node StatefulPartitionedCall/model/layer_name_here/batch_normalization_12/ReadVariableOp_1}}]] [Op:__inference_signature_wrapper_11671]

Function call stack:
signature_wrapper

✅ TensorFlow 2.4.0 loaded model ../model.ckpt by setting a back-reference to the SavedModel on the ConcreteFunction.
@psobot psobot added the type:bug Bug label Jan 26, 2021
@Saduf2019 Saduf2019 added the TF 2.4 for issues related to TF 2.4 label Jan 27, 2021
@Saduf2019
Copy link
Contributor

I am able to replicate the issue reported, please find the gist here

@Saduf2019 Saduf2019 added the comp:keras Keras related issues label Jan 28, 2021
@nikitamaia nikitamaia added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Feb 1, 2021
@brychcy
Copy link
Contributor

brychcy commented Jul 27, 2021

Interestingly, I'm observing this only since tensorflow 2.5.0 (and not since 2.4.0)
(But maybe it is related to that my tf 2.4.0 installation uses python 3.7.6 but tf 2.5.0 uses python 3.8.10?)

@sushreebarsa
Copy link
Contributor

@psobot Could you please have a look at this gist and let us know if it helps?Thanks!

@sushreebarsa sushreebarsa added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Mar 7, 2022
@sushreebarsa sushreebarsa self-assigned this Mar 7, 2022
@psobot
Copy link
Contributor Author

psobot commented Mar 7, 2022

Hi @sushreebarsa - it looks like in that Gist, loading a SavedModel now works as expected in TensorFlow 2.8.0. Any idea what changed to fix this issue?

@sushreebarsa sushreebarsa removed the stat:awaiting response Status - Awaiting response from author label Mar 7, 2022
@sushreebarsa
Copy link
Contributor

@psobot I have provided the saved model path in the updated gist and the code works fine for mnist model in TF v2.8.0.
Thanks!

@sushreebarsa sushreebarsa added TF 2.8 stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed TF 2.4 for issues related to TF 2.4 labels Mar 7, 2022
@sushreebarsa sushreebarsa removed their assignment Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.8 type:bug Bug
Projects
None yet
Development

No branches or pull requests

6 participants