Using the function signatures loaded from a SavedModel requires the original trackable object be kept in scope. #46708

psobot · 2021-01-26T20:21:12Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.15.7
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): v2.4.0-rc4-71-g582c8d236cb 2.4.0
Python version: 3.8.5

Describe the current behavior

Loading a SavedModel from disk and extracting a signature from it works only while the result of tf.saved_model.load has not be collected by the Python garbage collector. For example, the following code fails:

def load_model(path_to_saved_model):
    saved_model = tf.saved_model.load(path_to_saved_model)
    return saved_model.signatures["serving_default"]

load_model('./my_model.ckpt')(...) # throws "Error while reading resource variable..."

but this code snippet runs successfully:

saved_model = tf.saved_model.load('./my_model.ckpt')
model = saved_model.signatures["serving_default"]
model(...) # No exception is thrown

The issue can be worked around by manually attaching the original trackable object to the return value of the function, preventing the Python garbage collector from collecting the object:

def load_model_with_backref(path_to_saved_model):
    saved_model = tf.saved_model.load(path_to_saved_model)
    model = saved_model.signatures["serving_default"]
    model._backref_to_saved_model = saved_model
    return model
load_model('./my_model.ckpt')(...) # No exception is thrown

The exception thrown can, depending on the model and environment, sometimes be:

AssertionError: Called a function referencing variables which have been deleted. This likely means that function-local variables were created and not referenced elsewhere in the program. This is generally a mistake; consider storing variables in an object attribute on first call.

The exception can also manifest as:

FailedPreconditionError: Error while reading resource variable _AnonymousVar30 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar30/N10tensorflow3VarE does not exist.
[[{{node StatefulPartitionedCall/model/layer/batch_normalization_3/FusedBatchNormV3/ReadVariableOp_1}}]] [Op:__inference_signature_wrapper_11671]

This may be a similar issue to #37615.

Standalone code to reproduce the issue

import os

# Disable ultra-verbose TF logging
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "4"  # noqa
import tensorflow as tf


def load_model(path_to_saved_model):
    saved_model = tf.saved_model.load(path_to_saved_model)
    return saved_model.signatures["serving_default"]


def load_model_with_backref(path_to_saved_model):
    saved_model = tf.saved_model.load(path_to_saved_model)
    model = saved_model.signatures["serving_default"]
    model._backref_to_saved_model = saved_model
    return model


def use_model(model):
    input_tensors = [
        tensor for tensor in model.inputs if tensor.dtype != tf.resource
    ]
    return model(
        *[
            tf.random.uniform(shape=[dim or 1 for dim in _input.shape])
            for _input in input_tensors
        ]
    )


if __name__ == "__main__":
    import sys

    model_path = sys.argv[-1]
    try:
        use_model(load_model(model_path))
        print(
            f"✅ TensorFlow {tf.__version__} loaded model {model_path} without "
            "needing to set a back-reference to the SavedModel on the "
            "ConcreteFunction."
        )
    except Exception as e:
        print(
            f"❌ TensorFlow {tf.__version__} failed to load model without"
            f" setting back-reference {model_path}:\n{e}"
        )

    try:
        use_model(load_model_with_backref(model_path))
        print(
            f"✅ TensorFlow {tf.__version__} loaded model {model_path} by"
            " setting a back-reference to the SavedModel on the"
            " ConcreteFunction."
        )
    except Exception as e:
        print(
            f"❌ TensorFlow {tf.__version__} failed to load model when setting"
            f" back-reference {model_path}:\n{e}"
        )

When run on TensorFlow 2.4.0, the above script prints:

❌ TensorFlow 2.4.0 failed to load model without setting back-reference ../model.ckpt:
 Error while reading resource variable _AnonymousVar46 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar46/N10tensorflow3VarE does not exist.
	 [[{{node StatefulPartitionedCall/model/layer_name_here/batch_normalization_12/ReadVariableOp_1}}]] [Op:__inference_signature_wrapper_11671]

Function call stack:
signature_wrapper

✅ TensorFlow 2.4.0 loaded model ../model.ckpt by setting a back-reference to the SavedModel on the ConcreteFunction.

The text was updated successfully, but these errors were encountered:

Saduf2019 · 2021-01-28T05:26:22Z

I am able to replicate the issue reported, please find the gist here

brychcy · 2021-07-27T17:59:18Z

Interestingly, I'm observing this only since tensorflow 2.5.0 (and not since 2.4.0)
(But maybe it is related to that my tf 2.4.0 installation uses python 3.7.6 but tf 2.5.0 uses python 3.8.10?)

sushreebarsa · 2022-03-07T09:40:16Z

@psobot Could you please have a look at this gist and let us know if it helps?Thanks!

psobot · 2022-03-07T15:01:22Z

Hi @sushreebarsa - it looks like in that Gist, loading a SavedModel now works as expected in TensorFlow 2.8.0. Any idea what changed to fix this issue?

sushreebarsa · 2022-03-07T16:02:37Z

@psobot I have provided the saved model path in the updated gist and the code works fine for mnist model in TF v2.8.0.
Thanks!

psobot added the type:bug Bug label Jan 26, 2021

google-ml-butler bot assigned Saduf2019 Jan 26, 2021

Saduf2019 added the TF 2.4 for issues related to TF 2.4 label Jan 27, 2021

Saduf2019 added the comp:keras Keras related issues label Jan 28, 2021

Saduf2019 assigned jvishnuvardhan and unassigned Saduf2019 Jan 28, 2021

jvishnuvardhan assigned nikitamaia and unassigned jvishnuvardhan Jan 29, 2021

nikitamaia added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Feb 1, 2021

jacquelinegarrahan mentioned this issue Jun 14, 2021

tf/keras: Container localhost does not exist slaclab/lume-model#36

Closed

sushreebarsa added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Mar 7, 2022

sushreebarsa self-assigned this Mar 7, 2022

sushreebarsa removed the stat:awaiting response Status - Awaiting response from author label Mar 7, 2022

sushreebarsa added TF 2.8 stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed TF 2.4 for issues related to TF 2.4 labels Mar 7, 2022

sushreebarsa removed their assignment Mar 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the function signatures loaded from a SavedModel requires the original trackable object be kept in scope. #46708

Using the function signatures loaded from a SavedModel requires the original trackable object be kept in scope. #46708

psobot commented Jan 26, 2021

Saduf2019 commented Jan 28, 2021

brychcy commented Jul 27, 2021 •

edited

sushreebarsa commented Mar 7, 2022

psobot commented Mar 7, 2022

sushreebarsa commented Mar 7, 2022

Using the function signatures loaded from a SavedModel requires the original trackable object be kept in scope. #46708

Using the function signatures loaded from a SavedModel requires the original trackable object be kept in scope. #46708

Comments

psobot commented Jan 26, 2021

Saduf2019 commented Jan 28, 2021

brychcy commented Jul 27, 2021 • edited

sushreebarsa commented Mar 7, 2022

psobot commented Mar 7, 2022

sushreebarsa commented Mar 7, 2022

brychcy commented Jul 27, 2021 •

edited