Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start serving failed with uninitialized value since path variable and lookup table can't initialize in order #1437

Closed
BenZstory opened this issue Sep 8, 2019 · 6 comments

Comments

@BenZstory
Copy link

@BenZstory BenZstory commented Sep 8, 2019

Bug Report

Env information

  • Linux Ubuntu 16.10
  • python 3.5
  • tensorflow 1.14.0
  • serving with docker, tensorflow/serving:latest d4b221a3f345

Describe the problem

When using lookup table(index_to_string_table_from_file) combined with assets(path assigned with a variable), since the serving initializer param main_op won't accept sequential initializers, the tables_initializer may run before path assigner when start serving, and failed with uninitialized value.

Log like this when start serving:

E tensorflow_serving/util/retrier.cc:37] Loading servable: {name: test_index_model version: 1567928718} failed: Failed precondition: Attempting to use uninitialized value index_asset_path
         [[{{node index_asset_path/read}}]]

How to reproduce

Suppose we have a "test_index.assets" file with the following content:

alpha
bravo
charlie

In jupyter env, build graph:

graph = tf.Graph()
with graph.as_default():
    original_index_asset_path = tf.constant(os.path.join('test_index.assets'))
    index_asset_path = tf.Variable(original_index_asset_path, name='index_asset_path', trainable=False, collections=[])
    assign_index_asset_path = index_asset_path.assign(original_index_asset_path)
    
    ph_inputs = tf.placeholder(tf.int32, shape=[None])
    
    table = tf.contrib.lookup.index_to_string_table_from_file(vocabulary_file=index_asset_path, default_value="UNKNOWN")
    results = table.lookup(tf.cast(ph_inputs, tf.int64))
    
    tf.add_to_collection(tf.GraphKeys.ASSET_FILEPATHS, original_index_asset_path)

Start session, run initilizers in order and test results:

sess = tf.InteractiveSession(graph=graph)

init_op = tf.group(tf.global_variables_initializer(), assign_index_asset_path, tf.tables_initializer())

sess.run(tf.global_variables_initializer())
sess.run(assign_index_asset_path)
sess.run(tf.tables_initializer())
# sess.run(init_op)

test_results = sess.run(results, feed_dict={ph_inputs:[1, 2]})
print(test_results)

This should print [b'bravo' b'charlie']. If tried initilizing with grouped initilizer init_op, it would throw exception.

Export SavedModel:

builder = tf.saved_model.builder.SavedModelBuilder(os.path.join('test_index_model', str(int(time.time()))))

model_signature = tf.saved_model.signature_def_utils.build_signature_def(
    inputs={
        "index": tf.saved_model.utils.build_tensor_info(ph_inputs),
    },
    outputs={
        "results": tf.saved_model.utils.build_tensor_info(results),
    },
    method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME)

builder.add_meta_graph_and_variables(
    sess, [tf.saved_model.tag_constants.SERVING],
    signature_def_map={
        tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: model_signature,
    },
    assets_collection=tf.get_collection(tf.GraphKeys.ASSET_FILEPATHS),
    main_op=init_op
)
builder.save()

Then in terminal, start serving the model with docker:

docker run -p 8501:8501 -p 8500:8500 --mount type=bind,source=/path_to_test_dir/test_index_model,target=/models/test_index_model -e MODEL_NAME=test_index_model -t tensorflow/serving

Serving would probably fail with log as follows:

E tensorflow_serving/util/retrier.cc:37] Loading servable: {name: test_index_model version: 1567928718} failed: Failed precondition: Attempting to use uninitialized value index_asset_path
         [[{{node index_asset_path/read}}]]

I also tried using tf.control_dependencies, but that didn't work. I'm now serving my models with luck (sometimes the initializers just run in expect).

Is there any workaround for this? Should the main_op support initilizers in order?Or maybe the table can initialize in a better way with asset path.

@rmothukuru

This comment has been minimized.

Copy link

@rmothukuru rmothukuru commented Sep 11, 2019

@BenZstory ,
I have reproduced your issue. I have uncommented the line, sess.run(init_op) and I didn't get any exception. Can you please provide more information about that exception.

Yes, I get the Failed precondition error mentioned by you. But if we wait for 30-60 seconds, Server will retry and it will load the Servable.

Log for the same is shown below. Can you please wait for some time and confirm if Servable is loaded properly. Thanks!

2019-09-11 17:10:25.085200: E tensorflow_serving/util/retrier.cc:37] Loading servable: {name: test_index_model version: 1568221700} failed: Failed precondition: Attempting to use uninitialized value index_asset_path
	 [[{{node index_asset_path/read}}]]
2019-09-11 17:11:25.085415: I tensorflow_serving/util/retrier.cc:33] Retrying of Loading servable: {name: test_index_model version: 1568221700} retry: 1
2019-09-11 17:11:25.086018: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: /models/test_index_model/1568221700
2019-09-11 17:11:25.086126: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/test_index_model/1568221700
2019-09-11 17:11:25.087095: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-09-11 17:11:25.088402: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle.
2019-09-11 17:11:25.088504: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:212] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /models/test_index_model/1568221700/variables/variables.index
2019-09-11 17:11:25.088572: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:151] Running initialization op on SavedModel bundle at path: /models/test_index_model/1568221700
2019-09-11 17:11:25.096868: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 10750 microseconds.
2019-09-11 17:11:25.097012: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:103] No warmup data file found at /models/test_index_model/1568221700/assets.extra/tf_serving_warmup_requests
2019-09-11 17:11:25.097694: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: test_index_model version: 1568221700}
2019-09-11 17:11:25.103558: I tensorflow_serving/model_servers/server.cc:324] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 239] RAW: Entering the event loop ...
2019-09-11 17:11:25.108056: I tensorflow_serving/model_servers/server.cc:344] Exporting HTTP/REST API at:localhost:8501 ...

(PY_36) mothukuru@mothukuru-glaptop:~$ 
@BenZstory

This comment has been minimized.

Copy link
Author

@BenZstory BenZstory commented Sep 12, 2019

@rmothukuru
As for the exception, not just uncomment the line but also comment the other three lines of initilization. In this way, it simulated how serving actually runs initializers. If you do that, a similar FailedPreconditionError would probably be thrown, just like the one when start serving.


I tried waiting retrying, though it may success,but that was just because of the luck (these initilizers incidentally run in an order we hoped). This log shows it served successfully on the 2nd retry:

2019-09-12 08:04:53.218936: E tensorflow_serving/util/retrier.cc:37] Loading servable: {name: test_index_model version: 1568274556} failed: Failed precondition: Attempting to use uninitialized value index_asset_path
         [[{{node index_asset_path/read}}]]
2019-09-12 08:05:53.219117: I tensorflow_serving/util/retrier.cc:33] Retrying of Loading servable: {name: test_index_model version: 1568274556} retry: 1
2019-09-12 08:05:53.223480: E tensorflow_serving/util/retrier.cc:37] Loading servable: {name: test_index_model version: 1568274556} failed: Failed precondition: Attempting to use uninitialized value index_asset_path
         [[{{node index_asset_path/read}}]]
2019-09-12 08:06:53.223637: I tensorflow_serving/util/retrier.cc:33] Retrying of Loading servable: {name: test_index_model version: 1568274556} retry: 2
2019-09-12 08:06:53.229647: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: test_index_model version: 1568274556}
2019-09-12 08:06:53.251497: I tensorflow_serving/model_servers/server.cc:324] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 239] RAW: Entering the event loop ...

It may success on the 2nd retry, or fail for all the 5 retries (max retry setting).

Starting multiple models at the same time could give a better understanding of the uncertainty of initialization.

Suppose we duplicate the SavedModel in test_index_model for several times, rename the versions to "1", "2", "3" ..., and prepare a serving config file multi_serving.config as follow:

model_config_list {
  config: {
    name: 'test_index_model'
    base_path: '/models/test_index_model/'
	model_platform: "tensorflow"
	model_version_policy { 
	  specific {
	    versions: 1
	    versions: 2
	    versions: 3
	    versions: 4
	    versions: 5
	    versions: 6
	  }
	}
  }
}

Put this multi_serving.config inside test_index_model dir, then we start serving these models at the same time with cmd:

sudo docker run -p 8520:8500 -p 8521:8501 \
--mount type=bind,source=/path_to_serving_dir/test_index_model,target=/models/test_index_model \
-t tensorflow/serving --model_config_file=/models/test_index_model/multi_serving.config

Some of the models may success, but some will fail, and the failed ones have to retry for the next time untill all the models loaded. This could take a long time.


The retry can't guarantee a successful serving. We need a way to initialize path assigner and lookup table in the given order.

@rmothukuru

This comment has been minimized.

Copy link

@rmothukuru rmothukuru commented Sep 19, 2019

@BenZstory ,
Can you please provide more details on how did you use tf.control_dependencies and what error did you get (if it is not same as the one mentioned above). Thanks!

@rmothukuru rmothukuru assigned christisg and unassigned rmothukuru Sep 19, 2019
@BenZstory

This comment has been minimized.

Copy link
Author

@BenZstory BenZstory commented Sep 29, 2019

Sorry for the late response.

The errors are the same when using tf.control_dependencies. It seems that tf.control_dependencies have no affect on those initializers.

Since I'm not familiar with tf.control_dependencies, the codes may not be totally right.

  • Tried pointing out table's dependencies while building the graph:
graph = tf.Graph()
with graph.as_default():
    original_index_asset_path = tf.constant(os.path.join('test_index.assets'))
    index_asset_path = tf.Variable(original_index_asset_path, name='index_asset_path', trainable=False, collections=[])
    assign_index_asset_path = index_asset_path.assign(original_index_asset_path)
    
    ph_inputs = tf.placeholder(tf.int32, shape=[None])
    
    with tf.control_dependencies([index_asset_path, assign_index_asset_path]):
        table = tf.contrib.lookup.index_to_string_table_from_file(vocabulary_file=index_asset_path, default_value="UNKNOWN")
        results = table.lookup(tf.cast(ph_inputs, tf.int64))
    
    tf.add_to_collection(tf.GraphKeys.ASSET_FILEPATHS, original_index_asset_path)
  • Tried stack init ops with dependencies before running them:
sess = tf.InteractiveSession(graph=graph)

init_var = tf.group(tf.global_variables_initializer())
with tf.control_dependencies([init_var]):
    init_assign = tf.group(assign_index_asset_path)
    with tf.control_dependencies([init_assign]):
        init_table = tf.group(tf.tables_initializer())

init_op = tf.group(init_var, init_assign, init_table)

sess.run(init_op)

test_results = sess.run(results, feed_dict={ph_inputs:[1, 2]})

These attempts failed. Actually, the error will just be thrown immediately after running init_op.

@saxenasaurabh

This comment has been minimized.

Copy link
Member

@saxenasaurabh saxenasaurabh commented Oct 4, 2019

Could you try changing your code to the following

with tf.control_dependencies([assign_index_asset_path]):
  vocabulary_file = index_asset_path.read_value()
table = tf.contrib.lookup.index_to_string_table_from_file(vocabulary_file=vocabulary_file, default_value="UNKNOWN")

Then you can just use the following initializer:

init_op = tf.group(tf.global_variables_initializer(), tf.tables_initializer())

The problem was that when we create a variable we create a default "read" op that has no control dependency. When you try to run the table's initializer TensorFlow's executor, which executes greedily, tries to run that "read" op first and fails because the Assign op has not been run yet. In my suggestion above you explicitly call read_value which create a new "read" and ensure it runs after the Assign so things should work.

@BenZstory

This comment has been minimized.

Copy link
Author

@BenZstory BenZstory commented Oct 8, 2019

I tried the read_value solution, it firstly gave the same error, which means tf.control_dependencies is still not working on the variable and assigner.

Then I did some search, find out that the ResourceVariable can do control dependency as expected.

So I replace the index_asset_path to code like this:

from tensorflow.python.ops import resource_variable_ops as rr

index_asset_path = rr.ResourceVariable(original_index_asset_path, name='index_asset_path', trainable=False, collections=[])
(or pass `use_resource=True` to the Variable)

Also with control dependency and read_value mentioned above, it finally work and serving runs good.

Hope things would be better in tf 2.0 since this post said that ResourceVariable would be the default one.


In conclusion, we have to use ResourceVariable and explicitly called read_value to build dependencies, then serving can accept one init_op to do initializations in order.

I think the problem is solved for now, and we're good to close this issue.

@BenZstory BenZstory closed this Oct 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.