Fail to export model with exporter with distributed session #5439

jinhou · 2016-11-07T08:26:32Z

Hi,

I met an issue as follow when I was trying to export the model with distributed session,

Do you have any idea about how to fix this issue. Thanks

Exporting trained model to./model/
Traceback (most recent call last):
File "dnn_train.py", line 399, in
tf.app.run()
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 32, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "dnn_train.py", line 396, in main
run_training(server.target, cluster_spec)
File "dnn_train.py", line 318, in run_training
default_graph_signature=signature)
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/session_bundle/exporter.py", line 198, in init
ops.add_to_collection(constants.GRAPH_KEY, graph_any_buf)
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 4073, in add_to_collection
get_default_graph().add_to_collection(name, value)
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2689, in add_to_collection
self._check_not_finalized()
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2088, in _check_not_finalized
raise RuntimeError("Graph is finalized and cannot be modified.")
RuntimeError: Graph is finalized and cannot be modified.

aselle · 2016-11-07T19:09:10Z

It looks like you finalized your model at some point and then tried to modify it afterwards. Please provide a simple test case, or else you will likely need to look into it on your own.

jinhou · 2016-11-09T07:40:26Z

@aselle I used a grpc for a distruibed training, and it seemed that the graph will be finalized when i create the class Supervisor, see details in file "https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/supervisor.py".
And when i tried to dump the model with the following code, it said the above error.
` # Export inference model.

model_exporter = exporter.Exporter(saver)
signature = exporter.generic_signature({"Reshape_2:0":i_indices_reshape,
"Reshape_3:0":i_values,
"Cast_1:0":i_shape,
"Reshape_1:0":indices_shape,
"softmax_linear/add:0":logits})
model_exporter.init(init_op=init_op,
graph_def=sess.graph.as_graph_def(),
default_graph_signature=signature)
model_exporter.export(FLAGS.export_dir, tf.constant(FLAGS.export_version), sess)
print ('Successfully exported model to %s' % FLAGS.export_dir)
`
I found a work around method to write the model successfully . Firstly, I saved the checkpoint file during the training process (grpc distributed version). Then I restored the checkpoint file with another session (single node version) which do not use grpc. And I can write the model for this situation.

However, I am still wondered that how to save the model in the distributed version code?

aselle · 2016-11-10T02:15:41Z

@mrry, do you have any comments on what the proper way to do this is?

mrry · 2016-11-10T17:42:58Z

I'm not sure what your code looks like, but I'd suggest that you construct the Exporter object before starting training (which I assume is using a Supervisor and therefore calling Graph.finalize()...) to avoid this error. You can construct it immediately after constructing a tf.train.Saver.

jinhou · 2016-11-11T08:33:21Z

@mrry Thanks for the suggestion. I tried your method, I could put the clause model_exporter = exporter.Exporter(saver) before Supervisor as your guiding, however, I can't put model_exporter.init clause before Supervisor, because it need sess = sv.prepare_or_wait_for_session(target, config=sess_config) as one of the input parameters. So I put this clause after Supervisor. Then it still said I couldn't modify the graph for the clause model_exporter.init. Furthermore, I don't understand why this clause modify the graph?

Below is the details of my model export initialization:

  signature = exporter.generic_signature({"Reshape_2:0":i_indices_reshape,
                                          "Reshape_3:0":i_values,
                                          "Cast_1:0":i_shape,
                                          "Reshape_1:0":indices_shape,
                                          "softmax_linear/add:0":logits})
  model_exporter.init(sess.graph.as_graph_def(),
                      default_graph_signature=signature)

mrry · 2016-11-11T17:06:33Z

Ah, you don't actually need sess in the call to model_exporter.init(). Instead you could do something like:

model_exporter.init(tf.get_default_graph().as_graph_def(), ...)

firewu · 2016-11-13T09:22:24Z

I met the same error(RuntimeError: Graph is finalized...) when I call model_exporter.export(export_path, tf.constant(FLAGS.export_version), sess) which taked sess as an argument after the Supervisor.

mrry · 2016-11-14T15:33:54Z

@firewu I suspect the tf.constant() is causing an error in this case. This tensor should be created before you define the supervisor.

firewu · 2016-11-15T03:09:26Z

@mrry Thanks!That is it!I define the tensor before the supervisor, and it works. @jinhou Maybe the same solution will work. But I have to point that I wrote my code which is reference to inception_export.py at line 131:model_exporter.export(FLAGS.export_dir, tf.constant(global_step), sess).

prb12 · 2016-11-27T21:06:35Z

@jinhou Please reopen if this didn't solve the original issue.

shafy · 2017-09-18T04:46:05Z

@jinhou I was facing the same issue and solved it by manually unfinalizing -> exporting -> re-finalizing the graph like they do here: https://www.bountysource.com/issues/43253488-is-there-anything-example-about-how-to-apply-model-saved-by-distribution-tensorflow-in-tensorflow-serving

Not sure if this is considered bad practice or what unintended consequences could be. However, it works and I didn't find another way to make it work with Supervisor.

tobegit3hub · 2018-01-24T04:28:32Z

Thanks @shafy . Unfinalizing-then-finalized will work but I don't think it's the best way. If we use tf.train.MonitoredTrainingSession, it is not an Session and can't use this.

I think the APIs for distributed training which need finalizing the graph and saved model which need modifying the graph are a little conflicting.

jinhou added the stat:awaiting response Status - Awaiting response from author label Nov 7, 2016

aselle removed the stat:awaiting response Status - Awaiting response from author label Nov 7, 2016

aselle assigned mrry Nov 10, 2016

aselle added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:support Support issues labels Nov 10, 2016

mrry added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Nov 10, 2016

aselle removed the stat:awaiting response Status - Awaiting response from author label Nov 11, 2016

mrry added the stat:awaiting response Status - Awaiting response from author label Nov 11, 2016

prb12 closed this as completed Nov 27, 2016

tobegit3hub mentioned this issue Feb 4, 2017

Fail to export model with exporter with distributed session #5110

Closed

kirilg mentioned this issue Mar 27, 2017

Is there anything example about How to apply model saved by distribution tensorflow in tensorflow serving? tensorflow/serving#363

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to export model with exporter with distributed session #5439

Fail to export model with exporter with distributed session #5439

jinhou commented Nov 7, 2016

aselle commented Nov 7, 2016

jinhou commented Nov 9, 2016 •

edited

aselle commented Nov 10, 2016

mrry commented Nov 10, 2016

jinhou commented Nov 11, 2016 •

edited

mrry commented Nov 11, 2016

firewu commented Nov 13, 2016

mrry commented Nov 14, 2016

firewu commented Nov 15, 2016

prb12 commented Nov 27, 2016

shafy commented Sep 18, 2017

tobegit3hub commented Jan 24, 2018 •

edited

Fail to export model with exporter with distributed session #5439

Fail to export model with exporter with distributed session #5439

Comments

jinhou commented Nov 7, 2016

aselle commented Nov 7, 2016

jinhou commented Nov 9, 2016 • edited

aselle commented Nov 10, 2016

mrry commented Nov 10, 2016

jinhou commented Nov 11, 2016 • edited

mrry commented Nov 11, 2016

firewu commented Nov 13, 2016

mrry commented Nov 14, 2016

firewu commented Nov 15, 2016

prb12 commented Nov 27, 2016

shafy commented Sep 18, 2017

tobegit3hub commented Jan 24, 2018 • edited

jinhou commented Nov 9, 2016 •

edited

jinhou commented Nov 11, 2016 •

edited

tobegit3hub commented Jan 24, 2018 •

edited