Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to export model with exporter with distributed session #5439

Closed
jinhou opened this issue Nov 7, 2016 · 12 comments
Closed

Fail to export model with exporter with distributed session #5439

jinhou opened this issue Nov 7, 2016 · 12 comments
Assignees
Labels
stat:awaiting response Status - Awaiting response from author type:support Support issues

Comments

@jinhou
Copy link

jinhou commented Nov 7, 2016

Hi,

I met an issue as follow when I was trying to export the model with distributed session,

Do you have any idea about how to fix this issue. Thanks

Exporting trained model to./model/
Traceback (most recent call last):
File "dnn_train.py", line 399, in
tf.app.run()
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 32, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "dnn_train.py", line 396, in main
run_training(server.target, cluster_spec)
File "dnn_train.py", line 318, in run_training
default_graph_signature=signature)
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/session_bundle/exporter.py", line 198, in init
ops.add_to_collection(constants.GRAPH_KEY, graph_any_buf)
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 4073, in add_to_collection
get_default_graph().add_to_collection(name, value)
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2689, in add_to_collection
self._check_not_finalized()
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2088, in _check_not_finalized
raise RuntimeError("Graph is finalized and cannot be modified.")
RuntimeError: Graph is finalized and cannot be modified.

@aselle
Copy link
Contributor

aselle commented Nov 7, 2016

It looks like you finalized your model at some point and then tried to modify it afterwards. Please provide a simple test case, or else you will likely need to look into it on your own.

@jinhou jinhou added the stat:awaiting response Status - Awaiting response from author label Nov 7, 2016
@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label Nov 7, 2016
@jinhou
Copy link
Author

jinhou commented Nov 9, 2016

@aselle I used a grpc for a distruibed training, and it seemed that the graph will be finalized when i create the class Supervisor, see details in file "https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/supervisor.py".
And when i tried to dump the model with the following code, it said the above error.
` # Export inference model.

model_exporter = exporter.Exporter(saver)
signature = exporter.generic_signature({"Reshape_2:0":i_indices_reshape,
"Reshape_3:0":i_values,
"Cast_1:0":i_shape,
"Reshape_1:0":indices_shape,
"softmax_linear/add:0":logits})
model_exporter.init(init_op=init_op,
graph_def=sess.graph.as_graph_def(),
default_graph_signature=signature)
model_exporter.export(FLAGS.export_dir, tf.constant(FLAGS.export_version), sess)
print ('Successfully exported model to %s' % FLAGS.export_dir)
`
I found a work around method to write the model successfully . Firstly, I saved the checkpoint file during the training process (grpc distributed version). Then I restored the checkpoint file with another session (single node version) which do not use grpc. And I can write the model for this situation.

However, I am still wondered that how to save the model in the distributed version code?

@aselle
Copy link
Contributor

aselle commented Nov 10, 2016

@mrry, do you have any comments on what the proper way to do this is?

@aselle aselle added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:support Support issues labels Nov 10, 2016
@mrry
Copy link
Contributor

mrry commented Nov 10, 2016

I'm not sure what your code looks like, but I'd suggest that you construct the Exporter object before starting training (which I assume is using a Supervisor and therefore calling Graph.finalize()...) to avoid this error. You can construct it immediately after constructing a tf.train.Saver.

@mrry mrry added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Nov 10, 2016
@jinhou
Copy link
Author

jinhou commented Nov 11, 2016

@mrry Thanks for the suggestion. I tried your method, I could put the clause model_exporter = exporter.Exporter(saver) before Supervisor as your guiding, however, I can't put model_exporter.init clause before Supervisor, because it need sess = sv.prepare_or_wait_for_session(target, config=sess_config) as one of the input parameters. So I put this clause after Supervisor. Then it still said I couldn't modify the graph for the clause model_exporter.init. Furthermore, I don't understand why this clause modify the graph?

Below is the details of my model export initialization:

  signature = exporter.generic_signature({"Reshape_2:0":i_indices_reshape,
                                          "Reshape_3:0":i_values,
                                          "Cast_1:0":i_shape,
                                          "Reshape_1:0":indices_shape,
                                          "softmax_linear/add:0":logits})
  model_exporter.init(sess.graph.as_graph_def(),
                      default_graph_signature=signature)

@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label Nov 11, 2016
@mrry
Copy link
Contributor

mrry commented Nov 11, 2016

Ah, you don't actually need sess in the call to model_exporter.init(). Instead you could do something like:

model_exporter.init(tf.get_default_graph().as_graph_def(), ...)

@mrry mrry added the stat:awaiting response Status - Awaiting response from author label Nov 11, 2016
@firewu
Copy link

firewu commented Nov 13, 2016

I met the same error(RuntimeError: Graph is finalized...) when I call model_exporter.export(export_path, tf.constant(FLAGS.export_version), sess) which taked sess as an argument after the Supervisor.

@mrry
Copy link
Contributor

mrry commented Nov 14, 2016

@firewu I suspect the tf.constant() is causing an error in this case. This tensor should be created before you define the supervisor.

@firewu
Copy link

firewu commented Nov 15, 2016

@mrry Thanks!That is it!I define the tensor before the supervisor, and it works. @jinhou Maybe the same solution will work. But I have to point that I wrote my code which is reference to inception_export.py at line 131:model_exporter.export(FLAGS.export_dir, tf.constant(global_step), sess).

@prb12
Copy link
Member

prb12 commented Nov 27, 2016

@jinhou Please reopen if this didn't solve the original issue.

@shafy
Copy link

shafy commented Sep 18, 2017

@jinhou I was facing the same issue and solved it by manually unfinalizing -> exporting -> re-finalizing the graph like they do here: https://www.bountysource.com/issues/43253488-is-there-anything-example-about-how-to-apply-model-saved-by-distribution-tensorflow-in-tensorflow-serving

Not sure if this is considered bad practice or what unintended consequences could be. However, it works and I didn't find another way to make it work with Supervisor.

@tobegit3hub
Copy link
Contributor

tobegit3hub commented Jan 24, 2018

Thanks @shafy . Unfinalizing-then-finalized will work but I don't think it's the best way. If we use tf.train.MonitoredTrainingSession, it is not an Session and can't use this.

I think the APIs for distributed training which need finalizing the graph and saved model which need modifying the graph are a little conflicting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting response Status - Awaiting response from author type:support Support issues
Projects
None yet
Development

No branches or pull requests

7 participants