New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to export model with exporter with distributed session #5439
Comments
It looks like you finalized your model at some point and then tried to modify it afterwards. Please provide a simple test case, or else you will likely need to look into it on your own. |
@aselle I used a grpc for a distruibed training, and it seemed that the graph will be finalized when i create the class Supervisor, see details in file "https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/supervisor.py". model_exporter = exporter.Exporter(saver) However, I am still wondered that how to save the model in the distributed version code? |
@mrry, do you have any comments on what the proper way to do this is? |
I'm not sure what your code looks like, but I'd suggest that you construct the |
@mrry Thanks for the suggestion. I tried your method, I could put the clause Below is the details of my model export initialization:
|
Ah, you don't actually need model_exporter.init(tf.get_default_graph().as_graph_def(), ...) |
I met the same error(RuntimeError: Graph is finalized...) when I call model_exporter.export(export_path, tf.constant(FLAGS.export_version), sess) which taked sess as an argument after the Supervisor. |
@firewu I suspect the |
@mrry Thanks!That is it!I define the tensor before the supervisor, and it works. @jinhou Maybe the same solution will work. But I have to point that I wrote my code which is reference to inception_export.py at line 131:model_exporter.export(FLAGS.export_dir, tf.constant(global_step), sess). |
@jinhou Please reopen if this didn't solve the original issue. |
@jinhou I was facing the same issue and solved it by manually unfinalizing -> exporting -> re-finalizing the graph like they do here: https://www.bountysource.com/issues/43253488-is-there-anything-example-about-how-to-apply-model-saved-by-distribution-tensorflow-in-tensorflow-serving Not sure if this is considered bad practice or what unintended consequences could be. However, it works and I didn't find another way to make it work with Supervisor. |
Thanks @shafy . Unfinalizing-then-finalized will work but I don't think it's the best way. If we use I think the APIs for distributed training which need finalizing the graph and saved model which need modifying the graph are a little conflicting. |
Hi,
I met an issue as follow when I was trying to export the model with distributed session,
Do you have any idea about how to fix this issue. Thanks
Exporting trained model to./model/
Traceback (most recent call last):
File "dnn_train.py", line 399, in
tf.app.run()
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 32, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "dnn_train.py", line 396, in main
run_training(server.target, cluster_spec)
File "dnn_train.py", line 318, in run_training
default_graph_signature=signature)
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/session_bundle/exporter.py", line 198, in init
ops.add_to_collection(constants.GRAPH_KEY, graph_any_buf)
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 4073, in add_to_collection
get_default_graph().add_to_collection(name, value)
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2689, in add_to_collection
self._check_not_finalized()
File "/gruntdata/DL_dataset/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2088, in _check_not_finalized
raise RuntimeError("Graph is finalized and cannot be modified.")
RuntimeError: Graph is finalized and cannot be modified.
The text was updated successfully, but these errors were encountered: