-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document that tf.train.Supervisor is deprecated #6263
Comments
tf.Variable will add variable to GLOBAL_VARIABLES, and won't add it to MODEL_VARIABLES. It seem that Superivisor only save MODEL_VARIABLES. I fix this by call tf.contrib.framework.add_model_variable for all varibles defined by tf.Variable. |
@shiyemin, Do you mean add following lines: from __future__ import print_function
import tensorflow as tf
server = tf.train.Server.create_local_server()
logs_path = "mnist/logs"
global_step = tf.get_variable('global_step', [],
initializer=tf.constant_initializer(0),
trainable=False)
with tf.name_scope("weights"):
W1 = tf.Variable(tf.random_normal([784, 100]))
W2 = tf.Variable(tf.random_normal([784, 100]))
for variable in tf.global_variables():
tf.contrib.framework.add_model_variable(variable)
init_op = tf.global_variables_initializer()
print("Variables initialized ...")
sv = tf.train.Supervisor(is_chief=True,
logdir=logs_path,
global_step=global_step,
init_op=init_op,
save_model_secs=600)
with sv.managed_session(server.target) as sess:
while not sv.should_stop():
print('==============')
sv.stop() I tried this, still no success. |
I have the same issue - using |
I encountered a similar error using Tensorflow r0.12 and r1.0 as well. For me, the code breaks when I start a new session using |
I actually store the graph before I initialize a session. I tried with and
without a session present and it doesn't make a difference.
…--Hannes
On Mon, Jan 30, 2017 at 5:49 PM, Kevin Chen ***@***.***> wrote:
I'm encountering a similar error using Tensorflow r0.12 and r1.0 as well.
For me, the code breaks when I start a new session using with
sv.managed_session(...) as sess.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6263 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AATuQ5QH1sYoetzYwryaZaKdKk_UzS10ks5rXmkFgaJpZM4LKRtn>
.
|
|
For anyone looking for more information about deprecation of ´tf.trainSupervisor´ and upcoming update to guide, there are some newer comments here: #6604 |
Can anyone update this example with tf.train.MonitoredSession? |
Can anyone update document below with a phrase which clearly indicates that Supervisor is deprecated? Please make Tensorflow community united with clear vision to its API usage. https://www.tensorflow.org/api_docs/python/tf/train/Supervisor |
Fixes tensorflow#6263. PiperOrigin-RevId: 177230053
I am using CUDA 8.0, cuDNN 5.1, ubuntu 16.04, GPU: TitanX, tensorflow r0.12.
And I met some problems when using tf.train.Supervisor in distributed training. I have simplified my code shown as belown:
The problem is that if I set
logdir
explicitly in tf.train.Supervisor, then the code above will met error like this:NotFoundError (see above for traceback): Key weights/Variable not found in checkpoint
. But if I comment the lines about defining W1 and W2, then the code could work. So I assume there might be come issues in saving and restoring the checkpoint files intf.train.Supervisor
or maybe I did not usetf.train.Supervisor
correctly.The text was updated successfully, but these errors were encountered: