-
Notifications
You must be signed in to change notification settings - Fork 74.9k
Closed
Labels
TF 1.12Issues related to TF 1.12Issues related to TF 1.12comp:dist-stratDistribution Strategy related issuesDistribution Strategy related issuesstaleThis label marks the issue/pr stale - to be closed automatically if no activityThis label marks the issue/pr stale - to be closed automatically if no activitystat:awaiting responseStatus - Awaiting response from authorStatus - Awaiting response from author
Description
System information
Have I written custom code: N/A
OS Platform and Distribution: CentOS Linux release 7.3.1611
TensorFlow installed from: (pip install tf-nightly-gpu)
TensorFlow version: Tensorflow('v1.9.0-rc2-5345-g57d31aa599', '1.12.0-dev20181005')
Bazel version: N/A
GPU model and memory: Tesla P40 24G
Exact command to reproduce: N/A
Mobile device: N/A
CUDA/cuDNN version: cuda 9.0 with cudnn7.1.4
I train with tensorflow for multi-gpu with MirroredStrategy and estimator. I got the problem:
when I set the distribute mode with the following code it will got stuck after runing some training steps:
distribution = tf.contrib.distribute.MirroredStrategy()
config = tf.estimator.RunConfig(train_distribute=distribution)
estimator = tf.estimator.Estimator(model_fn=mymodel_fn, model_dir='logs',
config=config)
bug when I run without distribute mode like this:
distribution = tf.contrib.distribute.MirroredStrategy()
config = tf.estimator.RunConfig()
estimator = tf.estimator.Estimator(model_fn=mymodel_fn, model_dir='logs',
config=config)
It runs ok. Why?
Is that a bug of MirroredStrategy?
fanshiqing
Metadata
Metadata
Labels
TF 1.12Issues related to TF 1.12Issues related to TF 1.12comp:dist-stratDistribution Strategy related issuesDistribution Strategy related issuesstaleThis label marks the issue/pr stale - to be closed automatically if no activityThis label marks the issue/pr stale - to be closed automatically if no activitystat:awaiting responseStatus - Awaiting response from authorStatus - Awaiting response from author