Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'InceptionV3/Predictions/Softmax': Could not satisfy explicit device specification '/device:GPU:0' #3118

Closed
HeXu1 opened this issue Jan 7, 2018 · 5 comments
Labels
stat:awaiting model gardener Waiting on input from TensorFlow model gardener

Comments

@HeXu1
Copy link

HeXu1 commented Jan 7, 2018

System information

What is the top-level directory of the model you are using: slim
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): windows 10
TensorFlow installed from (source or binary): binary using pip install
TensorFlow version (use command below): 1.4.0 / 1.5.0 dev GPU
Bazel version (if compiling from source):
CUDA/cuDNN version: 8.0 / 6.1
GPU model and memory: GTX960 4G
Exact command to reproduce:

when I run the train_image_classifier.py then problem is:
Caused by op 'InceptionV3/Predictions/Softmax', defined at:
File "train_image_classifier.py", line 577, in
tf.app.run()
File "C:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train_image_classifier.py", line 477, in main
clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
File "D:\python3.5.2\Model\tensorflow_models\models-master\research\slim\deployment\model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "train_image_classifier.py", line 460, in clone_fn
logits, end_points = network_fn(images)
File "D:\python3.5.2\Model\tensorflow_models\models-master\research\slim\nets\nets_factory.py", line 135, in network_fn
return func(images, num_classes, is_training=is_training, **kwargs)
File "D:\python3.5.2\Model\tensorflow_models\models-master\research\slim\nets\inception_v3.py", line 543, in inception_v3
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
File "C:\Anaconda3\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "C:\Anaconda3\lib\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 2582, in softmax
predictions = nn.softmax(logits_2d)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1667, in softmax
return _softmax(logits, gen_nn_ops._softmax, dim, name)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1610, in _softmax
return compute_op(logits, name=name)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 4316, in _softmax
"Softmax", logits=logits, name=name)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op
op_def=op_def)
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'InceptionV3/Predictions/Softmax': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
[[Node: InceptionV3/Predictions/Softmax = SoftmaxT=DT_FLOAT, _device="/device:GPU:0"]]

@skye
Copy link
Member

skye commented Jan 8, 2018

@sguada @nathansilberman

@cy89 cy89 added the stat:awaiting model gardener Waiting on input from TensorFlow model gardener label Jan 8, 2018
@Ao-Lee
Copy link

Ao-Lee commented Jan 26, 2018

I got the same problem. It is solved by changing the last few lines of codes defined in train_image_classifier.py

       ###########################
        # Kicks off the training. #
        ###########################
        
        session_config = tf.ConfigProto(allow_soft_placement=True)
        
        slim.learning.train(
                train_tensor,
                logdir=FLAGS.train_dir,
                master=FLAGS.master,
                is_chief=(FLAGS.task == 0),
                init_fn=_get_init_fn(),
                summary_op=summary_op,
                number_of_steps=FLAGS.max_number_of_steps,
                log_every_n_steps=FLAGS.log_every_n_steps,
                save_summaries_secs=FLAGS.save_summaries_secs,
                save_interval_secs=FLAGS.save_interval_secs,
                sync_optimizer=optimizer if FLAGS.sync_replicas else None,
                session_config=session_config,
                )

@HeXu1
Copy link
Author

HeXu1 commented Jan 26, 2018

@Ao-Lee Very appreciated! That problem has been handled,but when I run again there is a new problem.
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,192,17,17]
[[Node: InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=0.001, is_training=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/Conv2D, InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/BatchNorm/Const, InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/beta/read/_317, InceptionV3/AuxLogits/Conv2d_1b_1x1/BatchNorm/Const_1, InceptionV3/AuxLogits/Conv2d_1b_1x1/BatchNorm/Const_1)]]
[[Node: zero_fraction_12/Mean/_631 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2709_zero_fraction_12/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

It's no reason for that.I 'm thinking whether it's the channels question?

@Ao-Lee
Copy link

Ao-Lee commented Jan 26, 2018

well, I guess your problem is out of memory error . try to lower batch size and run it again

@HeXu1
Copy link
Author

HeXu1 commented Jan 26, 2018

@Ao-Lee wow,it really works.Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting model gardener Waiting on input from TensorFlow model gardener
Projects
None yet
Development

No branches or pull requests

4 participants