Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA cannot create more than one session #19482

Closed
Davidnet opened this issue May 22, 2018 · 6 comments
Closed

CUDA cannot create more than one session #19482

Davidnet opened this issue May 22, 2018 · 6 comments
Assignees

Comments

@Davidnet
Copy link

Please go to Stack Overflow for help and support:

https://stackoverflow.com/questions/tagged/tensorflow

If you open a GitHub issue, here is our policy:

  1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.
  3. It shouldn't be a TensorBoard issue. Those go here.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Xenial
  • TensorFlow installed from (source or binary):
    source
  • TensorFlow version (use command below):
    1.8
  • Python version:
    2.7
  • Bazel version (if compiling from source):
    0.13
  • GCC/Compiler version (if compiling from source):
    5.4.0
  • CUDA/cuDNN version:
    9.0/7.0
  • GPU model and memory:
    Tegra x2
  • Exact command to reproduce:
    config = tf.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 0.4 # I have also tried with allow memory growth
    sess1 = tf.Session(config=config)
    sess2=tf.Session(config=config) # Cannot create the session
    You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.

I have a Jetson TX2 with updated drivers and the last jetpack provided by Nvidia, I have built tensorflow (r.1.5 and r.18) and I'm not able to create more than one session, I can execute operations and everything with only one session, but once I create a new session, I encounter that tensorflow cannot create a new session, which I suspect is Nvidia fault, but the error is not that informative:

  File "object_detection.py", line 183, in detection
    with tf.Session(graph=detection_graph,config=config) as sess:
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1509, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 628, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

Is there any way I can get more information about the cuda error or status? So I can complement my bug report?

Thanks

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

@skye
Copy link
Member

skye commented Jun 4, 2018

@tfboyd do we have a way to reproduce this?

@Davidnet just to clarify, do you not experience this problem with a CPU-only build?

@Davidnet
Copy link
Author

Davidnet commented Jun 4, 2018

I use a Jetson TX2, so its quite difficult to use a cpu only build (takes a lot of time). I'm concerned since I do not know how to fill a proper bug report. How to debug programs with CUDA and CUdnn ?

@tfboyd
Copy link
Member

tfboyd commented Jun 5, 2018

I had not used or seen multiple sessions in a script but indeed it happens and there as an issues resolved a long time ago. I would test the code on a regular GPU just to rule in or out Jetson being the issue. We do not have a Jetson TX2 or TX setup in our area. I would also wrap the session in a with.device just to make sure it is going where you want, but that is likely not needed really. If there was a CUDA issue I would expect to see a CUDA error but expectation does not always match reality.

Here is the multi-session example I found while looking for issues. They call run before starting the next session but I do not see why that would matter.

For debugging CUDA, I do not have direct knowledge. You could try the tfdebugging as a starting point. https://www.tensorflow.org/programmers_guide/debugger I also do not have experience with it. Huge help right?

That is where I would start. Finding out it is a Jetson specific issue would be my first thought, then going from there.

@Davidnet
Copy link
Author

Davidnet commented Jun 5, 2018

I'm now getting error on the CUDA runtime implicit initialization on GPU

2018-05-31 01:23:28.378527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-31 01:23:28.378766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4744 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
Building Graph
2018-05-31 01:23:40.001179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-31 01:23:40.001338: E tensorflow/core/common_runtime/direct_session.cc:154] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: unknown error
Traceback (most recent call last):
  File "real_time_detection.py", line 174, in <module>
    main()

@tfboyd
Copy link
Member

tfboyd commented Jun 5, 2018

@Davidnet There is likely not much I can debug. I would suggest the following:

  • Share the code you are running. All of it so someone else could cut/paste and run it. If someone has a few minutes to look at the issue they will get a lot farther, before giving up, if they have code to run and a lot of details stated in a concise manner.

  • Be clear about what you tried. In this case, without the code it is hard to know why you saw no initialization message before but you do not. What did you change?

  • Share the full log and command-line that was run. Link to a file or just paste it in.

Finally, keep in mind this is not a help desk and mostly it is about giving you ideas. In the case of the Jetson even more so as we do no have those sitting around like we do GPUs or CPUs on our local machines.

Do not read this as not wanting to help. Everyone wants to help and the biggest frustration is not having enough information.

@tensorflowbutler
Copy link
Member

Nagging Assignee @skye: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants