-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context in use? #526
Comments
The complete log is here : http://pastebin.com/as0fWvYv |
And the same issue with tutorials_example_trainer. Log here : http://pastebin.com/vPkFfete |
@noisychannel, could you provide a bit more information about your running On Fri, Dec 18, 2015 at 11:06 AM, Derek Murray notifications@github.com
|
Tried both scenarios :
Same issue in both cases. |
@noisychannel I have the same issue. Did you solve it? |
No, the issue remains. |
@zheng-xq Is the bug fixed in the recently released tensorflow 0.7? |
Any updates here? |
I've started an offline conversation with the stream-executor team, since the error originates from stream-executor. Still wait for their response. @leary-google, @eliben, anything from the stream-executor side? |
I am still seeing the same cuda error for tensorflow 0.7. The error is version:
error:
|
Indeed, I am seeing the same error sometimes on a shared K40. It seems to happen when someone else has completed a job, and somehow the context is not cleared? I am sure that no job is actually executing on the GPU at the time. |
I am having the same issue with GPU:1, I can run without problems in GPU:0, but when trying to force the graph to be in GPU:1, using Graph.device(), I get the following: http://pastebin.com/ekTgqJ0U |
I encountered the error |
@zheng-xq: Should we contact the Stream Executor folk offline? It looks like they might not have Github notifications turned on. |
Issue still exists in 0.9. |
Out of curiosity, how is this not a bigger issue? Is there a specific condition where this failure occurs? It seems like the process crashes whether all or any GPUs are available on the machine. How do other people get around this? |
Regarding the comment of @zxvix, saying: "I encountered the error Check failed: CUDA_SUCCESS == dynload::cuCtxSetCurrent(context) (0 vs. 216) recently, which turned out to be caused by someone accidently set the compute mode of GPU to be EXCLUSIVE_THREAD. Revert it back to DEFAULT solved my error." Sometimes on shared clusters there are valid reasons for setting exclusive mode for GPUs. Does TensorFlow require particular modes? Is EXCLUSIVE_PROCESS a possibility? |
Adding @henline, who is the owner of stream-executor. |
I am seeing a similar issue on 0.9 compiled from source, HEAD:
I am on a shared cluster with a scheduler, so I should have exclusive access to the node during my time slice. It looks like exclusive mode is set, but there are no running processes at the time time I try to use it:
|
Tensorflow guys, if I were you I would change the assert statement that cuda/cuda_driver.cc:395] Check failed: CUDA_SUCCESS == to some code that prints out the text form of the CUDA exit code, and maybe On Thu, Jul 28, 2016 at 11:52 AM, Mark Whitney notifications@github.com
|
Hi, I'm the owner of StreamExecutor. Sorry for arriving late to this discussion. I believe this problem is caused in all cases by GPUs with their compute mode set to EXCLUSIVE_THREAD (just as mentioned by @zxvix). The solution is to set the compute mode to DEFAULT or EXCLUSIVE_PROCESS, which can be done via one of the following commands:
The If anyone is seeing this error when the device compute mode is either DEFAULT or EXCLUSIVE_PROCESS, please let me know, because I don't think that should be possible. StreamExecutor will not work in either EXCLUSIVE_THREAD or PROHIBITED compute mode, but in response to @danpovey's question about shared clusters, EXCLUSIVE_PROCESS mode should be fine. There are no plans in StreamExecutor to support EXCLUSIVE_THREAD mode because it is listed as deprecated in the nvidia-smi help message. There are also no plans to support PROHIBITED mode because I think that mode prevents the creation of contexts, and StreamExecutor cannot function with that restriction. In response to @danpovey's suggestion about adding a better error message for this case, I think that's a good idea. I will work on getting a patch up to warn about the device compute mode if |
Thanks, good to know it should work in EXCLUSIVE_PROCESS. |
So it sounds like this is working as intended, since StreamExecutor doesn't plan on supporting modes other than EXCLUSIVE_PROCESS. |
Just to confirm, StreamExecutor works with EXCLUSIVE_PROCESS. Hopefully, @danpovey's suggestion about better error messages will be added in soon. It may be hard for people to search for this issue. |
For anyone using GPU based tensorflow on Compute Canada resources, submitting the job by specifying EXCLUSIVE_PROCESS worked for me. |
@MidoAssran Thanks for that! Will try it out. |
I can't really see what it's saying (next time paste as text), but in any
case I don't know what this might be. Maybe all the GPUs were already in
use.
…On Mon, Oct 2, 2017 at 8:22 PM, 201power ***@***.***> wrote:
I meet this error
[image: image]
<https://user-images.githubusercontent.com/7299296/31105041-dfe35b10-a795-11e7-9d64-22c460ebaefd.png>
change compute_mode does not solve the issue, any guidance?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#526 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJVux59GbGOM5GlqUBTsst_i5N6RwA_ks5soX42gaJpZM4G2yCN>
.
|
The GPU might be in use. This happened in a TF session followed immediately by another TF session. |
changing compute mode didn't solve the problem, i am getting the same error, my cards are two tesla k80 's |
I'm getting the same error (or similar one) +-----------------------------------------------------------------------------+ I'm running TF verison 1.12 on Ubuntu 18.04 Cuda 10.0, CUDNN 7.3.1 |
@henline Sorry to trouble you. I have set the mode to DEFAULT (nvidia-smi --compute-mode=0), and it indeed takes effect by the info. from nvidia-smi -q. But I still got the error "tensorflow/stream_executor/cuda/cuda_driver.cc:225] Check failed: CUDA_SUCCESS == cuCtxSetCurrent(cuda_context->context()) (0 vs. 3)". |
…int8_pooling Disable qint8 forward pooling on ROCm properly.
Running the following with GPU support :
python convolutional.py
throws the error:
F tensorflow/stream_executor/cuda/cuda_driver.cc:383] Check failed: CUDA_SUCCESS == dynload::cuCtxSetCurrent(context) (0 vs. 216)
Aborted
It seems like 216 when calling cuCtxSetCurrent (which I'm assuming assigns the context to the calling CPU thread) corresponds to CUDA_ERROR_CONTEXT_ALREADY_IN_USE.
What may be causing this error? It seems like the script successfully transfers data to the GPU and fails when initialize_all_variables() is called.
The text was updated successfully, but these errors were encountered: