Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context in use? #526

Closed
noisychannel opened this issue Dec 16, 2015 · 32 comments
Closed

Context in use? #526

noisychannel opened this issue Dec 16, 2015 · 32 comments

Comments

@noisychannel
Copy link

Running the following with GPU support :

python convolutional.py

throws the error:
F tensorflow/stream_executor/cuda/cuda_driver.cc:383] Check failed: CUDA_SUCCESS == dynload::cuCtxSetCurrent(context) (0 vs. 216)

Aborted

It seems like 216 when calling cuCtxSetCurrent (which I'm assuming assigns the context to the calling CPU thread) corresponds to CUDA_ERROR_CONTEXT_ALREADY_IN_USE.

What may be causing this error? It seems like the script successfully transfers data to the GPU and fails when initialize_all_variables() is called.

@noisychannel
Copy link
Author

The complete log is here : http://pastebin.com/as0fWvYv

@noisychannel
Copy link
Author

And the same issue with tutorials_example_trainer. Log here : http://pastebin.com/vPkFfete

@zheng-xq
Copy link
Contributor

@noisychannel, could you provide a bit more information about your running
environment. I see that you have two K20m on your machine. Is this a
dedicated machine, or something shared?

On Fri, Dec 18, 2015 at 11:06 AM, Derek Murray notifications@github.com
wrote:

Assigned #526 #526 to
@zheng-xq https://github.com/zheng-xq.


Reply to this email directly or view it on GitHub
#526 (comment).

@noisychannel
Copy link
Author

Tried both scenarios :

  1. Dedicated with 2 K20s.
  2. Shared with 2 K20s and 1 available and selected for use.

Same issue in both cases.

@digitalsword
Copy link

@noisychannel I have the same issue. Did you solve it?

@noisychannel
Copy link
Author

No, the issue remains.

@digitalsword
Copy link

@zheng-xq Is the bug fixed in the recently released tensorflow 0.7?

@noisychannel
Copy link
Author

Any updates here?

@zheng-xq
Copy link
Contributor

I've started an offline conversation with the stream-executor team, since the error originates from stream-executor. Still wait for their response.

@leary-google, @eliben, anything from the stream-executor side?

@digitalsword
Copy link

I am still seeing the same cuda error for tensorflow 0.7. The error is Check failed: CUDA_SUCCESS == dynload::cuCtxSetCurrent(context) (0 vs. 216)

version:

commit b88971051fbc49fa1e0b91ec1b0b60defa11697e
Merge: 5a30c8f 00986d4
Author: Derek Murray <mrry@google.com>
Date:   Fri Feb 26 05:08:35 2016 -0800

error:

++ python cifar10_train.py
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so.4 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so.7.5 locally
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: Tesla K40m
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:08:00.0
Total memory: 11.25GiB
Free memory: 11.15GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:718] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:08:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256B
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512B
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:107] Allocating 10.60GiB bytes.
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:118] GPU 0 memory begins at 0x13047a0000 extends to 0x15aaa4019a
F tensorflow/stream_executor/cuda/cuda_driver.cc:383] Check failed: CUDA_SUCCESS == dynload::cuCtxSetCurrent(context) (0 vs. 216)
./run.sh: line 4: 33880 Aborted                 python cifar10_train.py

@rlrs
Copy link

rlrs commented Mar 9, 2016

Indeed, I am seeing the same error sometimes on a shared K40. It seems to happen when someone else has completed a job, and somehow the context is not cleared? I am sure that no job is actually executing on the GPU at the time.

@crscardellino
Copy link

I am having the same issue with GPU:1, I can run without problems in GPU:0, but when trying to force the graph to be in GPU:1, using Graph.device(), I get the following: http://pastebin.com/ekTgqJ0U

@zxvix
Copy link

zxvix commented Apr 3, 2016

I encountered the error Check failed: CUDA_SUCCESS == dynload::cuCtxSetCurrent(context) (0 vs. 216) recently, which turned out to be caused by someone accidently set the compute mode of GPU to be EXCLUSIVE_THREAD. Revert it back to DEFAULT solved my error.

@girving
Copy link
Contributor

girving commented Jun 6, 2016

@zheng-xq: Should we contact the Stream Executor folk offline? It looks like they might not have Github notifications turned on.

@girving girving added the triaged label Jun 6, 2016
@noisychannel
Copy link
Author

Issue still exists in 0.9.

@noisychannel
Copy link
Author

Out of curiosity, how is this not a bigger issue? Is there a specific condition where this failure occurs? It seems like the process crashes whether all or any GPUs are available on the machine.

How do other people get around this?

@danpovey
Copy link

Regarding the comment of @zxvix, saying: "I encountered the error Check failed: CUDA_SUCCESS == dynload::cuCtxSetCurrent(context) (0 vs. 216) recently, which turned out to be caused by someone accidently set the compute mode of GPU to be EXCLUSIVE_THREAD. Revert it back to DEFAULT solved my error."

Sometimes on shared clusters there are valid reasons for setting exclusive mode for GPUs. Does TensorFlow require particular modes? Is EXCLUSIVE_PROCESS a possibility?

@zheng-xq zheng-xq assigned zheng-xq and unassigned zheng-xq Jun 16, 2016
@zheng-xq
Copy link
Contributor

Adding @henline, who is the owner of stream-executor.

@aselle aselle removed the triaged label Jul 28, 2016
@kramimus
Copy link

I am seeing a similar issue on 0.9 compiled from source, HEAD:

commit 554ddd9ad2d4abad5a9a31f2d245f0b1012f0d10
Merge: 89e1cc5 a0745a7
Author: yifeif <fengyifei2026@gmail.com>
Date:   Tue Jul 26 16:17:21 2016 -0700

I am on a shared cluster with a scheduler, so I should have exclusive access to the node during my time slice. It looks like exclusive mode is set, but there are no running processes at the time time I try to use it:

[2016-07-28T17:59:19Z]: +------------------------------------------------------+                       
[2016-07-28T17:59:19Z]: | NVIDIA-SMI 352.39     Driver Version: 352.39         |                       
[2016-07-28T17:59:19Z]: |-------------------------------+----------------------+----------------------+
[2016-07-28T17:59:19Z]: | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
[2016-07-28T17:59:19Z]: | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
[2016-07-28T17:59:19Z]: |===============================+======================+======================|
[2016-07-28T17:59:19Z]: |   0  Tesla K40m          Off  | 0000:08:00.0     Off |                    0 |
[2016-07-28T17:59:19Z]: | N/A   25C    P8    19W / 235W |     23MiB / 11519MiB |      0%    E. Thread |
[2016-07-28T17:59:19Z]: +-------------------------------+----------------------+----------------------+
[2016-07-28T17:59:19Z]:                                                                                
[2016-07-28T17:59:19Z]: +-----------------------------------------------------------------------------+
[2016-07-28T17:59:19Z]: | Processes:                                                       GPU Memory |
[2016-07-28T17:59:19Z]: |  GPU       PID  Type  Process name                               Usage      |
[2016-07-28T17:59:19Z]: |=============================================================================|
[2016-07-28T17:59:19Z]: |  No running processes found                                                 |
[2016-07-28T17:59:19Z]: +-----------------------------------------------------------------------------+
[2016-07-28T17:59:21Z]: Using TensorFlow backend.
[2016-07-28T17:59:22Z]: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so.7.5 locally
[2016-07-28T17:59:22Z]: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
[2016-07-28T17:59:22Z]: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so.7.5 locally
[2016-07-28T17:59:23Z]: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.4.0.7 locally
[2016-07-28T17:59:23Z]: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so.7.5 locally
[2016-07-28T17:59:34Z]: I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
[2016-07-28T17:59:34Z]: name: Tesla K40m
[2016-07-28T17:59:34Z]: major: 3 minor: 5 memoryClockRate (GHz) 0.745
[2016-07-28T17:59:34Z]: pciBusID 0000:08:00.0
[2016-07-28T17:59:34Z]: Total memory: 11.25GiB
[2016-07-28T17:59:34Z]: Free memory: 11.15GiB
[2016-07-28T17:59:34Z]: I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
[2016-07-28T17:59:34Z]: I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
[2016-07-28T17:59:34Z]: I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:08:00.0)
[2016-07-28T17:59:34Z]: F tensorflow/stream_executor/cuda/cuda_driver.cc:395] Check failed: CUDA_SUCCESS == dynload::cuCtxSetCurrent(cuda_context->context()) (0 vs. 216)
[2016-07-28T17:59:34Z]: X_train shape: (60000, 1, 28, 28)
[2016-07-28T17:59:34Z]: 60000 train samples
[2016-07-28T17:59:34Z]: 10000 test samples
[2016-07-28T17:59:41Z]: /tmp/wrapper5271742824235601482.sh: line 12: 61655 Aborted                 (core dumped) python mnist_cnn.py
[2016-07-28T17:59:41Z]: Exited with code 0

@danpovey
Copy link

Tensorflow guys, if I were you I would change the assert statement that
fails here:

cuda/cuda_driver.cc:395] Check failed: CUDA_SUCCESS ==
dynload::cuCtxSetCurrent(cuda_context->context()) (0 vs. 216)

to some code that prints out the text form of the CUDA exit code, and maybe
for good measure tries to invoke nvidia-smi to get extra information. We
found this necessary in Kaldi in order to ensure that when there are
problems, all the information needed is in the log.
Dan

On Thu, Jul 28, 2016 at 11:52 AM, Mark Whitney notifications@github.com
wrote:

I am seeing a similar issue on 0.9 compiled from source, HEAD:

commit 554ddd9
Merge: 89e1cc5 a0745a7
Author: yifeif fengyifei2026@gmail.com
Date: Tue Jul 26 16:17:21 2016 -0700

I am on a shared cluster with a scheduler, so I should have exclusive
access to the node during my time slice. It looks like exclusive mode is
set, but there are no running processes at the time time I try to use it:

2016-07-28T17:59:19Z: | NVIDIA-SMI 352.39 Driver Version: 352.39 |
2016-07-28T17:59:19Z: |-------------------------------+----------------------+----------------------+
2016-07-28T17:59:19Z: | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
2016-07-28T17:59:19Z: | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
2016-07-28T17:59:19Z: |===============================+======================+======================|
2016-07-28T17:59:19Z: | 0 Tesla K40m Off | 0000:08:00.0 Off | 0 |
2016-07-28T17:59:19Z: | N/A 25C P8 19W / 235W | 23MiB / 11519MiB | 0% E. Thread |
2016-07-28T17:59:19Z: +-------------------------------+----------------------+----------------------+
2016-07-28T17:59:19Z:
2016-07-28T17:59:19Z: +-----------------------------------------------------------------------------+
2016-07-28T17:59:19Z: | Processes: GPU Memory |
2016-07-28T17:59:19Z: | GPU PID Type Process name Usage |
2016-07-28T17:59:19Z: |=============================================================================|
2016-07-28T17:59:19Z: | No running processes found |
2016-07-28T17:59:19Z: +-----------------------------------------------------------------------------+
[2016-07-28T17:59:21Z]: Using TensorFlow backend.
[2016-07-28T17:59:22Z]: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so.7.5 locally
[2016-07-28T17:59:22Z]: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
[2016-07-28T17:59:22Z]: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so.7.5 locally
[2016-07-28T17:59:23Z]: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.4.0.7 locally
[2016-07-28T17:59:23Z]: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so.7.5 locally
[2016-07-28T17:59:34Z]: I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
[2016-07-28T17:59:34Z]: name: Tesla K40m
[2016-07-28T17:59:34Z]: major: 3 minor: 5 memoryClockRate (GHz) 0.745
[2016-07-28T17:59:34Z]: pciBusID 0000:08:00.0
[2016-07-28T17:59:34Z]: Total memory: 11.25GiB
[2016-07-28T17:59:34Z]: Free memory: 11.15GiB
[2016-07-28T17:59:34Z]: I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
[2016-07-28T17:59:34Z]: I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
[2016-07-28T17:59:34Z]: I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:08:00.0)
[2016-07-28T17:59:34Z]: F tensorflow/stream_executor/cuda/cuda_driver.cc:395] Check failed: CUDA_SUCCESS == dynload::cuCtxSetCurrent(cuda_context->context()) (0 vs. 216)
[2016-07-28T17:59:34Z]: X_train shape: (60000, 1, 28, 28)
[2016-07-28T17:59:34Z]: 60000 train samples
[2016-07-28T17:59:34Z]: 10000 test samples
[2016-07-28T17:59:41Z]: /tmp/wrapper5271742824235601482.sh: line 12: 61655 Aborted (core dumped) python mnist_cnn.py
[2016-07-28T17:59:41Z]: Exited with code 0


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#526 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu5zDQGnFRHmS-sFWKkA2dnlN3v1Cks5qaPqBgaJpZM4G2yCN
.

@henline
Copy link

henline commented Jul 29, 2016

Hi, I'm the owner of StreamExecutor. Sorry for arriving late to this discussion.

I believe this problem is caused in all cases by GPUs with their compute mode set to EXCLUSIVE_THREAD (just as mentioned by @zxvix). The solution is to set the compute mode to DEFAULT or EXCLUSIVE_PROCESS, which can be done via one of the following commands:

$ nvidia-smi --compute-mode=0 # for DEFAULT
$ nvidia-smi --compute-mode=3 # for EXCLUSIVE_PROCESS

The nvidia-smi -q command can also be used to query the current compute mode of the device.

If anyone is seeing this error when the device compute mode is either DEFAULT or EXCLUSIVE_PROCESS, please let me know, because I don't think that should be possible.

StreamExecutor will not work in either EXCLUSIVE_THREAD or PROHIBITED compute mode, but in response to @danpovey's question about shared clusters, EXCLUSIVE_PROCESS mode should be fine.

There are no plans in StreamExecutor to support EXCLUSIVE_THREAD mode because it is listed as deprecated in the nvidia-smi help message. There are also no plans to support PROHIBITED mode because I think that mode prevents the creation of contexts, and StreamExecutor cannot function with that restriction.

In response to @danpovey's suggestion about adding a better error message for this case, I think that's a good idea. I will work on getting a patch up to warn about the device compute mode if cuCtxSetCurrent fails and the compute mode is set to an unsupported setting.

@kramimus
Copy link

Thanks, good to know it should work in EXCLUSIVE_PROCESS.

@alextp
Copy link
Contributor

alextp commented Aug 15, 2016

So it sounds like this is working as intended, since StreamExecutor doesn't plan on supporting modes other than EXCLUSIVE_PROCESS.

@alextp alextp closed this as completed Aug 15, 2016
@noisychannel
Copy link
Author

Just to confirm, StreamExecutor works with EXCLUSIVE_PROCESS. Hopefully, @danpovey's suggestion about better error messages will be added in soon. It may be hard for people to search for this issue.

@MidoAssran
Copy link

For anyone using GPU based tensorflow on Compute Canada resources, submitting the job by specifying EXCLUSIVE_PROCESS worked for me.

@zafarali
Copy link
Contributor

@MidoAssran Thanks for that! Will try it out.

@201power
Copy link

201power commented Oct 3, 2017

I meet this error
image
change compute_mode does not solve the issue, any guidance?

@danpovey
Copy link

danpovey commented Oct 3, 2017 via email

@201power
Copy link

201power commented Oct 4, 2017

The GPU might be in use. This happened in a TF session followed immediately by another TF session.
I am not sure how to tell that if GPU is in use, since TF won't release memory after the first session finished.

@dee6600
Copy link

dee6600 commented Feb 27, 2018

changing compute mode didn't solve the problem, i am getting the same error, my cards are two tesla k80 's
and driver version 375 cuda version 8.0 tensorflow version 1.1

@tumusudheer
Copy link

I'm getting the same error (or similar one)
2019-01-26 01:22:06.728780: F tensorflow/stream_executor/cuda/cuda_driver.cc:206] Check failed: CUDA_SUCCESS == cuCtxSetCurrent(cuda_context->context()) (0 vs. 4) Aborted (core dumped)
Sample of my Nvidia-smi command output:
`Sat Jan 26 01:22:29 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 Off | 00000000:17:00.0 Off | N/A |
| 0% 42C P8 1W / 225W | 0MiB / 7952MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 2080 Off | 00000000:65:00.0 On | N/A |
| 0% 47C P8 13W / 225W | 166MiB / 7951MiB | 6% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 1101 G /usr/lib/xorg/Xorg 165MiB |
+-----------------------------------------------------------------------------+
`

I'm running TF verison 1.12 on Ubuntu 18.04 Cuda 10.0, CUDNN 7.3.1

@zhouchaopku
Copy link

zhouchaopku commented Jan 15, 2020

Hi, I'm the owner of StreamExecutor. Sorry for arriving late to this discussion.

I believe this problem is caused in all cases by GPUs with their compute mode set to EXCLUSIVE_THREAD (just as mentioned by @zxvix). The solution is to set the compute mode to DEFAULT or EXCLUSIVE_PROCESS, which can be done via one of the following commands:

$ nvidia-smi --compute-mode=0 # for DEFAULT
$ nvidia-smi --compute-mode=3 # for EXCLUSIVE_PROCESS

The nvidia-smi -q command can also be used to query the current compute mode of the device.

If anyone is seeing this error when the device compute mode is either DEFAULT or EXCLUSIVE_PROCESS, please let me know, because I don't think that should be possible.

StreamExecutor will not work in either EXCLUSIVE_THREAD or PROHIBITED compute mode, but in response to @danpovey's question about shared clusters, EXCLUSIVE_PROCESS mode should be fine.

There are no plans in StreamExecutor to support EXCLUSIVE_THREAD mode because it is listed as deprecated in the nvidia-smi help message. There are also no plans to support PROHIBITED mode because I think that mode prevents the creation of contexts, and StreamExecutor cannot function with that restriction.

In response to @danpovey's suggestion about adding a better error message for this case, I think that's a good idea. I will work on getting a patch up to warn about the device compute mode if cuCtxSetCurrent fails and the compute mode is set to an unsupported setting.

@henline Sorry to trouble you. I have set the mode to DEFAULT (nvidia-smi --compute-mode=0), and it indeed takes effect by the info. from nvidia-smi -q. But I still got the error "tensorflow/stream_executor/cuda/cuda_driver.cc:225] Check failed: CUDA_SUCCESS == cuCtxSetCurrent(cuda_context->context()) (0 vs. 3)".

darkbuck pushed a commit to darkbuck/tensorflow that referenced this issue Jan 23, 2020
…int8_pooling

Disable qint8 forward pooling on ROCm properly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests