Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.environ["CUDA_DEVICE_ORDER"] not work #25152

Closed
xiongma opened this issue Jan 24, 2019 · 32 comments
Closed

os.environ["CUDA_DEVICE_ORDER"] not work #25152

xiongma opened this issue Jan 24, 2019 · 32 comments
Assignees

Comments

@xiongma
Copy link

xiongma commented Jan 24, 2019

I have two GPU, I want to create 2 graph in 2 GPU, first graph in first GPU, second graph in second GPU.

  1. when I create first graph in first GPU, I use os.environ["CUDA_DEVICE_ORDER"] ='0'
    2.when I create second graph in second GPU, I use os.environ["CUDA_DEVICE_ORDER"] = '1', but second graph still create on first GPU, I try many different ways, but it still not work.

It's a bug?

@xiongma
Copy link
Author

xiongma commented Jan 24, 2019

can anyone help me, Please

@DavidWiesner
Copy link

DavidWiesner commented Jan 25, 2019

You want specify visible devices not the device order. Use
os.environ["CUDA_VISIBLE_DEVICES"]="0" before the tensorflow import

@xiongma
Copy link
Author

xiongma commented Jan 25, 2019

I will try, thank you

@xiongma
Copy link
Author

xiongma commented Jan 25, 2019

@DavidWiesner this is my code, but it still not work, it occur same condition, can you help me?
image
image
image
image
image
image
image

@DavidWiesner
Copy link

Bert is also importing tensorflow so put your environment variables before import bert

@xiongma
Copy link
Author

xiongma commented Jan 25, 2019

@DavidWiesner but I change my code, it still occur same condition, this is my code, after change,

image

image

image

@xiongma
Copy link
Author

xiongma commented Jan 25, 2019

@DavidWiesner this is first page, it lossed
image

@xiongma
Copy link
Author

xiongma commented Jan 25, 2019

this is running time log
image

@xiongma
Copy link
Author

xiongma commented Jan 25, 2019

@DavidWiesner please help me, this is very important to me, I was perplex by this problem 2 days.
thank you very much!

@DavidWiesner
Copy link

It is working. In the log you see only one gpu will be used

@xiongma
Copy link
Author

xiongma commented Jan 26, 2019

@DavidWiesner sorry, It took me so long to get back to you, But in my code, I set different gpu number, when I create class instance, I create 4 class instance, I want first instance work in fist gpu, second instance work in second gpu, third instance work in third gpu, fourth instance work in fourth gpu, but in my log, four instance still work in first gpu, I try many time, but it it same.

@xiongma
Copy link
Author

xiongma commented Jan 26, 2019

@omalleyt12 Can you help me?

@xiongma
Copy link
Author

xiongma commented Jan 27, 2019

@guptapriya Can you help me?

@guptapriya
Copy link
Contributor

hi @policeme - have you tried creating your graph under a with tf.device('/device:GPU:1'): context?
Given you're trying to use different GPUs in the same process, that is likely a better way to specify GPUs than the environment variables. (which you can then leave to be default and let TF detect all of them).
This page has more information on how to use this API: https://www.tensorflow.org/guide/using_gpu

Hope this helps.

@xiongma
Copy link
Author

xiongma commented Jan 28, 2019

Hi @guptapriya , but it wasn't work, this is my code and running time log, please help me.
image
image
image
image
image

@guptapriya
Copy link
Contributor

Can you try a simple example first, such as the one from the guide and see if there you're able to get ops on different GPUs? I wonder if TF is not detecting the multiple GPUs at all for some reason. It would be easier to debug with a small simple code snippet.

@xiongma
Copy link
Author

xiongma commented Jan 28, 2019

@guptapriya in this program, it work, I have 1 GPU, when I set GPU:1, it will occur exception, but I don't know why my first program doesn't work, can you tell me why.
image

@xiongma
Copy link
Author

xiongma commented Jan 28, 2019

image
image
image

@xiongma
Copy link
Author

xiongma commented Jan 28, 2019

@guptapriya this is running time log.

@guptapriya
Copy link
Contributor

Do you only have 1 GPU? If yes, do you mean why your original program didn't throw exception when you tried to use other GPUs?

@xiongma
Copy link
Author

xiongma commented Jan 29, 2019

yes

@xiongma
Copy link
Author

xiongma commented Jan 29, 2019

if you want original code, I can put my code into github

@guptapriya
Copy link
Contributor

can you check what is the value of allow_soft_placement in both the cases? if it is true, then everything will be placed on GPU:0 (when you have 1 GPU). See more details on the same page i mentioned before: https://www.tensorflow.org/guide/using_gpu
and here:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/protobuf/config.proto#L379

@xiongma
Copy link
Author

xiongma commented Jan 29, 2019

but when I removed allow_soft_placement in my code, I set my graph in first GPU, it appeared this exception.
image

@guptapriya
Copy link
Contributor

That's a different error because you're trying to place the saver ops on GPU which will not work.

Also, it doesn't look like there is a bug here being reported, so I will close this ticket now. I believe stack overflow is a better venue for clarifications on usage, such as this.

@elexira
Copy link

elexira commented May 12, 2019

please do not close this issue, how is this not a bug ? all these people having this issue, the least you can do is to help or leave this open.

@guptapriya
Copy link
Contributor

@jaingaurav can you help look into this, and see if there is a bug, and maybe provide the recommended APIs?

@YiLing28
Copy link

@policeme Hi, Have you solved this problem? I also encountered this problem.

@xiongma
Copy link
Author

xiongma commented Sep 30, 2019 via email

@shamangary
Copy link

shamangary commented May 15, 2020

I am facing the similar issue using Pytorch.
os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu this somehow works on some code and some code would fail.

You may change your running command from

# this might fails with os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
python main.py

# you can use this
CUDA_VISIBLE_DEVICES="3,4,5" python main.py

However, I don't know why some codes could work some codes don't.

@xiongma xiongma closed this as completed May 16, 2020
@fengz63
Copy link

fengz63 commented Jun 29, 2020

@shamangary You need to set that before the first use of cuda rather than after that. The following works well for me.
Snipaste_2020-06-29_16-53-34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants