-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BindToDevice() binds graph to specified (gpu) device which forces all its operations to be prcessed on that device. #20412
BindToDevice() binds graph to specified (gpu) device which forces all its operations to be prcessed on that device. #20412
Conversation
all its operations to be prcessed on that device. Export to golang as well.
Thanks for the PR @bioothod , but I'd like to understand more of your use case first. Firstly, could you describe the issue you were running into with using the GPU? By default, if linked with the GPU-enabled version of the TensorFlow C library, the Go program should automatically select a GPU for kernels where this is appropriate. If that is not happening, it seems like something we should fix. Secondly, the graph construction API as it stands intentionally does not allow nodes to be mutated after they are added to the graph, as this may lead to confusing behavior (e.g., if two Looking forward to your response. |
It should, but it does not. I have a pretty simple golang application, which uses default session options, and while I can list all gpu devices on the board, session is never being bound to any of them. It changed probably around 1.6, but I can not say for sure. Literally the same code binds to random gpus on 1.4.1 and it does not on 1.6.0 or recent master. Previously I used But it is only part of the problem, I also want to run particular session on particular GPU device among several available. Previously I could tune VisibleDeviceList in gpu part of session options, and things worked great. But after 1.6.0 you decided that multiple virtual mapping into the same physical device might confuse some operations (do not know how is it ever possible, but still), so golang code panics now if you ever touch VisibleDeviceList with something different than CUDA_VISIBLE_DEVICES (or empty string). Hence this patch - now I can create multiple graphs each of them bind to own gpu device and select them when creating new session according to my policies. GraphDef solution is quite heavy for many sessions, and anyway c++ code has ability to access graph nodes, and |
Thanks for the note. Let's separate the two issues. The GPU should work fine in 1.6+. I just tried with 1.9 and it does seem to work. Particularly the lines in the output:
If the same is not happening in the setup you're running it, it's worth investigating. So this PR shouldn't be required to enable use of the GPU. That said, I appreciate that running the same model on multiple devices isn't as smooth as it should be and that experience can be improved. However, I don't think that this approach is the best one. As I mentioned earlier, I'm weary of mutations to the graph since that can lead to confusing behavior with multiple sessions. Furthermore, the implementation here is forcing every node to run on GPU - which will be problematic if the graph has nodes that only have CPU kernels (unless the user set In both cases (whether the approach in this PR, or via having the process that writes out the graph to import) we're creating multiple sessions, each with their own copy of the graph. One somewhat ugly workaround is to have the program that creates the graph create a single saved model, with one tag per GPU. Then the Go program can create one session per GPU by providing the right tag to Thanks for your understanding. |
That's not quite what I'm working with, problem with GPU placement happens when you load new graph from protobuf and run in in go. I've made a simple example repo to highlight the problem: https://github.com/bioothod/golang_gpu_example If you clone it into
This happens not all the time though, with this particular graph it is always on CPU, but Forcing graph to run on GPU with CPU-only kernels should not be a problem - it is not a real force, but only a hint, and it would be rather great if TF emits some kind of warning in this case. Yet it is MUCH better than running on CPU when all the kernels do have GPU implementations. |
Are there any question you might have concerning this issue? Does my code highlight the problem in your environment? |
@bioothod - sorry, I missed your last update. Will take a look at your example soon. |
@asimshankar, still no progress on this? |
@bioothod - sorry for delay. I'm traveling right now, but will definitely respond by Tuesday. That said, one quick observation in your example - it seems the graph is operating on |
@bioothod : Took a look at your example and had some comments/observations. As mentioned above, the story will be much different if you're using types other than Long story short, the Furthermore, going back to my original reservation to adding this mutation to the graph - we've consciously avoided C APIs to mutate existing nodes in the graph as it can be hard to determine whether or not the mutations apply correctly. For example, consider the following: TF_Graph* graph = MakeMyGraph();
TF_BindToDevice(graph, "/cpu:0");
TF_Session* session = TF_NewSession(graph, ...);
TF_SessionRun(session, ...);
TF_BindToDevice(graph, "/gpu:0");
TF_SessionRun(session, ...); In this snippet, the second call to Alternatives would be the following:
Sound reasonable? |
Thank you for response @asimshankar ! I have to disagree - using With this patch nodes (all or only float32 for example, I can not say for sure) are executed on GPU. This is being confirmed not only by device placement log, which you say is misleading (and that's a bug too imho), but also HW monitoring tools (like It could be interesting to check float32 operations (I'm currently away from the servers and can not run similar float32 ops test), but what's the point? We have a graph which is supposed to be executed on GPU by documentation, and it is not. Hence the patch. As you said, second call for Session config already had this virtual-physical device mapping, and TF developers decided that it must not be used at all - at least go bindings crash if visible device list does not match per-process device list (CUDA_VISIBLE_DEVICES for instance) or is not empty. Logic that multiple virtual devices will point to the same physical remains, although I personally never saw such operations that require changing physical device properties.
So, let the solution be to extend |
@bioothod : You say that "So, basically, currently TF c/c++/go bindings are broken, all graphs are always executed on CPU.", but this is not true :). One can certainly execute graphs on GPUs from Go, as demonstrated in the gist I previously linked to, and when I change the placeholders to be Regarding I'm not opposed to adding to Can you provide an example of such a graph? |
It has been 14 days with no activity and the |
Sorry for long delays, I will send updated patch for review soon, some other things have distracted me from this task, but I haven't given up on it. |
…o BindToDevice_implementation
…s to bind newly created graph, drop TF_BindToDevice(), since (re)binding at runtime is being frown upon upstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more std::string
, but otherwise looks good. thanks!
Updated patch, but it looks like this reset the merge :) Sorry. And thank you! |
PiperOrigin-RevId: 218565776
I intend to temporarily disable this to resolve #23257 (That way, "go get" works by default and some code tweaks are needed to pin devices from Go, instead of the other way around where "go get" fails by default and "git checkout" it needed to make it work) |
Yeah, I read that issue, does not really know how to solve this kind of a problem, maybe only by manually splitting C and other parts and only merging golang/python/whatever after C part has been released |
Once the dust settles on This hasn't happened often enough to be troublesome, so I'm okay with some manual work for now. |
@asimshankar Default execution device is still disabled in golang API, should it be uncommented now as 1.13.0-rc0 is out? |
@bioothod : May be best to wait till 1.13.0 final is out instead of relying on the RC? |
Fair enough, let's wait for 1.13 release to roll out |
@asimshankar hi, is it time to merge go changes which were commented in 6f09a09 although documentation for C library build still references 1.12 |
Yes, happy to review and help merge a PR |
This is it: #27891 |
Export to golang as well.
If you want to implement strict processing of graph on specified device (for example one GPU among multiple processors), one must bind graph or separate operations to that device. Only C++ API somewhat support that, this patch makes it easier to use and creates helper functions for C and golang.
N.B. no matter which settings you use, currently (master of jun 27 and 1.6.0 release) all golang inference happens on CPU. Before 1.6 one could play with GPUConfig.VisibleDeeviceList, but it crashes currently (there are some reasons for that), and anyway always binds to CPU. So, the side effect of this patch is that one can not only tune GPU execution but turn if on again.