New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix duplicate kernel registration errors when using a Pluggable Device with the GPU
type
#56707
Conversation
…ce with the GPU type
@gbaned @rohan100jain Could we get more eyes on this? This is an issue many users have faced since we released the preview of our plugin since most people use the |
@PatriceVignola Sure, Thank you! |
GPU
typeGPU
type
@gbaned @rohan100jain @cheshire Can we get one more look at this PR? We were hoping to get it in the 2.10 release. |
@penpornk To add to Pat's comments above, here's the excerpt from the RFC that we're referring to for overriding the GPU device name
|
Hi @PatriceVignola and @rjdyk, Sorry for the delay! The RFC states that there could be more than one device with the same However, the implementation PR sparked concerns about having two keys for kernel lookups ( What this PR proposes (allowing plug-in to override native
We will discuss this internally and get back to you. What is your motivation for setting |
@penpornk Thank you for the details. Our motivation for using the GPU device type is that we didn't see any drawbacks, and it makes life easier for the users by not having to change their scripts. It also allows us to leverage a lot of the existing grappler optimizations (for example, like cuDNN, DML is faster with NCHW than NHWC). We were using the Our assumption was that the Of course we can always add At the very least, if this behavior should really be disallowed, I feel like an error should be thrown at device registration time (e.g. " |
In response the behavior of `tensorflow-directml-plugin` not overriding the `GPU` device naming for the `tensorflow` plugin (tensorflow/tensorflow#56707), we are setting `tensorflow-cpu` as a hard requirement for the time being.
Hi @penpornk Can you please review this PR ? Thank you! |
7 similar comments
Hi @penpornk Can you please review this PR ? Thank you! |
Hi @penpornk Can you please review this PR ? Thank you! |
Hi @penpornk Can you please review this PR ? Thank you! |
Hi @penpornk Can you please review this PR ? Thank you! |
Hi @penpornk Can you please review this PR ? Thank you! |
Hi @penpornk Can you please review this PR ? Thank you! |
Hi @penpornk Can you please review this PR ? Thank you! |
Hi @penpornk Can you please review this PR ? Thank you! |
3 similar comments
Hi @penpornk Can you please review this PR ? Thank you! |
Hi @penpornk Can you please review this PR ? Thank you! |
Hi @penpornk Can you please review this PR ? Thank you! |
Closing stale PR |
One of the supported scenarios outlined in the Pluggable Device RFC is to be able to have a pluggable device with the type
GPU
, and have it completely override the built-in TensorFlowGPU
device. However, since pluggable device kernels are registered at runtime, here's roughly what is happening:GPU
device with the higher priorityGPU
device from the pluggable deviceGPU
typeGPU
kernels with the same priority have already been registered in step 3Overriding the
GPU
device is not enough. Since theGPU
kernels are statically registered, we need to unregister them as soon as a pluggable device with the same type comes online.