-
Couldn't load subscription status.
- Fork 74.9k
[NextPluggableDevice]fix NextPluggableDevice allocator(use pjrt allocator) and Sync() #59898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NextPluggableDevice]fix NextPluggableDevice allocator(use pjrt allocator) and Sync() #59898
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix!
|
@jyingl3 I saw the PR is stuck with some internal checks Failed, if there is something need to be fixed in my PR, please let me know, thanks. |
|
Thanks Zhoulong! Could you update tensorflow/tensorflow/core/common_runtime/next_pluggable_device/BUILD to include tensorflow/core/tfrt/common:async_value_tensor? Sorry not noticing it earlier. Thank you! |
@jyingl3 done, please have a look. thanks |
|
Hi Zhoulong, we are actively testing some TPU implementation that uses NPD to meet deadline and worry this change may cause breakage. Hope you don't mind if we withhold merging this change temporarily. |
sure, we will wait. Thanks |
|
hi, @jyingl3 , is there any update on this pr? thanks |
| device_ordinal_(options.device_ordinal), | ||
| compilation_device_type_(options.compilation_device_name) { | ||
| allocator_ = std::make_unique<NextPluggableDeviceAllocator>(device_ordinal_); | ||
| if (absl::GetFlag(FLAGS_next_pluggable_device_use_pjrt)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind adding a new flag FLAGS_next_pluggable_device_use_pjrt_allocator to guard this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jyingl3 done, please have a review. thanks very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved the change. Thank you so much!
|
Thank you Zhoulong and sorry for the delay. Could you add a new flag so that the allocator can be switched? Left a comment about that. Thank you!! |
use AsyncValueAllocator if context is using PJRT runtime. And fix Sync() to call done callback to avoid hang issue in direct session.