Skip to content

Conversation

@jzhoulon
Copy link
Contributor

@jzhoulon jzhoulon commented Mar 5, 2023

use AsyncValueAllocator if context is using PJRT runtime. And fix Sync() to call done callback to avoid hang issue in direct session.

@google-ml-butler google-ml-butler bot added the size:S CL Change Size: Small label Mar 5, 2023
@jzhoulon jzhoulon changed the title fix NextPluggableDevice allocator(use pjrt allocator) and Sync() [NextPluggableDevice]fix NextPluggableDevice allocator(use pjrt allocator) and Sync() Mar 5, 2023
@github-actions github-actions bot added the kokoro:force-run Tests on submitted change label Mar 5, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 5, 2023
@jzhoulon
Copy link
Contributor Author

jzhoulon commented Mar 5, 2023

@jyingl3 @penpornk can you help to take a look? thanks

@gbaned gbaned added the comp:core issues related to core part of tensorflow label Mar 6, 2023
@gbaned gbaned requested a review from cantonios March 6, 2023 18:34
@google-ml-butler google-ml-butler bot added the awaiting review Pull request awaiting review label Mar 6, 2023
@gbaned gbaned requested review from jyingl3 and removed request for cantonios March 6, 2023 18:34
Copy link

@jyingl3 jyingl3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix!

@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 6, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 6, 2023
@jzhoulon
Copy link
Contributor Author

jzhoulon commented Mar 9, 2023

@jyingl3 I saw the PR is stuck with some internal checks Failed, if there is something need to be fixed in my PR, please let me know, thanks.

@jyingl3
Copy link

jyingl3 commented Mar 9, 2023

Thanks Zhoulong! Could you update tensorflow/tensorflow/core/common_runtime/next_pluggable_device/BUILD to include tensorflow/core/tfrt/common:async_value_tensor? Sorry not noticing it earlier. Thank you!

@google-ml-butler google-ml-butler bot removed the ready to pull PR ready for merge process label Mar 10, 2023
@github-actions github-actions bot added the kokoro:force-run Tests on submitted change label Mar 10, 2023
@kokoro-team kokoro-team removed kokoro:force-run Tests on submitted change labels Mar 10, 2023
@jzhoulon
Copy link
Contributor Author

Thanks Zhoulong! Could you update tensorflow/tensorflow/core/common_runtime/next_pluggable_device/BUILD to include tensorflow/core/tfrt/common:async_value_tensor? Sorry not noticing it earlier. Thank you!

@jyingl3 done, please have a look. thanks

@github-actions github-actions bot added the kokoro:force-run Tests on submitted change label Mar 14, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 14, 2023
@github-actions github-actions bot added the kokoro:force-run Tests on submitted change label Mar 14, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 14, 2023
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 14, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 14, 2023
@chuanhaozhuge
Copy link
Contributor

Hi Zhoulong, we are actively testing some TPU implementation that uses NPD to meet deadline and worry this change may cause breakage. Hope you don't mind if we withhold merging this change temporarily.

@jzhoulon
Copy link
Contributor Author

Hi Zhoulong, we are actively testing some TPU implementation that uses NPD to meet deadline and worry this change may cause breakage. Hope you don't mind if we withhold merging this change temporarily.

sure, we will wait. Thanks

@gbaned gbaned removed awaiting review Pull request awaiting review ready to pull PR ready for merge process labels Mar 29, 2023
@github-actions github-actions bot added the kokoro:force-run Tests on submitted change label Apr 9, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Apr 9, 2023
@jzhoulon
Copy link
Contributor Author

hi, @jyingl3 , is there any update on this pr? thanks

device_ordinal_(options.device_ordinal),
compilation_device_type_(options.compilation_device_name) {
allocator_ = std::make_unique<NextPluggableDeviceAllocator>(device_ordinal_);
if (absl::GetFlag(FLAGS_next_pluggable_device_use_pjrt)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding a new flag FLAGS_next_pluggable_device_use_pjrt_allocator to guard this behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jyingl3 done, please have a review. thanks very much.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved the change. Thank you so much!

@jyingl3
Copy link

jyingl3 commented Apr 12, 2023

Thank you Zhoulong and sorry for the delay. Could you add a new flag so that the allocator can be switched? Left a comment about that. Thank you!!

@github-actions github-actions bot added the kokoro:force-run Tests on submitted change label Apr 13, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Apr 13, 2023
@gbaned gbaned requested a review from jyingl3 April 13, 2023 09:58
@google-ml-butler google-ml-butler bot added the awaiting review Pull request awaiting review label Apr 13, 2023
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Apr 13, 2023
@kokoro-team kokoro-team removed kokoro:force-run Tests on submitted change labels Apr 13, 2023
@copybara-service copybara-service bot merged commit 4d80da0 into tensorflow:master Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review Pull request awaiting review comp:core issues related to core part of tensorflow ready to pull PR ready for merge process size:S CL Change Size: Small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants