New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XLA/GPU] Add CustomCallSchedule to give schedule hints to custom-calls. #48806
[XLA/GPU] Add CustomCallSchedule to give schedule hints to custom-calls. #48806
Conversation
f4f3a55
to
7de009c
Compare
Add schedule hints EARLY_AS_POSSIBLE and LATE_AS_POSSIBLE to custom-calls. This supports a custom-call case, where a logical operation can be lowered into two HLOs (e.g., PerformX and PerformXDone). We can utilize this mechanism to either hide host latencies between the pair of the custom-calls or the two calls can more accurately identify the def-use relationship (typically PerformX is scheduled right after all of its producers have been scheduled and PerformXDone is scheduled right before its first consumer.)
@timshen91 could you help to review this PR? Thanks! |
// Postprocessor of the HloInstructionSequence. This is an opt-in postprocessing | ||
// function to MemorySchedulerAlgorithm to enforce certain hlo schedule | ||
// constraints desired for custom-calls. | ||
typedef std::function<HloInstructionSequence(const HloInstructionSequence&)> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it more composable if we do postprocessing as another pass after memory scheduling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheduler tries to compare the mem usage of each schedule and pick the lowest one, so ideally postprocessing would happen in this file after the scheduling algorithm, but before the mem usage calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the main motivation to implement in the current way (vs. a separate pass) is to account for potential memory size changes caused by the postprocessors. In this way, the change is accounted into the memory usage calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (CustomCallWithSchedule(user, | ||
CustomCallSchedule::EARLY_AS_POSSIBLE) && | ||
absl::c_all_of(user->operands(), [&](const HloInstruction* opnd) { | ||
return scheduled.contains(opnd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should control-dependencies also be checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I was also wondering if checking control dependencies is necessary in this particular use case of custom-calls. I think it might not be necessary as in the current code base, i.e., I don't find any codes adding control dependencies to custom-calls. However, let's be conservative and add the check for control dependencies for future proof.
if (CustomCallWithSchedule(opnd, | ||
CustomCallSchedule::LATE_AS_POSSIBLE) && | ||
absl::c_all_of(opnd->users(), [&](const HloInstruction* u) { | ||
return scheduled.contains(u); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same about control-dependencies. I think you might need to check it.
// Postprocessor of the HloInstructionSequence. This is an opt-in postprocessing | ||
// function to MemorySchedulerAlgorithm to enforce certain hlo schedule | ||
// constraints desired for custom-calls. | ||
typedef std::function<HloInstructionSequence(const HloInstructionSequence&)> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
ENTRY %CustomCall () -> f32[1,2,3] { | ||
%constant = f32[1]{0} constant({12345}) | ||
ROOT %custom-call = f32[1,2,3]{0,2,1} custom-call(f32[1]{0} %constant), custom_call_target="foo\"bar", schedule=LATE_AS_POSSIBLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the tests also for EARLY ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
Please don't send me any more documents
…---Original---
From: "Marcello ***@***.***>
Date: Thu, Apr 29, 2021 23:32 PM
To: ***@***.***>;
Cc: ***@***.***>;
Subject: Re: [tensorflow/tensorflow] [XLA/GPU] Add CustomCallSchedule to give schedule hints to custom-calls. (#48806)
@Kariddi commented on this pull request.
In tensorflow/compiler/xla/service/gpu/gpu_hlo_schedule.cc:
> + const std::vector<HloInstruction*>& instrs = input.instructions(); + for (HloInstruction* instr : instrs) { + if (scheduled.contains(instr)) { + continue; + } + + early_as_possible_sched.push_back(instr); + scheduled.insert(instr); + + for (HloInstruction* user : instr->users()) { + // Schedule any user who has the attribute `early_as_possible` and all + // of its producers have been scheduled. + if (CustomCallWithSchedule(user, + CustomCallSchedule::EARLY_AS_POSSIBLE) && + absl::c_all_of(user->operands(), [&](const HloInstruction* opnd) { + return scheduled.contains(opnd);
Should control-dependencies also be checked?
In tensorflow/compiler/xla/service/gpu/gpu_hlo_schedule.cc:
> + for (auto it = early_as_possible_sched.rbegin(); + it != early_as_possible_sched.rend(); it++) { + if (scheduled.contains(*it)) { + continue; + } + + late_as_possible_sched.push_front(*it); + scheduled.insert(*it); + + for (HloInstruction* opnd : (*it)->unique_operands()) { + // Schedule any opnd who has the attribute `late_as_possible` if all of + // its users have been scheduled. + if (CustomCallWithSchedule(opnd, + CustomCallSchedule::LATE_AS_POSSIBLE) && + absl::c_all_of(opnd->users(), [&](const HloInstruction* u) { + return scheduled.contains(u);
Same about control-dependencies. I think you might need to check it.
In tensorflow/compiler/xla/service/hlo_parser_test.cc:
> @@ -1079,6 +1079,18 @@ ENTRY %CustomCallWithAliasing (p0: (f32[2,2], f32[42,2,3]), p1: f32[123,4]) -> ( ROOT %custom-call = (f32[123,4]{0,1}, f32[2,2]{0,1}, f32[1,2,3]{0,1,2}) custom-call((f32[2,2]{0,1}, f32[42,2,3]{0,1,2}) %p0, f32[123,4]{0,1} %p1), custom_call_target="baz", output_to_operand_aliasing={{0}: (1, {}), {1}: (0, {0})} } +)" +}, +// CustomCall with schedule. +{ +"CustomCallWithSchedule", +R"(HloModule custom_call + +ENTRY %CustomCall () -> f32[1,2,3] { + %constant = f32[1]{0} constant({12345}) + ROOT %custom-call = f32[1,2,3]{0,2,1} custom-call(f32[1]{0} %constant), custom_call_target="foo\"bar", schedule=LATE_AS_POSSIBLE
Could you add the tests also for EARLY ?
In tensorflow/compiler/xla/service/hlo_memory_scheduler.h:
> @@ -32,6 +32,12 @@ limitations under the License. namespace xla { +// Postprocessor of the HloInstructionSequence. This is an opt-in postprocessing +// function to MemorySchedulerAlgorithm to enforce certain hlo schedule +// constraints desired for custom-calls. +typedef std::function<HloInstructionSequence(const HloInstructionSequence&)>
+1
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look. Some response inlined. Will update the codes soon.
if (CustomCallWithSchedule(user, | ||
CustomCallSchedule::EARLY_AS_POSSIBLE) && | ||
absl::c_all_of(user->operands(), [&](const HloInstruction* opnd) { | ||
return scheduled.contains(opnd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I was also wondering if checking control dependencies is necessary in this particular use case of custom-calls. I think it might not be necessary as in the current code base, i.e., I don't find any codes adding control dependencies to custom-calls. However, let's be conservative and add the check for control dependencies for future proof.
// Postprocessor of the HloInstructionSequence. This is an opt-in postprocessing | ||
// function to MemorySchedulerAlgorithm to enforce certain hlo schedule | ||
// constraints desired for custom-calls. | ||
typedef std::function<HloInstructionSequence(const HloInstructionSequence&)> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the main motivation to implement in the current way (vs. a separate pass) is to account for potential memory size changes caused by the postprocessors. In this way, the change is accounted into the memory usage calculation.
|
||
ENTRY %CustomCall () -> f32[1,2,3] { | ||
%constant = f32[1]{0} constant({12345}) | ||
ROOT %custom-call = f32[1,2,3]{0,2,1} custom-call(f32[1]{0} %constant), custom_call_target="foo\"bar", schedule=LATE_AS_POSSIBLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
Codes updated. Please help to take another look. Thanks! |
@Kariddi A test failure is because I missed the initialization of custom_call_schedule_ in one of the Could you please approve the PR again? Thanks! |
ping~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of nitpicky comments.
|
||
ENTRY %CustomCall () -> f32[1,2,3] { | ||
%constant = f32[1]{0} constant({12345}) | ||
%custom-call.0 = f32[1,2,3]{0,2,1} custom-call(f32[1]{0} %constant), custom_call_target="foo", schedule=EARLY_AS_POSSIBLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about s/EARLY_AS_POSSIBLE
/EARLIEST
and s/LATE_AS_POSSIBLE
/LATEST
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. Updated.
@@ -1562,6 +1568,8 @@ class HloCustomCallInstruction : public HloInstruction { | |||
std::vector<std::pair<ShapeIndex, std::pair<int64, ShapeIndex>>> | |||
output_to_operand_aliasing_; | |||
absl::optional<Literal> literal_; | |||
// A custom-call schedule hint. | |||
CustomCallSchedule custom_call_schedule_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just add a default initializer (to CustomCallSchedule::NONE
) here instead of doing it in every constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. I just wanted to conform to the original coding style in the file. Let me know if you still want me to switch to use default initializers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Please help to take another look. Thanks!
|
||
ENTRY %CustomCall () -> f32[1,2,3] { | ||
%constant = f32[1]{0} constant({12345}) | ||
%custom-call.0 = f32[1,2,3]{0,2,1} custom-call(f32[1]{0} %constant), custom_call_target="foo", schedule=EARLY_AS_POSSIBLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. Updated.
@@ -1562,6 +1568,8 @@ class HloCustomCallInstruction : public HloInstruction { | |||
std::vector<std::pair<ShapeIndex, std::pair<int64, ShapeIndex>>> | |||
output_to_operand_aliasing_; | |||
absl::optional<Literal> literal_; | |||
// A custom-call schedule hint. | |||
CustomCallSchedule custom_call_schedule_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. I just wanted to conform to the original coding style in the file. Let me know if you still want me to switch to use default initializers.
EARLY_AS_POSSIBLE -> EARLIEST. LATE_AS_POSSIBLE -> LATEST.
@sanjoy are you still on this? |
// relationship of the two calls (typically PerformX is scheduled right after | ||
// all of its producers have been scheduled and PerformXDone is scheduled right | ||
// before its first consumer.) | ||
HloInstructionSequence postprocessor_to_custom_schedule( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually this is a bit dodgy: we are exposing the schedule to all clients, but only the GPU backend uses it.
Should we explicitly error out on other backends? Or ignore it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current behavior is that if some clients add the schedule attributes to backends other than GPU, the attributes are simply ignored as the their postprocessors are empty.
An alternative is to make a default postprocessor to give warnings. E.g., "the schedule is set but ignored on the backend." How do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the current implementation in effect ignores the schedule attributes if they are used in other backends. This is a safe and correct behavior by all means.
After some thought, it is good for me to leave this in the current way (i.e., no postprocessors in other backends and effectively ignore the schedule attributes). Let me know if you have other thoughts or I don't address your concern.
}; | ||
const std::vector<HloInstruction*>& instrs = input.instructions(); | ||
for (HloInstruction* instr : instrs) { | ||
if (scheduled.contains(instr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't we just defined the lambda checking precisely this? Or more concretely: do we need the lambda at all then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lambda is needed because the absl::c_all_of() below requires a function form.
absl::c_all_of(user->operands(), is_scheduled) && absl::c_all_of(user->control_predecessors(), is_scheduled)) {
It is cleaner to also use the lambda here. Will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
return scheduled.contains(instr); | ||
}; | ||
const std::vector<HloInstruction*>& instrs = input.instructions(); | ||
for (HloInstruction* instr : instrs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional nitpick: maybe just use input.instructions()
inline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
// relationship of the two calls (typically PerformX is scheduled right after | ||
// all of its producers have been scheduled and PerformXDone is scheduled right | ||
// before its first consumer.) | ||
HloInstructionSequence postprocessor_to_custom_schedule( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style guide says functions are UpperCase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I don't know why I made it lower case. Will revise it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -237,6 +243,9 @@ message HloInstructionProto { | |||
repeated xla.CustomCallOutputOperandAliasing | |||
custom_call_output_operand_aliasing = 74; | |||
|
|||
// Specifies the desired schedule for the custom-call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify that the field is only present for custom calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add comments to make it clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
@@ -3823,6 +3828,24 @@ StatusOr<PrecisionConfig::Precision> StringToPrecision(const string& name) { | |||
return found->second; | |||
} | |||
|
|||
StatusOr<CustomCallSchedule> StringToCustomCallSchedule(const string& name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: absl::string_view for param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@@ -3823,6 +3828,24 @@ StatusOr<PrecisionConfig::Precision> StringToPrecision(const string& name) { | |||
return found->second; | |||
} | |||
|
|||
StatusOr<CustomCallSchedule> StringToCustomCallSchedule(const string& name) { | |||
static std::unordered_map<string, CustomCallSchedule>* map = [] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use absl::flat_hash_map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated.
// Postprocessor of the HloInstructionSequence. This is an opt-in postprocessing | ||
// function to MemorySchedulerAlgorithm to enforce certain hlo schedule | ||
// constraints desired for custom-calls. | ||
typedef std::function<HloInstructionSequence(const HloInstructionSequence&)> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "using" is preferred to typedef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
// Postprocessor of the HloInstructionSequence. This is an opt-in postprocessing | ||
// function to MemorySchedulerAlgorithm to enforce certain hlo schedule | ||
// constraints desired for custom-calls. | ||
typedef std::function<HloInstructionSequence(const HloInstructionSequence&)> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks George for taking a look. I will update the codes accordingly soon.
// relationship of the two calls (typically PerformX is scheduled right after | ||
// all of its producers have been scheduled and PerformXDone is scheduled right | ||
// before its first consumer.) | ||
HloInstructionSequence postprocessor_to_custom_schedule( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current behavior is that if some clients add the schedule attributes to backends other than GPU, the attributes are simply ignored as the their postprocessors are empty.
An alternative is to make a default postprocessor to give warnings. E.g., "the schedule is set but ignored on the backend." How do you think?
// relationship of the two calls (typically PerformX is scheduled right after | ||
// all of its producers have been scheduled and PerformXDone is scheduled right | ||
// before its first consumer.) | ||
HloInstructionSequence postprocessor_to_custom_schedule( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I don't know why I made it lower case. Will revise it.
}; | ||
const std::vector<HloInstruction*>& instrs = input.instructions(); | ||
for (HloInstruction* instr : instrs) { | ||
if (scheduled.contains(instr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lambda is needed because the absl::c_all_of() below requires a function form.
absl::c_all_of(user->operands(), is_scheduled) && absl::c_all_of(user->control_predecessors(), is_scheduled)) {
It is cleaner to also use the lambda here. Will update it.
@@ -237,6 +243,9 @@ message HloInstructionProto { | |||
repeated xla.CustomCallOutputOperandAliasing | |||
custom_call_output_operand_aliasing = 74; | |||
|
|||
// Specifies the desired schedule for the custom-call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add comments to make it clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codes updated. Please help to take a look again and let me know if all the previous comments are addressed. Thanks!
// relationship of the two calls (typically PerformX is scheduled right after | ||
// all of its producers have been scheduled and PerformXDone is scheduled right | ||
// before its first consumer.) | ||
HloInstructionSequence postprocessor_to_custom_schedule( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return scheduled.contains(instr); | ||
}; | ||
const std::vector<HloInstruction*>& instrs = input.instructions(); | ||
for (HloInstruction* instr : instrs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
}; | ||
const std::vector<HloInstruction*>& instrs = input.instructions(); | ||
for (HloInstruction* instr : instrs) { | ||
if (scheduled.contains(instr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@@ -237,6 +243,9 @@ message HloInstructionProto { | |||
repeated xla.CustomCallOutputOperandAliasing | |||
custom_call_output_operand_aliasing = 74; | |||
|
|||
// Specifies the desired schedule for the custom-call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
@@ -3823,6 +3828,24 @@ StatusOr<PrecisionConfig::Precision> StringToPrecision(const string& name) { | |||
return found->second; | |||
} | |||
|
|||
StatusOr<CustomCallSchedule> StringToCustomCallSchedule(const string& name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@@ -3823,6 +3828,24 @@ StatusOr<PrecisionConfig::Precision> StringToPrecision(const string& name) { | |||
return found->second; | |||
} | |||
|
|||
StatusOr<CustomCallSchedule> StringToCustomCallSchedule(const string& name) { | |||
static std::unordered_map<string, CustomCallSchedule>* map = [] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated.
// Postprocessor of the HloInstructionSequence. This is an opt-in postprocessing | ||
// function to MemorySchedulerAlgorithm to enforce certain hlo schedule | ||
// constraints desired for custom-calls. | ||
typedef std::function<HloInstructionSequence(const HloInstructionSequence&)> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Warning: Unhelpful next ID: saw 76 but field "custom_call_schedule" already uses it. Use 77 instead? [misleading_next_id] |
Will do. I have some internet issue. Will do when my internet is up.
…On Fri, May 14, 2021 at 3:59 PM George Karpenkov ***@***.***> wrote:
Warning: Unhelpful next ID: saw 76 but field "custom_call_schedule"
already uses it. Use 77 instead? [misleading_next_id]
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#48806 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC3VUT5PTKKLR6B7O2KQGRLTNWTM7ANCNFSM43XR3SYA>
.
|
Updated. Please help to take a look again. |
PiperOrigin-RevId: 375939575 Change-Id: If25885fd0f0e48e961a5596ff76a83fdf61c36b3
Add schedule hints EARLY_AS_POSSIBLE and LATE_AS_POSSIBLE to custom-calls.
This supports a custom-call case, where a logical operation can be lowered into
two HLOs (e.g., PerformX and PerformXDone). We can utilize this mechanism to
either hide host latencies between the pair of the custom-calls or the two calls
can more accurately identify the def-use relationship (typically PerformX is
scheduled right after all of its producers have been scheduled and PerformXDone
is scheduled right before its first consumer.)
I need this change for implementing XLA Horovod ops. I have a working prototype internally within NVIDIA and it works well with this change for our tracked DL models, i.e., host overhead is well hidden and I do see some overlapping/parallelism between communication and computation.