[xla:gpu] Add runtime optimization using frontend attribute #63430
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[xla:gpu] Add runtime optimization using frontend attribute
_xla_send_recv_validation.
A collective-permute instruction inside a loop may not always send or receive
data that affect the output of the whole module in all iterations. Assume this
information is encoded in frontend attribute _xla_send_recv_validation attached
to the Send and Recv instructions decomposed from such a collective-permute
instruction, the runtime can use this information to skip the invocation of the
NCCL API that performs the Send and Recv operations.
Add tests.