[BACKEND] Remove special handling for bf16 in fp->int, int->fp handling #4281
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR removes some special handling for int->bf16 and bf16->int conversions in the TritonNVIDIAGPU->LLVM lowerings, in order to support, e.g.
cvt.bf16.s32
andcvt.s32.bf16
instructions that are now available on Hopper.Before this PR - there was some special handling for conversions to and from bf16; for int->bf16, the conversion would be done as a int->fp32 followed by fp32->bf16. Presumably, this was done because, before sm90, the ptx "cvt" instruction doesn't support conversions to/from bf16.
However, sm90 does support direct conversions to/from bf16; so this PR removes this special handling in order to make use of the direct cvt instructions. For Ampere, it looks like the special handling is no longer needed and llvm handles the details of different hardware implementations (perhaps thanks to llvm/llvm-project#74827?)
The core Triton is a small number of people, and we receive many PRs (thank
you!). To help us review your code more quickly, if you are a new
contributor (less than 3 PRs merged) we ask that you complete the following
tasks and include the filled-out checklist in your PR description.
Complete the following tasks before sending your PR, and replace
[ ]
with[x]
to indicate you have done them.I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run
pre-commit run --from-ref origin/main --to-ref HEAD
.Select one of the following.
/test
forlit
tests/unittest
for C++ tests/python/test
for end-to-end testsFILL THIS IN
.Select one of the following.
lit
tests.lit
tests I have added follow these best practices,including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)