-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix LLVM issue with bitcast+fptrunc vectorization and add type load/downconversion test #2915
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5c4583f
to
d17ab4d
Compare
84e8c90
to
4767f84
Compare
4767f84
to
a5961a2
Compare
a5961a2
to
b11b356
Compare
Also fixes similar issue doing double->half conversion, that one emits a similar intrinsic:
After this change:
The generated code:
|
b11b356
to
cc14236
Compare
being imported into the TG. We load the data as a stream of ints of various sizes, then bitcast them to match it original type and then often cast to some of our internal data types if needed. That turns out to be a non-trivial task to get right, esp dealing with historical backends with lots of hair and various level of support for some of the types involved. This test will excercise that conversion, the goal here is to have most of the backends at least provide a consistent behaviour when dealing with truncations and signedness. Metal and OpenCL seems to be the platforms that are closer than anyone else to the truth, OpenCL however also seems to vary between different backends.
vectorization would emit call to an undeclared function __truncsfhf2@PLT, which is not defined and essentially translates to a call via NULL pointer. What's annoying is that only happens when there are several bitcasts followed by the matching number of the fptrunc's. This only triggers when loading weights from the disk in bulk.
Also workaround llvm codegen bug doing double->half conversion.
Surprress warnings while doing type gymnastics.
d48e05c
to
757ecc9
Compare
Changes
|
There's a whole bunch of different unrelated things in this PR, it's unreviewable. Want to submit small, clear win PRs? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch is expected to fix Issue #1367 as well as add some tests around various types being downconverted after loading from disk. The test brings to light various inconsistencies that conversion might go with different backends and different type permutations, as well as hopefully help us to clean it over time to make it all more consistent.
The goal over time would be to reduce broken to 0.
The original bug I was after causes crash in the llvm-generated code, and it's only happens when the compiler sees multiple fptrunc operations like so:
The code was crashing so dive into gdb revealed this:
%r15 is very obviously 0 at this point, so trying to compile the IR with the lcc gave me this:
Which was a bit unexpected but gave me a clue. So after trying few things that did not work, the +f16c' seems to do the trick.
None of the existing dtype tests hit the issue since you'd need multiple conversion operations to trigger the vectorization.
I've fixed the llvm so it passes all checks, as a bonus I've fixed coder to check if bfloat is supported and fallback to do conversion via LLVM backend if not. This makes coder working with GPU and TORCH, however for some reason it produces gibberish on torch (uint support is lacking?)