-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vulkan: workaround for #10710 and #12147 16 bit unpack8 bug #12472
Conversation
@netrunnereve As a quick check, it looks good. The performance of sd.cpp may have been a little bit (< 1%) faster than b4785. |
Thank you for looking into this. |
Works like a charm, thanks @netrunnereve. It makes it even stranger that the bug still isn't fixed at the driver level, the logic between unpacking from 16 bits or 32 bits should be pretty much the same. |
The SPIRV is different as glslc doesn't optimize out the additional unpacks and the 32 bit one is treated as an unpack of 4 8 bit values into 4 floats. In the assembly I'm seeing that it correctly loads the bottom 8 bits into the .x slot and the next set of 8 bits in the .y slot for a 32 bit input, but if I use a 16 bit input it somehow loads the identical bottom bits into both the .x and .y slots. I mean they should have some sort of automated regression test for this... right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced we'll ever be able to remove this again, but let's keep hoping.
Also, more of these will sneak in over time, probably.
Even if they fix it for RDNA GCN still has this bug and Windows GCN drivers are finished. So yeah I guess we'll have to avoid 16 bit unpack8s from now on. |
This is a temporary workaround for the AMD unpack8 driver bug reported in #10710 and #12147. I managed to trigger this bug in older versions of RGA using the offline compiler and I could see that it only happens with an unpack8 on a 16 bit input. If I make the unpack8 use a 32 bit input instead it works fine.
Most compilers should be smart enough to optimize out the extra two unpacks and performance is pretty much the same on my 470.
PR:
Master:
@stduhpf @Danik-droid @masamaru-san can you test and make sure that the bug is gone now?
And yeah AMD still needs to fix their driver. Something as simple as the little shader below will trigger it 🤦♀️.