shader_ir/warp: Implement SHFL for Nvidia devices #2855
Implements SHFL (not
On non-Nvidia devices SHFL is emulated with as a theoretical device with a warp size of one, having the same behaviour as NX hardware but with a single thread per warp. We won't have to do this on Vulkan with devices with the option of using a subgroup of 32 (gen9 Intel, Vega and Nvidia). That being said, SPIR-V instructions can't query if a thread is out of bounds. The IR is generic enough to handle three cases.
I'm sad about the math used to convert Nvidia's SHFL mask back into GLSL's width. What this mask means is unknown to me at the moment of writing this.
This also introduces the usage of