Description
Since #87651, the struct layout when compiling CUDA/HIP code for Windows is consistent between host and device. However, there is the __declspec(empty_bases)
attribute that can modify the struct layout. Right now this attribute is respected on the host side but ignored on the device side.
Consider this example:
struct A {};
struct B {};
struct __declspec(empty_bases) C : A, B {
int i;
};
__global__ void C_kernel(C c)
{
c.i = 1;
}
The device-side compilation reports warning: __declspec attribute 'empty_bases' is not supported [-Wignored-attributes]
. With -fdump-record-layouts
, you can see that the device-side layout is
0 | struct C
0 | struct A (base) (empty)
1 | struct B (base) (empty)
4 | int i
| [sizeof=8, align=4,
| nvsize=8, nvalign=4]
while the host-side layout is
0 | struct C
0 | struct A (base) (empty)
0 | struct B (base) (empty)
0 | int i
| [sizeof=4, align=4,
| nvsize=4, nvalign=4]
Godbolt: https://cuda.godbolt.org/z/ccrs4Ecf6
Since #87651 you no longer need the empty_bases
attribute to manually make struct layouts consistent between host and device. The attribute is used e.g. in Nvidia's CCCL library for nvcc compatibility and since NVIDIA/cccl#3155 (NVIDIA/cccl@fc84efd) it is causing a layout inconsistency with Clang on Windows. ROCm's libhipcxx also uses the empty_bases
attribute in at least one place and is thus susceptible to the bug: https://github.com/ROCm/libhipcxx/blob/4d5d918b1f6d85406bc389b8b55be72047228a1c/include/cuda/std/detail/libcxx/include/tuple#L420.
To solve this issue, Clang should respect __declspec(empty_bases)
on the device side too when targeting Windows.