[dx11] [wip] (do not submit yet) Redirect global_tmps_buffer_i32 to u32 so DX11 doesn't bind 2 UAVs pointing to the same resource #5151
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It seems in the SPIR-V program,
global_tmps_buffer_u32
andglobal_tmps_buffer_i32
point to the same underlying resource.In DX11, this means 2 UAVs can point to the same "global temp buffer" resource, and be bound to the same pipeline.
It seems this behavior is fine with the Vulkan backend, but will cause the following warning with the DX11 backend:
This can cause incorrect behavior, and may be reproduced with the following Taichi kernel:
Output before the fix:
Expected result:
The way it happens is: the
hey
Taichi kernel gets compiled into 2 HLSL kernels. The first one saves the current value ofidx[None]
, orix
, toglobal_tmps_buffer_u32
. However, for some reason the second kernel decides to read the loop count (ix
) fromglobal_tmps_buffer_i32
.The first kernel looks like:
The second kernel looks like:
As shown above, With the current code, the
i32
andu32
versions of the global temp buffer gets two UAVs, u2 and u3 respectively. However, since they are pointing to the same buffer allocation, the following DX runtime warning will happen:The DX runtime will bind either
u2
andu3
to NULL, depending on which one is the bound later than the other. This will cause_74
in the second kernel to have a value of 0 so only the first element inscratch
gets incremented.This change attempts to fix this bug by redirecting all operations on "i32" to "u32" (since u32 seems to be present by default)
This has not yet been extensively tested and may need to be improved a bit, I wonder if there is a better way? Any comments will be appreciated.
Thanks!
Related issue = #