Fix data race in bss#338
Merged
tinebp merged 1 commit intovortexgpgpu:masterfrom Apr 21, 2026
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix: BSS data race in multi-core configuration
Hello, I have encountered a data race in the BSS segment when running kernels on multi-core configurations.
All cores start concurrently from
_start, and each one was callingmemsetto zero the entire BSS region.gridDimandblockDimare BSS globals written byvx_spawn_threadson core 0.A lagging core's memset would overwrite these values after core 0 had already set them,
causing
get_global_size()to return 0 and kernels to produce wrong results or loop indefinitely.Cause of bug
In
kernel/src/vx_start.S, every core executed:Since all cores start simultaneously, any core can run this memset after core 0 has already
written
gridDim/blockDimviavx_spawn_threads, silently zeroing them.Fix
Remove the BSS memset from
_startentirely.Move BSS zeroing to the host side in
vx_upload_kernel_bytes:vx_copy_to_devuploads the kernel binary (bin_sizebytes), then explicitlyzeros the remaining BSS region (
runtime_size - bin_sizebytes) before the GPU starts.This guarantees BSS is cleared exactly once, with no possibility of a race.