Skip to content

Fix data race in bss#338

Merged
tinebp merged 1 commit intovortexgpgpu:masterfrom
talubik:bss_dr_fix
Apr 21, 2026
Merged

Fix data race in bss#338
tinebp merged 1 commit intovortexgpgpu:masterfrom
talubik:bss_dr_fix

Conversation

@talubik
Copy link
Copy Markdown
Contributor

@talubik talubik commented Apr 17, 2026

Fix: BSS data race in multi-core configuration

Hello, I have encountered a data race in the BSS segment when running kernels on multi-core configurations.
All cores start concurrently from _start, and each one was calling memset to zero the entire BSS region.
gridDim and blockDim are BSS globals written by vx_spawn_threads on core 0.
A lagging core's memset would overwrite these values after core 0 had already set them,
causing get_global_size() to return 0 and kernels to produce wrong results or loop indefinitely.

Cause of bug

In kernel/src/vx_start.S, every core executed:

la    a0, _edata
la    a2, _end
sub   a2, a2, a0
li    a1, 0
call  memset

Since all cores start simultaneously, any core can run this memset after core 0 has already
written gridDim/blockDim via vx_spawn_threads, silently zeroing them.

Fix

Remove the BSS memset from _start entirely.

Move BSS zeroing to the host side in vx_upload_kernel_bytes:
vx_copy_to_dev uploads the kernel binary (bin_size bytes), then explicitly
zeros the remaining BSS region (runtime_size - bin_size bytes) before the GPU starts.
This guarantees BSS is cleared exactly once, with no possibility of a race.

@tinebp tinebp merged commit b732330 into vortexgpgpu:master Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants