Skip to content

cuda FFOR + perfetto logging#6588

Draft
a10y wants to merge 6 commits intodevelopfrom
aduffy/fix-gpu-scan
Draft

cuda FFOR + perfetto logging#6588
a10y wants to merge 6 commits intodevelopfrom
aduffy/fix-gpu-scan

Conversation

@a10y
Copy link
Contributor

@a10y a10y commented Feb 18, 2026

This PR adds Perfetto tracing generation to the CUDA scan-cli. Traces are emitted to trace.pb which you can drop directly into https://ui.perfetto.dev/ and open, for example:

image

All span + log arguments are preserved.

This PR also migrates our BP kernel to a fused FOR+BP operation. Pure BP will use reference of 0, and FOR execution will collapse the tree and fuse. This gives a noticeable boost in some of the per-batch traces of maybe ~10-20%.

if self.ops.is_empty() {
self.log(format_args!("{msg}\n{}", array.display_tree()));
} else {
self.log(msg);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was taking >50% of execution time when i had logging enabled for the vortex-cuda crate, even if I didn't have it enabled for vortex-array. Mainly calculating the display_tree().

Scrapping it made the traces easier to read

@a10y
Copy link
Contributor Author

a10y commented Feb 18, 2026

This is helpful, but >90% of scan time is spent in ZSTD still

image

a10y added 4 commits February 18, 2026 20:35
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/fix-gpu-scan branch from b889973 to 3ed9fdf Compare February 18, 2026 21:32
@a10y a10y requested a review from 0ax1 February 18, 2026 21:38
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/fix-gpu-scan branch from 2b11947 to d0119f2 Compare February 18, 2026 21:59
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y changed the title add perfetto logging for gpu-scan-cli cuda FFOR + perfetto logging Feb 18, 2026
@a10y a10y added the changelog/skip Do not list PR in the changelog label Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/skip Do not list PR in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments