feat[gpu]: scalar encodings by joseph-isaacs · Pull Request #6109 · vortex-data/vortex

joseph-isaacs · 2026-01-22T17:24:19Z

No description provided.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

# Conflicts: # Cargo.toml # vortex-cuda/src/lib.rs

vortex-cuda-macros/src/lib.rs

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

0ax1 · 2026-01-23T10:25:31Z

vortex-cuda/kernels/scalar_kernel.cuh

+        ? (block_start + elements_per_block)
+        : array_len;
+
+    // Vectorized loop - process 16 bytes per iteration for better memory throughput.


Did I leave that comment. In any case the ops here are not vectorized.

Yeah this isn't true. I thought this would happen on some archs. But CUDA only can do vec ops on loads and stores. It really is only the unrolling doing the trick here.

Ill remove in the next one

0ax1 · 2026-01-23T10:31:21Z

vortex-cuda/src/kernel/encodings/alp.rs

+
+    // Launch kernel
+    let _cuda_events =
+        launch_cuda_kernel_impl(&mut launch_builder, CU_EVENT_DISABLE_TIMING, array_len)?;


if we rework the launcher logic, it'd be nice to not record events by default for each launch.

codspeed-hq · 2026-01-23T10:34:23Z

Merging this PR will degrade performance by 18.15%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 8 improved benchmarks
❌ 3 regressed benchmarks
✅ 1263 untouched benchmarks
⏩ 1254 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	WallTime	`u8_FoR[1K]`	14.4 µs	6.2 µs	×2.3
❌	WallTime	`u16_FoR[1M]`	6.1 µs	7.4 µs	-18.15%
⚡	Simulation	`canonical_into_non_nullable[(10000, 100, 0.01)]`	2.9 ms	2.1 ms	+37.72%
⚡	Simulation	`canonical_into_non_nullable[(10000, 100, 0.0)]`	2.7 ms	1.9 ms	+42.32%
⚡	Simulation	`canonical_into_non_nullable[(10000, 100, 0.1)]`	4.5 ms	3.7 ms	+22.17%
❌	Simulation	`canonical_into_nullable[(10000, 10, 0.0)]`	444.5 µs	529.1 µs	-15.99%
❌	Simulation	`canonical_into_nullable[(10000, 100, 0.0)]`	4.1 ms	4.9 ms	-16.51%
⚡	Simulation	`into_canonical_non_nullable[(10000, 100, 0.01)]`	3 ms	2.2 ms	+36.64%
⚡	Simulation	`into_canonical_non_nullable[(10000, 100, 0.1)]`	4.6 ms	3.8 ms	+21.44%
⚡	Simulation	`into_canonical_non_nullable[(10000, 100, 0.0)]`	2.7 ms	1.9 ms	+41.68%
⚡	Simulation	`into_canonical_nullable[(10000, 100, 0.0)]`	5.2 ms	4.4 ms	+18.47%

_{Comparing ji/scalar-gpu (f3a7bf3) with develop (03f0140)}

1254 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

0ax1

nice!

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs added 3 commits January 22, 2026 17:23

feat[gpu]: scalar encodings

485aeba

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

feat[gpu]: scalar encodings

bc518c3

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

fix

3daec62

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs marked this pull request as ready for review January 23, 2026 10:07

Merge remote-tracking branch 'origin/develop' into ji/scalar-gpu

c066b69

# Conflicts: # Cargo.toml # vortex-cuda/src/lib.rs

0ax1 reviewed Jan 23, 2026

View reviewed changes

vortex-cuda-macros/src/lib.rs Outdated Show resolved Hide resolved

f

f3a7bf3

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

0ax1 reviewed Jan 23, 2026

View reviewed changes

joseph-isaacs added the changelog/feature A new feature label Jan 23, 2026

0ax1 reviewed Jan 23, 2026

View reviewed changes

0ax1 approved these changes Jan 23, 2026

View reviewed changes

joseph-isaacs merged commit f0be28a into develop Jan 23, 2026
61 of 64 checks passed

joseph-isaacs deleted the ji/scalar-gpu branch January 23, 2026 10:36

danking pushed a commit that referenced this pull request Feb 6, 2026

feat[gpu]: scalar encodings (#6109)

3f74808

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat[gpu]: scalar encodings#6109

feat[gpu]: scalar encodings#6109
joseph-isaacs merged 5 commits intodevelopfrom
ji/scalar-gpu

joseph-isaacs commented Jan 22, 2026

Uh oh!

Uh oh!

0ax1 Jan 23, 2026

Uh oh!

joseph-isaacs Jan 23, 2026

Uh oh!

0ax1 Jan 23, 2026

Uh oh!

joseph-isaacs Jan 23, 2026

Uh oh!

0ax1 Jan 23, 2026

Uh oh!

joseph-isaacs Jan 23, 2026

Uh oh!

codspeed-hq bot commented Jan 23, 2026

Uh oh!

0ax1 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joseph-isaacs commented Jan 22, 2026

Uh oh!

Uh oh!

0ax1 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

0ax1 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

0ax1 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

codspeed-hq bot commented Jan 23, 2026

Merging this PR will degrade performance by 18.15%

Performance Changes

Footnotes

Uh oh!

0ax1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants