Skip to content

feat[gpu]: scalar encodings#6109

Merged
joseph-isaacs merged 5 commits intodevelopfrom
ji/scalar-gpu
Jan 23, 2026
Merged

feat[gpu]: scalar encodings#6109
joseph-isaacs merged 5 commits intodevelopfrom
ji/scalar-gpu

Conversation

@joseph-isaacs
Copy link
Contributor

No description provided.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs marked this pull request as ready for review January 23, 2026 10:07
# Conflicts:
#	Cargo.toml
#	vortex-cuda/src/lib.rs
f
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
? (block_start + elements_per_block)
: array_len;

// Vectorized loop - process 16 bytes per iteration for better memory throughput.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I leave that comment. In any case the ops here are not vectorized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From FoR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this isn't true. I thought this would happen on some archs. But CUDA only can do vec ops on loads and stores. It really is only the unrolling doing the trick here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill remove in the next one

@joseph-isaacs joseph-isaacs added the changelog/feature A new feature label Jan 23, 2026

// Launch kernel
let _cuda_events =
launch_cuda_kernel_impl(&mut launch_builder, CU_EVENT_DISABLE_TIMING, array_len)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we rework the launcher logic, it'd be nice to not record events by default for each launch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 23, 2026

Merging this PR will degrade performance by 18.15%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 8 improved benchmarks
❌ 3 regressed benchmarks
✅ 1263 untouched benchmarks
⏩ 1254 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime u8_FoR[1K] 14.4 µs 6.2 µs ×2.3
WallTime u16_FoR[1M] 6.1 µs 7.4 µs -18.15%
Simulation canonical_into_non_nullable[(10000, 100, 0.01)] 2.9 ms 2.1 ms +37.72%
Simulation canonical_into_non_nullable[(10000, 100, 0.0)] 2.7 ms 1.9 ms +42.32%
Simulation canonical_into_non_nullable[(10000, 100, 0.1)] 4.5 ms 3.7 ms +22.17%
Simulation canonical_into_nullable[(10000, 10, 0.0)] 444.5 µs 529.1 µs -15.99%
Simulation canonical_into_nullable[(10000, 100, 0.0)] 4.1 ms 4.9 ms -16.51%
Simulation into_canonical_non_nullable[(10000, 100, 0.01)] 3 ms 2.2 ms +36.64%
Simulation into_canonical_non_nullable[(10000, 100, 0.1)] 4.6 ms 3.8 ms +21.44%
Simulation into_canonical_non_nullable[(10000, 100, 0.0)] 2.7 ms 1.9 ms +41.68%
Simulation into_canonical_nullable[(10000, 100, 0.0)] 5.2 ms 4.4 ms +18.47%

Comparing ji/scalar-gpu (f3a7bf3) with develop (03f0140)

Open in CodSpeed

Footnotes

  1. 1254 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Contributor

@0ax1 0ax1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@joseph-isaacs joseph-isaacs merged commit f0be28a into develop Jan 23, 2026
61 of 64 checks passed
@joseph-isaacs joseph-isaacs deleted the ji/scalar-gpu branch January 23, 2026 10:36
danking pushed a commit that referenced this pull request Feb 6, 2026
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants