perf: compute LogUp fingerprints in parallel chunks by nicole-graus · Pull Request #548 · yetanotherco/lambda_vm

nicole-graus · 2026-04-23T20:02:45Z

Replaces the per-column parallel dispatch in LogUp fingerprint computation with par_chunks_mut(1024), processing fingerprints, batch-inversion, and term accumulation per chunk for better L2 cache locality.

Optimization extracted from PR #518.

github-actions · 2026-04-23T20:04:59Z

Codex Code Review

Medium - Performance regression in parallel builds at crypto/stark/src/lookup.rs and crypto/stark/src/lookup.rs. The old code used into_par_iter() over committed interaction pairs, so many pairs could run concurrently even when the trace was short. The new path serializes pairs and only parallelizes inside each pair by fixed LOGUP_CHUNK_SIZE = 1024. For tables with many interaction pairs but trace_len <= 1024 (or only a few chunks), this effectively collapses the work to one pair at a time and can be a substantial throughput regression. Keep pair-level parallelism, or switch between pair-parallel and chunk-parallel based on trace_len / number of pairs.

No security or correctness issues stood out in the diff.

I couldn’t run a full cargo check here because the sandbox blocks rustup from creating temp files under /home/runner/.rustup.

claude · 2026-04-23T20:06:59Z

Review: perf: compute LogUp fingerprints in parallel chunks

Overview: Replaces per-column into_par_iter dispatch with par_chunks_mut(1024), so each Rayon task processes a contiguous block of rows (fingerprint computation → batch inversion → term accumulation) for better L2 cache locality. The change is architecturally sound and the algorithm is mathematically equivalent to the sequential path.

Medium — Dead code rather than removal

The original compute_logup_term_column and compute_logup_batched_term_column are now unreachable when parallel is enabled, silenced with #[cfg_attr(feature = "parallel", allow(dead_code))]. This leaves ~300 lines of production code that can silently diverge from the new chunked versions. The right fix is to drop the allow(dead_code) attributes, use those functions as correctness oracles in unit tests (assert parallel output == sequential output on a small trace), then delete them once confidence is established. See inline comment at line 1397.

Low — No test for parallel ↔ sequential equivalence

There are no unit tests verifying that compute_logup_batched_term_column_chunked / compute_logup_term_column_chunked produce field-element-identical output to their sequential counterparts. Given this is cryptographic arithmetic, even a small trace (e.g. 16 rows) test would give good confidence and is cheap to write.

Nit — `_table_name` computed unconditionally

let _table_name = self.name.as_deref().unwrap_or("UNKNOWN"); is always evaluated but only referenced in the #[cfg(not(feature = "parallel"))] branch. Harmless (cheap string slice), but a matching #[cfg(not(feature = "parallel"))] guard on the binding would make the intent explicit.

Overall the implementation looks correct: per-row fingerprint computation is independent so chunk-local batch inversion is equivalent to the global one, disjoint par_chunks_mut slices are race-free, and Send + Sync bounds are correctly propagated. The main ask is replacing the dead-code suppression with an actual equivalence test.

nicole-graus · 2026-04-23T20:14:51Z

/bench 10

github-actions · 2026-04-23T20:26:17Z

Benchmark — fib_iterative_8M (median of 10)

_{Table parallelism: 32 (auto = cores / 3)}

Metric	main	PR	Δ
Peak heap	62014 MB	63562 MB	+1548 MB (+2.5%) ⚪
Prove time	30.985s	30.088s	-0.897s (-2.9%) ⚪

✅ No significant change.

⚠️ Heap spread: 7.8% (60766 MB / 62303 MB / 65744 MB / 65377 MB / 62343 MB / 63562 MB / 63657 MB / 61191 MB / 64536 MB / 63726 MB)
⚠️ Baseline heap spread: 7.3% (62014 MB / 60897 MB / 65412 MB) — comparison may be less reliable
Consider re-running /bench

_{Commit: 7b91793 · Baseline: cached · Runner: self-hosted bench}

…nked matches sequential output on 2048-row trace

…umn_chunked and add test

github-actions · 2026-04-24T15:13:59Z

Codex Code Review

Medium - build_auxiliary_trace() now disables parallelism across committed pairs whenever trace_len > LOGUP_CHUNK_SIZE (crypto/stark/src/lookup.rs), and relies only on the per-pair chunked worker (crypto/stark/src/lookup.rs). That assumption is too coarse: for traces only slightly above 1024 rows, each pair produces only 2-3 Rayon tasks, so a table with many committed pairs can regress sharply versus the old into_par_iter() over pairs. This is a real throughput regression for medium-size, interaction-heavy tables. The scheduling heuristic should consider both trace_len and num_committed_pairs (or total chunk count), not just trace_len.

No security or correctness issues stood out in the diff beyond that. I couldn’t run cargo test here because cargo/rustup tries to write under a read-only home in this environment.

claude · 2026-04-24T15:17:48Z

Review: perf: compute LogUp fingerprints in parallel chunks

Correctness: pass. The chunked functions are mathematically equivalent to the originals. Per-chunk batch inversion is independent between chunks (each chunk's fingerprints are self-contained), and the two new tests confirm output parity.

Architecture is sound. The two-regime strategy (parallelize across pairs when trace fits one chunk; parallelize within each pair via par_chunks_mut when trace is large) is a reasonable trade-off for cache locality.

Issues

Medium – duplicate Multiplicity match arms (see inline at line 1791)
compute_multiplicity_for_row is a third copy of the same exhaustive match multiplicity { … } that already lives in compute_multiplicities and inside compute_logup_batched_term_column. Adding a new Multiplicity variant requires updating all three independently; the compiler won't catch a mismatch. A single shared helper (or a method on Multiplicity) would eliminate this risk.

Low – missing debug-checks instrumentation in batched chunked path (see inline at line 1796)
compute_logup_batched_term_column_chunked has no bus-debugger logging when debug-checks is enabled, while the single-interaction chunked variant does. Batch-absorbed interactions become invisible to the bus debugger in parallel builds.

Low – allow(dead_code) on the reference implementations (see inline at line 1422)
The original compute_logup_term_column / compute_logup_batched_term_column now exist only as test oracles in parallel builds. Consider marking them explicitly as reference implementations, or ensuring they are also exercised in non-parallel CI.

Minor notes

LOGUP_CHUNK_SIZE = 1024 has no documented tuning rationale. A one-line comment (e.g. field element size × 2 fingerprint vectors × 1024 rows ≈ ~96 KB — fits within typical L2) would help future tuning.
LinearTerm::Column and LinearTerm::ColumnUnsigned arms in compute_multiplicity_for_row look identical but are correct: the From<i64>/From<u64> dispatch handles signed negation transparently. A brief comment would prevent a future reader from incorrectly collapsing them.

MauroToscano · 2026-04-24T17:34:38Z

/bench 10

compute LogUp fingerprints in parallel chunks of 1024 rows

293dd24

claude Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread crypto/stark/src/lookup.rs

claude Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread crypto/stark/src/lookup.rs Outdated

nicole-graus added 3 commits April 23, 2026 18:35

add differential test verifying compute_logup_batched_term_column_chu…

3d712cc

…nked matches sequential output on 2048-row trace

Emit per-row bus_debug::log_interaction inside compute_logup_term_col…

34acc74

…umn_chunked and add test

Dispatch committed LogUp term columns

db91608

nicole-graus marked this pull request as ready for review April 24, 2026 15:11

claude Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread crypto/stark/src/lookup.rs

claude Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread crypto/stark/src/lookup.rs Outdated

claude Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread crypto/stark/src/lookup.rs Outdated

MauroToscano and others added 2 commits April 24, 2026 14:33

simplify code

e55d12a

Merge branch 'main' into logup-chunked-fingerprints

7b91793

ColoCarletti approved these changes Apr 24, 2026

View reviewed changes

MauroToscano added this pull request to the merge queue Apr 24, 2026

Merged via the queue into main with commit 61a1eca Apr 24, 2026
10 checks passed

MauroToscano deleted the logup-chunked-fingerprints branch April 24, 2026 19:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: compute LogUp fingerprints in parallel chunks#548

perf: compute LogUp fingerprints in parallel chunks#548
MauroToscano merged 6 commits into
mainfrom
logup-chunked-fingerprints

nicole-graus commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Apr 23, 2026

Uh oh!

nicole-graus commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Apr 24, 2026

Uh oh!

MauroToscano commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nicole-graus commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Codex Code Review

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Apr 23, 2026

Review: perf: compute LogUp fingerprints in parallel chunks

Medium — Dead code rather than removal

Low — No test for parallel ↔ sequential equivalence

Nit — _table_name computed unconditionally

Uh oh!

nicole-graus commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark — fib_iterative_8M (median of 10)

Uh oh!

github-actions Bot commented Apr 24, 2026

Codex Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Apr 24, 2026

Review: perf: compute LogUp fingerprints in parallel chunks

Issues

Minor notes

Uh oh!

MauroToscano commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Nit — `_table_name` computed unconditionally

github-actions Bot commented Apr 23, 2026 •

edited

Loading