perf: compute LogUp fingerprints in parallel chunks#548
Conversation
Codex Code Review
No security or correctness issues stood out in the diff. I couldn’t run a full |
Review: perf: compute LogUp fingerprints in parallel chunksOverview: Replaces per-column Medium — Dead code rather than removalThe original Low — No test for parallel ↔ sequential equivalenceThere are no unit tests verifying that Nit —
|
|
/bench 10 |
Benchmark — fib_iterative_8M (median of 10)Table parallelism: 32 (auto = cores / 3)
Commit: 7b91793 · Baseline: cached · Runner: self-hosted bench |
…nked matches sequential output on 2048-row trace
…umn_chunked and add test
Codex Code Review
No security or correctness issues stood out in the diff beyond that. I couldn’t run |
Review: perf: compute LogUp fingerprints in parallel chunksCorrectness: pass. The chunked functions are mathematically equivalent to the originals. Per-chunk batch inversion is independent between chunks (each chunk's fingerprints are self-contained), and the two new tests confirm output parity. Architecture is sound. The two-regime strategy (parallelize across pairs when trace fits one chunk; parallelize within each pair via IssuesMedium – duplicate Low – missing Low – Minor notes
|
|
/bench 10 |
Replaces the per-column parallel dispatch in LogUp fingerprint computation with
par_chunks_mut(1024), processing fingerprints, batch-inversion, and term accumulation per chunk for better L2 cache locality.Optimization extracted from PR #518.