Skip to content

Perf/registers on main#493

Draft
jotabulacios wants to merge 8 commits into
mainfrom
perf/registers-on-main
Draft

Perf/registers on main#493
jotabulacios wants to merge 8 commits into
mainfrom
perf/registers-on-main

Conversation

@jotabulacios
Copy link
Copy Markdown
Collaborator

@jotabulacios jotabulacios commented Apr 13, 2026

This PR removes the MEMW_R register fast-path table and has the CPU chip emit Memory bus interactions directly for all register accesses. This eliminates ~3–4 trace rows per instruction while keeping identical security guarantees. The Memory bus LogUp argument enforces register read/write ordering the same way MEMW_R did.

The one new requirement is handling large timestamp gaps: if a register goes unaccessed for many instructions, the gap between the current and previous timestamp can exceed IS_HALF's 16-bit limit. The new REGISTER_RELOAD table handles this by inserting intermediate bridge rows (each at most 65534 timestamps apart) whenever such a gap is detected. In practice this is rare, it only triggers for programs where a register is idle for 65534+ consecutive instructions.

CPU column count grows from 74 to 80 (six new witness-only columns for previous timestamps and values). Effective trace width is unchanged at 194.

  accesses (rs1, rs2, rd, PC), eliminating the register fast-path table
  and its ~3-4 rows-per-instruction overhead
  accesses, preventing IS_HALF overflow when a register goes unaccessed
  for many instructions
@jotabulacios
Copy link
Copy Markdown
Collaborator Author

/bench

@github-actions
Copy link
Copy Markdown

Codex Code Review

  1. Critical — Security/Soundness: REGISTER_RELOAD is unconstrained and can arbitrarily rewrite register history
  • File: prover/src/test_utils.rs:783, prover/src/tables/register_reload.rs:122
  • create_register_reload_air installs zero transition constraints for a table that emits Memory bus sender/receiver tokens.
  • With no constraints tying old_ts, new_ts, ordering, or gap bounds, a malicious prover can choose arbitrary reload rows (including backward timestamp moves) to rebalance Memory tokens and potentially violate register causality.
  • Action: add constraints at least for old_ts < new_ts, bounded delta (<= MAX_REG_GAP), and range constraints for timestamp/register columns.
  1. High — Functional bug: wrong TraceTable step size drops almost all reload rows from evaluation
  • File: prover/src/tables/register_reload.rs:107
  • TraceTable::new_main(data, cols::NUM_COLUMNS, num_rows) passes num_rows as step_size.
  • In this codebase, step size is 1 for trace tables; using num_rows makes the trace effectively a single step (num_steps = 1), so only row 0 is evaluated in transition/bus contexts.
  • Action: change third argument to 1.

Assumption: I reviewed only the provided PR diff.
I couldn’t run tests in this environment due sandbox/rustup temp-file restrictions.

@github-actions
Copy link
Copy Markdown

Benchmark — fib_iterative_8M (median of 3)

Table parallelism: 32 (auto = cores / 3)

Metric main PR Δ
Peak heap 221277 MB 242380 MB +21103 MB (+9.5%) 🔴
Prove time 47.030s 32.273s -14.757s (-31.4%) 🟢

⚠️ Regression detected — heap or time increased by more than 5%.

✅ Low variance (time: 3.4%, heap: 0.1%)

Commit: 369a915 · Baseline: built from main · Runner: self-hosted bench

Comment thread prover/src/tables/register_reload.rs Outdated
}
// Padding rows remain zero-initialized: old_ts = new_ts = 0 → self-canceling tokens.

TraceTable::new_main(data, cols::NUM_COLUMNS, num_rows)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — Wrong step_size passed to TraceTable::new_main

Every other table in the codebase calls:

TraceTable::new_main(data, cols::NUM_COLUMNS, 1)

Here num_rows is passed instead of 1. The third argument is step_size — how many rows represent a single VM state — not the height of the table (height is inferred from data.len() / num_cols).

With step_size = num_rows, the STARK sees this table as having height / num_rows = 1 step. This means:

  • num_steps() returns 1 instead of num_rows
  • The LDE/FRI computation operates on a different polynomial degree than intended
  • If transition constraints are ever added, they will only be evaluated once regardless of table size

Since REGISTER_RELOAD currently has zero transition constraints the proof still passes in tests, but this is a latent correctness bug.

Suggested change
TraceTable::new_main(data, cols::NUM_COLUMNS, num_rows)
TraceTable::new_main(data, cols::NUM_COLUMNS, 1)

let val_hi = (value >> 32) as u32;

let mut ts = prev_ts;
while curr_ts.saturating_sub(ts) > MAX_REG_GAP {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — saturating_sub silently swallows inverted timestamp order

If curr_ts < prev_ts (e.g. due to a future bug in timestamp assignment), saturating_sub returns 0 and the while-loop is never entered. The function then returns prev_ts, causing the CPU's IS_HALF lookup to receive a field-element-wrapped negative delta and fail at proof generation in an opaque way.

A debug_assert! makes the invariant explicit and produces a useful panic in debug builds:

debug_assert!(
    curr_ts >= prev_ts,
    "bridge_timestamp_gap: curr_ts ({}) < prev_ts ({}), reg_idx={}",
    curr_ts, prev_ts, reg_idx
);

// rs1: IS_HALF[ts - rs1_prev_ts - 1] — fires when READ_REGISTER1=1
if d.read_register1 && d.rs1 != 0 {
let prev = (op.rs1_prev_ts & 0xFFFF_FFFF) as u32;
let delta = ts.wrapping_sub(prev).wrapping_sub(1) as u16;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — wrapping_sub hides ordering violations silently

For rs1, the prover-side delta is ts - prev - 1. If prev >= ts this wraps to a large u16, adding a spurious IS_HALF entry with the wrong value. The CPU bus sends the algebraic expression TIMESTAMP - RS1_PREV_TS - 1 directly, so there would be a bus imbalance and proof failure — but the root cause would be hard to diagnose.

The invariant that ts >= prev + 1 (for rs1) and ts >= prev (for rs2/rd/PC) should hold by construction from bridge_timestamp_gap, so a debug_assert! is cheap and documents the contract:

// rs1: IS_HALF[ts - rs1_prev_ts - 1]
if d.read_register1 && d.rs1 != 0 {
    let prev = (op.rs1_prev_ts & 0xFFFF_FFFF) as u32;
    debug_assert!(ts > prev, "rs1 delta underflow: ts={ts} prev={prev}");
    let delta = ts.wrapping_sub(prev).wrapping_sub(1) as u16;
    ...
}

Same pattern applies to the rs2, rd, and PC deltas below.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 13, 2026

Review: Perf/registers on main

Overall the design is sound — the Memory bus (LogUp) correctly enforces value consistency without needing explicit timestamp-ordering constraints in REGISTER_RELOAD, and the per-register IS_HALF checks in the CPU enforce local ordering. The MAX_REG_GAP maths check out for all four access types (rs1/rs2/rd/PC).

Medium

Wrong step_size in generate_register_reload_trace (link)

Every other table passes 1 as the third argument to TraceTable::new_main. REGISTER_RELOAD passes num_rows. The third parameter is step_size (rows per VM state), not the table height; height is inferred from data.len() / num_cols. With step_size = num_rows, the STARK sees the table as having exactly 1 step, distorting the polynomial degree. Tests pass only because there are zero transition constraints, masking the bug.

Fix: TraceTable::new_main(data, cols::NUM_COLUMNS, 1)


Low

bridge_timestamp_gap uses saturating_sub without asserting curr_ts >= prev_ts (link)

If timestamps are ever inverted, saturating_sub silently returns 0, the loop is skipped, and the CPU IS_HALF check receives a field-element-wrapped negative value — proof failure with no useful diagnostic. A debug_assert!(curr_ts >= prev_ts) makes the invariant explicit.

wrapping_sub in collect_bitwise_from_cpu_registers without invariant guards (link)

The deltas are safe by construction (bridge enforces bounded gaps), but if the invariant is ever violated the wrapping produces a wrong IS_HALF entry and a confusing bus-imbalance failure. A debug_assert! on each delta before the cast to u16 would document the contract and catch regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants