Perf/registers on main#493
Conversation
accesses (rs1, rs2, rd, PC), eliminating the register fast-path table and its ~3-4 rows-per-instruction overhead
accesses, preventing IS_HALF overflow when a register goes unaccessed for many instructions
|
/bench |
Codex Code Review
Assumption: I reviewed only the provided PR diff. |
Benchmark — fib_iterative_8M (median of 3)Table parallelism: 32 (auto = cores / 3)
Commit: 369a915 · Baseline: built from main · Runner: self-hosted bench |
| } | ||
| // Padding rows remain zero-initialized: old_ts = new_ts = 0 → self-canceling tokens. | ||
|
|
||
| TraceTable::new_main(data, cols::NUM_COLUMNS, num_rows) |
There was a problem hiding this comment.
Medium — Wrong step_size passed to TraceTable::new_main
Every other table in the codebase calls:
TraceTable::new_main(data, cols::NUM_COLUMNS, 1)Here num_rows is passed instead of 1. The third argument is step_size — how many rows represent a single VM state — not the height of the table (height is inferred from data.len() / num_cols).
With step_size = num_rows, the STARK sees this table as having height / num_rows = 1 step. This means:
num_steps()returns1instead ofnum_rows- The LDE/FRI computation operates on a different polynomial degree than intended
- If transition constraints are ever added, they will only be evaluated once regardless of table size
Since REGISTER_RELOAD currently has zero transition constraints the proof still passes in tests, but this is a latent correctness bug.
| TraceTable::new_main(data, cols::NUM_COLUMNS, num_rows) | |
| TraceTable::new_main(data, cols::NUM_COLUMNS, 1) |
| let val_hi = (value >> 32) as u32; | ||
|
|
||
| let mut ts = prev_ts; | ||
| while curr_ts.saturating_sub(ts) > MAX_REG_GAP { |
There was a problem hiding this comment.
Low — saturating_sub silently swallows inverted timestamp order
If curr_ts < prev_ts (e.g. due to a future bug in timestamp assignment), saturating_sub returns 0 and the while-loop is never entered. The function then returns prev_ts, causing the CPU's IS_HALF lookup to receive a field-element-wrapped negative delta and fail at proof generation in an opaque way.
A debug_assert! makes the invariant explicit and produces a useful panic in debug builds:
debug_assert!(
curr_ts >= prev_ts,
"bridge_timestamp_gap: curr_ts ({}) < prev_ts ({}), reg_idx={}",
curr_ts, prev_ts, reg_idx
);| // rs1: IS_HALF[ts - rs1_prev_ts - 1] — fires when READ_REGISTER1=1 | ||
| if d.read_register1 && d.rs1 != 0 { | ||
| let prev = (op.rs1_prev_ts & 0xFFFF_FFFF) as u32; | ||
| let delta = ts.wrapping_sub(prev).wrapping_sub(1) as u16; |
There was a problem hiding this comment.
Low — wrapping_sub hides ordering violations silently
For rs1, the prover-side delta is ts - prev - 1. If prev >= ts this wraps to a large u16, adding a spurious IS_HALF entry with the wrong value. The CPU bus sends the algebraic expression TIMESTAMP - RS1_PREV_TS - 1 directly, so there would be a bus imbalance and proof failure — but the root cause would be hard to diagnose.
The invariant that ts >= prev + 1 (for rs1) and ts >= prev (for rs2/rd/PC) should hold by construction from bridge_timestamp_gap, so a debug_assert! is cheap and documents the contract:
// rs1: IS_HALF[ts - rs1_prev_ts - 1]
if d.read_register1 && d.rs1 != 0 {
let prev = (op.rs1_prev_ts & 0xFFFF_FFFF) as u32;
debug_assert!(ts > prev, "rs1 delta underflow: ts={ts} prev={prev}");
let delta = ts.wrapping_sub(prev).wrapping_sub(1) as u16;
...
}Same pattern applies to the rs2, rd, and PC deltas below.
Review: Perf/registers on mainOverall the design is sound — the Memory bus (LogUp) correctly enforces value consistency without needing explicit timestamp-ordering constraints in REGISTER_RELOAD, and the per-register IS_HALF checks in the CPU enforce local ordering. The MAX_REG_GAP maths check out for all four access types (rs1/rs2/rd/PC). MediumWrong Every other table passes Fix: Low
If timestamps are ever inverted,
The deltas are safe by construction (bridge enforces bounded gaps), but if the invariant is ever violated the wrapping produces a wrong IS_HALF entry and a confusing bus-imbalance failure. A |
This PR removes the MEMW_R register fast-path table and has the CPU chip emit Memory bus interactions directly for all register accesses. This eliminates ~3–4 trace rows per instruction while keeping identical security guarantees. The Memory bus LogUp argument enforces register read/write ordering the same way MEMW_R did.
The one new requirement is handling large timestamp gaps: if a register goes unaccessed for many instructions, the gap between the current and previous timestamp can exceed IS_HALF's 16-bit limit. The new REGISTER_RELOAD table handles this by inserting intermediate bridge rows (each at most 65534 timestamps apart) whenever such a gap is detected. In practice this is rare, it only triggers for programs where a register is idle for 65534+ consecutive instructions.
CPU column count grows from 74 to 80 (six new witness-only columns for previous timestamps and values). Effective trace width is unchanged at 194.