Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
40cc0bc
spec: Draft CPU rework, still missing updates for all ALU chips, DECO…
RobinJadoul May 26, 2026
eaffb54
Decoding for the new CPU
RobinJadoul May 26, 2026
97c17ce
Bitwise using BYTE_ALU
RobinJadoul May 26, 2026
c003b61
Update old chips for the new CPU
RobinJadoul May 26, 2026
dd81c6b
Fix decode for MEMORY signed flag
RobinJadoul May 26, 2026
cc50d00
Add input range check to SHIFTs typ
RobinJadoul May 26, 2026
720512c
MEMORY signature HL -> WL
RobinJadoul May 26, 2026
ce1a06a
Clarify SHIFTW decoding
RobinJadoul May 26, 2026
5158513
Further updates and new chips for the new CPU
RobinJadoul May 27, 2026
d90ebc6
Fix arg2 muxes
RobinJadoul May 27, 2026
8c0b399
Link instruction length to CPU32
RobinJadoul May 27, 2026
761abbd
Apply easy changes from code review
RobinJadoul May 29, 2026
013b865
Address more review comments
RobinJadoul May 29, 2026
f3b041d
Fix CPU32 register interactions
RobinJadoul May 29, 2026
88fc8cf
Fix CPUs PC write multiplicity
RobinJadoul May 29, 2026
7d14501
Fix HALT's interaction with the CPU padding
RobinJadoul May 29, 2026
9b30b9e
Add some optional extra arith constraints to enforce the "easy" assum…
RobinJadoul May 29, 2026
a495f8e
Combine assumptions constraints
RobinJadoul Jun 1, 2026
322bfbc
Remove todo comments that got a tracking issue
RobinJadoul Jun 1, 2026
1dee915
word_instr gate for read_registerX out of caution
RobinJadoul Jun 1, 2026
c9540a5
Fix JAL(R) decoding and ALU
RobinJadoul Jun 1, 2026
dad2eea
Placate the type checker
RobinJadoul Jun 1, 2026
b1b51c9
Represent instruction length as half
RobinJadoul Jun 2, 2026
fdcf023
Update spec/src/cpu.toml
RobinJadoul Jun 2, 2026
9be4ecd
fixes
RobinJadoul Jun 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions spec/bitwise.typ
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,15 @@ and convenience functionalities over small domains.

The #bitwise chip is comprised of #nr_variables variables that are expressed using #nr_columns columns.
Of these, the _input_ and _output_ variables (#nr_precomputed in total) are precomputed.

#render_chip_variable_table(chip, config)

*Note*: This table contains one row for every possible value of `(X, Y, Z)`.
As such, it has length $2^8 dot 2^8 dot 2^4 = 2^(20)$.

We use the ALU operation descriptors from @decode to identify the operations in the `BYTE_ALU` interaction.
Since each of the three columns is only $2^16$ rows long, they can be combined in a single $2^20$ column (with room to spare).

= Lookup
This chip adds the following interactions to the lookup:
#render_constraint_table(chip, config)
Expand Down
12 changes: 8 additions & 4 deletions spec/book.typ
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,25 @@
("add.typ", [`ADD`/`SUB` template], <add>),
("neg.typ", [`NEG` template], <neg>),
)),
("MEMORY", (
("memw.typ", [`MEMW` chip], <memw>),
)),
("CPU", (
("decode.typ", [`DECODE` table], <decode>),
("cpu.typ", [`CPU` chip], <cpu>),
("cpu32.typ", [`CPU32` chip], <cpu32>),
)),
("ALU", (
("shift.typ", [`SHIFT` chip], <shift>),
("branch.typ", [`BRANCH` chip], <branch>),
("lt.typ", [`LT` chip], <lt>),
("eq.typ", [`EQ` chip], <eq>),
("mul.typ", [`MUL` chip], <mul>),
("dvrm.typ", [`DVRM` chip], <dvrm>),
("load.typ", [`LOAD` chip], <load>),
("bitwise.typ", [`BITWISE` chips], <bitwise>),
("bytewise.typ", [`BYTEWISE` chip], <bytewise>)
)),
("MEMORY", (
("memw.typ", [`MEMW` chip], <memw>),
("load.typ", [`LOAD` chip], <load>),
("store.typ", [`STORE` chip], <store>),
)),
("ECALLS", (
("about_ecalls.typ", [About `ECALL`], <ecall>),
Expand Down
5 changes: 5 additions & 0 deletions spec/branch.typ
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@ The #branch chip is comprised of #nr_variables variables that are expressed usin

#render_chip_assumptions(chip, config)

Some of the assumptions can be checked with only arithmetic constraints, so we
provide these below.

#render_constraint_table(chip, config, groups: "assumptions")

= Constraints

We constrain `next_pc` to be $#`base_address` + #`offset`$,
Expand Down
38 changes: 38 additions & 0 deletions spec/bytewise.typ
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#import "/book.typ": book-page, rj
#import "/src.typ": load_config, load_chip
#import "/chip.typ": (
render_chip_assumptions,
render_chip_variable_table,
total_nr_variables,
total_nr_instantiated_columns,
compute_nr_interactions,
render_constraint_table,
render_chip_padding_table,
)

#let config = load_config()
#let chip = load_chip("src/bytewise.toml", config)

#show: book-page(chip.name)
#let bytewise = raw(chip.name)

The #bytewise chip is an ALU chip that decomposes the input `DWordWL` values into bytes and
performs a `BITWISE` operation pairwise (AND, OR, XOR).
The `BITWISE` lookup inherently performs a range check, so no further constraints are necessary.

= Variables
#let nr_variables = total_nr_variables(chip)
#let nr_columns = total_nr_instantiated_columns(chip, config)
#let nr_interactions = compute_nr_interactions(chip)

The #bytewise chip is comprised of #nr_variables variables that are expressed using #nr_columns columns and leverages #nr_interactions interaction(s):
#render_chip_variable_table(chip, config)

= Constraints

#render_constraint_table(chip, config)

= Padding

The chip can be padded with the following values:
#render_chip_padding_table(chip, config)
76 changes: 48 additions & 28 deletions spec/cpu.typ
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
#let cpu = raw(chip.name)

The #cpu chip coordinates memory accesses and dispatches to other chips for arithmetic and logical operations.
It bases its decisions on the entry of the `DECODE` table (@decode) corresponding the the current program counter (PC).
It bases its decisions on the entry of the `DECODE` table (@decode) corresponding the current program counter (PC).

= Variables
#let nr_variables = total_nr_variables(chip)
Expand All @@ -30,36 +30,52 @@ The #cpu chip is comprised of #nr_variables variables that are expressed using #
= Assumptions
#render_chip_assumptions(chip, config)

Additionally, the following constraints can be used to provide defense-in-depth
validation of the assumptions.

#render_constraint_table(chip, config, groups: "assumptions")

= Constraints

First, we perform a decoding lookup for the current PC.
Instructions having the `word_instr` flag set are not decoded here, as they are delegated to the `CPU32` chip.
In that case, we ensure that the current row of the CPU cannot have any other observable effects.

#render_constraint_table(chip, config, groups: "decode")
Comment thread
erik-3milabs marked this conversation as resolved.

== Range checks

We constrain all columns to have the appropriate ranges.
The flags and register indices looked up from the decoding need to be checked,
as they are communicated through the interaction in a packed form.
All values in `packed_decode` need to be checked to ensure
the packing is correct for the interaction.
In contrast, we know ahead of time that decoding will ensure proper range checks for `pc` and `imm`.
Similarly, since `next_pc` will propagate through the memory argument and be looked up
in the instruction decoding on the next cycle, it is forced to be in the correct range.#rj[is this true, do we need this elsewhere for chip assumptions?]
For the auxiliary columns, we need to check the limbs of `arg1`, `arg2`, and `res`.
in the instruction decoding on the next cycle, it is forced to be in the correct range;
the final value for `next_pc` is similarly fixed by the memory finalization.
For the auxiliary columns, we need to check the limbs of `res`, since
`rv1` and `rv2` are enforced by the memory argument, and `rvd` is correct by the correctness of the dependent chips.
The ranges of the other auxiliary columns are enforced through later constraints.
#rj[Make sure we argue for every column here]
#rj[is `rvd` still sufficiently constrained? (can also be done through the memory argument like `pc`?)]

#render_constraint_table(chip, config, groups: "range")

== ALU

The ALU functionality is then obtained through judicious dispatching to the corresponding chips.
The ALU functionality is then obtained through delegation to the `ALU` signature, backed by the various ALU chips,
or by using the appropriate template.
For the pure ALU path, `arg2` is computed as `rv2 + imm`, which relies on @cpu:a:arg2-multiplex to
be either `rv2` or `imm`, depending on the instruction.
The other contributions for `arg2` are specific to the (mutually exclusive, @cpu:a:mem-branch-mutex)
`MEMORY` and `BRANCH` flags:
- For the `MEMORY` path, we want the output of the ALU to be $#`rv1` + #`imm`$, as that is the
address at which the memory access occurs.
- For the `BRANCH` path, we want the ALU output to reflect the branch condition (or just be inactive for JALR).

#render_constraint_table(chip, config, groups: "alu")

== Memory<cpu:memory>

The interactions with the memory, both for register loading and storing, as for `LOAD` and `STORE` instructions are handled.
Note that since registers need no byte-addressing, we store them in the memory argument with `Word` limbs.
Note that since registers need no byte-addressing, we store them in the memory argument with `Word` limbs,
simultaneously ensuring that register reads are properly range checked as long as all writes are.
The `pc` register behaves very predictably with respect to its timestamps and when it is being read,
so for performance reasons, we inline its memory interactions directly into the #cpu chip.

Expand All @@ -70,32 +86,36 @@ Constraints regarding whether `pc_double_read` corresponds to an `AUIPC` instruc
as regardless of its value, the old timestamp is guaranteed smaller than the new timestamp,
and the integrity of the memory argument therefore ensures the correctness of this bit.

#render_constraint_table(chip, config, groups: "mem")

=== Potential optimizations
The memory interaction itself is handled by the `MEMORY` signature,
which will read the `mem_flags` argument to perform either a `LOAD` or a `STORE`.
We refer to the previous section's description of `arg2` for how
the address is computed.

- `double_pc_read` could be integrated into decoding, so that `AUIPC` could set `read_register1 = 0` and no extra MEMW access for `rv1` is needed at this point.
The value to (potentially) be written back to `rd` is stored in `rvd`,
which can either come from the ALU --- in case of an ALU operation or a JALR branch ---
or from the MEMORY interaction.
Comment thread
RobinJadoul marked this conversation as resolved.

== System

The interactions with the wider system.
#render_constraint_table(chip, config, groups: "mem")

#render_constraint_table(chip, config, groups: "sys")
== Branching

== Input and output to the ALU
A branch is expressed by having the `BRANCH` flag set to 1.
Since `BRANCH` and `MEMORY` are mutually exclusive (@cpu:a:mem-branch-mutex),
we can repurpose the `mem_flags` field to indicate a JALR instruction.
When JALR is not set, we have a conditional branch that is decided upon by the result of the ALU instructions,
as set in the `res` variable.
As such, we can set `branch_cond` appropriately as multiplicity flag for the `BRANCH` chip.

We constrain `arg1`, `arg2` and `rvd` to correspond to the wanted values,
including the appropriate sign/zero extension, depending on `word_instr`.
#render_constraint_table(chip, config, groups: "branch")

#render_constraint_table(chip, config, groups: "ext")
== System

== Other constraints
For @cpu:c:is_equal, note that @cpu:c:sub sets `res` to be the difference between `arg1` and `arg2` whenever `BEQ` is $1$.
Given that this difference is $0$ when both are equal, @cpu:c:is_equal ensures `is_equal` is set to $1$ if and only if $#`arg1` = #`arg2`$ and `BEQ` is set.
The interactions with the wider system go through the `ECALL` interface.
Since we treat `EBREAK` instructions as unprovable traps, we avoid emitting `DECODE` rows
for these, and do not need any further handling in the CPU.

#render_constraint_table(chip, config, groups: "misc")
#render_constraint_table(chip, config, groups: "sys")

#rj[Document the choice to not have a multiplicity column here for padding]

= Padding

Expand All @@ -104,4 +124,4 @@ in the DECODE table, at the _odd_ address 1, only reachable through a HALT ecall

#render_chip_padding_table(chip, config)

This approach minimizes the number of dependent lookups, increasing only multiplicities in the DECODE table and the IS_BYTE lookup.
This approach minimizes the number of dependent lookups, increasing only multiplicities in the `DECODE` table and the `IS_BYTE` and `IS_HALF` lookups.
61 changes: 61 additions & 0 deletions spec/cpu32.typ
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#import "/book.typ": book-page, rj
#import "/src.typ": load_config, load_chip
#import "/chip.typ": (
render_chip_assumptions,
render_chip_variable_table,
render_chip_padding_table,
render_constraint_table,
compute_nr_interactions,
total_nr_instantiated_columns,
total_nr_variables,
)

#let config = load_config()
#let chip = load_chip("src/cpu32.toml", config)

#show: book-page(chip.name)
#let cpu32 = raw(chip.name)

The #cpu32 chip is used to delegate the 32-bit instructions of the RV64I instruction set
from the main CPU table (@cpu).
All 32-bit instructions are ALU-only instructions, so the BRANCH, MEMORY and ECALL paths need no elaboration.
The timestamp and PC have already been read by the CPU table at this point, and need no further checking;
the PC for the next instruction will also already be handled by CPU.

The structure follows the regular ALU path, with some extra variables and constraints to contain the required sign extensions.

= Variables
#let nr_variables = total_nr_variables(chip)
#let nr_columns = total_nr_instantiated_columns(chip, config)
#let nr_interactions = compute_nr_interactions(chip)

The #cpu32 chip is comprised of #nr_variables variables that are expressed using #nr_columns columns and leverages #nr_interactions interaction(s):
#render_chip_variable_table(chip, config)

= Assumptions

#render_chip_assumptions(chip, config)

Some of the assumptions can be checked with only arithmetic constraints, so we
provide these below.

#render_constraint_table(chip, config, groups: "assumptions")

= Constraints

Most constraints correspond to those already present in the CPU, and we present them here first,
including some updates to the range checking corresponding to the differing types.

#render_constraint_table(chip, config, groups: ("decode", "range", "alu", "mem", "logup"))

Then, we have the constraints corresponding to the sign-extension and definition of `arg1`, `arg2` and `rd`.
This includes a step where we extract the `signed` bit from the `alu_flags`, as this determines
whether to sign extend the inputs or not.

#render_constraint_table(chip, config, groups: "ext")

= Padding

The table can be padded with the following values:
#render_chip_padding_table(chip, config)

Loading
Loading