Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shift32 chip: output constraint(s) #124

Open
morganthomas opened this issue Mar 17, 2024 · 2 comments
Open

Shift32 chip: output constraint(s) #124

morganthomas opened this issue Mar 17, 2024 · 2 comments

Comments

@morganthomas
Copy link
Collaborator

morganthomas commented Mar 17, 2024

Currently the Shift32 chip STARK does not put any constraint on its output. The output should be constrained to be equal to input_1 * power_of_two, truncated to a 32-bit word.

@morganthomas morganthomas self-assigned this Mar 17, 2024
@morganthomas
Copy link
Collaborator Author

morganthomas commented Mar 18, 2024

I had thought that this issue would be easy, because I thought that to complete the argument we could just say output = input * power_of_two. That's not correct in all cases, though; the multiplication may overflow. So now it's looking like a bigger issue than I previously estimated. Today I've been looking at different ways this argument could be completed.

I found two approaches to take inspiration from: Succinct SP1's approach, and TinyRAM's approach. I tried to look at how RISC Zero solves the problem, but I didn't end up finding their code that solves this problem. In summary, the approaches:

  1. SP1's approach. They decompose the base and the exponent into limbs. The base is decomposed into 8-bit bytes. The exponent is decomposed into the least significant three bits (the number of bits to shift) and the two least significant bits after the three LSBs (the number of bytes to shift). The limbs of the base are individually bit shifted by the number of bits to shift, and the limbs of the base are shifted as units by the number of bytes to shift.
  2. TinyRAM's approach. They multiply the base by 2 to the power of the exponent and truncate.

TinyRAM's approach works easily enough in a large enough field to perform the multiplication without overflow. This is the setting of TinyRAM. TinyRAM's approach can't be directly adapted to Valida. Prima facie, this suggests in favor of emulating SP1's approach. However, I'm thinking it may make more sense to take inspiration from TinyRAM's approach, even though it can't be directly adapted.

The basic premise behind TinyRAM's approach is that bit shifts can be checked using multiplication or division by a power of two, provided a large enough algebraic structure to do that arithmetic in without overflowing. No such algebraic structure currently exists in Valida. I propose to add a 64-bit unsigned multiplication chip to Valida, which would be used to check 32-bit signed and unsigned division, and 32-bit signed and unsigned bit shifts.

I am not proposing to add a 64-bit unsigned multiplication instruction. A 64-bit unsigned multiplication instruction would exceed the data throughput of Valida's memory channel. Valida has three 32-bit memory channels, for a total memory bus throughput of 96 bits per clock cycle. A 64-bit unsigned multiplication instruction, let's say with two 32-bit inputs and one 64-bit output, would require a memory bus throughput of at least 128 bits per clock cycle. However, the data throughput (i.e., the number of bits of data per row) of some of the cross-chip interactions is substantially greater than the throughput of the memory buses, so the data throughput required by a 64-bit unsigned multiplication chip, which is only used on the back end by other chips, seems like a viable idea without substantially changing the performance characteristics of the permutation argument which checks the cross-chip interactions.

A 64-bit unsigned multiplication chip's trace could be checked in a similar fashion to the 32-bit unsigned multiplication chip. The 32-bit unsigned multiplication chip decomposes the inputs into bytes, multiplies these separately, and sums together the results. What would mainly be different about a 64-bit unsigned multiplication chip with 32-bit inputs, compared to the existing 32-bit chip, is that the 64-bit chip's output would be bigger, and there would be no truncating of the answer in the case of the 64-bit chip.

Given a 64-bit unsigned multiplication chip, a 32-bit unsigned division c = a / b can be checked by verifying a = b * c. Notice that if the equation is true, then b * c must be less than 2^32, because a is less than 2^32. This may lead to the question, why is 64-bit multiplication required if the result of the multiplication always fits in 32 bits when the constraint is satisfied? The answer is that if we use 32-bit multiplication to multiply b and c, then the answer may be truncated, and we wouldn't necessarily know that. 32-bit unsigned multiplication with an overflow bit would maybe suffice for this application, but 64-bit unsigned multiplication with 32-bit inputs would also suffice.

Given 32-bit unsigned division, 32-bit signed division can be checked by factoring out and multiplying together the sign bits, which reduces the problem to 32-bit unsigned division.

A 64-bit unsigned multiplication chip would also help for implementing MULHU32 (32-bit unsigned multiplication, high / most significant 32 bits of the result only) and MULHS32 (32-bit signed multiplication, high / most significant bits only).

Finally, given 64-bit unsigned multiplication and 32-bit signed and unsigned division, shifts left, logical shifts right, and arithmetic shifts right can all be implemented as special cases.

This suggests to me that a 64-bit unsigned multiplication chip would help to kill several birds with one stone, as several unsolved problems in the Valida STARK can be straightforwardly reduced to it. As such, I'd like to try solving this issue by reducing it to 64-bit unsigned multiplication and implementing a 64-bit unsigned multiplication chip. I'm not sure if I want to do this by way of the 32-bit division chips, or without those layers of indirection; currently leaning towards the latter.

@morganthomas morganthomas removed their assignment Mar 26, 2024
@morganthomas
Copy link
Collaborator Author

See: #137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant