New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement recursion circuit logic for handling public inputs #249
Comments
Replace `PartialEq, PartialOrd` with `ConstantTimeEq` on `{Extended}SpendingKey`
In Halo Study Club today, we discussed this, and I made the following notes to clarifying things for myself: Assume we have a circuit that verifies some earlier proof P1. Let Fp be the circuit field (i.e. the base field of the "application curve"), and let Fq be the scalar field of the application curve. The process for verifying P1 inside a circuit involves arithmetic in both Fp and Fq, but the circuit cells can only store Fp elements and perform Fp arithmetic. So our goal is to defer the Fq arithmetic to the next phase of the cycle, where the circuit field is Fq. The act of deferring Fq arithmetic means that we need to expose the Fq elements in the proof P2 for this circuit. This happens via the instance columns used to verify P2, which contain Fp cells. Then, in the next phase of the cycle, we need to witness those instance columns by assigning them into Fq cells, and then compute a commitment to those cells (as part of the first steps of the Halo 2 verifier). We therefore need to choose the decomposition of Fq elements into Fp cells, to align with the decomposition of Fq elements into Fq cells. If we decompose the Fq elements into bitstring sequences that can fit into both Fp and Fq cells, then we can use endoscaling to implement a 1:1:1 mapping between bitstring sequences, Fp cell values, and Fq cell values.
It just so happens (by design) that we can also use these endoscalar values witnessed into the Fq cells to construct commitments to these elements, which acts as the proof's commitments to the instance columns provided by the verifier. This corresponds to the step in In summary, we have two separate things we need to do:
So the public inputs "gadget" doesn't actually need to be a single gadget. It should be two separate gadgets relying on two separate instruction sets, one for shuttling around the "payload" from one circuit to the other, and another for constraining the commitment to the instance columns. These gadgets would then be constrained to use the same endoscalar type and semantics at the higher layer. It would then be up to the chip implementer to decide whether to implement one chip per instruction set, or a single chip implementing both (but the point is that implementing one of the instruction sets and using one of the gadgets, shouldn't require implementing the other instruction set as well). |
Having talked with @ebfull again I now have a much better understanding of what he meant for the endoscalar usage. In my notes, below, treat
And my attempt to turn the above into a fancy Mermaid diagram: graph TB
subgraph Fp circuit
InstanceA(Instance column <br/> commitments) -- absorb --> TranscriptA(Transcript)
PayloadA(App payload) -- Internal <br/> decomposition --> ChunksAp
PubInA(Other public inputs) -- Fp --> ChunksAp
TranscriptA(Transcript) -- squeeze Fp --> ChallengeAp(Challenge <br/> bitstring)
ChallengeAp -- split --> ChunksAp(Bitstring chunks)
ChunksAp -- endoscale_q, Alg 2 <br/> using Fp scalars <br/> optimised with lookup --> EndoAq(Fp endoscalars)
ChallengeAp -- endoscale_p, Alg1 <br/> using Fp bases --> IPAp(IPA)
end
ChunksAp ====> ChunksAq
EndoAq -..-> EndoBGp
subgraph Fq circuit
ChunksAq(Bitstring chunks) -- Internal </br> recomposition --> PayloadBin(App payload in)
ChunksAq -- concat --> ChallengeAq(Challenge <br/> bitstring)
ChallengeAq -- endoscale_p, Alg 2 <br/> using Fq scalars --> EndoBp(Fq endoscalars)
EndoBp --> DeferredFq(Deferred Fq <br/> arithmetic)
ChunksAq -- endoscale_q, Alg 1 </br> using Fq bases --> EndoBGp("[Fp endoscalar] Gl_i")
EndoBGp -- sum --> InstanceB(Instance column <br/> commitments)
PayloadBin --> PayloadBout(App payload out)
InstanceB -- absorb --> TranscriptB(Transcript)
PayloadBout -- Internal <br/> decomposition --> ChunksBq
PubInB(Other public inputs) -- Fq --> ChunksBq
TranscriptB -- squeeze Fq --> ChallengeBq(Challenge <br/> bitstring)
ChallengeBq -- split --> ChunksBq(Bitstring chunks)
ChunksBq -- endoscale_r, Alg 2 <br/> using Fq scalars <br/> optimised with lookup --> EndoBq(Fq endoscalars)
ChallengeBq -- endoscale_q', Alg 1 <br/> using Fq bases --> IPAq(IPA)
end
ChunksBq ====> ChunksBr
EndoBq -..-> EndoCGq
subgraph Fr circuit
ChunksBr(Bitstring chunks) -- Internal </br> recomposition --> PayloadCin(App payload in)
ChunksBr -- concat --> ChallengeBr(Challenge <br/> bitstring)
ChallengeBr -- endoscale_q', Alg 2 <br/> using Fr scalars --> EndoCq(Fr endoscalars)
EndoCq --> DeferredFr(Deferred Fr <br/> arithmetic)
ChunksBr -- endoscale_r, Alg 1 </br> using Fr bases --> EndoCGq("[Fq endoscalar] Gl_i")
EndoCGq -- sum --> InstanceC(Instance column <br/> commitments)
end
Given the above, I think we need one new instruction set
And then various gadgets that utilise this and other instruction sets. For example a
|
[Clarification: the discussion below assumes that 160-bit challenges can be proven secure, which had not been done at the time and still hasn't.] In Halo Study Club today, we had yet another large discussion between @ebfull, @daira, @therealyingtong and myself on this, and clarified yet more differing assumptions we were making about how this all works. We spent time writing out the various logical types involved, and clarifying what logical operations were needed (vs what types and relations might actually exist in an optimised chip implementation). We also established that my proposed Logical types and relations
NotesWe might need to use the same challenge up to ~30 times.
Conversely, in order to take a Poseidon output and truncate it to 160 bits, we are forced to fully decompose the output in order to extract the 160 bit outputs, and then we can just copy those bits to where we need them @daira wants to optimise the usage of Poseidon outputs in the transcript, so when we sample multiple challenges in a row, we have the least number of "dangling decomposition rows" (e.g. running sum rows constraining upper bits of the field element), and the useful rows can then be inlined into other operations (like the endoscaling).
@str4d would like us to write "straight-line" chip implementations without any optimisations, both for use in checking the optimised chips, and to ensure that the instruction sets being defined are in fact usable with non-optimised chips and don't encode optimisation-specific assumptions. Alg 1/2If we want to rely in the implementation of Alg 1 on the assumption that there are no collisions on Alg 2, then we must limit the length of the Alg 1 bitstring input to some length that is (a few bits) less than the field size. The precise limit depends on details of the curve, and may or may not depend on the size of the lookup table that the user has requested. Circuit polymorphismFor protocols like Zexe when verifying multiple circuits (or if we want to allow varying PCD graphs within a protocol), we need a stable specification of the recursive verification algorithm. This algorithm must be able to operate polymorphically over some family of circuits (constrained by the application protocol), and over proofs created by multiple versions of halo2. |
This will have two components:
The text was updated successfully, but these errors were encountered: