Implement recursion circuit logic for handling public inputs #249

str4d · 2021-04-16T22:17:01Z

This will have two components:

Public input packing: Creating the commitment to the public inputs efficiently on one curve.
Unpacking the public inputs on the other curve.

Replace `PartialEq, PartialOrd` with `ConstantTimeEq` on `{Extended}SpendingKey`

str4d · 2022-02-11T21:20:29Z

In Halo Study Club today, we discussed this, and I made the following notes to clarifying things for myself:

Assume we have a circuit that verifies some earlier proof P₁. Let F_p be the circuit field (i.e. the base field of the "application curve"), and let F_q be the scalar field of the application curve. The process for verifying P₁ inside a circuit involves arithmetic in both F_p and F_q, but the circuit cells can only store F_p elements and perform F_p arithmetic. So our goal is to defer the F_q arithmetic to the next phase of the cycle, where the circuit field is F_q.

The act of deferring F_q arithmetic means that we need to expose the F_q elements in the proof P₂ for this circuit. This happens via the instance columns used to verify P₂, which contain F_p cells. Then, in the next phase of the cycle, we need to witness those instance columns by assigning them into F_q cells, and then compute a commitment to those cells (as part of the first steps of the Halo 2 verifier).

We therefore need to choose the decomposition of F_q elements into F_p cells, to align with the decomposition of F_q elements into F_q cells. If we decompose the F_q elements into bitstring sequences that can fit into both F_p and F_q cells, then we can use endoscaling to implement a 1:1:1 mapping between bitstring sequences, F_p cell values, and F_q cell values.

In the F_p circuit, we use a lookup table to store the 1:1 mapping between bitstring sequences and F_p cell values, which we use to constrain the F_q element chunks into the instance column F_p cells.
In the F_q circuit, we implement the endoscaling directly to constrain the 1:! relationship between the witnessed instance columns assigned into advice column F_q cells, and the bitstring sequences needed in the circuit.

It just so happens (by design) that we can also use these endoscalar values witnessed into the F_q cells to construct commitments to these elements, which acts as the proof's commitments to the instance columns provided by the verifier. This corresponds to the step in halo2_proofs 0.1.0 where we produce a polynomial commitment to each instance column, and then write the coordinates of that commitment point into the BLAKE2b transcript.

In summary, we have two separate things we need to do:

We need to map F_q elements to instance column F_p cell values, then witness (the equivalent of) those values in advice column F_q cells, then map back to F_q element to constrain the F_q arithmetic.
We need to commit to all instance column F_p cell values (not just the ones created for the above mapping) as part of the Halo 2 protocol.

So the public inputs "gadget" doesn't actually need to be a single gadget. It should be two separate gadgets relying on two separate instruction sets, one for shuttling around the "payload" from one circuit to the other, and another for constraining the commitment to the instance columns. These gadgets would then be constrained to use the same endoscalar type and semantics at the higher layer. It would then be up to the chip implementer to decide whether to implement one chip per instruction set, or a single chip implementing both (but the point is that implementing one of the instruction sets and using one of the gadgets, shouldn't require implementing the other instruction set as well).

str4d · 2022-02-16T23:21:26Z

Having talked with @ebfull again I now have a much better understanding of what he meant for the endoscalar usage. In my notes, below, treat Fr == Fp (i.e. I'm writing it below as a series of nested curves in order to simplify things, but it is actually a cycle).

Endoscaling:
    endoscale_p: Fp is base field, Fq is scalar field
    endoscale_q: Fq is base field, Fp is scalar field
    endoscale_r: Fr is base field, Fq is scalar field

Fp circuit:
    Other public inputs -> Fp values --------    App payload --(internal decomposition)
                                             \   /
Transcript -> Fp -> challenge bitstring -> bitstring chunks --(Alg 2: endoscale_q using Fp scalars)--> Fp endoscalars
                                      \                       (  which we optimise with lookup    )   (in instance columns)
                                       \                                                              (but never witnessed)
                                        --(Alg 1: endoscale_p with Fp as base)--> IPA                 (   in Fq circuit   )

Fq circuit:
             --> App payload
            /
  bitstring chunks -(concat)-> challenge bitstring --(Alg 2: endoscale_p using Fq scalars)-> Fq endoscalars -> deferred Fq arithmetic
           \
            --(Alg 1: endoscale_q with Fq as base)--> {[Fp endoscalar] G_l_i} --(sum)--> instance column commitments
                                                      (equivalent to proof's)
                                                      (   instance columns  )

    Other public inputs -> Fq values --------    App payload --(internal decomposition)
                                             \   /
Transcript -> Fq -> challenge bitstring -> bitstring chunks --(Alg 2: endoscale_r using Fq scalars)--> Fq endoscalars
                                      \                       (  which we optimise with lookup    )   (in instance columns)
                                       \                                                              (but never witnessed)
                                        --(Alg 1: endoscale_q' with Fq as base)--> IPA                (   in Fr circuit   )

Fr circuit:
             --> App payload
            /
  bitstring chunks -(concat)-> challenge bitstring --(Alg 2: endoscale_q' using Fr scalars)-> Fr endoscalars -> deferred Fr arithmetic
           \
            --(Alg 1: endoscale_r with Fr as base)--> {[Fq endoscalar] G_l_i} --(sum)--> instance column commitment
                                                      (equivalent to proof's)
                                                      (   instance columns  )

And my attempt to turn the above into a fancy Mermaid diagram:

graph TB
    subgraph Fp circuit
        InstanceA(Instance column <br/> commitments) -- absorb --> TranscriptA(Transcript)
        PayloadA(App payload) -- Internal <br/> decomposition --> ChunksAp
        PubInA(Other public inputs) -- Fp --> ChunksAp
        TranscriptA(Transcript) -- squeeze Fp --> ChallengeAp(Challenge <br/> bitstring)
        ChallengeAp -- split --> ChunksAp(Bitstring chunks)
        ChunksAp -- endoscale_q, Alg 2 <br/> using Fp scalars <br/> optimised with lookup --> EndoAq(Fp endoscalars)
        ChallengeAp -- endoscale_p, Alg1 <br/> using Fp bases --> IPAp(IPA)
    end

    ChunksAp ====> ChunksAq
    EndoAq -..-> EndoBGp

    subgraph Fq circuit
        ChunksAq(Bitstring chunks) -- Internal </br> recomposition --> PayloadBin(App payload in)
        ChunksAq -- concat --> ChallengeAq(Challenge <br/> bitstring)
        ChallengeAq -- endoscale_p, Alg 2 <br/> using Fq scalars --> EndoBp(Fq endoscalars)
        EndoBp --> DeferredFq(Deferred Fq <br/> arithmetic)
        ChunksAq -- endoscale_q, Alg 1 </br> using Fq bases --> EndoBGp("[Fp endoscalar] Gl_i")
        EndoBGp -- sum --> InstanceB(Instance column <br/> commitments)

        PayloadBin --> PayloadBout(App payload out)
        InstanceB -- absorb --> TranscriptB(Transcript)

        PayloadBout -- Internal <br/> decomposition --> ChunksBq
        PubInB(Other public inputs) -- Fq --> ChunksBq
        TranscriptB -- squeeze Fq --> ChallengeBq(Challenge <br/> bitstring)
        ChallengeBq -- split --> ChunksBq(Bitstring chunks)
        ChunksBq -- endoscale_r, Alg 2 <br/> using Fq scalars <br/> optimised with lookup --> EndoBq(Fq endoscalars)
        ChallengeBq -- endoscale_q', Alg 1 <br/> using Fq bases --> IPAq(IPA)
    end

    ChunksBq ====> ChunksBr
    EndoBq -..-> EndoCGq

    subgraph Fr circuit
        ChunksBr(Bitstring chunks) -- Internal </br> recomposition --> PayloadCin(App payload in)
        ChunksBr -- concat --> ChallengeBr(Challenge <br/> bitstring)
        ChallengeBr -- endoscale_q', Alg 2 <br/> using Fr scalars --> EndoCq(Fr endoscalars)
        EndoCq --> DeferredFr(Deferred Fr <br/> arithmetic)
        ChunksBr -- endoscale_r, Alg 1 </br> using Fr bases --> EndoCGq("[Fq endoscalar] Gl_i")
        EndoCGq -- sum --> InstanceC(Instance column <br/> commitments)
    end

Given the above, I think we need one new instruction set EndoscaleInstructions:

load_endoscale_scalar_table
endoscale_base: Performs Alg 1 with current field as base.
endoscale_scalar_fixed: Performs Alg 2 with current field as scalar, using a fixed-length bitstring that is in the lookup table loaded via load_endoscale_scalar_table.
endoscale_scalar_var: Performs Alg 2 with current field as scalar, using a variable-length bitstring.

And then various gadgets that utilise this and other instruction sets. For example a PublicInputsGadget that has two variants, one for the non-deferred (Fp) circuit, and one for the deferred (Fq) circuit:

In Fp circuit:
- Use closure map to define the types of public inputs, and consume them from somewhere.
- Split into bitstring chunks
- Use EndoscaleInstructions to map to endoscalars (in advice cells).
- Use closure map to figure out indices of endoscalars in instance column.
- Use utilities to constrain endoscalar advice cells to correct instance column positions.
In Fq circuit:
- Use closure map to constrain types of witnessed "public inputs"
- Split into bitstring chunks
- Use EndoscaleInstructions to convert into endoscalar point commitments to chunks
- Use EccInstructions to sum point commitments to produce instance column commitment.

str4d · 2022-02-18T17:23:58Z

[Clarification: the discussion below assumes that 160-bit challenges can be proven secure, which had not been done at the time and still hasn't.]

In Halo Study Club today, we had yet another large discussion between @ebfull, @daira, @therealyingtong and myself on this, and clarified yet more differing assumptions we were making about how this all works. We spent time writing out the various logical types involved, and clarifying what logical operations were needed (vs what types and relations might actually exist in an optimised chip implementation). We also established that my proposed EndoscaleInstructions above were too low-level; in particular, we don't need to expose the lookup table as part of the instruction set, but instead as a configuration of the chip itself.

Logical types and relations

Transcript <--> 160-length bitstring

These are the challenges we want to sample.

BaseFieldElement <--> 255-length bitstring

This covers other public inputs etc for the protocol.
Assuming 255-bit field for now.

Vec<var-length bitstring> <--pad-and-pack--> Vec<CHUNK-length bitstring>

This operation handles internal padding (alignment)
We need this to ensure that no matter how large the input, CHUNK is less than the Alg1/2 maximum input size (see notes below).
CHUNK-length bitstring is a constrained version of var-length bitstring (i.e we can ideally use the former anywhere we use the latter, but not vice versa). Similar to NonIdentityPoint and Point: From<NonIdentityPoint> on EccInstructions.

var-length bitstring <--decompose--> var-length boolean-decomposed bitstring

var-length boolean-decomposed bitstring <--EndoAlg1--> NonIdentityPoint (affine)

var-length has a limit in this context.

var-length boolean-decomposed bitstring <--EndoAlg2--> BaseFieldElement

@daira notes that given any K-bit endoscalar lookup table, you can optimise almost any var-length endoscaling operations (both less than and greater than K bits) with some work (and additional columns).
User would tell the chip their desired maximum table size, and the chip can decide what size tables to allocate for best performance, and what the optimal CHUNK size is for the pack-and-pad operation (since it can be larger than the table bitstring lookup length, but has restrictions on its possible lengths).

Notes

We might need to use the same challenge up to ~30 times.

var-length boolean-decomposed bitstring can potentially be optimised out by instead inlining the decomposition running sum (2k bits for 0 < k <= 80 or 127) because decomposition is almost free (in terms of advice area)

Conversely, in order to take a Poseidon output and truncate it to 160 bits, we are forced to fully decompose the output in order to extract the 160 bit outputs, and then we can just copy those bits to where we need them

@daira wants to optimise the usage of Poseidon outputs in the transcript, so when we sample multiple challenges in a row, we have the least number of "dangling decomposition rows" (e.g. running sum rows constraining upper bits of the field element), and the useful rows can then be inlined into other operations (like the endoscaling).

@ebfull points out that for Halo 2 specifically, we only sample at most two challenges in a row before needing to absorb data, and with 160-bit challenges we need at least two Poseidon outputs; therefore the simple thing (truncate each output to a single challenge) has exactly as many "dangling decomposition rows" as the packed form.
@str4d points out that the choice of simple vs packed affects the transcript API internals, which we're treating as a black box here. If we use the simple approach for Halo 2 recursion, it potentially makes the transcript API less efficient for other users who might benefit from packed. We should consider how to possibly tune the transcript API to enable this to be chosen by the user.

@str4d would like us to write "straight-line" chip implementations without any optimisations, both for use in checking the optimised chips, and to ensure that the instruction sets being defined are in fact usable with non-optimised chips and don't encode optimisation-specific assumptions.

Alg 1/2

If we want to rely in the implementation of Alg 1 on the assumption that there are no collisions on Alg 2, then we must limit the length of the Alg 1 bitstring input to some length that is (a few bits) less than the field size. The precise limit depends on details of the curve, and may or may not depend on the size of the lookup table that the user has requested.

Circuit polymorphism

For protocols like Zexe when verifying multiple circuits (or if we want to allow varying PCD graphs within a protocol), we need a stable specification of the recursive verification algorithm. This algorithm must be able to operate polymorphically over some family of circuits (constrained by the application protocol), and over proofs created by multiple versions of halo2.

str4d added the A-recursion Area: Recursive proofs label Jul 22, 2021

therealyingtong self-assigned this Sep 20, 2021

therealyingtong mentioned this issue Oct 4, 2021

[WIP] Public inputs gadget for recursion #378

Closed

r3ld3v added this to the Core Sprint 2021-40 milestone Oct 11, 2021

r3ld3v added the S-committed label Oct 13, 2021

r3ld3v modified the milestones: Core Sprint 2021-40, Core Sprint 2021-42 Oct 25, 2021

r3ld3v modified the milestones: Core Sprint 2021-42, Core Sprint 2021-44 Nov 15, 2021

str4d added S-committed and removed S-committed labels Nov 22, 2021

r3ld3v modified the milestones: Core Sprint 2021-44, Core Sprint 2021-46 Nov 29, 2021

daira changed the title ~~Implement circuit logic for handling public inputs~~ Implement recursion circuit logic for handling public inputs Nov 30, 2021

r3ld3v modified the milestones: Core Sprint 2021-46, Core Sprint 2021-48 Dec 6, 2021

r3ld3v added S-committed and removed S-committed labels Dec 7, 2021

therealyingtong mentioned this issue Dec 13, 2021

Endoscaling instructions and chip #428

Closed

r3ld3v modified the milestones: Core Sprint 2021-48, Core Sprint 2021-50 Dec 20, 2021

r3ld3v added S-committed and removed S-committed labels Dec 20, 2021

r3ld3v modified the milestones: Core Sprint 2021-50, 2021-52 Jan 3, 2022

r3ld3v modified the milestones: Core Sprint 2021-52, Core Sprint 2022-02 Jan 18, 2022

str4d pushed a commit that referenced this issue Jan 19, 2022

Merge pull request #249 from zcash/241-spendingkey-ct_eq

53b68ea

Replace `PartialEq, PartialOrd` with `ConstantTimeEq` on `{Extended}SpendingKey`

r3ld3v modified the milestones: Core Sprint 2022-02, Core Sprint 2022-04 Feb 7, 2022

r3ld3v modified the milestones: Core Sprint 2022-04, Core Sprint 2022-06, Core Sprint 2022-08 Feb 23, 2022

r3ld3v added S-committed and removed S-committed labels Feb 25, 2022

r3ld3v removed this from the Core Sprint 2022-08 milestone Feb 25, 2022

r3ld3v removed the S-committed label Feb 25, 2022

therealyingtong mentioned this issue Mar 22, 2022

Endoscaling instructions and chip. #529

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement recursion circuit logic for handling public inputs #249

Implement recursion circuit logic for handling public inputs #249

str4d commented Apr 16, 2021

str4d commented Feb 11, 2022 •

edited

str4d commented Feb 16, 2022 •

edited by daira

str4d commented Feb 18, 2022 •

edited by daira

Implement recursion circuit logic for handling public inputs #249

Implement recursion circuit logic for handling public inputs #249

Comments

str4d commented Apr 16, 2021

str4d commented Feb 11, 2022 • edited

str4d commented Feb 16, 2022 • edited by daira

str4d commented Feb 18, 2022 • edited by daira

Logical types and relations

Notes

Alg 1/2

Circuit polymorphism

str4d commented Feb 11, 2022 •

edited

str4d commented Feb 16, 2022 •

edited by daira

str4d commented Feb 18, 2022 •

edited by daira