Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement recursion circuit logic for handling public inputs #249

Open
str4d opened this issue Apr 16, 2021 · 3 comments
Open

Implement recursion circuit logic for handling public inputs #249

str4d opened this issue Apr 16, 2021 · 3 comments
Assignees
Labels
A-recursion Area: Recursive proofs

Comments

@str4d
Copy link
Contributor

str4d commented Apr 16, 2021

This will have two components:

  • Public input packing: Creating the commitment to the public inputs efficiently on one curve.
  • Unpacking the public inputs on the other curve.
@str4d str4d added the A-recursion Area: Recursive proofs label Jul 22, 2021
@therealyingtong therealyingtong self-assigned this Sep 20, 2021
@r3ld3v r3ld3v added this to the Core Sprint 2021-40 milestone Oct 11, 2021
@daira daira changed the title Implement circuit logic for handling public inputs Implement recursion circuit logic for handling public inputs Nov 30, 2021
@r3ld3v r3ld3v modified the milestones: Core Sprint 2021-50, 2021-52 Jan 3, 2022
str4d pushed a commit that referenced this issue Jan 19, 2022
Replace `PartialEq, PartialOrd` with `ConstantTimeEq` on `{Extended}SpendingKey`
@str4d
Copy link
Contributor Author

str4d commented Feb 11, 2022

In Halo Study Club today, we discussed this, and I made the following notes to clarifying things for myself:

Assume we have a circuit that verifies some earlier proof P1. Let Fp be the circuit field (i.e. the base field of the "application curve"), and let Fq be the scalar field of the application curve. The process for verifying P1 inside a circuit involves arithmetic in both Fp and Fq, but the circuit cells can only store Fp elements and perform Fp arithmetic. So our goal is to defer the Fq arithmetic to the next phase of the cycle, where the circuit field is Fq.

The act of deferring Fq arithmetic means that we need to expose the Fq elements in the proof P2 for this circuit. This happens via the instance columns used to verify P2, which contain Fp cells. Then, in the next phase of the cycle, we need to witness those instance columns by assigning them into Fq cells, and then compute a commitment to those cells (as part of the first steps of the Halo 2 verifier).

We therefore need to choose the decomposition of Fq elements into Fp cells, to align with the decomposition of Fq elements into Fq cells. If we decompose the Fq elements into bitstring sequences that can fit into both Fp and Fq cells, then we can use endoscaling to implement a 1:1:1 mapping between bitstring sequences, Fp cell values, and Fq cell values.

  • In the Fp circuit, we use a lookup table to store the 1:1 mapping between bitstring sequences and Fp cell values, which we use to constrain the Fq element chunks into the instance column Fp cells.
  • In the Fq circuit, we implement the endoscaling directly to constrain the 1:! relationship between the witnessed instance columns assigned into advice column Fq cells, and the bitstring sequences needed in the circuit.

It just so happens (by design) that we can also use these endoscalar values witnessed into the Fq cells to construct commitments to these elements, which acts as the proof's commitments to the instance columns provided by the verifier. This corresponds to the step in halo2_proofs 0.1.0 where we produce a polynomial commitment to each instance column, and then write the coordinates of that commitment point into the BLAKE2b transcript.

In summary, we have two separate things we need to do:

  • We need to map Fq elements to instance column Fp cell values, then witness (the equivalent of) those values in advice column Fq cells, then map back to Fq element to constrain the Fq arithmetic.
  • We need to commit to all instance column Fp cell values (not just the ones created for the above mapping) as part of the Halo 2 protocol.

So the public inputs "gadget" doesn't actually need to be a single gadget. It should be two separate gadgets relying on two separate instruction sets, one for shuttling around the "payload" from one circuit to the other, and another for constraining the commitment to the instance columns. These gadgets would then be constrained to use the same endoscalar type and semantics at the higher layer. It would then be up to the chip implementer to decide whether to implement one chip per instruction set, or a single chip implementing both (but the point is that implementing one of the instruction sets and using one of the gadgets, shouldn't require implementing the other instruction set as well).

@str4d
Copy link
Contributor Author

str4d commented Feb 16, 2022

Having talked with @ebfull again I now have a much better understanding of what he meant for the endoscalar usage. In my notes, below, treat Fr == Fp (i.e. I'm writing it below as a series of nested curves in order to simplify things, but it is actually a cycle).

Endoscaling:
    endoscale_p: Fp is base field, Fq is scalar field
    endoscale_q: Fq is base field, Fp is scalar field
    endoscale_r: Fr is base field, Fq is scalar field

Fp circuit:
    Other public inputs -> Fp values --------    App payload --(internal decomposition)
                                             \   /
Transcript -> Fp -> challenge bitstring -> bitstring chunks --(Alg 2: endoscale_q using Fp scalars)--> Fp endoscalars
                                      \                       (  which we optimise with lookup    )   (in instance columns)
                                       \                                                              (but never witnessed)
                                        --(Alg 1: endoscale_p with Fp as base)--> IPA                 (   in Fq circuit   )

Fq circuit:
             --> App payload
            /
  bitstring chunks -(concat)-> challenge bitstring --(Alg 2: endoscale_p using Fq scalars)-> Fq endoscalars -> deferred Fq arithmetic
           \
            --(Alg 1: endoscale_q with Fq as base)--> {[Fp endoscalar] G_l_i} --(sum)--> instance column commitments
                                                      (equivalent to proof's)
                                                      (   instance columns  )

    Other public inputs -> Fq values --------    App payload --(internal decomposition)
                                             \   /
Transcript -> Fq -> challenge bitstring -> bitstring chunks --(Alg 2: endoscale_r using Fq scalars)--> Fq endoscalars
                                      \                       (  which we optimise with lookup    )   (in instance columns)
                                       \                                                              (but never witnessed)
                                        --(Alg 1: endoscale_q' with Fq as base)--> IPA                (   in Fr circuit   )

Fr circuit:
             --> App payload
            /
  bitstring chunks -(concat)-> challenge bitstring --(Alg 2: endoscale_q' using Fr scalars)-> Fr endoscalars -> deferred Fr arithmetic
           \
            --(Alg 1: endoscale_r with Fr as base)--> {[Fq endoscalar] G_l_i} --(sum)--> instance column commitment
                                                      (equivalent to proof's)
                                                      (   instance columns  )

And my attempt to turn the above into a fancy Mermaid diagram:

graph TB
    subgraph Fp circuit
        InstanceA(Instance column <br/> commitments) -- absorb --> TranscriptA(Transcript)
        PayloadA(App payload) -- Internal <br/> decomposition --> ChunksAp
        PubInA(Other public inputs) -- Fp --> ChunksAp
        TranscriptA(Transcript) -- squeeze Fp --> ChallengeAp(Challenge <br/> bitstring)
        ChallengeAp -- split --> ChunksAp(Bitstring chunks)
        ChunksAp -- endoscale_q, Alg 2 <br/> using Fp scalars <br/> optimised with lookup --> EndoAq(Fp endoscalars)
        ChallengeAp -- endoscale_p, Alg1 <br/> using Fp bases --> IPAp(IPA)
    end

    ChunksAp ====> ChunksAq
    EndoAq -..-> EndoBGp

    subgraph Fq circuit
        ChunksAq(Bitstring chunks) -- Internal </br> recomposition --> PayloadBin(App payload in)
        ChunksAq -- concat --> ChallengeAq(Challenge <br/> bitstring)
        ChallengeAq -- endoscale_p, Alg 2 <br/> using Fq scalars --> EndoBp(Fq endoscalars)
        EndoBp --> DeferredFq(Deferred Fq <br/> arithmetic)
        ChunksAq -- endoscale_q, Alg 1 </br> using Fq bases --> EndoBGp("[Fp endoscalar] Gl_i")
        EndoBGp -- sum --> InstanceB(Instance column <br/> commitments)

        PayloadBin --> PayloadBout(App payload out)
        InstanceB -- absorb --> TranscriptB(Transcript)

        PayloadBout -- Internal <br/> decomposition --> ChunksBq
        PubInB(Other public inputs) -- Fq --> ChunksBq
        TranscriptB -- squeeze Fq --> ChallengeBq(Challenge <br/> bitstring)
        ChallengeBq -- split --> ChunksBq(Bitstring chunks)
        ChunksBq -- endoscale_r, Alg 2 <br/> using Fq scalars <br/> optimised with lookup --> EndoBq(Fq endoscalars)
        ChallengeBq -- endoscale_q', Alg 1 <br/> using Fq bases --> IPAq(IPA)
    end

    ChunksBq ====> ChunksBr
    EndoBq -..-> EndoCGq

    subgraph Fr circuit
        ChunksBr(Bitstring chunks) -- Internal </br> recomposition --> PayloadCin(App payload in)
        ChunksBr -- concat --> ChallengeBr(Challenge <br/> bitstring)
        ChallengeBr -- endoscale_q', Alg 2 <br/> using Fr scalars --> EndoCq(Fr endoscalars)
        EndoCq --> DeferredFr(Deferred Fr <br/> arithmetic)
        ChunksBr -- endoscale_r, Alg 1 </br> using Fr bases --> EndoCGq("[Fq endoscalar] Gl_i")
        EndoCGq -- sum --> InstanceC(Instance column <br/> commitments)
    end

Given the above, I think we need one new instruction set EndoscaleInstructions:

  • load_endoscale_scalar_table
  • endoscale_base: Performs Alg 1 with current field as base.
  • endoscale_scalar_fixed: Performs Alg 2 with current field as scalar, using a fixed-length bitstring that is in the lookup table loaded via load_endoscale_scalar_table.
  • endoscale_scalar_var: Performs Alg 2 with current field as scalar, using a variable-length bitstring.

And then various gadgets that utilise this and other instruction sets. For example a PublicInputsGadget that has two variants, one for the non-deferred (Fp) circuit, and one for the deferred (Fq) circuit:

  • In Fp circuit:
    • Use closure map to define the types of public inputs, and consume them from somewhere.
    • Split into bitstring chunks
    • Use EndoscaleInstructions to map to endoscalars (in advice cells).
    • Use closure map to figure out indices of endoscalars in instance column.
    • Use utilities to constrain endoscalar advice cells to correct instance column positions.
  • In Fq circuit:
    • Use closure map to constrain types of witnessed "public inputs"
    • Split into bitstring chunks
    • Use EndoscaleInstructions to convert into endoscalar point commitments to chunks
    • Use EccInstructions to sum point commitments to produce instance column commitment.

@str4d
Copy link
Contributor Author

str4d commented Feb 18, 2022

[Clarification: the discussion below assumes that 160-bit challenges can be proven secure, which had not been done at the time and still hasn't.]

In Halo Study Club today, we had yet another large discussion between @ebfull, @daira, @therealyingtong and myself on this, and clarified yet more differing assumptions we were making about how this all works. We spent time writing out the various logical types involved, and clarifying what logical operations were needed (vs what types and relations might actually exist in an optimised chip implementation). We also established that my proposed EndoscaleInstructions above were too low-level; in particular, we don't need to expose the lookup table as part of the instruction set, but instead as a configuration of the chip itself.

Logical types and relations

Transcript <--> 160-length bitstring
  • These are the challenges we want to sample.
BaseFieldElement <--> 255-length bitstring
  • This covers other public inputs etc for the protocol.
  • Assuming 255-bit field for now.
Vec<var-length bitstring> <--pad-and-pack--> Vec<CHUNK-length bitstring>
  • This operation handles internal padding (alignment)
  • We need this to ensure that no matter how large the input, CHUNK is less than the Alg1/2 maximum input size (see notes below).
  • CHUNK-length bitstring is a constrained version of var-length bitstring (i.e we can ideally use the former anywhere we use the latter, but not vice versa). Similar to NonIdentityPoint and Point: From<NonIdentityPoint> on EccInstructions.
var-length bitstring <--decompose--> var-length boolean-decomposed bitstring
var-length boolean-decomposed bitstring <--EndoAlg1--> NonIdentityPoint (affine)
  • var-length has a limit in this context.
var-length boolean-decomposed bitstring <--EndoAlg2--> BaseFieldElement
  • @daira notes that given any K-bit endoscalar lookup table, you can optimise almost any var-length endoscaling operations (both less than and greater than K bits) with some work (and additional columns).
  • User would tell the chip their desired maximum table size, and the chip can decide what size tables to allocate for best performance, and what the optimal CHUNK size is for the pack-and-pad operation (since it can be larger than the table bitstring lookup length, but has restrictions on its possible lengths).

Notes

We might need to use the same challenge up to ~30 times.

var-length boolean-decomposed bitstring can potentially be optimised out by instead inlining the decomposition running sum (2k bits for 0 < k <= 80 or 127) because decomposition is almost free (in terms of advice area)

Conversely, in order to take a Poseidon output and truncate it to 160 bits, we are forced to fully decompose the output in order to extract the 160 bit outputs, and then we can just copy those bits to where we need them

@daira wants to optimise the usage of Poseidon outputs in the transcript, so when we sample multiple challenges in a row, we have the least number of "dangling decomposition rows" (e.g. running sum rows constraining upper bits of the field element), and the useful rows can then be inlined into other operations (like the endoscaling).

  • @ebfull points out that for Halo 2 specifically, we only sample at most two challenges in a row before needing to absorb data, and with 160-bit challenges we need at least two Poseidon outputs; therefore the simple thing (truncate each output to a single challenge) has exactly as many "dangling decomposition rows" as the packed form.
  • @str4d points out that the choice of simple vs packed affects the transcript API internals, which we're treating as a black box here. If we use the simple approach for Halo 2 recursion, it potentially makes the transcript API less efficient for other users who might benefit from packed. We should consider how to possibly tune the transcript API to enable this to be chosen by the user.

@str4d would like us to write "straight-line" chip implementations without any optimisations, both for use in checking the optimised chips, and to ensure that the instruction sets being defined are in fact usable with non-optimised chips and don't encode optimisation-specific assumptions.

Alg 1/2

If we want to rely in the implementation of Alg 1 on the assumption that there are no collisions on Alg 2, then we must limit the length of the Alg 1 bitstring input to some length that is (a few bits) less than the field size. The precise limit depends on details of the curve, and may or may not depend on the size of the lookup table that the user has requested.

Circuit polymorphism

For protocols like Zexe when verifying multiple circuits (or if we want to allow varying PCD graphs within a protocol), we need a stable specification of the recursive verification algorithm. This algorithm must be able to operate polymorphically over some family of circuits (constrained by the application protocol), and over proofs created by multiple versions of halo2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-recursion Area: Recursive proofs
Projects
None yet
3 participants