Expand Candidate Selection section#87
Expand Candidate Selection section#87coriolinus wants to merge 6 commits intoparitytech:masterfrom coriolinus:prgn-candidate-backing
Conversation
While working on Candidate Selection, I kind of needed to create two different subsystems en passant. I think both of them were already planned, but this puts down a bit of starter text to flesh out each of them.
| The Candidate Selection subsystem monitors net traffic for two events: | ||
|
|
||
| - a new parablock candidate is available | ||
| - a peer has seconded a parablock candidate |
There was a problem hiding this comment.
The reason for this is that we'd prefer to avoid seconding the same parablock as another peer?
Given our short timeframe here, it might not be the case that higher candidate diversity is actually better.
There was a problem hiding this comment.
Yes, I'd had the impression that it was actually a slashable error to second the same parablock as a peer; the best we can do in that case is mark it as Valid.
There was a problem hiding this comment.
We do not afaik require slashing here. see https://github.com/w3f/research/pull/87/files#r427365330
| New parablock candidates may arrive from a potentially unbounded set of collators. This subsystem chooses either 0 or 1 of them per relay parent to second. If it chooses to second a candidate, it sends an apropriate message to the **Candidate Backing** subsystem to generate an appropriate `Statement`. | ||
|
|
||
| All parablocks which peers have seconded are also sent to the Candidate Backing subsystem for re-validation. | ||
| As seconded peers are tallied, double-votes are detected. If found, a report is sent to the **Misbehavior Arbitration** subsystem. |
There was a problem hiding this comment.
There is a division to be clarified about which subsystems are responsible for double-vote reporting. Candidate Backing, Statement Distribution, and Candidate Seconding subsystems are all going to be in position to detect double-vote misbehavior. We shouldn't be filing duplicate reports from all systems, since one should be sufficient. Of these, I think Statement Distribution is probably the best fit, as it's also the system that's capable of dropping new incoming statements - the marginal cost of double-voting more than once is 0, so someone who double-votes is likely to spam.
We could establish an invariant that statement distribution doesn't give other subsystems double-votes.
There was a problem hiding this comment.
OK, I don't feel strongly about which of those subsystems has the formal responsibility to detect double-votes. Agree that only one of them should really be in charge of it.
Not sure how to establish that invariant: couldn't we get Vote 1, distribute it, then receive Vote 2 from the same source? We can report double-voting at that point, and we can cease to distribute Vote 1, but we can't rescind those instances we've already sent.
There was a problem hiding this comment.
What's the distinction between Candidate Backing and Candidate Seconding, by the way? The latter is new to me.
There was a problem hiding this comment.
We do not afaik require double vote defenses:
We should prevent a parablock X's first backer from being first backer for the next parablock Y if X expired instead of progressing to availability, but that's only so that one bad first backer cannot halt a parachain for one rotation. We do not require slashing because replacing an expired candidate receipt reveals the previous first backer so replacing an expired candidate by a candidate with the same first backer makes the relay chain block invalid.
We could encourage duplicate backing votes on competing parablock because the relay chain block producer chooses among them definitively. Availability and approval votes cannot be duplicates by construction.
We only slash when two validators disagree about validity. We could impose penalties for poor participation for performance reasons, but that sounds like future optimization work.
There was a problem hiding this comment.
What does it mean to be the "first backer"? Ordering is really relative to network conditions. We sort the votes in the backing anyway - take a look at the Inclusion module. This first-backer constraint sounds complicated and don't seem to get us much, as the para will still be live upon the next rotation. Double-votes are the only misbehavior we want to want to discourage (although it's not strictly necessary, it's easy to catch and discourage) at this phase as everything else is covered by the approval system, which hasn't been specified yet. It's easy to avoid double-voting as an honest validator and the slashing condition is strict.
There was a problem hiding this comment.
Thanks! In that case, I'd propose to simplify Statements:
/// A statement about the validity of a parachain candidate
enum Statement {
/// This parachain candidate is valid.
Valid(CandidateReceipt),
/// The parachain candidate with this hash is invalid.
Invalid(Hash),
}Motivation: eliminating the Seconded variant makes it much more clear what the semantics should be: a validator should never contradict its own statements about a block, and all validators should agree on the validity. As far as I can tell, the only motivation for distinguishing the Seconded variant is to save some bytes for those validators who mark a block Valid. That feels like premature optimization, to me.
There was a problem hiding this comment.
Yup, we forbid self conflicts. :)
I still think XCMP might distinguish between the first backer and later backers, meaning multiple parachain validators could compute different candidate receipts from the same parachain block, so "seconding" could refer to backing parachain validators after the first one. I'd assumed you meant this initially actually..
In any case, we should avoid assuming a deterministic map from candidates to candidate receipts, which may distinguish this seconded notion.
There was a problem hiding this comment.
As far as I can tell, the only motivation for distinguishing the Seconded variant is to save some bytes for those validators who mark a block Valid. That feels like premature optimization, to me.
Actually, it's to bound the number of candidates a validator has to validate, which is a legitimate DoS vector. The Seconded message gives us an upper bound on the number of candidates we'd ever need to validate or distribute during the backing phase, as long as nobody wants to get slashed 100%. We can use the Seconded message to establish a data-dependency in gossip so we never have to deal with an unbounded amount of Valid messages either.
We can replace my suggestion to rotate "first backers" by merely asking that anytime we replace an expired candidate with signer set S_old by a new candidate with signer set S_new then S_new must include min_new_backers validators not in S_old. We'd take min_new_backers=0 if we asked for 2/3rds, but some parameter choices give min_new_backers>0. We could then second as many parablock with the same parent as we like.
I'm still not sure why we want this - does it actually give us any more security, given that we will ensure the block is available and we have a secondary-checking system? If it's to prevent a liveness issue, I'd rather not worry about it as rotations are pretty fast anyway.
There was a problem hiding this comment.
@burdges and I talked separately, the issues are based on different mental models and are resolved by having collators fully responsible for generating PoVs, as they do in this document, so the first backer is not as important. We should keep the Seconded message and don't need to worry about the candidate-replacement signer overlap stuff as with this division between collation and validation it is a liveness concern as opposed to a safety one.
There was a problem hiding this comment.
We figured this out. We'll want a clearer deliniation between collation and backing than my notion of "first" gave. It does not preclude later running the last phase of collation on validators, but this becomes an architectural question for cumulus, not A&V. :)
|
|
||
| New parablock candidates may arrive from a potentially unbounded set of collators. This subsystem chooses either 0 or 1 of them per relay parent to second. If it chooses to second a candidate, it sends an apropriate message to the **Candidate Backing** subsystem to generate an appropriate `Statement`. | ||
|
|
||
| All parablocks which peers have seconded are also sent to the Candidate Backing subsystem for re-validation. |
There was a problem hiding this comment.
The Candidate Backing subsystem should be getting Seconded statements from the statement distribution subsystem (not yet defined), so sending them again here seems redundant.
This makes me think we want another subsystem that actually does validation, because there are going to be situations like these where different subsystems validate candidates and we don't want to do the work twice: this one, which validates candidates received from collators, and the candidate backing system which validated candidates received from the network.
There was a problem hiding this comment.
Heh: I'd thought of breaking out a Candidate Validation subsystem, but I couldn't convince myself that it was sufficiently distinct from Candidate Backing to be worth it. That said, I'm perfectly willing to accept that distinction.
I think that we-the-team should probably have a verbal discussion in the near future to decide the rough overview of the enumeration of subsystems, and the major responsibilities of each. Otherwise we'll keep running into minor divergence of assumptions like this.
There was a problem hiding this comment.
In particular, our first validator produces lite client proofs about the relay chain parent, which later validators only check, so that requires some difference in execution context I guess.
There was a problem hiding this comment.
In particular, our first validator produces lite client proofs about the relay chain parent, which later validators only check, so that requires some difference in execution context I guess.
Well, maybe in XCMP. Not with the state of the doc as-is
|
|
||
| #### TODOs | ||
|
|
||
| - threshold of reports for deciding that a validator has misbehaved |
There was a problem hiding this comment.
A single double-vote report is full evidence, so that should be enough.
There was a problem hiding this comment.
Yes, anytime any two validators disagree about a parablock's validity, then we immediately escalate to all validators checking, and the minority side is slashed 100%.
We'll have lesser escalations later, but only for subtle things like a secondary approval checker being late, which may indicate a targeted DoS attack, or for a relay chain equivocation in BABE.
There was a problem hiding this comment.
While true, I've assumed we only need to do those escalations for blocks that going through the approval process. Full escalation during the backing phase is probably unnecessary.
There was a problem hiding this comment.
I think escalation here improves security since an adversary could miss calculate their ability to limit the other parachain validators access. We could postpone doing this escalation until later however since we do not consider this the primary mechanism. @AlistairStewart ?
I suppose excluding slashing from an entire module dramatically simplifies the code, yes? After escalation works in approvals then we could consider abstracting that code to cover both, so maybe 2021..
There was a problem hiding this comment.
Yeah, I think this might be one for 2021. The first AnV push will ideally focus only on stuff that's made its way on-chain, and we can schedule another one later. I don't want to block the whole project on marginal security gains. I'm pretty sure we can get 99% of the way there in much less time by focusing on the big stuff.
This document should be a simpler way to get a view of what the subsystems are, what their responsibilities are, and how the messages flow.
|
|
||
| ## Misbehavior Arbitration | ||
|
|
||
| The misbehavior arbitration module kicks in when two validator nodes disagree about a candidate's validity. In this case, _all_ validators, not just those assigned to its parachain, weigh in on the validity of this candidate. The minority is slashed. |
There was a problem hiding this comment.
Why not have a specific misbehavior system for double-votes? The dispute between valid/invalid would be handled mostly by the Validity module, a runtime API, and a SecondaryChecking subsystem.
- Validity module tracks blocks currently up for approval, backing, output of all secondary checks so far. it handles slashing when there are disputes.
- Runtime API makes all information available to node-side
- SecondaryChecks uses information about blocks up for approval (random seed, escalation) to schedule and perform secondary checks.
There was a problem hiding this comment.
We always envisioned the validity module counting the parachain validator's backing votes btw, so yes maybe it could just handle their slashing along with its own.
There was a problem hiding this comment.
This should not be a module, but a subsystem within the validity module; I misspoke.
The point of Misbehavior Arbitration is that this role handles things when the whole voting set needs to vote on something: this is the subsystem which says "hey, peers: everyone judge this particular block", tallies the votes, and slashes the minority. That behavior is fairly complicated, so I thought it made more sense to put it into its own subsystem than to build it into a different subsystem with a different primary responsibility.
In other words, assuming I understand the three-step checklist properly, I think Misbehavior Arbitration is the subsystem within the Validity module which handles step 1.
|
|
||
| Let's follow a parachain candidate through the system. This follows the happy path, but we'll note exit points along the way. | ||
|
|
||
| - Collator (external to this system) |
There was a problem hiding this comment.
I'd like collator subsystems to be covered here as well, since they're also block-based work.
| - Candidate Selection: every parachain validator can choose at most 1 candidate to back, so potentially many are discarded. | ||
| - Candidate Backing | ||
| - Statement Distribution: if there's disagreement about a candidate's validity, things move to misbehavior arbitration. | ||
| - Overseer: a quorum of validators unanimously agrees about this candidate's validity. |
There was a problem hiding this comment.
why overseer here?
should probably go to a "block authorship" subsystem (although actually block authorship will happen outside of the overseer/subsystem hierarchy, I've been thinking we'll create a subsystem responsible for gathering information for that block authorship logic.
and from there it goes into the runtime.
|
|
||
| ### Inbound Messages | ||
|
|
||
| - **Various Subsystems**: requests to validate a particular parablock candidate |
There was a problem hiding this comment.
one open question is how the PoV fetch is handled.
in the case of validating a block from a collator (to decide whether to second), the collator has sent us a PoV, so we could send it directly to the validation subsystem. In the case of validating a candidate seconded by another validator, we just have the receipt and not the PoV, so we'd want to proceed through these steps:
- Figure out if we've already validated the candidate
- If not, fetch PoV
- Upon PoV fetch, do validation
We might want something like a PoVOrigin sent alongside a validation request that tells us where to get the PoV. Included would be one option, and fetching from the network would be another. But maybe this is where things get complicated, since the network might want to use information about statements received from other validators to figure out who might have the PoV. At the moment, we'll just gossip all PoVs in the same subsystem as statements, so we can defer this to later.
|
We've punted several things here:
|
As the file at this new location included changes not present in paritytech/research#87, this is effectively a rebase, applied manually. I believe that I have successfully retained all of, and only, the intended changes.
|
Closing in favor of paritytech/polkadot#1161 as the canonical location of the edited document has moved into that repo. |
* migrate subsystem expansion PR from w3f/research As the file at this new location included changes not present in paritytech/research#87, this is effectively a rebase, applied manually. I believe that I have successfully retained all of, and only, the intended changes. * add section on collators * note why the overseer is the terminal message point for the validation subsystem * add detail about how the validitiy system gets PoV blocks * rename to Validity Subsystems Overview * get rid of changes to the implementor's guide I think it makes most sense to track and review the subsystems overview in a different PR than the one which adds content to the implementor's guide. * punt misbehavior arbitration details to the future * empty commit to rerun CI
Note: there are a bunch of whitespace edits where my editor stripped trailing whitespace; I can edit to leave those in, but IMO it's cleaner to just fix it.