Skip to content

Expand Candidate Selection section#87

Closed
coriolinus wants to merge 6 commits intoparitytech:masterfrom
coriolinus:prgn-candidate-backing
Closed

Expand Candidate Selection section#87
coriolinus wants to merge 6 commits intoparitytech:masterfrom
coriolinus:prgn-candidate-backing

Conversation

@coriolinus
Copy link
Copy Markdown
Contributor

Note: there are a bunch of whitespace edits where my editor stripped trailing whitespace; I can edit to leave those in, but IMO it's cleaner to just fix it.

While working on Candidate Selection, I kind of needed to create
two different subsystems en passant. I think both of them were
already planned, but this puts down a bit of starter text to flesh
out each of them.
The Candidate Selection subsystem monitors net traffic for two events:

- a new parablock candidate is available
- a peer has seconded a parablock candidate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this is that we'd prefer to avoid seconding the same parablock as another peer?

Given our short timeframe here, it might not be the case that higher candidate diversity is actually better.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd had the impression that it was actually a slashable error to second the same parablock as a peer; the best we can do in that case is mark it as Valid.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not afaik require slashing here. see https://github.com/w3f/research/pull/87/files#r427365330

New parablock candidates may arrive from a potentially unbounded set of collators. This subsystem chooses either 0 or 1 of them per relay parent to second. If it chooses to second a candidate, it sends an apropriate message to the **Candidate Backing** subsystem to generate an appropriate `Statement`.

All parablocks which peers have seconded are also sent to the Candidate Backing subsystem for re-validation.
As seconded peers are tallied, double-votes are detected. If found, a report is sent to the **Misbehavior Arbitration** subsystem.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a division to be clarified about which subsystems are responsible for double-vote reporting. Candidate Backing, Statement Distribution, and Candidate Seconding subsystems are all going to be in position to detect double-vote misbehavior. We shouldn't be filing duplicate reports from all systems, since one should be sufficient. Of these, I think Statement Distribution is probably the best fit, as it's also the system that's capable of dropping new incoming statements - the marginal cost of double-voting more than once is 0, so someone who double-votes is likely to spam.

We could establish an invariant that statement distribution doesn't give other subsystems double-votes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I don't feel strongly about which of those subsystems has the formal responsibility to detect double-votes. Agree that only one of them should really be in charge of it.

Not sure how to establish that invariant: couldn't we get Vote 1, distribute it, then receive Vote 2 from the same source? We can report double-voting at that point, and we can cease to distribute Vote 1, but we can't rescind those instances we've already sent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the distinction between Candidate Backing and Candidate Seconding, by the way? The latter is new to me.

Copy link
Copy Markdown
Contributor

@burdges burdges May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not afaik require double vote defenses:

We should prevent a parablock X's first backer from being first backer for the next parablock Y if X expired instead of progressing to availability, but that's only so that one bad first backer cannot halt a parachain for one rotation. We do not require slashing because replacing an expired candidate receipt reveals the previous first backer so replacing an expired candidate by a candidate with the same first backer makes the relay chain block invalid.

We could encourage duplicate backing votes on competing parablock because the relay chain block producer chooses among them definitively. Availability and approval votes cannot be duplicates by construction.

We only slash when two validators disagree about validity. We could impose penalties for poor participation for performance reasons, but that sounds like future optimization work.

Copy link
Copy Markdown
Contributor

@rphmeier rphmeier May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean to be the "first backer"? Ordering is really relative to network conditions. We sort the votes in the backing anyway - take a look at the Inclusion module. This first-backer constraint sounds complicated and don't seem to get us much, as the para will still be live upon the next rotation. Double-votes are the only misbehavior we want to want to discourage (although it's not strictly necessary, it's easy to catch and discourage) at this phase as everything else is covered by the approval system, which hasn't been specified yet. It's easy to avoid double-voting as an honest validator and the slashing condition is strict.

Copy link
Copy Markdown
Contributor Author

@coriolinus coriolinus May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! In that case, I'd propose to simplify Statements:

/// A statement about the validity of a parachain candidate
enum Statement {
    /// This parachain candidate is valid.
    Valid(CandidateReceipt),
    /// The parachain candidate with this hash is invalid.
    Invalid(Hash),
}

Motivation: eliminating the Seconded variant makes it much more clear what the semantics should be: a validator should never contradict its own statements about a block, and all validators should agree on the validity. As far as I can tell, the only motivation for distinguishing the Seconded variant is to save some bytes for those validators who mark a block Valid. That feels like premature optimization, to me.

Copy link
Copy Markdown
Contributor

@burdges burdges May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, we forbid self conflicts. :)

I still think XCMP might distinguish between the first backer and later backers, meaning multiple parachain validators could compute different candidate receipts from the same parachain block, so "seconding" could refer to backing parachain validators after the first one. I'd assumed you meant this initially actually..

In any case, we should avoid assuming a deterministic map from candidates to candidate receipts, which may distinguish this seconded notion.

Copy link
Copy Markdown
Contributor

@rphmeier rphmeier May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, the only motivation for distinguishing the Seconded variant is to save some bytes for those validators who mark a block Valid. That feels like premature optimization, to me.

Actually, it's to bound the number of candidates a validator has to validate, which is a legitimate DoS vector. The Seconded message gives us an upper bound on the number of candidates we'd ever need to validate or distribute during the backing phase, as long as nobody wants to get slashed 100%. We can use the Seconded message to establish a data-dependency in gossip so we never have to deal with an unbounded amount of Valid messages either.

We can replace my suggestion to rotate "first backers" by merely asking that anytime we replace an expired candidate with signer set S_old by a new candidate with signer set S_new then S_new must include min_new_backers validators not in S_old. We'd take min_new_backers=0 if we asked for 2/3rds, but some parameter choices give min_new_backers>0. We could then second as many parablock with the same parent as we like.

I'm still not sure why we want this - does it actually give us any more security, given that we will ensure the block is available and we have a secondary-checking system? If it's to prevent a liveness issue, I'd rather not worry about it as rotations are pretty fast anyway.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@burdges and I talked separately, the issues are based on different mental models and are resolved by having collators fully responsible for generating PoVs, as they do in this document, so the first backer is not as important. We should keep the Seconded message and don't need to worry about the candidate-replacement signer overlap stuff as with this division between collation and validation it is a liveness concern as opposed to a safety one.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We figured this out. We'll want a clearer deliniation between collation and backing than my notion of "first" gave. It does not preclude later running the last phase of collation on validators, but this becomes an architectural question for cumulus, not A&V. :)


New parablock candidates may arrive from a potentially unbounded set of collators. This subsystem chooses either 0 or 1 of them per relay parent to second. If it chooses to second a candidate, it sends an apropriate message to the **Candidate Backing** subsystem to generate an appropriate `Statement`.

All parablocks which peers have seconded are also sent to the Candidate Backing subsystem for re-validation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Candidate Backing subsystem should be getting Seconded statements from the statement distribution subsystem (not yet defined), so sending them again here seems redundant.

This makes me think we want another subsystem that actually does validation, because there are going to be situations like these where different subsystems validate candidates and we don't want to do the work twice: this one, which validates candidates received from collators, and the candidate backing system which validated candidates received from the network.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh: I'd thought of breaking out a Candidate Validation subsystem, but I couldn't convince myself that it was sufficiently distinct from Candidate Backing to be worth it. That said, I'm perfectly willing to accept that distinction.

I think that we-the-team should probably have a verbal discussion in the near future to decide the rough overview of the enumeration of subsystems, and the major responsibilities of each. Otherwise we'll keep running into minor divergence of assumptions like this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, our first validator produces lite client proofs about the relay chain parent, which later validators only check, so that requires some difference in execution context I guess.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, our first validator produces lite client proofs about the relay chain parent, which later validators only check, so that requires some difference in execution context I guess.

Well, maybe in XCMP. Not with the state of the doc as-is


#### TODOs

- threshold of reports for deciding that a validator has misbehaved
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single double-vote report is full evidence, so that should be enough.

Copy link
Copy Markdown
Contributor

@burdges burdges May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, anytime any two validators disagree about a parablock's validity, then we immediately escalate to all validators checking, and the minority side is slashed 100%.

We'll have lesser escalations later, but only for subtle things like a secondary approval checker being late, which may indicate a targeted DoS attack, or for a relay chain equivocation in BABE.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While true, I've assumed we only need to do those escalations for blocks that going through the approval process. Full escalation during the backing phase is probably unnecessary.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think escalation here improves security since an adversary could miss calculate their ability to limit the other parachain validators access. We could postpone doing this escalation until later however since we do not consider this the primary mechanism. @AlistairStewart ?

I suppose excluding slashing from an entire module dramatically simplifies the code, yes? After escalation works in approvals then we could consider abstracting that code to cover both, so maybe 2021..

Copy link
Copy Markdown
Contributor

@rphmeier rphmeier May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this might be one for 2021. The first AnV push will ideally focus only on stuff that's made its way on-chain, and we can schedule another one later. I don't want to block the whole project on marginal security gains. I'm pretty sure we can get 99% of the way there in much less time by focusing on the big stuff.

Comment thread docs/polkadot/implementors-guide/guide.md Outdated
This document should be a simpler way to get a view of what the subsystems
are, what their responsibilities are, and how the messages flow.

## Misbehavior Arbitration

The misbehavior arbitration module kicks in when two validator nodes disagree about a candidate's validity. In this case, _all_ validators, not just those assigned to its parachain, weigh in on the validity of this candidate. The minority is slashed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have a specific misbehavior system for double-votes? The dispute between valid/invalid would be handled mostly by the Validity module, a runtime API, and a SecondaryChecking subsystem.

  1. Validity module tracks blocks currently up for approval, backing, output of all secondary checks so far. it handles slashing when there are disputes.
  2. Runtime API makes all information available to node-side
  3. SecondaryChecks uses information about blocks up for approval (random seed, escalation) to schedule and perform secondary checks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always envisioned the validity module counting the parachain validator's backing votes btw, so yes maybe it could just handle their slashing along with its own.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a module, but a subsystem within the validity module; I misspoke.

The point of Misbehavior Arbitration is that this role handles things when the whole voting set needs to vote on something: this is the subsystem which says "hey, peers: everyone judge this particular block", tallies the votes, and slashes the minority. That behavior is fairly complicated, so I thought it made more sense to put it into its own subsystem than to build it into a different subsystem with a different primary responsibility.

In other words, assuming I understand the three-step checklist properly, I think Misbehavior Arbitration is the subsystem within the Validity module which handles step 1.


Let's follow a parachain candidate through the system. This follows the happy path, but we'll note exit points along the way.

- Collator (external to this system)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like collator subsystems to be covered here as well, since they're also block-based work.

- Candidate Selection: every parachain validator can choose at most 1 candidate to back, so potentially many are discarded.
- Candidate Backing
- Statement Distribution: if there's disagreement about a candidate's validity, things move to misbehavior arbitration.
- Overseer: a quorum of validators unanimously agrees about this candidate's validity.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why overseer here?

should probably go to a "block authorship" subsystem (although actually block authorship will happen outside of the overseer/subsystem hierarchy, I've been thinking we'll create a subsystem responsible for gathering information for that block authorship logic.

and from there it goes into the runtime.


### Inbound Messages

- **Various Subsystems**: requests to validate a particular parablock candidate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one open question is how the PoV fetch is handled.

in the case of validating a block from a collator (to decide whether to second), the collator has sent us a PoV, so we could send it directly to the validation subsystem. In the case of validating a candidate seconded by another validator, we just have the receipt and not the PoV, so we'd want to proceed through these steps:

  1. Figure out if we've already validated the candidate
  2. If not, fetch PoV
  3. Upon PoV fetch, do validation

We might want something like a PoVOrigin sent alongside a validation request that tells us where to get the PoV. Included would be one option, and fetching from the network would be another. But maybe this is where things get complicated, since the network might want to use information about statements received from other validators to figure out who might have the PoV. At the moment, we'll just gossip all PoVs in the same subsystem as statements, so we can defer this to later.

@burdges
Copy link
Copy Markdown
Contributor

burdges commented May 21, 2020

We've punted several things here:

coriolinus added a commit to paritytech/polkadot that referenced this pull request May 27, 2020
As the file at this new location included changes not present in
paritytech/research#87, this is effectively a
rebase, applied manually. I believe that I have successfully retained
all of, and only, the intended changes.
@coriolinus
Copy link
Copy Markdown
Contributor Author

Closing in favor of paritytech/polkadot#1161 as the canonical location of the edited document has moved into that repo.

@coriolinus coriolinus closed this May 27, 2020
rphmeier pushed a commit to paritytech/polkadot that referenced this pull request Jun 4, 2020
* migrate subsystem expansion PR from w3f/research

As the file at this new location included changes not present in
paritytech/research#87, this is effectively a
rebase, applied manually. I believe that I have successfully retained
all of, and only, the intended changes.

* add section on collators

* note why the overseer is the terminal message point for the validation subsystem

* add detail about how the validitiy system gets PoV blocks

* rename to Validity Subsystems Overview

* get rid of changes to the implementor's guide

I think it makes most sense to track and review the subsystems
overview in a different PR than the one which adds content to
the implementor's guide.

* punt misbehavior arbitration details to the future

* empty commit to rerun CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants