New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lite client bisection is unsafe without counterfactual slashing #3244

Open
cwgoes opened this Issue Feb 2, 2019 · 8 comments

Comments

Projects
None yet
4 participants
@cwgoes
Copy link

cwgoes commented Feb 2, 2019

Take as the desideratum of our lite client commit verification algorithm that for it to be the case that a lite client which checks in at least once per unbonding period is successfully fooled, it must necessarily be the case that a certain fraction of bonded or unbonding stake has committed an equivocation and will be punished if the lite client publishes that equivocation to the chain. ("fooling a lite client is always costly")

As I understand it, the current mechanism by which our lite client verification algorithm bisects headers, checks the difference in validator sets, and skips intermediary header verification violates this property: it is possible, in certain cases with sufficient validator set flux, for a portion of stake to successfully fool a lite client at zero cost.

Consider the following example:

A Tendermint chain started at block 0 and is currently at block 20, with all blocks within a single unbonding period. At block 10, the chain had some validator set V_10, and at block 20 the chain had some validator set V_20. There is less than 1/3 overlap between V_10 and V_20 - i.e., in between blocks 10 and 20, the validator set has changed by 2/3.

Consider a lite client with the correct root-of-trust for block 10, now connecting to a full node and trying to verify the state at block 20 using the bisection optimization.

An honest full node will report V_20 correctly and the lite client will notice that the validator set has changed by a sufficient amount, request intermediary headers, and check that the appropriate next-validator-set hashes were signed.

However, a dishonest full node working in conjunction with 2/3 of the validator set of block 10 can instead report V'_20 where V'_20 overlaps with V_10 by 2/3. The lite client will calculate that the validator set has changed by less than 1/3, not request intermediary headers, and accept whatever state root V'_20 signed.

The lite client can now be costlessly fooled - V'_20 can sign whatever it wants, without committing any equivocation at all, since at least 2/3 of V'_20 aren't actually bonded at block 20!

I haven't read all the lite client code, so this might not be exactly correct, but I think this class of attack works for any case where the lite client skips intermediary header verification by checking validator set overlap and where a now-unbonded fraction of a validator set of an earlier block could sign a later block header without committing an equivocation.

The only way I see to preserve the safety of bisection is to slash validators for signing headers while unbonding - Tendermint would need to track validators in the unbonding state, and if any signature of a header of a height when they were in that state is discovered (not necessarily a double-signature, any signature at all), Tendermint must consider that a slashable fault and report it to the state machine for punishment (presumably equivalent to punishment for regular equivocation).

As a short term quick fix, we could also disable lite client bisection and verify all headers.

cc @ebuchman @jaekwon @sunnya97

@zmanian

This comment has been minimized.

Copy link
Contributor

zmanian commented Feb 3, 2019

So my POV on this has been that counter factual slashing should ideally be implemented in the SDK not in Tendermint.

it's a detail of the generalized Byzantine behavior that the needs to be expressible in the SDK and specific to the proof of stake implementation.

@zmanian

This comment has been minimized.

Copy link
Contributor

zmanian commented Feb 3, 2019

Vitalik was pointing out that another possible way to do things is delay voting power changes for the signing window.

@cwgoes

This comment has been minimized.

Copy link
Author

cwgoes commented Feb 4, 2019

So my POV on this has been that counter factual slashing should ideally be implemented in the SDK not in Tendermint.

At first glance this seems plausible - the SDK is already tracking the unbonding validators and has their consensus signing keys, we would only need to add parsing for the relevant Tendermint data structures and then evidence of a signature-while-unbonding could be submitted via transaction.

it's a detail of the generalized Byzantine behavior that the needs to be expressible in the SDK and specific to the proof of stake implementation.

It does seem somewhat application-specific, but so far we've tried to write a Tendermint-generic lite client, so any Tendermint lite client (desiring this safety property) will need to worry about this.

@jaekwon

This comment has been minimized.

Copy link
Contributor

jaekwon commented Feb 4, 2019

Great discussion.

For now can we assume (and DISCLAIM) that the validator set won't change too much within the unbonding period? I don't think it would be a bad assumption for the cosmos hub after launch. It could be a problem for some applications but I don't foresee any concrete problems on the horizon.

It could/should be an offense to sign a block header which includes the wrong ValidatorsHash (or something like that). That could just be handled by Tendermint as another kind of Evidence, with vote(s) and header w/ invalid .ValidatorsHash for the given .Height., and the SDK can deal with punishing.

@cwgoes

This comment has been minimized.

Copy link
Author

cwgoes commented Feb 4, 2019

For now can we assume (and DISCLAIM) that the validator set won't change too much within the unbonding period? I don't think it would be a bad assumption for the cosmos hub after launch. It could be a problem for some applications but I don't foresee any concrete problems on the horizon.

This seems likely in practice - and maybe even desirable to enforce in the state machine - but at the moment we don't have any guaranteed bound; if delegators changed their mind the entire validator set could change in two blocks.

If you mean that we could release the lite client as-is, with a disclaimer that malicious-full-node security is only provided under the assumption that the validator set has not changed too much, that doesn't seem too dangerous to me for the short term.

It could/should be an offense to sign a block header which includes the wrong ValidatorsHash (or something like that). That could just be handled by Tendermint as another kind of Evidence, with vote(s) and header w/ invalid .ValidatorsHash for the given .Height., and the SDK can deal with punishing.

That seems like it deals with this case (and maybe would be easier to implement?) - in general, however, I do wonder if we should just slash unbonding validators for signing any block header with their consensus keys. I don't see any reason they would have for doing so, and it reduces the set of possible signatures the lite client security model has to reason about.

@jaekwon

This comment has been minimized.

Copy link
Contributor

jaekwon commented Feb 4, 2019

If you mean that we could release the lite client as-is, with a disclaimer that malicious-full-node security is only provided under the assumption that the validator set has not changed too much, that doesn't seem too dangerous to me for the short term.

That's what I mean, yes.

That seems like it deals with this case (and maybe would be easier to implement?) - in general, however, I do wonder if we should just slash unbonding validators for signing any block header with their consensus keys. I don't see any reason they would have for doing so, and it reduces the set of possible signatures the lite client security model has to reason about.

That seems like another kind of offense we should punish for... but even bonded validators can sign bad blocks w/ the wrong ValidatorHash, which should be punished because it could contribute to an attack.

@ebuchman

This comment has been minimized.

Copy link
Contributor

ebuchman commented Feb 6, 2019

Opened #3259 to track an immediate fix at some performance hit.

One concern is how this evidence would be detected and included in blocks in the first place, since attackers just need to send it to light clients, not to the full nodes. If specific lite clients are being targeted and simultaneously eclipsed, the evidence may never get out in time. I think this is what justifies it being an application-specific concern, since the client needs proof that the signers are still validators, ie. that the validator set changes slowly, or that it's determined sufficiently far in advance. For apps that allow it to change too quickly, bisection may not be possible?

So for the Tendermint lite-client to safely update validator sets using bisection, it will need to rely on some injected function from the application. We can do this easily for the SDK since it's in-process, but it won't yet work with non Go apps unless we expose the lite API out-of-process. We can also push for Tendermint lite client implementations in all the languages as the way to unlock bisection 🔓

Note we can't use a new evidence type for this without also adding better evidence handling to ABCI (eg. CheckEvidence). For apps in other languages, they'd have to implement it as txs (and verify the vote signature).

@ebuchman ebuchman closed this Feb 6, 2019

@ebuchman ebuchman reopened this Feb 6, 2019

@ebuchman

This comment has been minimized.

Copy link
Contributor

ebuchman commented Feb 6, 2019

Whoops finger slipped.

but even bonded validators can sign bad blocks w/ the wrong ValidatorHash, which should be punished because it could contribute to an attack.

We should probably add an evidence type for this, but it also won't necessarily be detected in the consensus pkg for anyone other than the proposer, so will similarly need to be reported by lite clients themselves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment