New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highly available validators #1758

Open
mdyring opened this Issue Jun 16, 2018 · 9 comments

Comments

Projects
None yet
5 participants
@mdyring
Contributor

mdyring commented Jun 16, 2018

As per tendermint/kms#29, I am not convinced that adding in redundancy at the KMS level is the right place to do it.

Assuming the redundant KMS runs on hardware isolated from the validator (gaiad), the validator itself would still remaing a single point of failure.

An idea I have been floating earlier is that of a "logical validator" concept, which I believe can also be made safe in the context of double signing (though I am not a consensus wizard). If done right, it should AFAICS be highly available and allow for simple and cheap deployment.

A logical validator means a number of gaiad instances signing with the same public key (via KMS or with locally stored key, doesn't matter). Let's imagine each such instance runs on its own dedicated hardware and call it a physical validator.

Each physical validator is assigned a priority, which is must be unique among all the physical validators that constitutes the logical validator.

All physical validators are active at the same time and behaves exactly as they do today. The only difference is that they include their unique priority when voting (the priority is signed as well).

Simply put, the idea is that the existing consensus code is modified (slightly) to ensure a logical validator is always participating in consensus as long as a one of the physical validators that constitute is, is alive.

Physical validators might observe transactions in a different order, essentially double signing. This is resolved at the consensus layer, by only considering input of the physical validator that has the highest priority and ignoring the rest of the votes from the same public key (logical validator).

To put this in active/passive terms: the physical validator with the highest priority, that manages to vote in a round, is thus considered the "active" part while the remaining physical validators (sharing same pubkey) are considered "passive". The existing consensus code thus takes care of failing over and there are no new dependencies introduced.

Double signing (with same priority and public key) should of course still result in slashing, but this should no longer by a risk of failing over.

Depending on the implementation details at the consensus layer, it seems to me that this might provide good availability while introducing little complexity.

It also lowers the barrier to entry to setup a HA validator, which should translate to more determistic block times overall.

Also see Virtual Router Redundancy Protocol.

Referencing #870.

@milosevic

This comment has been minimized.

Contributor

milosevic commented Jun 16, 2018

I am not sure consensus code is the right place to address validator HA aspects, it seems like mixing concerns. Furthermore, we are trying hard to simplify maximally consensus core code as it is critical part of the system, so it can be battle-tested, formally verified and proved correct. I like the idea of logical validator but with different meaning: having multiple validator nodes and multiple signing nodes that act from the outside as a single logical validator that is highly available and never double signs.

I was thinking about the following architecture (apologise in advance if something is not fully correct, I am not familiar with KMS details and with SocketPV part of Tendermint).

Assume set of Tendermint validator nodes that run normal Tendermint code with the constraint that they can't sign protocol messages. This part is delegated to a signing process that communicates with Tendermint over SocketPV network interface. The signing process is deterministic state machine whose responsibility is signing protocol message and ensuring that the validator does not double sign (therefore, remembering last hight/round/step in which it signed message, and messages itself). This part is maybe implemented (or can be implemented with KMS).

Every validator runs Tendermint consensus protocol and although it might receive messages out of order and even different set of messages, decision what block to commit in each height is the same at all validators (by Tendermint consensus guarantees), so validator replicas are kept in sync by Tendermint consensus protocol. The temporary local differences at validator nodes are not important as the signing process will sign only single message (for all validators replica) for each height/round/step. So set of validators replica plus the replicated signing process look from external world as a single deterministic process.

Implementing replicated signing process requires relying on state machine library for crash tolerant model (Raft, or Paxos based). The state machine logic shouldn't be very complex as the signing process responsibility is relatively simple (signing messages and remembering what has been signed). I am not sure if it exists good open sourced RSM (Raft or Paxos based) library out there.
This is not the official view of Tendermint team, only my current thinking on the subject.

@xla

This comment has been minimized.

Contributor

xla commented Jun 16, 2018

As it has been the basis for a wide variety of infrastructure projects over the last years there is a plethora of tested implementations. for Raft. Paxos itself is less popular as far as implementations go, but there CASPaxos which is a very recent attempt to iterate on the shortcomings of its predecessors.

@ebuchman

This comment has been minimized.

Contributor

ebuchman commented Jun 19, 2018

having multiple validator nodes and multiple signing nodes that act from the outside as a single logical validator that is highly available and never double signs.

I think this is what we want to go for, and I think we can do it all through the SocketPV (ie. without changes to Tendermint)

It should be noted that whenever you replicate key material across multiple HSMs you change the security model, even if you use a consensus algorithm across them, unless you build the consensus rules into the HSM itself. Otherwise there's no way for an HSM to know for certain "there was no signature produced for height X".

In any case, I don't think this is something that should be built into Tendermint any more than it already is (ie. the SocketPV). Really what we're talking about is a new program between Tendermint and KMS that implements the SocketPV but restricts forwarding messages to the actual KMS based on some consensus protocol.

@ebuchman

This comment has been minimized.

Contributor

ebuchman commented Jun 19, 2018

Here's one way to think about how to build that intermediate layer. We could actually do it as an ABCI app running on its own Tendermint cluster - let's call it clusign.

So we have the validator process, say gaiad, and the signer process, clusignd, and each has an ABCI-app and a Tendermint component, call them gaiad-app, gaiad-tm, clusignd-app, clusignd-tm.

Suppose we also give each clusign node an ID so we can distinguish who should sign.

The process could be like this:

  • clusignd-app implements SocketPV and connects to gaiad-tm as the signer
  • gaiad-tm sends a request for a signature to clusignd-app
  • clusignd-app sends a transaction to clusignd-tm with the HRS info and the ID of this clusign node
    • this is a signal to the cluster that clusign node with ID intends to sign for this HRS
  • The first transaction included in a clusignd-tm block with a particular HRS determines which clusign node gets to sign
  • clusign-app processes txs in the clusignd-tm block, checks if the ID in the first tx for an HRS corresponds to itself:
    • if it does, it proceeds to forward the original signature request to its KMS, and returns the result to gaiad-tm
    • if it doesn't, it responds to gaiad-tm with an empty signature

Note a few things about this solution:

  • Tendermint all the way down, fuck yeh, but it means we need 4 replicas to tolerate one crash. We could hack Tendermint to change the quorum size since we dont need BFT, but meh. At least its a useful illustration, and shouldn't be hard to implement on other consensus providers
  • Tendermint will need to be ok with receiving empty signatures from the KMS
  • Validator replicas are not actually synced about which blocks they propose and votes they send - that would require much deeper plumbing on Tendermint. Here, they will eg propose whatever they want, and only the validtor that gets a tx into the clusignd-tm first will actually get to sign their proposal. The rest will get ignored
@ebuchman

This comment has been minimized.

Contributor

ebuchman commented Jun 19, 2018

Note there is also the possibility of implementing a BLS pubkey type for validator signing. This would allow for multiple KMS to service a single validator, but it doesn't solve the problem of validator replicas. Also, AFAIKVZ (as far as I know via zaki ...), there's no constant time impls of BLS yet

@ebuchman

This comment has been minimized.

Contributor

ebuchman commented Jun 19, 2018

The above was mostly for illustration purposes around how to do this with a consensus algorithm and Tendermint. We can work towards that. A simpler short term solution would be to build the clusign process to implement something like what's suggested here: #1758 (comment)

Example of storage service and algorithm guaranteeing no double-signing:

Storing H, R, and S as independent 64bit unsigned integers (or whatever size is deemed sufficient for some degree of sufficient).
Processors A and B receive a message to sign at H=1, R=3, S=2.
Processor A succeeds to update (conditional upsert) or insert (primary key uniqueness constraint) into the database with the constraint that an item with the attributes H=1, R=3, S=1 does not exist.
Processor B necessarily fails to update the database as the constraint is now unsatisfiable and waits for the next HRS.
The only database property required in this scenario is a an atomic CAS operation, which is trivial and exists in many databases today

I would encourage folks to try to build something like this, but to do it as a layer between Tendermint and the KMS.

@mdyring

This comment has been minimized.

Contributor

mdyring commented Jun 19, 2018

So we have the validator process, say gaiad, and the signer process, clusignd, and each has an ABCI-app and a Tendermint component, call them gaiad-app, gaiad-tm, clusignd-app, clusignd-tm.

@ebuchman I conceptually like the idea of "Tendermint all the way down". If I understand the model correctly, it should protect against both gaiad and KMS becoming SPOF.

For this to work, clusign-tm should create a block as soon as it sees a tx - but that shouldn't be a problem.

I agree with @milosevic that Tendermint and consensus code should be kept as simple as possible. At the same time we need to explore how to provide production level HA at validator level, in a way that will make it "fairly easy" and cost effective for validators to setup (so they will actually use it)

For the purpose of network liveness and deterministic block times, from my perspective validator HA and consensus are closely related. Conceptually I don't see it as a mixing of concerns.

At the end of the day, any solution that solves SPOF of both gaiad and KMS with minimal changes gets my 👍 - it looks like it could be clusign.

@milosevic

This comment has been minimized.

Contributor

milosevic commented Jun 19, 2018

I am also all for "Tendermint all the way down". We will probably make it possible in the future to have
fault tolerant model as a configuration parameter so we can reduce the cost by one replica.

@ebuchman ebuchman added the validator label Jun 20, 2018

@melekes

This comment has been minimized.

Contributor

melekes commented Jun 20, 2018

I like proposed solution, but if there is a much simpler algorithm out there (without BFT and leader rotation, leader based, no logs even better), folks may be better with it (given there is a {their language} battle-tested implementation). But again, I'd encourage people to experiment and figure out the best possible solution.

@xla xla added this to the launch milestone Aug 2, 2018

@xla xla added the docs label Aug 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment