-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ADR 082: The Future of the Socket Protocol
Based on our discussion at the dev call of 19-May-2022, this is a draft ADR that proposes some design decisions about the API surface for out-of-process ABCI applications in future versions of Tendermint.
- Loading branch information
M. J. Fromberger
committed
May 23, 2022
1 parent
6ff77ee
commit 98581b4
Showing
2 changed files
with
214 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,213 @@ | ||
# ADR 082: The Future of the Socket Protocol | ||
|
||
## Changelog | ||
|
||
- 19-May-2022: Initial draft (@creachadair) | ||
|
||
## Status | ||
|
||
(in review) | ||
|
||
## Context | ||
|
||
The [Application Blockchain Interface (ABCI)][abci] is a client-server protocol | ||
used by the Tendermint consensus engine to communicate with the application on | ||
whose behalf it performs state replication. There are currently three transport | ||
options available for ABCI applications: | ||
|
||
1. **In-process**: Applications written in Go can be linked directly into the | ||
same binary as the consensus node. Such applications use a "local" ABCI | ||
client, which exposes application methods to the node as direct function | ||
calls. | ||
|
||
2. **Socket protocol**: Out-of-process applications may export the ABCI service | ||
via a custom socket protocol that sends requests and responses over a | ||
Unix-domain or TCP socket connection as length-prefixed protocol buffers. | ||
In Tendermint, this is handled by the [socket client][socket-client]. | ||
|
||
3. **gRPC**: Out-of-process applications may export the ABCI service via gRPC. | ||
In Tendermint, this is handled by the [gRPC client][grpc-client]. | ||
|
||
Both the out-of-process options (2) and (3) have a long history in Tendermint. | ||
The beginnings of the gRPC client were added in [May 2016][abci-start] when | ||
ABCI was still hosted in a separate repository, and the socket client (formerly | ||
called the "remote client") was part of ABCI from its inception in November | ||
2015. | ||
|
||
At that time when ABCI was first being developed, the gRPC project was very new | ||
(it launched Q4 2015) and it was not an obvious choice for use in Tendermint. | ||
It took a while before the language coverage and quality of gRPC reached a | ||
point where it could be a viable solution for out-of-process applications. For | ||
that reason, it made sense for the initial design of ABCI to focus on a custom | ||
protocol for out-of-process applications. | ||
|
||
## Problem Statement | ||
|
||
For practical reasons, ABCI needs an interprocess communication option to | ||
support applications not written in Go. The two practical options are RPC and | ||
FFI, and for operational reasons an RPC mechanism makes more sense. | ||
|
||
The socket protocol has not changed all that substantially since its original | ||
design, and has the advantage of being simple to implement in almost any | ||
reasonable language. However, its simplicity includes some limitations that | ||
have had a negative impact on the stability and performance of out-of-process | ||
applications using it. In particular: | ||
|
||
- The protocol lacks request identifiers, so the client and server must | ||
complete requests in strict FIFO order. Even if the client issues requests | ||
that have no dependency on each other, the protocol has no way except order | ||
of issue to map responses to requests. This strictly limits the application's | ||
ability to process requests concurrently, and has been a source of complaints | ||
from some network operators on that basis. | ||
|
||
- The protocol lacks method identifiers, so the only way for the client and | ||
server to understand which operation is requested is to dispatch on the type | ||
of the request and response payloads. For responses, this means that [any | ||
error condition is terminal not only to the request, but to the entire ABCI | ||
client](https://github.com/tendermint/tendermint/blob/master/abci/client/socket_client.go#L149). | ||
|
||
The historical intent of terminating for any error seems to have been that | ||
all ABCI errors are unrecoverable and hence protocol fatal. In practice, | ||
however, this greatly complicates debugging a faulty node, since the only way | ||
to respond to errors is to panic the node which loses valuable context that | ||
could have been logged. | ||
|
||
- There are subtle concurrency management dependencies between the client and | ||
the server that are not clearly documented anywhere, and it is very easy for | ||
small changes in both the client and the server to lead to tricky deadlocks, | ||
panics, race conditions, and slowdowns. As a recent example of this, see | ||
https://github.com/tendermint/tendermint/pull/8581. | ||
|
||
These limitations are fixable, but one important question is whether it is | ||
worthwhile to fix them. We can add request and method identifiers, for | ||
example, but doing so would be a breaking change to the protocol requiring | ||
every application using it to update. If applications have to migrate anyway, | ||
the stability and language coverage of gRPC have improved a lot, and today it | ||
is probably simpler to set up and maintain an application using gRPC transport | ||
than to reimplement the Tendermint socket protocol. | ||
|
||
Moreover, gRPC addresses all the above issues out-of-the-box, and requires | ||
(much) less custom code for both the server (i.e., the application) and the | ||
client. The project is well-funded and widely-used, which makes it a safe bet | ||
for a dependency. | ||
|
||
## Decision | ||
|
||
There is a set of related alternatives to consider: | ||
|
||
- Question 1: Designate a single IPC standard for out-of-process applications? | ||
|
||
Claim: We should converge on one (and only one) IPC option for out-of-process | ||
applications. We should choose an option that, after a suitable period of | ||
deprecation for alternatives, will address most or all the highest-impact | ||
uses of Tendermint. Maintaining multiple options increases the surface area | ||
for bugs and vulnerabilities, and we should not have multiple options for | ||
basic interfaces without a clear and well-documented reason. | ||
|
||
- Question 2a: Standardize on gRPC and deprecate/remove the socket protocol? | ||
|
||
Claim: Maintaining and improving a custom RPC protocol is a substantial | ||
project and not directly relevant to the requirements of consensus. We would | ||
be better served by depending on a well-maintained open-source library like | ||
gRPC. | ||
|
||
- Question 2b: Improve the socket protocol and deprecate/remove gRPC? | ||
|
||
Claim: If we find meaningful advantages to maintaining our own custom RPC | ||
protocol in Tendermint, we should treat it as a first-class project within | ||
the core and invest in making it good enough that we do not require other | ||
options. | ||
|
||
**One important consideration** when discussing these questions is that _any | ||
outcome which includes keeping the socket protocol will have eventual migration | ||
impacts for clients_ regardless. To fix the limitations of the socket protocol | ||
as it is currently designed will require making _breaking changes_ to the | ||
protocol. So, while we may put off a migration cost for clients by retaining | ||
the socket protocol in the short term, we will eventually have to pay those | ||
costs to fix the problems in its current design. | ||
|
||
## Detailed Design | ||
|
||
1. If we choose to standardize on gRPC, the main work in in Tendermint core | ||
will be removing and cleaning up the code for the socket client and server. | ||
|
||
Besides the code cleanup, we will also need to clearly document a | ||
deprecation schedule, and invest time in making the migration easier for | ||
applications currently using the socket protocol. | ||
|
||
> **Point for discussion:** Migrating from the socket protocol to gRPC | ||
> should mostly be a plumbing change, as long as we do it during a release | ||
> in which we are not making other breaking changes to ABCI. However, the | ||
> effort may be more or less depending on how gRPC integration works in the | ||
> application's implementation language, and would have to be sure networks | ||
> have plenty of time not only to make the change but to verify that it | ||
> preserves the function of the network. | ||
> | ||
> What questions should we be asking node operators and application | ||
> developers to understand the migration costs better? | ||
2. If we choose to keep only the socket protocol, we will need to follow up | ||
with a more detailed design for extending and upgrading the protocol to fix | ||
the existing performance and operational issues with the protocol. | ||
|
||
3. If we choose to keep both options, we will still need to do all the work of | ||
(2), but the gRPC implementation should not require any immediate changes. | ||
|
||
|
||
## Consequences | ||
|
||
- **Standardize on gRPC** | ||
|
||
- ✅ Addresses existing performance and operational issues. | ||
- ✅ Replaces custom code with a well-maintained widely-used library. | ||
- ✅ Aligns with Cosmos SDK, which already uses gRPC extensively. | ||
- ✅ Aligns with priv validator interface, for which the socket protocol is already deprecated for gRPC. | ||
- ❓ Applications will be hard to implement in a language without gRPC support. | ||
- ⛔ All users of the socket protocol have to migrate to gRPC, and we believe most current out-of-process applications use the socket protocol. | ||
|
||
- **Standardize on socket protocol** | ||
|
||
- ✅ Less immediate impact for existing users (but see below). | ||
- ✅ Simplifies ABCI API surface by removing gRPC. | ||
- ❓ Users of the socket protocol will have a (smaller) migration. | ||
- ❓ Potentially easier to implement for languages that do not have support. | ||
- ⛔ Need to do all the work to fix the socket protocol (which will require existing users to update anyway later). | ||
- ⛔ Ongoing maintenance burden for per-language server implementations. | ||
|
||
- **Keep both options** | ||
|
||
- ✅ Less immediate impact for existing users (but see below). | ||
- ❓ Users of the socket protocol will have a (smaller) migration. | ||
- ⛔ Still need to do all the work to fix the socket protocol (which will require existing users to update anyway later). | ||
- ⛔ Requires ongoing maintenance and support of both gRPC and socket protocol integrations. | ||
|
||
|
||
## References | ||
|
||
- [Application Blockchain Interface (ABCI)][abci] | ||
- [Tendermint ABCI socket client][socket-client] | ||
- [Tendermint ABCI gRPC client][grpc-client] | ||
- [Initial commit of gRPC client][abci-start] | ||
|
||
[abci]: https://github.com/tendermint/spec/tree/master/spec/abci | ||
[socket-client]: https://github.com/tendermint/tendermint/blob/master/abci/client/socket_client.go | ||
[socket-server]: https://github.com/tendermint/tendermint/blob/master/abci/server/socket_server.go | ||
[grpc-client]: https://github.com/tendermint/tendermint/blob/master/abci/client/grpc_client.go | ||
[abci-start]: https://github.com/tendermint/abci/commit/1ab3c747182aaa38418258679c667090c2bb1e0d | ||
|
||
## Appendix: Known Implementations of ABCI Socket Protocol | ||
|
||
This is a list of known implementations of the Tendermint custom socket | ||
protocol. Note that in most cases I have not checked how complete or correct | ||
these implementations are; these are based on search results and a cursory | ||
visual inspection. | ||
|
||
- Tendermint Core (Go): [client][socket-client], [server][socket-server] | ||
- Informal Systems [tendermint-rs](https://github.com/informalsystems/tendermint-rs) (Rust): [client](https://github.com/informalsystems/tendermint-rs/blob/master/abci/src/client.rs), [server](https://github.com/informalsystems/tendermint-rs/blob/master/abci/src/server.rs) | ||
- Tendermint [js-abci](https://github.com/tendermint/js-abci) (JS): [server](https://github.com/tendermint/js-abci/blob/master/src/server.js) | ||
- [Hotmoka](https://github.com/Hotmoka/hotmoka) ABCI (Java): [server](https://github.com/Hotmoka/hotmoka/blob/master/io-hotmoka-tendermint-abci/src/main/java/io/hotmoka/tendermint_abci/Server.java) | ||
- [Tower ABCI](https://github.com/penumbra-zone/tower-abci) (Rust): [server](https://github.com/penumbra-zone/tower-abci/blob/main/src/server.rs) | ||
- [abci-host](https://github.com/datopia/abci-host) (Clojure): [server](https://github.com/datopia/abci-host/blob/master/src/abci/host.clj) | ||
- [abci_server](https://github.com/KrzysiekJ/abci_server) (Erlang): [server](https://github.com/KrzysiekJ/abci_server/blob/master/src/abci_server.erl) | ||
- [py-abci](https://github.com/davebryson/py-abci) (Python): [server](https://github.com/davebryson/py-abci/blob/master/src/abci/server.py) | ||
- [scala-tendermint-server](https://github.com/intechsa/scala-tendermint-server) (Scala): [server](https://github.com/InTechSA/scala-tendermint-server/blob/master/src/main/scala/lu/intech/tendermint/Server.scala) |