Skip to content

Commit

Permalink
ADR 082: The Future of the Socket Protocol
Browse files Browse the repository at this point in the history
Based on our discussion at the dev call of 19-May-2022, this is a draft ADR
that proposes some design decisions about the API surface for out-of-process
ABCI applications in future versions of Tendermint.
  • Loading branch information
M. J. Fromberger committed May 23, 2022
1 parent 8e0d004 commit 3777302
Show file tree
Hide file tree
Showing 2 changed files with 214 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/architecture/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,3 +118,4 @@ None
- [ADR-071: Proposer-Based Timestamps](./adr-071-proposer-based-timestamps.md)
- [ADR-073: Adopt LibP2P](./adr-073-libp2p.md)
- [ADR-074: Migrate Timeout Parameters to Consensus Parameters](./adr-074-timeout-params.md)
- [ADR-082: The Future of the Socket Protocol](./adr-082-socket-protocol.md)
213 changes: 213 additions & 0 deletions docs/architecture/adr-082-socket-protocol.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# ADR 082: The Future of the Socket Protocol

## Changelog

- 19-May-2022: Initial draft (@creachadair)

## Status

(in review)

## Context

The [Application Blockchain Interface (ABCI)][abci] is a client-server protocol
used by the Tendermint consensus engine to communicate with the application on
whose behalf it performs state replication. There are currently three transport
options available for ABCI applications:

1. **In-process**: Applications written in Go can be linked directly into the
same binary as the consensus node. Such applications use a "local" ABCI
client, which exposes application methods to the node as direct function
calls.

2. **Socket protocol**: Out-of-process applications may export the ABCI service
via a custom socket protocol that sends requests and responses over a
Unix-domain or TCP socket connection as length-prefixed protocol buffers.
In Tendermint, this is handled by the [socket client][socket-client].

3. **gRPC**: Out-of-process applications may export the ABCI service via gRPC.
In Tendermint, this is handled by the [gRPC client][grpc-client].

Both the out-of-process options (2) and (3) have a long history in Tendermint.
The beginnings of the gRPC client were added in [May 2016][abci-start] when
ABCI was still hosted in a separate repository, and the socket client (formerly
called the "remote client") was part of ABCI from its inception in November
2015.

At that time when ABCI was first being developed, the gRPC project was very new
(it launched Q4 2015) and it was not an obvious choice for use in Tendermint.
It took a while before the language coverage and quality of gRPC reached a
point where it could be a viable solution for out-of-process applications. For
that reason, it made sense for the initial design of ABCI to focus on a custom
protocol for out-of-process applications.

## Problem Statement

For practical reasons, ABCI needs an interprocess communication option to
support applications not written in Go. The two practical options are RPC and
FFI, and for operational reasons an RPC mechanism makes more sense.

The socket protocol has not changed all that substantially since its original
design, and has the advantage of being simple to implement in almost any
reasonable language. However, its simplicity includes some limitations that
have had a negative impact on the stability and performance of out-of-process
applications using it. In particular:

- The protocol lacks request identifiers, so the client and server must
complete requests in strict FIFO order. Even if the client issues requests
that have no dependency on each other, the protocol has no way except order
of issue to map responses to requests. This strictly limits the application's
ability to process requests concurrently, and has been a source of complaints
from some network operators on that basis.

- The protocol lacks method identifiers, so the only way for the client and
server to understand which operation is requested is to dispatch on the type
of the request and response payloads. For responses, this means that [any
error condition is terminal not only to the request, but to the entire ABCI
client](https://github.com/tendermint/tendermint/blob/master/abci/client/socket_client.go#L149).

The historical intent of terminating for any error seems to have been that
all ABCI errors are unrecoverable and hence protocol fatal. In practice,
however, this greatly complicates debugging a faulty node, since the only way
to respond to errors is to panic the node which loses valuable context that
could have been logged.

- There are subtle concurrency management dependencies between the client and
the server that are not clearly documented anywhere, and it is very easy for
small changes in both the client and the server to lead to tricky deadlocks,
panics, race conditions, and slowdowns. As a recent example of this, see
https://github.com/tendermint/tendermint/pull/8581.

These limitations are fixable, but one important question is whether it is
worthwhile to fix them. We can add request and method identifiers, for
example, but doing so would be a breaking change to the protocol requiring
every application using it to update. If applications have to migrate anyway,
the stability and language coverage of gRPC have improved a lot, and today it
is probably simpler to set up and maintain an application using gRPC transport
than to reimplement the Tendermint socket protocol.

Moreover, gRPC addresses all the above issues out-of-the-box, and requires
(much) less custom code for both the server (i.e., the application) and the
client. The project is well-funded and widely-used, which makes it a safe bet
for a dependency.

## Decision

There is a set of related alternatives to consider:

- Question 1: Designate a single IPC standard for out-of-process applications?

Claim: We should converge on one (and only one) IPC option for out-of-process
applications. We should choose an option that, after a suitable period of
deprecation for alternatives, will address most or all the highest-impact
uses of Tendermint. Maintaining multiple options increases the surface area
for bugs and vulnerabilities, and we should not have multiple options for
basic interfaces without a clear and well-documented reason.

- Question 2a: Standardize on gRPC and deprecate/remove the socket protocol?

Claim: Maintaining and improving a custom RPC protocol is a substantial
project and not directly relevant to the requirements of consensus. We would
be better served by depending on a well-maintained open-source library like
gRPC.

- Question 2b: Improve the socket protocol and deprecate/remove gRPC?

Claim: If we find meaningful advantages to maintaining our own custom RPC
protocol in Tendermint, we should treat it as a first-class project within
the core and invest in making it good enough that we do not require other
options.

**One important consideration** when discussing these questions is that _any
outcome which includes keeping the socket protocol will have eventual migration
impacts for clients_ regardless. To fix the limitations of the socket protocol
as it is currently designed will require making _breaking changes_ to the
protocol. So, while we may put off a migration cost for clients by retaining
the socket protocol in the short term, we will eventually have to pay those
costs to fix the problems in its current design.

## Detailed Design

1. If we choose to standardize on gRPC, the main work in in Tendermint core
will be removing and cleaning up the code for the socket client and server.

Besides the code cleanup, we will also need to clearly document a
deprecation schedule, and invest time in making the migration easier for
applications currently using the socket protocol.

> **Point for discussion:** Migrating from the socket protocol to gRPC
> should mostly be a plumbing change, as long as we do it during a release
> in which we are not making other breaking changes to ABCI. However, the
> effort may be more or less depending on how gRPC integration works in the
> application's implementation language, and would have to be sure networks
> have plenty of time not only to make the change but to verify that it
> preserves the function of the network.
>
> What questions should we be asking node operators and application
> developers to understand the migration costs better?
2. If we choose to keep only the socket protocol, we will need to follow up
with a more detailed design for extending and upgrading the protocol to fix
the existing performance and operational issues with the protocol.

3. If we choose to keep both options, we will still need to do all the work of
(2), but the gRPC implementation should not require any immediate changes.


## Consequences

- **Standardize on gRPC**

- ✅ Addresses existing performance and operational issues.
- ✅ Replaces custom code with a well-maintained widely-used library.
- ✅ Aligns with Cosmos SDK, which already uses gRPC extensively.
- ✅ Aligns with priv validator interface, for which the socket protocol is already deprecated for gRPC.
- ❓ Applications will be hard to implement in a language without gRPC support.
- ⛔ All users of the socket protocol have to migrate to gRPC, and we believe most current out-of-process applications use the socket protocol.

- **Standardize on socket protocol**

- ✅ Less immediate impact for existing users (but see below).
- ✅ Simplifies ABCI API surface by removing gRPC.
- ❓ Users of the socket protocol will have a (smaller) migration.
- ❓ Potentially easier to implement for languages that do not have support.
- ⛔ Need to do all the work to fix the socket protocol (which will require existing users to update anyway later).
- ⛔ Ongoing maintenance burden for per-language server implementations.

- **Keep both options**

- ✅ Less immediate impact for existing users (but see below).
- ❓ Users of the socket protocol will have a (smaller) migration.
- ⛔ Still need to do all the work to fix the socket protocol (which will require existing users to update anyway later).
- ⛔ Requires ongoing maintenance and support of both gRPC and socket protocol integrations.


## References

- [Application Blockchain Interface (ABCI)][abci]
- [Tendermint ABCI socket client][socket-client]
- [Tendermint ABCI gRPC client][grpc-client]
- [Initial commit of gRPC client][abci-start]

[abci]: https://github.com/tendermint/spec/tree/master/spec/abci
[socket-client]: https://github.com/tendermint/tendermint/blob/master/abci/client/socket_client.go
[socket-server]: https://github.com/tendermint/tendermint/blob/master/abci/server/socket_server.go
[grpc-client]: https://github.com/tendermint/tendermint/blob/master/abci/client/grpc_client.go
[abci-start]: https://github.com/tendermint/abci/commit/1ab3c747182aaa38418258679c667090c2bb1e0d

## Appendix: Known Implementations of ABCI Socket Protocol

This is a list of known implementations of the Tendermint custom socket
protocol. Note that in most cases I have not checked how complete or correct
these implementations are; these are based on search results and a cursory
visual inspection.

- Tendermint Core (Go): [client][socket-client], [server][socket-server]
- Informal Systems [tendermint-rs](https://github.com/informalsystems/tendermint-rs) (Rust): [client](https://github.com/informalsystems/tendermint-rs/blob/master/abci/src/client.rs), [server](https://github.com/informalsystems/tendermint-rs/blob/master/abci/src/server.rs)
- Tendermint [js-abci](https://github.com/tendermint/js-abci) (JS): [server](https://github.com/tendermint/js-abci/blob/master/src/server.js)
- [Hotmoka](https://github.com/Hotmoka/hotmoka) ABCI (Java): [server](https://github.com/Hotmoka/hotmoka/blob/master/io-hotmoka-tendermint-abci/src/main/java/io/hotmoka/tendermint_abci/Server.java)
- [Tower ABCI](https://github.com/penumbra-zone/tower-abci) (Rust): [server](https://github.com/penumbra-zone/tower-abci/blob/main/src/server.rs)
- [abci-host](https://github.com/datopia/abci-host) (Clojure): [server](https://github.com/datopia/abci-host/blob/master/src/abci/host.clj)
- [abci_server](https://github.com/KrzysiekJ/abci_server) (Erlang): [server](https://github.com/KrzysiekJ/abci_server/blob/master/src/abci_server.erl)
- [py-abci](https://github.com/davebryson/py-abci) (Python): [server](https://github.com/davebryson/py-abci/blob/master/src/abci/server.py)
- [scala-tendermint-server](https://github.com/intechsa/scala-tendermint-server) (Scala): [server](https://github.com/InTechSA/scala-tendermint-server/blob/master/src/main/scala/lu/intech/tendermint/Server.scala)

0 comments on commit 3777302

Please sign in to comment.