Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tendermint has four serialization protocols #608

Closed
aphyr opened this issue Aug 11, 2017 · 0 comments
Closed

Tendermint has four serialization protocols #608

aphyr opened this issue Aug 11, 2017 · 0 comments
Labels
C:rpc Component: JSON RPC, gRPC
Milestone

Comments

@aphyr
Copy link

aphyr commented Aug 11, 2017

Client implementers for Tendermint using Merkleeyes may need to understand and implement several distinct serialization formats: Tendermint URL Query Parameters, Tendermint JSON, Protocol Buffers, and Tendermint Wire. This issue exists to document the ways these formats interact, and to provide a locus of discussion on serialization. I suggest this issue be resolved by improving documentation, and possibly by reducing the number of formats.

Query Params

The Tendermint HTTP API generally takes URL-encoded query parameters and emits JSON, but it interprets parameters in a (sensible, but) somewhat unusual way. Numbers are encoded as their base-10 ASCII representations, e.g. ?num=123. Strings are wrapped in double quotes, e.g. ?string="some text". Byte arrays are represented as 0x followed by their hex-encoded bytes, e.g. ?bytes=0x1234abcd.

There is nothing particularly wrong about this, but users accustomed to APIs where the parse rules are inferred based on the expected type of a query parameter should be made explicitly aware of the serialization rules.

JSON

Tendermint JSON is used by the HTTP API, and also as an on-disk format in Tendermint's write-ahead log. Since it is JSON, parsers already exist, but clients will likely need to wrap their JSON ser/de code in an additional layer to interpret some datatypes: for instance, binary values are serialized as hex-encoded strings, but unlike URL parameters, no 0x prefix is included. Since Tendermint JSON is not self-describing, it is not possible to write a generalized ser/de library. Users must know the type of fields in advance to correctly decode them.

The structural representation of datatypes in Tendermint JSON is not always uniform. For example, the validator-set-change operation in Merkleeyes generates JSON responses which represent validator keys as a hex string, e.g. "1A5BCCF4..." . However, validator.json and genesis.json use an object like {"type": "ed25519", "data": "1A5BCCF4..."}. This makes it difficult for operators to integrate the two APIs. There should be a document which provides a standard ontology of Tendermint concepts, with some sort of (even informal!) type specification of their common representations.

The names of fields in Tendermint JSON also vary. For example, the validator-set-change operation in Merkleeyes produces JSON responses where validator votes are called power, but in genesis.json, votes are named amount. The design document should define canonical names for these ideas, and APIs should use those names consistently.

Protocol Buffers

Tendermint and blockchain applications communicate via ABCI which is a Protocol Buffers spec. That means ABCI implementers need to speak protocol buffers as well. As far as ABCI is concerned, the application's messages are opaque bytes, which is a wholly sensible choice, because implementers may not want to be locked into a particular serialization format. However...

Wire

... There is a fourth format, called Wire, which is Tendermint's homegrown serialization format for Merkleeyes. Wire is actually two formats: a binary format and also a JSON codec. I'm not sure whether Wire JSON is the same as Tendermint JSON. Wire appears relatively similar to Protocol Buffers, with a slightly different set of concrete types (e.g. time). They appear roughly similar in terms of compactness: protocol buffers spends an extra 1 or 2 tag bytes on each field, but by convention every field is optional, which allows for more compact representation in evolving schemas. Wire's varint encoding is usually one to two bytes longer than Protocol Buffers.

ABCI and Tendermint's network interfaces are intended to be language agnostic, but there are only Wire implementations for Go and (sort of) Javascript. Client implementers must write at least a partial Wire serializer in order to talk to Merkleeyes, and there is no formal specification or test suite for verifying that implementation. It might be worth dropping Wire in favor of protocol buffers, since PB is already a part of Tendermint, and implementations of PB are available in most languages already. If Wire does continue, it would be nice to have a design document, some sort of grammar (EBNF, flowchart, etc...), and a directory full of example messages in some well-known format (e.g. Tendermint JSON, protocol buffers) and their Wire-serialized equivalents, so implementers can test their Wire implementations.

@aphyr aphyr changed the title Tendermint has four serialization protocols Tendermint has three serialization protocols Aug 11, 2017
@aphyr aphyr changed the title Tendermint has three serialization protocols Tendermint has four serialization protocols Aug 11, 2017
@ebuchman ebuchman added C:rpc Component: JSON RPC, gRPC user labels Mar 20, 2018
@xla xla added this to the post-launch milestone Aug 2, 2018
@ebuchman ebuchman mentioned this issue Feb 28, 2019
cmwaters pushed a commit to cmwaters/tendermint that referenced this issue Dec 12, 2022
* Fix super linter CI

* Update linter.yml

* Port linter fix from tendermint#608

* Add missed ADRs

* fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:rpc Component: JSON RPC, gRPC
Projects
None yet
Development

No branches or pull requests

4 participants