Skip to content
This repository has been archived by the owner on Feb 23, 2022. It is now read-only.

RFC: priority-based mempool #154

Closed
wants to merge 10 commits into from
145 changes: 145 additions & 0 deletions rfc/004-priority-based-mempool.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# RFC 004: Priority-based Mempool

## Changelog

- 2020-09-02: Initial draft

## Author(s)

- Junjia He (@ninjaahhh)

## Context

User-defined priorities for blockchain transactions can be useful, e.g. `gasPrice` in Ethereum blockchain works as a first-price auction mechanism to cope with limited capacity. However in Tendermint, current approach for packing transactions into a block follows a first-in-first-out (FIFO) style which is baked in Tendermint engine and can't be affected by ABCI applications.

To make Tendermint support priority-based transaction processing, two main aspects of current architecture in Tendermint need to be updated: one is on ABCI's interface for accepting a transaction (`CheckTx`), and another is on Tendermint's mempool design and implementation. This RFC proposes corresponding changes to address these two aspects.

## Proposal

Add a new field `priority` to the ABCI `ResponseCheckTx` message:

```proto
message ResponseCheckTx {
uint32 code = 1;
bytes data = 2;
string log = 3; // nondeterministic
string info = 4; // nondeterministic
int64 gas_wanted = 5 [json_name = "gas_wanted"];
int64 gas_used = 6 [json_name = "gas_used"];
repeated Event events = 7
[(gogoproto.nullable) = false, (gogoproto.jsontag) = "events,omitempty"];
string codespace = 8;
uint64 priority = 9;
}
```

Then ABCI developers can add application-specific user-defined priorities for the transactions. This addition is naturally backward-compatible because by default all requests will be returned with 0 priority and on mempool side the process will fallback to FIFO.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that all txs with the same priority will be processed in FIFO order? If so, that places an additional constraint on any sorting or heap algorithm, i.e. that it is stable.

Copy link
Author

@ninjaahhh ninjaahhh Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, but easy tweaks can be made to achieve that even while the sorting is not stable: e.g. using "priority + timestamp" as the comparing key such that if two items have the same priority, the one with lower timestamp (earlier) should be ranked higher.


The next step is to make current mempool implementation priority-aware. We can have 3 phases to gradually implement this feature to make as least breaking changes as possible.

### Phase 0: Keep existing CList-based mempool

Goal:

1. Zero cost on processing "legacy" transactions (i.e. those have 0 priority)
2. When creating a block, transactions with higher priorities should be included earlier

The first phase will be experimental and only achieves the desired functionality *with potential performance degradation*. The main purpose is to design the code flow and underlying APIs accordingly, without dramatic code changes on existing data structures / algorithms.

A simple approach we can take is the following:

1. When reading `CheckTx` responses, mempool keeps a single counter called `maxPriority`. If all transactions have 0 priorities, the counter should simply have the same 0 value
2. When `Mempool.ReapMaxBytesMaxGas` is called by consensus engine, mempool would check its `maxPriority` value:
- if it's zero, it should follow existing logic to return eligible transactions in FIFO style
- if not, should return eligible transactions ranked by their priorities (either by `O(n^2)` brute-force or `O(nlogn)` sort). Then during commit, update mempool's `maxPriority` accordingly since transactions will be removed from mempool
ninjaahhh marked this conversation as resolved.
Show resolved Hide resolved

### Phase 1: Abstract away existing mempool interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a great idea and should move forward independently of the rest of this RFC - probably best to move this part to an ADR in Tendermint directly (it doesn't break protocols or APIs so no need for spec-based RFC process, just changes to Tendermint Go I think) and work on it there while we figure out how to actually support the protocol changes for prioritization

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. I'll draft an ADR covering details the refactoring.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: ADR proposed in tendermint/tendermint#5451


In case phase 0 is not enough (for reference, go-ethereum uses [a heap](https://github.com/ethereum/go-ethereum/blob/6c9f040ebeafcc680b0c457e6f4886e2bca32527/core/tx_list.go#L440) to do similar logic), we may want to change the underlying data structure storing transactions. However, before adding a new data structure to replace existing CList implementation, certain refactoring work needs to be finished.

Goal:

1. No functionality change and no performance improvement. This phase should be to design a common underlying mempool interface to allow different implementations

Existing mempool code has many places tightly coupled with CList implementation (like in mempool reactor), and this phase mainly works as a middle ground to abstract away this coupling to allow future mempool implementation using other data structures for better performances.

Proposed interface changes:

```golang
type basemempool struct {
mempoolImpl

// Atomic integers
height int64 // the last block Update()'d to
txsBytes int64 // total size of mempool, in bytes

// Notify listeners (ie. consensus) when txs are available
notifiedTxsAvailable bool
txsAvailable chan struct{} // fires once for each height, when the mempool is not empty

// Keep a cache of already-seen txs.
// This reduces the pressure on the proxyApp.
cache txCache
preCheck PreCheckFunc
postCheck PostCheckFunc

config *cfg.MempoolConfig

// Exclusive mutex for Update method to prevent concurrent execution of
// CheckTx or ReapMaxBytesMaxGas(ReapMaxTxs) methods.
updateMtx sync.RWMutex

wal *auto.AutoFile // a log of mempool txs
proxyAppConn proxy.AppConnMempool

logger log.Logger
metrics *Metrics
}

type mempoolImpl interface {
Size() int

addTx(*mempoolTx, uint64)
removeTx(types.Tx) bool // return whether corresponding element is removed or not
updateRecheckCursor()
reapMaxTxs(int) types.Txs
reapMaxBytesMaxGas(int64, int64) types.Txs // based on priority
recheckTxs(proxy.AppConnMempool)
isRecheckCursorNil() bool
getRecheckCursorTx() *mempoolTx
getMempoolTx(types.Tx) *mempoolTx
deleteAll()
// ...and more
}
```

Such that mempool's shared code will live under `basemempool` (callback handling, WAL, etc.) while different `mempoolImpl` only needs to implement required methods on transaction addition / removal / iteration etc. The proposed interface is [implemented in this code base](https://github.com/QuarkChain/tendermintx/blob/master/mempool/mempool.go).

### Phase 2: Implement mempool based on different data structures

We will have more discussions on what data structure fits the requirements best. For reference, we have some preliminary profiling results on [left-leaning red–black tree and BTree](https://github.com/QuarkChain/tendermintx/blob/master/mempool/bench_test.go#L18-L31).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need anything fancy here? Can we not just use a max heap?

Copy link
Contributor

@ValarDragon ValarDragon Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want a data structure that can be added to in parallel in a lock-free fashion, while having lookups that bypass the lcok. Neither max heaps or RB trees satisfy this when implemented 'normally'.

I am strongly in favor of the concurrent lock-free AVL variant from: http://ddana.cswp.cs.technion.ac.il/wp-content/uploads/sites/19/2015/12/logicalorderingavl.pdf

ref: tendermint/tendermint#2147 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. using balance tree can make the iteration during reaping mempool O(n) instead of O(nlogn) as it’s just an in-order traversal, while cost would be amortized during insertions. essentially a trade-off between performances of tx addition and reaping
  2. a lock-free data structure supporting parallel addition sounds really interesting. but considering the trade-off between performance gain and implementation complexity, I think some profilings can help get more insights. I would say if a simpler lock-based data structure can provide similar performance as current CList then it’s ok to have a lock

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am strongly in favor of the concurrent lock-free AVL variant from: http://ddana.cswp.cs.technion.ac.il/wp-content/uploads/sites/19/2015/12/logicalorderingavl.pdf

are there any implementations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lock-free data structure supporting parallel addition sounds really interesting.

yeah. also, being able to Reap transactions from the mempool and, in the same time, iterate over the tree (broadcasting transactions) would be nice too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm wrong here, but aren't CheckTx calls synchronous and serialized? So why do we need a lock-free data structure when the access to it will be serialized by CheckTx anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm wrong here, but aren't CheckTx calls synchronous and serialized?

they're supposed to be async https://github.com/tendermint/tendermint/blob/d1a88fe39f0200b89b406d783a5a2490cfe3c42a/mempool/clist_mempool.go#L291

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm wrong here, but aren't CheckTx calls synchronous and serialized? So why do we need a lock-free data structure when the access to it will be serialized by CheckTx anyway?

The calls are made asynchronously, but in serial order. But we may want to perform two other operations simultaneously:

  • broadcast txs to peers (requires reading from the set of txs)
  • reaping txs for a proposal block (requires reading an ordered batch of txs)


## Status

Proposed

## Consequences

### Positive

- ABCI blockchains will have more customizability on how transactions are included in blocks

### Negative

- At early phases, using priority-based mempools could face performance degradations
- Mempool implementations may have increased complexities

### Neutral

- Mempool code could undergo a non-trivial refactoring

## References

- [Introduction to ABCIx](https://forum.cosmos.network/t/introduction-to-abcix-an-extension-of-abci-with-greater-flexibility-and-security/3771/)
- [On ABCIx’s priority-based mempool implementation](https://forum.cosmos.network/t/on-abcixs-priority-based-mempool-implementation/3912)
- [Existing ABCIx implementation](https://github.com/QuarkChain/tendermintx)