-
Notifications
You must be signed in to change notification settings - Fork 59
Conversation
Co-authored-by: Marko <markobaricevic3778@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the incremental approach here. However, I am hesitant to introduce sorting to CList implementation because of potential performance hits. Maybe we should leave it as is, and instead focus on the interface and new implementation. What do others think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for writing a detailed RFC. I apologize for the confusion, the RFC process is being reworked with external contributors in mind.
The role of an RFC is to build a greater understanding between the teams implementing clients (tendermint-go & tendermint-rs) of the proposal, after an issue has been opened. In rare cases there is a clear understanding in the issue of the change and its intended ramifications there is no need for an RFC.
Let's take a step back, open an issue with the details around the spec change needed, adding priority to CheckTx
. We can facilitate discussion in the issue.
rfc/004-priority-based-mempool.md
Outdated
|
||
### Phase 2: Implement mempool based on different data structures | ||
|
||
We will have more discussions on what data structure fits the requirements best. For reference, we have some preliminary profiling results on [left-leaning red–black tree and BTree](https://github.com/QuarkChain/tendermintx/blob/master/mempool/bench_test.go#L18-L31). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need anything fancy here? Can we not just use a max heap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want a data structure that can be added to in parallel in a lock-free fashion, while having lookups that bypass the lcok. Neither max heaps or RB trees satisfy this when implemented 'normally'.
I am strongly in favor of the concurrent lock-free AVL variant from: http://ddana.cswp.cs.technion.ac.il/wp-content/uploads/sites/19/2015/12/logicalorderingavl.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- using balance tree can make the iteration during reaping mempool O(n) instead of O(nlogn) as it’s just an in-order traversal, while cost would be amortized during insertions. essentially a trade-off between performances of tx addition and reaping
- a lock-free data structure supporting parallel addition sounds really interesting. but considering the trade-off between performance gain and implementation complexity, I think some profilings can help get more insights. I would say if a simpler lock-based data structure can provide similar performance as current
CList
then it’s ok to have a lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am strongly in favor of the concurrent lock-free AVL variant from: http://ddana.cswp.cs.technion.ac.il/wp-content/uploads/sites/19/2015/12/logicalorderingavl.pdf
are there any implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lock-free data structure supporting parallel addition sounds really interesting.
yeah. also, being able to Reap transactions from the mempool and, in the same time, iterate over the tree (broadcasting transactions) would be nice too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm wrong here, but aren't CheckTx
calls synchronous and serialized? So why do we need a lock-free data structure when the access to it will be serialized by CheckTx
anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm wrong here, but aren't CheckTx calls synchronous and serialized?
they're supposed to be async https://github.com/tendermint/tendermint/blob/d1a88fe39f0200b89b406d783a5a2490cfe3c42a/mempool/clist_mempool.go#L291
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm wrong here, but aren't CheckTx calls synchronous and serialized? So why do we need a lock-free data structure when the access to it will be serialized by CheckTx anyway?
The calls are made asynchronously, but in serial order. But we may want to perform two other operations simultaneously:
- broadcast txs to peers (requires reading from the set of txs)
- reaping txs for a proposal block (requires reading an ordered batch of txs)
sounds good. I’ll use this RFC as the master issue for all my intended changes, and for me it’s perfectly ok if only parts of them manage to get merged. also agree with your last point, I’ll start a simpler issue on only adding |
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
I'll drop my notes on the mempool here: The mempool is a in-memory pool of uncommitted transactions.
The typical flow is: Current implementation and its flawsIn the heart of the current implementation there are a concurrent linked list and a map for a quick access (key - SHA256 hash of the transaction). The list is unordered. The map was added at a much later stage when we needed to record/retrieve the sender (in order to avoid sending the transaction back, which was wasteful). The mempool also contains the LRU cache of the most recently used transactions. The cache prevents same transactions from entering the mempool and serves as a basis layer of replay protection for the ABCI application. However, the ABCI applications must implement the additional means of protection in case the cache is not enough. It’s been reported that the linked list and the map could diverge after the cache becomes full - tendermint/tendermint#5281. I guess the biggest downside of the current implementation is the lack of prioritisation. It should give the ABCI application a way to establish some order, based on the fee or some other field(s). Dev and others also expressed some concerns about concurrency patters - “My concern is primarily that we currently have the downside that you can't keep receiving / broadcasting txs while updating / rechecking.“ (see tendermint/tendermint#2147 and tendermint/tendermint#2484) Other possible optimisations:
What we want from the new design:
ProposalSwitch from the CList to self-balancing AVL tree. QuarkChain (#154) suggests using a Red-Black tree. When choosing between different BST, AVL was picked because it’s good for in-memory usage (comparing to BTree, which is often used in databases where persistence is needed) and fast retrieval (comparing to Red-Black tree, which is better for When choosing between BST and array based structures (raw array, max heap), BST was picked because more granular locking is possible (i.e. we don't want to lock the entire struct). TODO: need to run tests to see if a map (similar to existing sync.Map) is needed to provide amortised O(1) access to individual elements (for retrieving/recording senders). Concurrency issues
A smaller tree can be used to receive transactions and broadcast them to other peers. This tree will be merged into the main one once rechecking is done.
When fee is always zero (for those applications who don’t use fee), it may be possible to issue sequence numbers and use them to sort transactions. Appendix A. What other projects are using?Libra mempool (https://developers.libra.org/docs/crates/mempool) employs a hash map of accounts plus a BTree for PriorityIndex https://github.com/libra/libra/blob/c5f6a2b4a6be63f6ef8f17a2c1cf192c9a23bb07/mempool/src/core_mempool/index.rs#L25-L27. |
another reasonable trade-off would be just using heaps which might also address the concerns you mentioned. during reaping, array copy of the tx heap and then fetch top-k from the copied heap. benefits:
comparing with alternatives |
another aspect is the tx removal after commit:
|
Wait, do we even need max-heap? We don't need to sort the mempool after each transaction immediately. So, we can just have an array, which will be sorted before reaping. |
} | ||
``` | ||
|
||
Then ABCI developers can add application-specific user-defined priorities for the transactions. This addition is naturally backward-compatible because by default all requests will be returned with 0 priority and on mempool side the process will fallback to FIFO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply that all txs with the same priority will be processed in FIFO order? If so, that places an additional constraint on any sorting or heap algorithm, i.e. that it is stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, but easy tweaks can be made to achieve that even while the sorting is not stable: e.g. using "priority + timestamp" as the comparing key such that if two items have the same priority, the one with lower timestamp (earlier) should be ranked higher.
rfc/004-priority-based-mempool.md
Outdated
|
||
### Phase 2: Implement mempool based on different data structures | ||
|
||
We will have more discussions on what data structure fits the requirements best. For reference, we have some preliminary profiling results on [left-leaning red–black tree and BTree](https://github.com/QuarkChain/tendermintx/blob/master/mempool/bench_test.go#L18-L31). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm wrong here, but aren't CheckTx
calls synchronous and serialized? So why do we need a lock-free data structure when the access to it will be serialized by CheckTx
anyway?
sorting array during reaping, upsides:
on the other hand, say k is the usual number of tx in a block and n is the number of mempool, downsides:
personally I think it's a good starting point, especially if n is small or k is relative big comparing with n (which means heap doesn't really show enough strengths) |
another benefit of array: after we've sorted the array and reaped k txs, the rest remains sorted, so we can use this knowledge and do not sort it again, but use another array and merge them using merge sort at some later point. |
@melekes can you elaborate? max-heaps typically aren't sorted internally. |
oh, really? could you point me to any resource? I thought the whole point of max-heap is that array is always sorted; when adding an element you perform heapifying - "To heapify an element in a max heap we need to find the maximum of its children and swap it with the current element. We continue this process until the heap property is satisfied at each node." https://www.journaldev.com/44248/max-heap-java also, I realized my comment only applies to array implementation. sorry for confusion. |
Haven't read this yet but just wanted to flag another approach which doesn't necessary negate this work but could complement: ABCI based reaping. The idea here is a new ABCI message (or potentially repurposed Query msg) that is sent to the app before reaping txs. The app could then return a list of tx hashes it wants included in the block, and the mempool could load those from its map. It could potentially also allow apps to return custom txs as well (that weren't in the mempool, but that they want included). This way the app would completely control reaping without any changes to the mempool data structures. Obviously we still want to improve the mempool data structures and find a solid priority-based solution but this might be a valuable intermediate step or even alternative approach, especially in domains where prioritization may be multi-dimensional or more complex than can be captured by a single |
- if it's zero, it should follow existing logic to return eligible transactions in FIFO style | ||
- if not, should return eligible transactions ranked by their priorities (e.g. `O(nlogn)` sort). Then during commit, update mempool's `maxPriority` accordingly since transactions will be removed from mempool | ||
|
||
### Phase 1: Abstract away existing mempool interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a great idea and should move forward independently of the rest of this RFC - probably best to move this part to an ADR in Tendermint directly (it doesn't break protocols or APIs so no need for spec-based RFC process, just changes to Tendermint Go I think) and work on it there while we figure out how to actually support the protocol changes for prioritization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good. I'll draft an ADR covering details the refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: ADR proposed in tendermint/tendermint#5451
- [remove gasWanted for non-validators](https://github.com/tendermint/tendermint/issues/2978) | ||
- [time-based eviction](https://github.com/tendermint/tendermint/issues/3436) | ||
|
||
What we want from the new design: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be helpful to have a clear more comprehensive statement of the functioning of the mempool. It should include the need to:
- Broadcast txs (in order) to peers, concurrently and efficiently (ie. without resending the same txs many times)
- Execute CheckTx serially but concurrently with other Tendermint functionality (like executing a block)
- Allow proposers to efficiently reap txs from the mempool (possibly concurrently with new CheckTxs happening?)
- Support tx rechecking and re-prioritization (?!) after a block is executed
- Ensure priority ordered txs still form a valid ordering (ie. they pass CheckTx in that order and thus would pass DeliverTx in that order)
- More?
The mempool has to handle a diverse range of concerns and it would be good to get clarity on them upfront so we can weigh the full set of constraints when evaluating designs.
It also might give us an opportunity to realize that certain requirements are actually not necessary, or certain other requirements are necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure priority ordered txs still form a valid ordering (ie. they pass CheckTx in that order and thus would pass DeliverTx in that order)
Does this mean we'll have to re-execute CheckTx on the whole mempool every time we sort it? Otherwise, how would we guarantee that a sequence will pass?
re-prioritization (?!)
we won't be needing this if we choose a tree, which is always sorted (actual)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we'll have to re-execute CheckTx on the whole mempool every time we sort it? Otherwise, how would we guarantee that a sequence will pass?
This is going to be a challenge. We'll have to figure something out here. Possibly the mempool will have to be more aware of the relationship between txs (eg if they pertain to the same "account", whatever that might be from tendermint's perspective). One way to possibly do this is leave it to the app to return a list of txs that are invalidated by a new tx, for instance ... (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure priority ordered txs still form a valid ordering (ie. they pass CheckTx in that order and thus would pass DeliverTx in that order)
where is this requirement coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the fact that we need proposers to proposer blocks that have a sequence of validly ordered txs. Like if I send a gaia tx with sequence 5 and a small fee, and then a nother tx with sequence 6 and a large fee, the sequence 5 tx still needs to come first! (in bitcoin land they might call this "child pays for parent")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Then we might need that "account" idea when CheckTx
optionally returns a grouping key ("account adress") and a sequence number. The proposer would need to replace a transaction with lower sequence number if such exists.
|
||
- Mempool code could undergo a non-trivial refactoring | ||
|
||
## Appendix A. Other Projects' Memory Pools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be helpful to provide some summarization here of how both the Libra and GoEthereum mempools work for context. Likely we can borrow quite a bit from them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libra mempool maintains a hashmap of accounts (each account contains a list of transactions, sorted by sequence number). Btree is used for PriorityIndex
(global index of transactions sorted by priority).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Let's add that to the text. Maybe we need ResponseCheckTx to include an "address" field so txs can be grouped according to their "sender" (whatever that is from tendermint's perspective lol) which will help manage some of the complexity here. Iirc, ethereum does something similar.
This is a case where being agnostic to txs is going to be challenging. Either we leak more through the abci boundary or we give the app more control here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's exactly the problem we faced when trying to simulate the "nonce" system from ethereum. implementing something like that could be useful but I think it's beyond the scope of this RFC and deserves its own.
will update the texts on how it's done in libra and eth.
This is an interesting idea. I considered giving the app full control of this to be too expensive, since we'd have to serialize and pass the entire mempool contents across ABCI, but I suppose the app has already seen the txs during However, I suppose this would require the application to either know about or control mempool eviction, to have a complete picture of which txs are still pending in the mempool - otherwise it could request txs that no longer exist. |
(While still reading up on the details), this idea fits very nicely with an issue we're having integrating a order-fairness-preprotocol into Tendermint; essentially, we will give a set of transactions to the mempool that need to be in the same block, which (as far as I know) currently needs some sort of ugly workarounds. A maybe easier way to implement this to reduce the requirement of the App to be aware of all txs would be to add tags to the txs, so that the App can get hashes and tags, and decide based on the tags what it wants to get into the next block. Another add-on to the prioritisation that might be helpful (for our fairness approach but probably other Apps that run a lot of independent things on one chain) would be to allow for independent priority queues; in this setting, again each tx has a tag, and then is prioritised only relative to messages with the same tag (so, for example, on a trading platform a cancel message can get priority over its related order, but not necessarily over unrelated orders). |
Yes, an app should have all this info. Txs only get dropped from mempool if they're invalidated by the app, or if they're included in a block, both of which the app can keep track of. This does move some of the complexity here into the ABCI frameworks so we have to be mindful. We'll still want sane defaults obviously on the tendermint side but this could be promising.
Right! This is actually something we've talked about before in the context of wanting the mempool to double for gossipping evidence, which should have a higher priority than txs. So we should probably add this as a requirement, that the mempool design can handle msgs of different gossip-priorities. One goal might be to consolidate the evidence reactor into the mempool (of course this will complicate the mempool further, but with the right design, could be worth it ..). Note these "gossip-priorities" are distinct from the economic prioritiy we've been talking about since we don't want ppl to be able to delay evidence by paying higher fees on their txs, so the priority of evidence is qualitatively higher than that of even a high fee tx |
I don't have full context for this requirement on "gossip-priorities". will it be a distinct topic or this should be implemented alongside the "economic priority"? |
It's one thing to see, another - to have a record. Having both Tendermint and ABCI app contain duplicate information (transactions) undermines the whole purpose of Tendermint/ABCI app separation of concerns. I'd say we should either keep mempool in Tendermint or move it to ABCI completely.
should we design TM for all possible cases? won't simple "priority" field be enough for 99% of applications? |
This struggle is real, and the mempool is a place where these concerns are most likely to leak across the ABCI boundary, so we should think through more possibilities here.
Don't we want the mempool to support evidence? I thought we might be able to do away with evidence reactor by consolidating it with mempool |
Considering the current evidence pool is 600 lines (albeit reactor is simple) and contains logic which is different from mempool https://github.com/tendermint/tendermint/blob/7eb4e5c0b1de7aad987717d974ceec7012993f08/evidence/pool.go#L186, I'm leaning towards keeping these two separate. Even if we do merge them together, you don't need multi-dimentional sorting. Instead, two separate priority queues can be used (we probably don't want to mix transactions and evidence together). |
I think our goal here is to find the right API to express the mempool relationship between the app and tendermint. it seems essentially the app needs to express two kinds of constraints: ordering (this tx must go before that one), and priority (barring other constraints, this tx should go before that one). I'm not sure what it means to fully give the mempool over to the app, since tendermint needs the txs to be able to gossip them to peers, and we probably dont want to push reactor logic through abci. Though I sympathize with recording redundant information, I'm not sure what the best way to resolve this is yet. The app could potentially jut keep track of mempool txs by some index (even an integer), and have a more complex protocol between the app and tendermint for requesting replays and determining ordering and priority. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
related to tendermint/tendermint#5314