Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mempool: Change data structure #2147

Closed
ValarDragon opened this issue Aug 5, 2018 · 6 comments
Closed

mempool: Change data structure #2147

ValarDragon opened this issue Aug 5, 2018 · 6 comments
Labels
C:mempool Component: Mempool S:proposal Status: Proposal T:enhancement Type: Enhancement
Milestone

Comments

@ValarDragon
Copy link
Contributor

ValarDragon commented Aug 5, 2018

Currently our mempool uses a linked list with a cache implemented as a hashmap (default size 100k).
The operations involved have the following asymptotics:
(assume mempool has N items already in it)

Add Tx - O(1) check in cache, locks on the cache. (Note this has a non-neglible hash involved, so the constant here is important!) Doesn't check if tx is in the main mempool linked list, which means we can add duplicate txs if our mempool size in the config is greater than our cache size.
Update k Txs - O(N), progresses serially. This doesn't parallelize because we iterate through every tx, and run recheck Tx blocking on that response. We also lock on the cache here, so parallelizing this part would still cause the threads to hang at the last step.
Reap k Txs - O(k) locks, unsorted. cache locks and unlocks here.

The cache locking and unlocking is on every operation is a huge barrier to this scaling. We also want our mempool to be sorted, which would mean our Add operation would have to be O(N). (This is a property required of basically all blockchain mempools, to handle periods of high tx volume)

I propose that we switch to the following concurrent AVL tree, that doesn't require locks: http://ddana.cswp.cs.technion.ac.il/wp-content/uploads/sites/19/2015/12/logicalorderingavl.pdf. Looking at table 1, this allows large numbers of threads quite well. Our expected number of txs in the mempool is between 10^3 and 10^5 I'd guess. No locking means we don't have to fear deadlocks in our implementation, and that implementation is significantly simpler. The other suggested implementations there either have locks, or "virtual nodes" which opens up fears of memleaks. (Note this suggestion presumes implementation of an ABCI "priority" function for txs)

This allows us to having the sorting property we want, and get rid of all locks, so we can easily support parallelizability in Add Tx and Update Tx. (No need to even lock between them!!) (Its not fully clear to me if Reap can be parallelized as well, however we don't need parallellizability in reap at all) We also use lots of extra memory in our Clist, so we will get a significantly more memory efficient implementation. (As this gets rid of the cache as well)

Our operations are all now either log(n) or k log (n) for the batch variants. We get the advantage that we can execute them all in parallel safely, this tree is self-balancing, and no constant hash overheads.

@ValarDragon
Copy link
Contributor Author

cc @jack

@jaekwon
Copy link
Contributor

jaekwon commented Sep 4, 2018

The purpose of the mempool keeping its order, is to keep determinism in checkTx. This gives us the property that a proposer who proposes a block, as long as it is running the existing mempool logic, is guaranteed to propose a block whose txs all pass checkTx. That's maybe not a property that the Cosmos Hub cares about, but it might be important for other applications.

There may be no need to optimize this part, because blocks shouldn't take all 100% of the possible CPU/hours available, because otherwise nodes that are behind will never be able to catch up. So ideally each block only requires 10% or so of the processing time available, and so this extra synchronous CheckTx step isn't particularly onerous, as far as I can tell. The block had already been committed, and it's time to execute this block, and after we do so we'll probably want to wait a bit before the next block anyways.

I'm not sure that that concurrent AVL tree is sufficient for determinism... we should discuss strategies for enabling parallel computation while still preserving determinism, and yeah there are a number of algorithms that gives us the ability to merge trees together after somewhat isolated processing. In any case, the design space of how to parallelize transactions on a single chain is large, and warrants its own mega-discussion thread, because there's a lot to discuss explore there... mostly after launch :)

@ValarDragon
Copy link
Contributor Author

ValarDragon commented Sep 4, 2018

Totally agree that this all post launch :)

The purpose of the mempool keeping its order, is to keep determinism in checkTx. This gives us the property that a proposer who proposes a block, as long as it is running the existing mempool logic, is guaranteed to propose a block whose txs all pass checkTx.

I believe when building a block it is reasonable to have the software default to require that checkTx of all txs passes. I agree that this set of checkTx calls can be synchronous (This is what I meant when I said however we don't need parallellizability in reap at all). This may also just be safer in the situation where we go with my recheck proposal and if a chain has a MsgChangePubkey and they forget to account for this in recheck.

There may be no need to optimize this part, because blocks shouldn't take all 100% of the possible CPU/hours available, because otherwise nodes that are behind will never be able to catch up.

I wrote this from the perspective that mempool sorting is something we eventually want. If we were to go with a scheme similar to Vitalik Buterin's min fee (which is very similar to the minfee you've described in person), then we would still need mempool sorting. Github issue describing it: zcash/zcash#3473, paper: https://ethresear.ch/uploads/default/original/2X/1/197884012ada193318b67c4b777441e4a1830f49.pdf, ethresearch post: https://ethresear.ch/t/draft-position-paper-on-resource-pricing/2838. The TL;DR; of it is it makes a more socially optimal fee mechanism, by having a consensus determined min fee. This min-fee gets burned, and doesn't go to the miner. We have the min-fee adapt based on network conditions to optimize for blocks being 50% full. (Threshold can be altered) A tx then has a miner reward on top of the fee, to incentivize inclusion on-chain. He does a significant amount of analysis to show why this is better.

Given that we want mempool sorting, we need to adapt our data structure anyway. (A linked list is too slow) I even venture that a linked list is too slow without the need for sorting, the part that is particularly concerning regarding performance currently is our Update function. (Filter out txs that already have appeared) It is O(N), and each update involves a mutex lock so it can't be parallelized.

My concern is mostly not with the above though. (As you are right, as we long as we limit the amount of CPU that takes, that alone wouldn't affect block production / verification) My concern is primarily that we currently have the downside that you can't keep receiving / broadcasting txs while updating / rechecking. This limits our p2p network, especially in the situation of low block times. This will be true of all data structures that aren't concurrency safe. Now once we add a concurrency safe data structure we can receive a new block and new txs in parallel. (While retaining being sorted and balanced!) This means that our p2p network will be more robust.

I'm not sure that that concurrent AVL tree is sufficient for determinism... we should discuss strategies for enabling parallel computation while still preserving determinism, and yeah there are a number of algorithms that gives us the ability to merge trees together after somewhat isolated processing. In any case, the design space of how to parallelize transactions on a single chain is large, and warrants its own mega-discussion thread, because there's a lot to discuss explore there... mostly after launch :)

Totally agree that on-chain parallelization is its own mega-discussion haha. I'm not really sure what the determinism here that you are worrying about is. (I think my above paragraph may address your concern?) If its w.r.t to Reap, I do agree with keeping Reap unparallelized, and I even think enforcing CheckTx there is reasonable.

@ValarDragon
Copy link
Contributor Author

Jae and I had an in person conversation about this awhile ago. I forgot to write up the results of it.

The idea we had converged on at the end of the convo was basically to have a concurrent tree with sorting enabled for storing the bulk of the mempool, with txs verified independently. However, each validator should also maintain a few "threads" of txs. The idea of a "thread" is that these are all verified to pass CheckTx when ran sequentially. (So the proposer can basically instantly propose its highest value thread) This is unlike the mempool, as the mempool constantly changes ordering.

When initially building a thread, you make it sorted by value. If you detect that you are proposing soon, you check to see if your mempool contains any txs with greater fees than what you have in the thread. Suppose there are n better txs in the mempool than whats in the thread. Then you remove the last n txs from the thread, and put these new ones.

The idea behind multiple threads is to reduce re-computation when you receive another persons block. There is a lot of design work for how to optimally do this thread concept. Please note that this thread design can be done incrementally. Phase 1 may just have a single thread, and a slightly slower update process right before you Reap. Later phases can incorporate more threads, and more improvements for incremental updates, and detection for when you will be proposing soon.

@xla
Copy link
Contributor

xla commented Sep 25, 2018

Would be extremely valuable to write this up in an ADR, created #2484 to keep track.

@tac0turtle
Copy link
Contributor

The idea is encapsulated in #2484, there is also a link from that issue to this one. The information will not be lost

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:mempool Component: Mempool S:proposal Status: Proposal T:enhancement Type: Enhancement
Projects
None yet
Development

No branches or pull requests

4 participants