Mempool: tx cache can be filled with invalid txs until memory is exhausted. #2180

ValarDragon · 2018-08-07T23:46:29Z

Found through hacker one!

In mempool.CheckTx() we add any incoming transaction to the mempool and cache, and after that we run check tx. CheckTx has a callback to delete the tx from the mempool and cache if check tx failed. However its not deleting it from the mempool's caches' linked list. This means that an attacker can spam garbage txs at the p2p layer, and they will clog up everyones mempool. Due to our default settings, each of these invalid Txs can be up to 1 MB, and the default mempool would take 100k of these. This means you can cause Out of Memory errors through the p2p layer. (As this allows for up to 100GB of RAM to be used up given enough time)

This can be quickly patched by making the callback delete from the linked list and write a test for callbacks deleting from all three locations. We should iterate from the back of the linked list to delete the element from there.
Additionally we should lower the default max tx size from 1 MB. I suggest 100KB. This should be a parameter in the config though.

Additionally we change the CheckTx functionality to only add it to the cache after checkTx passes. This could be done #postlaunch, but it needs to be done. If not, this same wasted space vector exists on txs that take forever to pass through checktx. (E.g. multisig with many accts) This attack vector is much less impactful, and likely would never crash a node, but no reason to let someone make you waste extra memory / computation.

melekes · 2018-08-08T08:11:16Z

However its not deleting it from the mempool's caches' linked list.

I raised this issue a long time ago. I will double check, but this should not be a problem since old txs get evicted when we push new txs (linked list has a fixed size)

Additionally we change the CheckTx functionality to only add it to the cache after checkTx passes.

the sole purpose of the cache is to prevent the same txs entering app.CheckTx(tx []byte)

melekes · 2018-08-08T08:17:27Z

And yes, we should enforce

tendermint/types/params.go

Line 68 in d07eaab

MaxBytes: 10240, // 10kB

ValarDragon · 2018-08-08T15:09:31Z

I raised this issue a long time ago. I will double check, but this should not be a problem since old txs get evicted when we push new txs (linked list has a fixed size)

Linked list fixed size is the same as the cache size which is 100k. This also causes bugs with the linked list trying to evict stuff from the hashmap that isnt in the cache. (Meaning you could make the cache enforce effictively nothing)

the sole purpose of the cache is to prevent the same txs entering app.CheckTx(tx []byte)

If thats the goal, were failing at that. We would need to not remove it from the hashmap, or the linked list. Currently were removing it from the hashmap (which is what we use to actually check this), and not the linked list. Even with not removing from the hashmap, this still opens up an adversary to being able to fill up your RAM quite aggresively. They could fill up 10KB * 2 * 100k = 2 GB under default params. That would make this problem less critical, but I think still quite subpar. We could cache the tx hash in the hashmap + linked list. That makes alot of sense to me actually, and removes alot of this concern. (the amount of maliciously wasted ram is now in the megabytes)

Storing txs only in the cache is very problematic, as our current implementation only checks duplicate txs through cache. The alternative (until we do my proposal in #2147 which will make searches in the tree log(n)) is that we have to do an O(N) search in CheckTx to make sure we aren't adding a duplicate. This is too expensive. For prelaunch, I'd be fine with raising the mempool cache size by a factor of 10 relative to the actual mempool size, and we just hope noone tries to exploit mempool cache eviction. (It would take 1 million tx receives if we raise the factor, so probably only a #postlaunch concern) the proper solution to this will come once we alter the data structure, so that we can properly check duplicates in the BST.

We shouldn't be adding things to the mempool until its passed checkTx though. But that can be changed #postlaunch.

melekes · 2018-08-08T17:11:00Z

This also causes bugs with the linked list trying to evict stuff from the hashmap that isnt in the cache. (Meaning you could make the cache enforce effictively nothing)

You mean this

tendermint/mempool/mempool.go

Lines 527 to 529 in 37ea704

    
           // NOTE: the tx may have already been removed from the map 
        
           // but deleting a non-existent element is fine 
        
           delete(cache.map_, string(poppedTx))

?

How exactly somebody can enforce nothing?

I suggest 100KB

10KB currently, but we're not enforcing it

We could cache the tx hash in the hashmap + linked list.

👍 if we do that, I think we don't need linked list. as for possible duplicates, that's fine since in our docs we explicitly say app should implement replay protection.

ValarDragon · 2018-08-08T17:20:09Z

The linked list is what handles evicting things from the hashmap. So consider the following:

hashmap + linked list filled with good txs
spam bad txs, these get removed from hashmap (since callback removes it), but each additional addition to linked list evicts a good tx from the end of the linked list + causes it to be evicted from the hashmap
linked list now only has bad spam txs, all good txs have been evicted from hashmap

+1 if we do that, I think we don't need linked list. as for possible duplicates, that's fine since in our docs we explicitly say app should implement replay protection.

Not sure why this lets us remove the linked list? We use the linked list to figure out once the cache is too big, which element from the hashmap we need to remove. (e.g. the order in which txs came through)

melekes · 2018-08-08T17:49:21Z

Oh, right. Thanks for the explanation. Yeah, we need to fix this asap.

ValarDragon · 2018-08-08T17:59:30Z

I'll start working on the fix for this then!

fullyallocated · 2018-08-08T18:37:05Z

Related: #2185

ValarDragon · 2018-08-08T21:48:58Z

I didn't realize earlier, syncing the linked list and the hashmap doesn't have to be O(N). Theres an easy O(1) solution, just make the hashmap store the linked list element pointer.

So the solution were at now is (AFAIK):

Make remove delete from linked list, by making the hashmap store the element ptr. (PR in progress)
Enforce max tx size (Being done in the MaxBytes PR)
Make cache use tx hash. Blocked on deciding what hash algorithm to use here, we want something fast. Ref mempool cache: use a fast hash #2187
Consider changing cache size (I think we should reduce it until we make the cache use a tx hash)
make callbacks no longer delete from cache

ValarDragon · 2018-08-09T03:43:15Z

Re item 5, Currently we remove bad txs from the cache because they may be good later. (Thwbks for pointing this out anton) We probably need to be more nuanced, and check what the abci error is before we decide to remove it from the cache. An alternative that may be simpler is have the hashmap also store the block height invalid txs were included, so they keep getting rejected until the subsequent block.

#2185 will take care alot of this particular problem, but we probs should have a solution at the mempool level.

ValarDragon · 2018-08-10T20:24:00Z

@melekes @ebuchman How do you guys feel about reducing the cache size until we use a tx hash. I think we should make the cache size like 10k until we implement cache using a tx hash. (Per #2187, that will be postlaunch) SHA2 is too slow here to make sense (imo), unless we used AVX2 SHA 2.

Then once the maxbytes PR and keeping linked list synced prs are merged, were done with this issue prelaunch.

ebuchman · 2018-08-10T21:36:20Z

SHA2 is too slow here to make sense

What is this based on ?

ValarDragon · 2018-08-10T22:58:44Z

Thanks for asking! Turns out the number I was basing that on was for the 1MB case, I had forgotten to adjust it now that we've reduced tx size to 10kb.

BenchmarkSHA256_1mb-8   	     500	   2432877 ns/op
BenchmarkSHA256_10kb-8   	   50000	     24605 ns/op

Sha2 should be fine at 10kb :)

melekes · 2018-08-29T04:20:07Z

Merged to develop.

ValarDragon added C:mempool Component: Mempool T:security Type: Security (specify priority) labels Aug 7, 2018

ValarDragon added this to the launch milestone Aug 7, 2018

ebuchman added T:bug Type Bug (Confirmed) C:mempool Component: Mempool and removed C:mempool Component: Mempool labels Aug 8, 2018

ValarDragon mentioned this issue Aug 8, 2018

mempool: Keep cache hashmap and linked list in sync #2188

Merged

4 tasks

ValarDragon mentioned this issue Aug 15, 2018

mempool: Store txs by hash inside of cache #2234

Merged

4 tasks

melekes closed this as completed Aug 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mempool: tx cache can be filled with invalid txs until memory is exhausted. #2180

Mempool: tx cache can be filled with invalid txs until memory is exhausted. #2180

ValarDragon commented Aug 7, 2018 •

edited by melekes

melekes commented Aug 8, 2018

melekes commented Aug 8, 2018

ValarDragon commented Aug 8, 2018 •

edited

melekes commented Aug 8, 2018

ValarDragon commented Aug 8, 2018

melekes commented Aug 8, 2018

ValarDragon commented Aug 8, 2018

fullyallocated commented Aug 8, 2018

ValarDragon commented Aug 8, 2018 •

edited

ValarDragon commented Aug 9, 2018 •

edited

ValarDragon commented Aug 10, 2018

ebuchman commented Aug 10, 2018

ValarDragon commented Aug 10, 2018

melekes commented Aug 29, 2018

Mempool: tx cache can be filled with invalid txs until memory is exhausted. #2180

Mempool: tx cache can be filled with invalid txs until memory is exhausted. #2180

Comments

ValarDragon commented Aug 7, 2018 • edited by melekes

melekes commented Aug 8, 2018

melekes commented Aug 8, 2018

ValarDragon commented Aug 8, 2018 • edited

melekes commented Aug 8, 2018

ValarDragon commented Aug 8, 2018

melekes commented Aug 8, 2018

ValarDragon commented Aug 8, 2018

fullyallocated commented Aug 8, 2018

ValarDragon commented Aug 8, 2018 • edited

ValarDragon commented Aug 9, 2018 • edited

ValarDragon commented Aug 10, 2018

ebuchman commented Aug 10, 2018

ValarDragon commented Aug 10, 2018

melekes commented Aug 29, 2018

ValarDragon commented Aug 7, 2018 •

edited by melekes

ValarDragon commented Aug 8, 2018 •

edited

ValarDragon commented Aug 8, 2018 •

edited

ValarDragon commented Aug 9, 2018 •

edited