Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mempool: disable MaxBatchBytes #5800

Merged
merged 9 commits into from
Dec 21, 2020
Merged

mempool: disable MaxBatchBytes #5800

merged 9 commits into from
Dec 21, 2020

Conversation

melekes
Copy link
Contributor

@melekes melekes commented Dec 15, 2020

@p4u from vocdoni.io reported that the mempool might behave incorrectly under a
high load. The consequences can range from pauses between blocks to the peers
disconnecting from this node.

My current theory is that the flowrate lib we're using to control flow
(multiplex over a single TCP connection) was not designed w/ large blobs
(1MB batch of txs) in mind.

I've tried decreasing the Mempool reactor priority, but that did not
have any visible effect. What actually worked is adding a time.Sleep
into mempool.Reactor#broadcastTxRoutine after an each successful send ==
manual control flow of sort.

As a temporary remedy (until the mempool package
is refactored), the max-batch-bytes was disabled. Transactions will be sent
one by one without batching

Closes #5796

My current theory is that the flowrate lib we're using to control flow
(multiplex over a single TCP connection) was not designed w/ large blobs
(1MB batch of txs) in mind.

I've tried decreasing the Mempool reactor priority, but that did not
have any visible effect. What actually worked is adding a time.Sleep
into mempool.Reactor#broadcastTxRoutine after an each successful send ==
manual control flow of sort.

Closes #5796
@melekes melekes self-assigned this Dec 15, 2020
@melekes melekes changed the title mempool: decrease MaxBatchBytes from 1MB to 4kB mempool: decrease MaxBatchBytes from 10MB to 4kB Dec 15, 2020
@codecov
Copy link

codecov bot commented Dec 15, 2020

Codecov Report

Merging #5800 (c1db2eb) into master (6a056e0) will increase coverage by 0.05%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5800      +/-   ##
==========================================
+ Coverage   59.77%   59.82%   +0.05%     
==========================================
  Files         262      262              
  Lines       23705    23688      -17     
==========================================
+ Hits        14169    14171       +2     
+ Misses       8023     8007      -16     
+ Partials     1513     1510       -3     
Impacted Files Coverage Δ
config/toml.go 60.86% <ø> (ø)
config/config.go 79.27% <100.00%> (+0.95%) ⬆️
mempool/reactor.go 82.50% <100.00%> (-3.87%) ⬇️
abci/client/socket_client.go 38.02% <0.00%> (-1.15%) ⬇️
mempool/clist_mempool.go 80.57% <0.00%> (-0.72%) ⬇️
consensus/reactor.go 74.21% <0.00%> (-0.13%) ⬇️
proxy/multi_app_conn.go 48.05% <0.00%> (ø)
blockchain/v0/reactor.go 63.96% <0.00%> (ø)
consensus/state.go 68.49% <0.00%> (+0.27%) ⬆️
... and 5 more

@melekes melekes marked this pull request as ready for review December 16, 2020 08:31
config/config.go Outdated Show resolved Hide resolved
@p4u
Copy link
Contributor

p4u commented Dec 16, 2020

I hope you are sure about this @melekes, you know Tendermint better than me, but this workaround looks like buring a very serious problem that eventually will be solved with a complete refactor in the future.

mempool/reactor.go Outdated Show resolved Hide resolved
@erikgrinaker
Copy link
Contributor

My current theory is that the flowrate lib we're using to control flow

The rate limits are configurable. @p4u, can you see if increasing send-rate and recv-rate in config.toml to e.g. 10000000 (10 MB/s) improves the situation?

@melekes melekes marked this pull request as draft December 16, 2020 11:01
@melekes melekes changed the title mempool: decrease MaxBatchBytes from 10MB to 4kB mempool: disable MaxBatchBytes Dec 16, 2020
@melekes melekes marked this pull request as ready for review December 16, 2020 12:09
@tessr
Copy link
Contributor

tessr commented Dec 16, 2020

this workaround looks like buring a very serious problem that eventually will be solved with a complete refactor in the future

This is something we're trying to balance. Although we intend to do a complete refactor, it's not slated to begin for a few months, and it will probably take a few months itself. I think we're looking at mid-2021 before the mempool refactor is ready to roll. In the meantime, we'd like to see if there's a quick (but safe) fix that we can apply to get everything working for you again.

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think the original PR that introduced this made some other changes as well, might be worth verifying that this change actually fixes the problems, i.e. that other bugs weren't introduced too.

@p4u
Copy link
Contributor

p4u commented Dec 16, 2020

This is something we're trying to balance. Although we intend to do a complete refactor, it's not slated to begin for a few months, and it will probably take a few months itself. I think we're looking at mid-2021 before the mempool refactor is ready to roll. In the meantime, we'd like to see if there's a quick (but safe) fix that we can apply to get everything working for you again.

Sure, I understand. I just wanted to point out that if the work around is not safe enough it would become a mess for anyone using Tendermint and upgrading to 0.34. If we are not sure of the safety of this, I'd revert the whole Batch Tx feature for now.

fix that we can apply to get everything working for you again.

Well, not only for me but for anyone using Tendermint I hope 👍

EDIT: I just noticed that the PR has been changed, so my comment was in the previous context that was quite risky IMO.

CHANGELOG_PENDING.md Outdated Show resolved Hide resolved
@tessr
Copy link
Contributor

tessr commented Dec 16, 2020

Is this PR description still accurate, @melekes?

@melekes
Copy link
Contributor Author

melekes commented Dec 16, 2020

Is this PR description still accurate, @melekes?

yes

@tessr tessr added this to the v0.34.1 milestone Dec 17, 2020
@melekes melekes merged commit 77deb71 into master Dec 21, 2020
@melekes melekes deleted the anton/5796-mempool branch December 21, 2020 15:17
melekes added a commit that referenced this pull request Dec 21, 2020
@p4u from vocdoni.io reported that the mempool might behave incorrectly under a
high load. The consequences can range from pauses between blocks to the peers
disconnecting from this node.

My current theory is that the flowrate lib we're using to control flow
(multiplex over a single TCP connection) was not designed w/ large blobs
(1MB batch of txs) in mind.

I've tried decreasing the Mempool reactor priority, but that did not
have any visible effect. What actually worked is adding a time.Sleep
into mempool.Reactor#broadcastTxRoutine after an each successful send ==
manual control flow of sort.

As a temporary remedy (until the mempool package
is refactored), the max-batch-bytes was disabled. Transactions will be sent
one by one without batching

Closes #5796
melekes added a commit that referenced this pull request Dec 21, 2020
@p4u from vocdoni.io reported that the mempool might behave incorrectly under a
high load. The consequences can range from pauses between blocks to the peers
disconnecting from this node.

My current theory is that the flowrate lib we're using to control flow
(multiplex over a single TCP connection) was not designed w/ large blobs
(1MB batch of txs) in mind.

I've tried decreasing the Mempool reactor priority, but that did not
have any visible effect. What actually worked is adding a time.Sleep
into mempool.Reactor#broadcastTxRoutine after an each successful send ==
manual control flow of sort.

As a temporary remedy (until the mempool package
is refactored), the max-batch-bytes was disabled. Transactions will be sent
one by one without batching

Closes #5796
cmwaters pushed a commit that referenced this pull request Jan 4, 2021
@p4u from vocdoni.io reported that the mempool might behave incorrectly under a
high load. The consequences can range from pauses between blocks to the peers
disconnecting from this node.

My current theory is that the flowrate lib we're using to control flow
(multiplex over a single TCP connection) was not designed w/ large blobs
(1MB batch of txs) in mind.

I've tried decreasing the Mempool reactor priority, but that did not
have any visible effect. What actually worked is adding a time.Sleep
into mempool.Reactor#broadcastTxRoutine after an each successful send ==
manual control flow of sort.

As a temporary remedy (until the mempool package
is refactored), the max-batch-bytes was disabled. Transactions will be sent
one by one without batching

Closes #5796
github-merge-queue bot pushed a commit to cometbft/cometbft that referenced this pull request Jan 17, 2024
This configuration is not used anymore; it's a leftover of batching txs
in the mempool, which was deprecated
(tendermint/tendermint#5800)
mergify bot pushed a commit to cometbft/cometbft that referenced this pull request Jan 17, 2024
This configuration is not used anymore; it's a leftover of batching txs
in the mempool, which was deprecated
(tendermint/tendermint#5800)

(cherry picked from commit dab72ad)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mempool: "weird stuff" when sending transaction batches
6 participants