Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p/conn: large (or a lot of) outgoing messages from other reactors can block consensus reactor from making progress #5816

Closed
melekes opened this issue Dec 21, 2020 · 3 comments
Labels
C:p2p Component: P2P pkg stale for use by stalebot T:bug Type Bug (Confirmed)
Milestone

Comments

@melekes
Copy link
Contributor

melekes commented Dec 21, 2020

See #5796 and #3920 (comment)

What happened

Large messages (or many messages) produced by other reactors and scheduled to be send can temporarily block consensus reactor from making progress.

Why do you think it happens?

Either libs/flow library does not perform as expected or p2p/conn/connection scheduling logic is invalid.

What did you expect?

No halting. Tendermint top priority is exchanging votes and making blocks with a few transactions + evidence (if any). Consensus reactor messages should have a top priority, while other (e.g. mempool gossip, evidence) should have a lower priority.

@melekes melekes added T:bug Type Bug (Confirmed) C:p2p Component: P2P pkg labels Dec 21, 2020
@erikgrinaker
Copy link
Contributor

erikgrinaker commented Dec 21, 2020

This is basically #2888. The new P2P stack will have separate queues per reactor channel.

@melekes
Copy link
Contributor Author

melekes commented Dec 21, 2020

This is basically #2888. The new P2P stack will have separate queues per reactor channel.

How's that? The issue here is incorrect dispatching (sending) while multiplexing over single TCP stream, while #2888 is about individual Reactor#Receive blocking receiving messages => sending != receiving. But you're right that if we adopt QUIC (independent streams), this issue will go away.

@erikgrinaker
Copy link
Contributor

Ah, got it -- you're right, different issue. New P2P stack will handle this as well, by having separate outbound queues per peer with some scheduling policy, and dropping messages if a peer can't keep up to avoid blocking reactors.

@melekes melekes changed the title p2p/conn: large (a lot of) messages in other reactors can block consensus reactor from making progress p2p/conn: large (or a lot of) outgoing messages from other reactors can block consensus reactor from making progress Dec 21, 2020
@tessr tessr added this to Untriaged 🥚 in Tendermint Core Project Board via automation Dec 21, 2020
@tessr tessr moved this from Untriaged 🥚 to To Do 📝 in Tendermint Core Project Board Dec 21, 2020
@tessr tessr added this to the v0.34.1 milestone Dec 21, 2020
@tessr tessr moved this from To Do 📝 to In progress 🏃🏻 in Tendermint Core Project Board Dec 21, 2020
mergify bot pushed a commit that referenced this issue Dec 23, 2020
blockchain/vX reactor priority was decreased because during the normal operation
(i.e. when the node is not fast syncing) blockchain priority can't be
the same as consensus reactor priority. Otherwise, it's theoretically possible to
slow down consensus by constantly requesting blocks from the node.

NOTE: ideally blockchain/vX reactor priority would be dynamic. e.g. when
the node is fast syncing, the priority is 10 (max), but when it's done
fast syncing - the priority gets decreased to 5 (only to serve blocks
for other nodes). But it's not possible now, therefore I decided to
focus on the normal operation (priority = 5).

evidence and consensus critical messages are more important than
the mempool ones, hence priorities are bumped by 1 (from 5 to 6).

statesync reactor priority was changed from 1 to 5 to be the same as
blockchain/vX priority.

Refs #5816
melekes added a commit that referenced this issue Dec 23, 2020
blockchain/vX reactor priority was decreased because during the normal operation
(i.e. when the node is not fast syncing) blockchain priority can't be
the same as consensus reactor priority. Otherwise, it's theoretically possible to
slow down consensus by constantly requesting blocks from the node.

NOTE: ideally blockchain/vX reactor priority would be dynamic. e.g. when
the node is fast syncing, the priority is 10 (max), but when it's done
fast syncing - the priority gets decreased to 5 (only to serve blocks
for other nodes). But it's not possible now, therefore I decided to
focus on the normal operation (priority = 5).

evidence and consensus critical messages are more important than
the mempool ones, hence priorities are bumped by 1 (from 5 to 6).

statesync reactor priority was changed from 1 to 5 to be the same as
blockchain/vX priority.

Refs #5816
tessr pushed a commit that referenced this issue Dec 23, 2020
blockchain/vX reactor priority was decreased because during the normal operation
(i.e. when the node is not fast syncing) blockchain priority can't be
the same as consensus reactor priority. Otherwise, it's theoretically possible to
slow down consensus by constantly requesting blocks from the node.

NOTE: ideally blockchain/vX reactor priority would be dynamic. e.g. when
the node is fast syncing, the priority is 10 (max), but when it's done
fast syncing - the priority gets decreased to 5 (only to serve blocks
for other nodes). But it's not possible now, therefore I decided to
focus on the normal operation (priority = 5).

evidence and consensus critical messages are more important than
the mempool ones, hence priorities are bumped by 1 (from 5 to 6).

statesync reactor priority was changed from 1 to 5 to be the same as
blockchain/vX priority.

Refs #5816
@tessr tessr modified the milestones: v0.34.1, 0.34.2 Jan 4, 2021
@tessr tessr modified the milestones: v0.34.2, v0.34.4 Jan 21, 2021
@github-actions github-actions bot added the stale for use by stalebot label Jul 26, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:p2p Component: P2P pkg stale for use by stalebot T:bug Type Bug (Confirmed)
Projects
No open projects
Tendermint Core Project Board
In progress 🏃🏻
Development

No branches or pull requests

3 participants