p2p/conn: large (or a lot of) outgoing messages from other reactors can block consensus reactor from making progress #5816

melekes · 2020-12-21T13:11:21Z

What happened

Large messages (or many messages) produced by other reactors and scheduled to be send can temporarily block consensus reactor from making progress.

Why do you think it happens?

Either libs/flow library does not perform as expected or p2p/conn/connection scheduling logic is invalid.

What did you expect?

No halting. Tendermint top priority is exchanging votes and making blocks with a few transactions + evidence (if any). Consensus reactor messages should have a top priority, while other (e.g. mempool gossip, evidence) should have a lower priority.

The text was updated successfully, but these errors were encountered:

erikgrinaker · 2020-12-21T13:15:02Z

This is basically #2888. The new P2P stack will have separate queues per reactor channel.

melekes · 2020-12-21T13:19:05Z

This is basically #2888. The new P2P stack will have separate queues per reactor channel.

How's that? The issue here is incorrect dispatching (sending) while multiplexing over single TCP stream, while #2888 is about individual Reactor#Receive blocking receiving messages => sending != receiving. But you're right that if we adopt QUIC (independent streams), this issue will go away.

erikgrinaker · 2020-12-21T13:22:22Z

Ah, got it -- you're right, different issue. New P2P stack will handle this as well, by having separate outbound queues per peer with some scheduling policy, and dropping messages if a peer can't keep up to avoid blocking reactors.

blockchain/vX reactor priority was decreased because during the normal operation (i.e. when the node is not fast syncing) blockchain priority can't be the same as consensus reactor priority. Otherwise, it's theoretically possible to slow down consensus by constantly requesting blocks from the node. NOTE: ideally blockchain/vX reactor priority would be dynamic. e.g. when the node is fast syncing, the priority is 10 (max), but when it's done fast syncing - the priority gets decreased to 5 (only to serve blocks for other nodes). But it's not possible now, therefore I decided to focus on the normal operation (priority = 5). evidence and consensus critical messages are more important than the mempool ones, hence priorities are bumped by 1 (from 5 to 6). statesync reactor priority was changed from 1 to 5 to be the same as blockchain/vX priority. Refs #5816

melekes added T:bug Type Bug (Confirmed) C:p2p Component: P2P pkg labels Dec 21, 2020

melekes mentioned this issue Dec 21, 2020

Should mempool support flow control? #3920

Closed

melekes changed the title ~~p2p/conn: large (a lot of) messages in other reactors can block consensus reactor from making progress~~ p2p/conn: large (or a lot of) outgoing messages from other reactors can block consensus reactor from making progress Dec 21, 2020

tessr added this to Untriaged 🥚 in Tendermint Core Project Board via automation Dec 21, 2020

tessr moved this from Untriaged 🥚 to To Do 📝 in Tendermint Core Project Board Dec 21, 2020

tessr added this to the v0.34.1 milestone Dec 21, 2020

tessr moved this from To Do 📝 to In progress 🏃🏻 in Tendermint Core Project Board Dec 21, 2020

melekes mentioned this issue Dec 23, 2020

modify Reactor priorities #5826

Merged

melekes mentioned this issue Dec 23, 2020

modify Reactor priorities (#5826) #5830

Merged

tessr modified the milestones: v0.34.1, 0.34.2 Jan 4, 2021

tessr modified the milestones: v0.34.2, v0.34.4 Jan 21, 2021

github-actions bot added the stale for use by stalebot label Jul 26, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p2p/conn: large (or a lot of) outgoing messages from other reactors can block consensus reactor from making progress #5816

p2p/conn: large (or a lot of) outgoing messages from other reactors can block consensus reactor from making progress #5816

melekes commented Dec 21, 2020 •

edited

erikgrinaker commented Dec 21, 2020 •

edited

melekes commented Dec 21, 2020 •

edited

erikgrinaker commented Dec 21, 2020

p2p/conn: large (or a lot of) outgoing messages from other reactors can block consensus reactor from making progress #5816

p2p/conn: large (or a lot of) outgoing messages from other reactors can block consensus reactor from making progress #5816

Comments

melekes commented Dec 21, 2020 • edited

What happened

Why do you think it happens?

What did you expect?

erikgrinaker commented Dec 21, 2020 • edited

melekes commented Dec 21, 2020 • edited

erikgrinaker commented Dec 21, 2020

melekes commented Dec 21, 2020 •

edited

erikgrinaker commented Dec 21, 2020 •

edited

melekes commented Dec 21, 2020 •

edited