-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc 27: report on bandwidth usage within Tendermint #9706
Conversation
|
||
Therefore, block gossip can be updated to transmit a representation of the data contained in the block that assumes the peers will already have most of this data. Namely, the block gossip can be updated to only send 1) a list of transaction hashes and 2) a bit array of votes selected for the block along with the header and other required block metadata. | ||
|
||
This new proposed method for gossiping block data would require a slight update to the mempool transaction gossip and consensus vote gossip. Since all of the contents of each block will not be gossiped together, it's possible that some nodes are missing a proposed transaction or the vote of a validator indicated in the new block gossip format. The mempool and consensus reactors would need to be updated to provide a `NeedTxs` and `NeedVotes` message. Each of these messages would allow a node to request a set of data from their peers. When a node receives one of these, it will then transmit the Tx/Votes indicate in the associated message regardless of whether it believes it has transmitted them to the peer before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GossipSub works a little bit in this way with Want
and Have
messages. I think you'd want to have more than just Want
/Need
messages because you don't know who actually has the data you're requesting and so you're just blindly requesting it. What would be better is that upon receiving a block a node communicates all the txs it has. This can be quite compact in the same way VoteSetBits
is because you reference by index in the block as opposed to hash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good approach to start thinking about, but I am afraid that the required changes are more complex. For instance, can a node forward a block when it does not have the full content (votes and txs) it references? Putting it in another way, how we guarantee that the references votes and txs are always available in the network?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting it in another way, how we guarantee that the references votes and txs are always available in the network?
[I mentioned this in another comment] Why do we need to guarantee that? How would that be different from forwarding a block part of a block for which we haven't received all block parts yet (in the current logic)?
I'm only referring to TXs here. I understand votes are slightly different because consensus will stop propagating them upon decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'd want to have more than just Want/Need messages because you don't know who actually has the data you're requesting and so you're just blindly requesting it.
This is true, however you the proposed mechanism still would result in the node receiving all of the data from its peers since our peers would have to receive them as well in order to actually commit the block at all.
Putting it in another way, how we guarantee that the references votes and txs are always available in the network?
I'm a bit unclear on how this would be any different from what we have today. If a validator has already committed the block, then all of the Tx and votes within the block are available within the block. If the block has been pruned by all validators, then the Txs and votes may also be missing, but the block parts would also be gone so we are, as I see it, no worse than we previously were.
|
||
Given that Tendermint informs all peers of _each_ vote message it receives, all nodes should be well informed of which votes their peers have. Given that the vote messages were the third largest consumer of bandwidth in the observation on Osmosis, it's possible that this system is not currently working correctly. Further analysis should examine where votes may be being retransmitted. | ||
|
||
### Suggested Improvements to Lower Message Transmission Bandwidth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erasure coding block propagation is another one that (especially Dev) has been keen on. It decreases the chances you receive multiple parts because you only need, for instance, and 5 of 10 parts to reproduce the entire block. See #7932 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erasure code will only help if we are more structured about how to spread information since each node still needs to receive the same amount of data for each proposal.
For example, if the proposal data amounts to 40k or 10 "original" block parts and you add 5 blocks of "parity", then you have to spread 15 blocks from the proposer's point of view and each node needs to receive 10 block parts, original or parity. If some of the blocks are parity, then they will be used to reconstruct the missing original ones, also adding some processing overhead, before being able to reconstruct and ProcessProposal.
Co-authored-by: Callum Waters <cmwaters19@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent read!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice document.
I think it involves some discussions that are already present elsewhere in the repository, and should be referred. Also, I think some of the proposed alternatives should be discussed in more detail in future versions or derived RFC/ADRs.
#### BlockPart Transmission | ||
|
||
Sending `BlockPart` messages consumes the most bandwidth out of all p2p messages types as observed in the Blockpane Osmosis validator. | ||
In the almost 3 hour observation, the validator sent about 20 gigabytes of `BlockPart` messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably already mentioned this, but it is useful here to know how many bytes were effectively added to the blockchain in this period. This gives an idea of the communication overhead: consumed bandwidth / payload.
|
||
#### Mempool Tx Transmission | ||
|
||
The Tendermint mempool stages transactions that are yet to be committed to the blockchain and communicates these transactions to its peers. Each message contains one transaction. Data collected from the Blockpane node running on Osmosis indicates that the validator sent about 12 gigabytes of `Txs` messages during the nearly 3 hour observation period. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, it would be interesting to know the cumulative size of all transactions committed in the same time frame, so that we have an idea of the overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at collecting this. I have the height offsets, so we should we able to figure this out
|
||
Therefore, block gossip can be updated to transmit a representation of the data contained in the block that assumes the peers will already have most of this data. Namely, the block gossip can be updated to only send 1) a list of transaction hashes and 2) a bit array of votes selected for the block along with the header and other required block metadata. | ||
|
||
This new proposed method for gossiping block data would require a slight update to the mempool transaction gossip and consensus vote gossip. Since all of the contents of each block will not be gossiped together, it's possible that some nodes are missing a proposed transaction or the vote of a validator indicated in the new block gossip format. The mempool and consensus reactors would need to be updated to provide a `NeedTxs` and `NeedVotes` message. Each of these messages would allow a node to request a set of data from their peers. When a node receives one of these, it will then transmit the Tx/Votes indicate in the associated message regardless of whether it believes it has transmitted them to the peer before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good approach to start thinking about, but I am afraid that the required changes are more complex. For instance, can a node forward a block when it does not have the full content (votes and txs) it references? Putting it in another way, how we guarantee that the references votes and txs are always available in the network?
Co-authored-by: Sergio Mena <sergio@informal.systems>
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
|
||
The Tendermint mempool starts a new [broadcastTxRoutine][broadcast-tx-routine] for each peer that it is informed of. The routine sends all transactions that the mempool is aware of to all peers with few exceptions. The only exception is if the mempool received a transaction from a peer, then it marks it as such and won't resend to that peer. Otherwise, it retains no information about which transactions it already to sent to a peer. In some cases it may therefore resend transactions the peer already has. This can occur if the mempool removes a transaction from the `CList` data structure used to store the list of transaction while it is about to be sent and if the transaction was the tail of the `CList` during removal. This will be more likely to occur if a large number of transactions from the end of the list are removed during `RecheckTx`, since multiple transactions will become the tail and then be deleted. It is unclear at the moment how frequently this occurs on production chains. | ||
|
||
Beyond ensuring that transactions are rebroadcast to peers less frequently, there is not a simple scheme to communicate fewer transactions to peers. Peers cannot communicate what transactions they need since they do not know which transactions exist on the network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here they could communicate what they have, for example in a bloom filter, and let the other nodes figure what they don't have and must be sent. Bloom filters could be reset at each new hight or using some aging scheme.
Boom filters size could be a consensus parameter, adjusted whenever the filter is deemed too full or too empty. A bad size choice will use more bandwidth to transmit an almost empty filter (too big) or slow the propagation of transactions due to false positives (too small), until a reconfiguration happens.
|
||
Block, vote, and mempool gossiping transmit much of same data. The mempool reactor gossips candidate transactions to each peer. The consensus reactor, when gossiping the votes, sends vote metadata and the digital signature of that signs over that metadata. Finally, when a block is proposed, the proposing node amalgamates the received votes, a set of transaction, and adds a header to produce the block. This block is then serialized and gossiped as a list of bytes. However, the data that the block contains, namely the votes and the transactions were most likely _already transmitted to the nodes on the network_ via mempool transaction gossip and consensus vote gossip. | ||
|
||
Therefore, block gossip can be updated to transmit a representation of the data contained in the block that assumes the peers will already have most of this data. Namely, the block gossip can be updated to only send 1) a list of transaction hashes and 2) a bit array of votes selected for the block along with the header and other required block metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also have block parts that include transactions that were inserted during prepareProposal, as these will not be in the mempool, or have the proposer add such transactions to the mempool. These cannot be enforced, though, and byzantine nodes could use this stall rounds, but it is not different from their current ability to withhold block parts.
Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: Sergio Mena <sergio@informal.systems>
There is still an open question here regarding the total size of all transactions added to the chain during this experiment. I have not had a chance to retrieve that data. I am planning to still merge this despite the open question so it can remain in the repo for future reference. |
This pull request adds a report on the major bandwidth usage within Tendermint.
Rendered
PR checklist
CHANGELOG_PENDING.md
updated, or no changelog entry neededdocs/
) and code comments, or nodocumentation updates needed