New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ux: block times #5911
Comments
@sunnya97 and Sikka want to do this. I thought I saw a discussion topic on this, but I can't find it...maybe it doesn't exist. In any case, the idea Sikka has is that the application defines variable block times that it submits to Tendermint at |
Just thinking aloud: how would tendermint slow down? by waiting in between committing a block and proposing a new one? |
@cmwaters Yep. This will give more time for the state machine to finish executing. |
You can actually see this happening now on the Kava network. Many validators are sacrificing liveness because the business logic takes longer than the expected commit timeout on some nodes and thus their votes are not getting included in time. Having something like this would enable fairness amongst all validators to properly execute the relevant logic in time. |
I'm on board with this. We could probably look into cleaning up some of the existing consensus config parameters while we are at it (#2920 (comment)) |
@sunnya97 is this proposed timing change part of the ABCI++ proposal, or is it separate? |
It's not AFAIK, because it requires no changes to ABCI semantics apart from data structure (i.e. proto) changes and some internal changes to Tendermint behind the ABCI boundary. Correct me if I'm wrong @sunnya97. |
Yeah, this is separate from ABCI++, for the reasons @alexanderbez mentioned. It's mostly about making a Tendermint parameter hot configurable and exposing it over ABCI, but it's not related to ABCI++'s new phase creations |
The question still remains of how this timeout should affect the internal consensus timeouts, in the setting where validators have very heterogenous hardware setups / execution speeds. Something worth noting is that we can't really do anything in the case of a griefing proposer who proposes as soon as 2/3rds precommits appear. I think the simplest thing to do is to alter the Then we can expose over EndBlock a parameter I believe the alternative is to change timeoutCommit, but it feels like that should exist to handle network delay and scale on its own with the partial synchrony bound |
In thinking through this feature I think we would be able to get rid of The reason I see this within the same basket is because longer block times is what |
I guess the difference though is that variable block times statically sets the block time of the block at the following height whereas |
I'm not sure I follow @marbar3778. You can still create empty blocks with this feature. I believe |
Create empty blocks basically says dont produce blocks within this duration unless there are txs. This duration is set in the config.toml. # EmptyBlocks mode and possible interval between empty blocks
create-empty-blocks = {{ .Consensus.CreateEmptyBlocks }}
create-empty-blocks-interval = "{{ .Consensus.CreateEmptyBlocksInterval }}" In my head the |
I don't understand. Creating a variable block time via the response in |
Correct. But this value |
Some conversation about application defined block times was brought up in relation to the proposer based timestamp work. I'm just relaying Josef's words here for future reference:
I think this largely concurs with what @ValarDragon has already mentioned above. |
It would be nice to have this issue solved in the next major release of tendermint. This would allow applications to do more computation at the end of epochs. |
The @celestiaorg team also needs this feature. I'm happy to help with this. We might try to implement this in our fork first and then upstream it. Is there a good overview on the open questions and what needs further investigation? |
Awesome. tbh, I expect the first order of business here is collecting those open questions and doing the investigation. |
The big issue we run into within Osmosis is that whenever we have a long block, the entire p2p network disconnects, and has to begin reforming after the block goes out. This causes significant validator outages / issues with getting the p2p network to reconnect after each such long block. This problem for Osmosis would probably also be solved by having some sort of keep-alive system in the p2p layer, so the p2p connections don't disconnect during a long block execution. |
This is interesting to hear @ValarDragon. Execution of a block just holds up the consensus reactor as it awaits for the application to finish but all the other reactors using the p2p network should still be working (i.e. mempool should still be receiving/gossiping transactions). How often does this happen? |
This happens every day without fail at the epoch time, where we have a block whose End Block is ~4-5 minutes of computation. Validators have checked this repeatedly, with plots of their number of peers against time, concluding that the peers are actually disconnecting |
@cmwaters @ValarDragon It seems possible that what's happening in the Osmosis case is related to this documented behavior of ABCI: https://github.com/tendermint/spec/blob/master/spec/abci/apps.md#commit.
I'm wondering if simultaneous execution in ABCI++ may obviate that specific issue. |
I think I'd like a little bit more clarity on what this specific github issue is asking for and why. It definitely seems interesting and I think the implementation would be reasonably straightforward (famous last words) but it adds complexity and all changes to the consensus engine incur some amount of risk so I think it's worth being sure why we're doing it before proceeding. If I'm understanding correctly, the main reason for this change is that some networks have variable commit times and that commit times sometimes grow much longer at certain heights pre-agreed upon by all validators on the network. Eg. in the osmosis example, at the 'Epoch', the application may wish to tell the consensus engine to use a longer I'm not sure that it makes sense to do this features just for networks where some validators run on slow hardware. It seems like configuring a longer static @liamsi, would you be able to provide some input into the celestia use case? i.e. what mostly motivates the need from @celestiaorg. @ValarDragon What would be ensuring that the different nodes actually return the same value in |
Sure, in our case we just wanted to globally set longer (somewhat reliable) block intervals (e.g. let's say we only want blocks to be produced about every 30 seconds instead of as fast as possible). |
Thanks @liamsi for the response. It sounds like what you're trying to accomplish should be able to be done with |
Thanks @williambanfield and @cmwaters 👍🏼 I can confirm this works as expected. Tested with two nodes. |
The goal in my view is that when you have an occasionally long block, you have consensus alot time for this, so you don't end up in a situation where:
The first one is only bad because of #3044 (but this is a serious issue) FWIW: This is not at all high priority to me. Its helpful for full nodes to work properly in chains with long execution times |
Hey @ValarDragon, I am proposing moving I would appreciate any feedback there if you think that this solution may be capable of solving the epoch block-time problem. |
Summary
Its unclear how-to change block times. Even if it was explained well it would be confusing because it is local to each node and not global.
Problem Definition
There is no global way to change block times. On top of this it is unclear how to do it and can cause issues for validators if they forget to change.
Proposal
Allow the app to set a block time goal. If it's a larger block time then Tendermint slows down, if it's a lower time then tendermint does its best to get to that time. Ideally this would be a consensus param.
Open Question
Should the app be allowed to change the block time on a live network?
For Admin Use
The text was updated successfully, but these errors were encountered: