ux: block times #5911

tac0turtle · 2021-01-15T14:02:34Z

Summary

Its unclear how-to change block times. Even if it was explained well it would be confusing because it is local to each node and not global.

Problem Definition

There is no global way to change block times. On top of this it is unclear how to do it and can cause issues for validators if they forget to change.

Proposal

Allow the app to set a block time goal. If it's a larger block time then Tendermint slows down, if it's a lower time then tendermint does its best to get to that time. Ideally this would be a consensus param.

Open Question

Should the app be allowed to change the block time on a live network?

For Admin Use

Not duplicate issue
Appropriate labels applied
Appropriate contributors tagged
Contributor assigned/self-assigned

alexanderbez · 2021-01-15T14:18:51Z

@sunnya97 and Sikka want to do this. I thought I saw a discussion topic on this, but I can't find it...maybe it doesn't exist. In any case, the idea Sikka has is that the application defines variable block times that it submits to Tendermint at EndBlock. This could be useful for many reasons, e.g. needing to run expensive operations every epoch where block times are expected to be significantly longer than the norm.

cmwaters · 2021-01-18T11:14:44Z

Just thinking aloud: how would tendermint slow down? by waiting in between committing a block and proposing a new one?

sunnya97 · 2021-01-18T15:52:32Z

@cmwaters Yep. This will give more time for the state machine to finish executing.

alexanderbez · 2021-01-18T17:50:23Z

You can actually see this happening now on the Kava network. Many validators are sacrificing liveness because the business logic takes longer than the expected commit timeout on some nodes and thus their votes are not getting included in time.

Having something like this would enable fairness amongst all validators to properly execute the relevant logic in time.

cmwaters · 2021-01-19T08:08:58Z

I'm on board with this. We could probably look into cleaning up some of the existing consensus config parameters while we are at it (#2920 (comment))

tessr · 2021-01-22T11:12:06Z

@sunnya97 is this proposed timing change part of the ABCI++ proposal, or is it separate?

alexanderbez · 2021-01-22T14:41:46Z

It's not AFAIK, because it requires no changes to ABCI semantics apart from data structure (i.e. proto) changes and some internal changes to Tendermint behind the ABCI boundary. Correct me if I'm wrong @sunnya97.

sunnya97 · 2021-01-22T15:22:44Z

Yeah, this is separate from ABCI++, for the reasons @alexanderbez mentioned. It's mostly about making a Tendermint parameter hot configurable and exposing it over ABCI, but it's not related to ABCI++'s new phase creations

ValarDragon · 2021-01-29T17:16:28Z

The question still remains of how this timeout should affect the internal consensus timeouts, in the setting where validators have very heterogenous hardware setups / execution speeds.

Something worth noting is that we can't really do anything in the case of a griefing proposer who proposes as soon as 2/3rds precommits appear.

I think the simplest thing to do is to alter the StartTime expression found in the NewHeight consensus step. ( https://github.com/tendermint/spec/blob/master/spec/consensus/consensus.md#newheight-step-heighth )
Currently, StartTime = CommitTime+timeoutCommit. Instead I think this should be StartTime = CommitTime + timeoutCommit + expected_block_execution_time.

Then we can expose over EndBlock a parameter next_expected_block_execution_time. So when block N is finished executing, it outputs a parameter to Tendermint for how much time the next block is expected to take to execute. For chains that wish to prioritize consensus time, they can make this always be 0.

I believe the alternative is to change timeoutCommit, but it feels like that should exist to handle network delay and scale on its own with the partial synchrony bound

tac0turtle · 2021-02-22T09:24:58Z

In thinking through this feature I think we would be able to get rid of create_empty_blocks flag. The flow would change from the app saying I want a long block time unless transactions build up.

The reason I see this within the same basket is because longer block times is what create_empty_blocks does, if there are no transactions.

cmwaters · 2021-02-22T09:36:56Z

I guess the difference though is that variable block times statically sets the block time of the block at the following height whereas create_empty_blocks = true makes the block time during the current height dynamic, right?

alexanderbez · 2021-02-22T12:52:50Z

I'm not sure I follow @marbar3778. You can still create empty blocks with this feature. I believe create_empty_blocks will wait until there are txs or not, whereas with this feature the application doesn't know that ahead of time and if it sends a large enough duration, we could still see empty blocks w/o txs.

tac0turtle · 2021-02-22T13:06:39Z

You can still create empty blocks with this feature. I believe create_empty_blocks will wait until there are txs or not, whereas with this feature the application doesn't know that ahead of time and if it sends a large enough duration, we could still see empty blocks w/o txs.

Create empty blocks basically says dont produce blocks within this duration unless there are txs. This duration is set in the config.toml.

# EmptyBlocks mode and possible interval between empty blocks
create-empty-blocks = {{ .Consensus.CreateEmptyBlocks }}
create-empty-blocks-interval = "{{ .Consensus.CreateEmptyBlocksInterval }}"

In my head the create-empty-blocks becomes a consensus param, but the interval is the variable in endblock. It replicates the same functionality as right now but makes it a global config instead of local. This also further reduces the config.toml.

alexanderbez · 2021-02-22T13:14:32Z

I don't understand. Creating a variable block time via the response in EndBlock doesn't give you the same functionality as create-empty-blocks=false.

tac0turtle · 2021-02-22T13:49:17Z

I don't understand. Creating a variable block time via the response in EndBlock doesn't give you the same functionality as create-empty-blocks=false.

Correct. But this value create-empty-blocks-interval is essentially the variable block time in endblock.

cmwaters · 2021-03-23T13:09:30Z

Some conversation about application defined block times was brought up in relation to the proposer based timestamp work. I'm just relaying Josef's words here for future reference:

I guess the cleanest way to do this is that the proposer waits until the BLOCKDELAY has expired before sending the PROPOSE message.

In spec#265 (comment) I have estimated how TimeoutPropose needs to be computed to ensure monotonic block times, that is
block(i+1).Time > block(i).Time
This would need to be generalized to
block(i+1).Time > block(i).Time + BLOCKDELAY
That is, to compute T2, where T2 => blocktime + 2 * ACCURACY + MSGDELAY - now_p, we would need to add BLOCKDELAY.

I think this largely concurs with what @ValarDragon has already mentioned above. BLOCKDELAY would simply be read from the application in EndBlock (or FinalizeBlock).

tac0turtle · 2021-04-06T18:28:49Z

It would be nice to have this issue solved in the next major release of tendermint. This would allow applications to do more computation at the end of epochs.

liamsi · 2021-08-16T15:18:36Z

The @celestiaorg team also needs this feature. I'm happy to help with this. We might try to implement this in our fork first and then upstream it. Is there a good overview on the open questions and what needs further investigation?

tessr · 2021-08-16T20:16:23Z

Awesome. tbh, I expect the first order of business here is collecting those open questions and doing the investigation.

ValarDragon · 2021-08-16T20:37:22Z

The big issue we run into within Osmosis is that whenever we have a long block, the entire p2p network disconnects, and has to begin reforming after the block goes out. This causes significant validator outages / issues with getting the p2p network to reconnect after each such long block.

This problem for Osmosis would probably also be solved by having some sort of keep-alive system in the p2p layer, so the p2p connections don't disconnect during a long block execution.

cmwaters · 2021-08-31T08:41:23Z

This is interesting to hear @ValarDragon. Execution of a block just holds up the consensus reactor as it awaits for the application to finish but all the other reactors using the p2p network should still be working (i.e. mempool should still be receiving/gossiping transactions). How often does this happen?

ValarDragon · 2021-08-31T14:15:07Z

This happens every day without fail at the epoch time, where we have a block whose End Block is ~4-5 minutes of computation. Validators have checked this repeatedly, with plots of their number of peers against time, concluding that the peers are actually disconnecting

williambanfield · 2021-09-14T15:29:58Z

@cmwaters @ValarDragon It seems possible that what's happening in the Osmosis case is related to this documented behavior of ABCI: https://github.com/tendermint/spec/blob/master/spec/abci/apps.md#commit.

Application state should only be persisted to disk during Commit.

Before Commit is called, Tendermint locks and flushes the mempool so that no new messages will be received on the mempool connection. This provides an opportunity to safely update all four connection states to the latest committed state at once.

When Commit completes, it unlocks the mempool.

I'm wondering if simultaneous execution in ABCI++ may obviate that specific issue.

williambanfield · 2021-11-12T20:21:25Z

I think I'd like a little bit more clarity on what this specific github issue is asking for and why. It definitely seems interesting and I think the implementation would be reasonably straightforward (famous last words) but it adds complexity and all changes to the consensus engine incur some amount of risk so I think it's worth being sure why we're doing it before proceeding.

If I'm understanding correctly, the main reason for this change is that some networks have variable commit times and that commit times sometimes grow much longer at certain heights pre-agreed upon by all validators on the network. Eg. in the osmosis example, at the 'Epoch', the application may wish to tell the consensus engine to use a longer timeout-commit because the application is expected to be processing state changes for a longer time.

I'm not sure that it makes sense to do this features just for networks where some validators run on slow hardware. It seems like configuring a longer static timeout-commit is the right solve for that. Perhaps implementing this feature has the upside of making the timeout-commit be within the purview of the app, making it harder for validators to alter it when they want the network to move more quickly. That seems like somewhat of a weak justification, but maybe I'm wrong. For networks where there are known period of slowdown, epochs for example, this feature seems to make more sense.

@liamsi, would you be able to provide some input into the celestia use case? i.e. what mostly motivates the need from @celestiaorg.

@ValarDragon What would be ensuring that the different nodes actually return the same value in EndBlock? Or, in your view, does this not really matter?

liamsi · 2021-11-12T20:34:24Z

@liamsi, would you be able to provide some input into the celestia use case? i.e. what mostly motivates the need from @celestiaorg.

Sure, in our case we just wanted to globally set longer (somewhat reliable) block intervals (e.g. let's say we only want blocks to be produced about every 30 seconds instead of as fast as possible).
After chatting with @cmwaters this should already be possible by tweaking TimeoutCommit only on the tendermint side and this should be sufficient for our use-case. Though I still do not fully understand how a timeout param alone would fix this. But I trust Callum on this. Maybe the app needs to wait before committing if the block would otherwise be committed too fast and this is all. I will look into it in the next two weeks and report back if it worked.

williambanfield · 2021-11-12T21:42:20Z

Thanks @liamsi for the response. It sounds like what you're trying to accomplish should be able to be done with timeout-commit and skip-timeout-commit=false. This should make sure that all validators with the param configured wait for the full timeout-commit duration. I'm interested to see how that turns out for your usecase.

liamsi · 2021-11-18T21:35:01Z

Thanks @williambanfield and @cmwaters 👍🏼 I can confirm this works as expected. Tested with two nodes. ~~We should maybe add this in the docs.~~ nvm, it's mentioned here: https://docs.tendermint.com/master/nodes/configuration.html#create-empty-blocks-true

ValarDragon · 2021-11-19T00:24:52Z

The goal in my view is that when you have an occasionally long block, you have consensus alot time for this, so you don't end up in a situation where:

rounds have incremented several times
you proceed as soon as the 2/3rth validator comes online

The first one is only bad because of #3044 (but this is a serious issue)
The second is bad for time to resilient network recovery, in that you may want the hardware your full nodes / data indexers, etc. to not have to be fast enough to accomodate as soon as 2/3rds of validator start proceeding.

FWIW: This is not at all high priority to me. Its helpful for full nodes to work properly in chains with long execution times

williambanfield · 2022-01-04T16:07:30Z

Hey @ValarDragon, I am proposing moving TimeoutCommit into the consensus parameters that chains control as part of #7503

I would appreciate any feedback there if you think that this solution may be capable of solving the epoch block-time problem.

tac0turtle added S:proposal Status: Proposal T:ux Type: Issue or Pull Request related to developer experience labels Jan 15, 2021

tac0turtle added this to Untriaged 🥚 in Tendermint Core Project Board via automation Jan 15, 2021

tessr moved this from Untriaged 🥚 to Needs Design / Spec / Discussion 🕴 in Tendermint Core Project Board Jan 22, 2021

tessr added this to the Tendermint 1.0 milestone Jan 22, 2021

cmwaters mentioned this issue Mar 17, 2021

Proposer-based timestamp specification tendermint/spec#261

Merged

liamsi mentioned this issue Jun 2, 2021

Investigate consensus steps timeouts celestiaorg/celestia-core#87

Closed

tac0turtle mentioned this issue Jun 8, 2021

Remove BlockTimeIota and other Consensus Params from Config #2920

Closed

isheff mentioned this issue Nov 18, 2021

New Proposal Timeouts anoma/typhon#17

Open

rootulp mentioned this issue Feb 7, 2023

feat!: account for time already elapsed when waiting after the commit celestiaorg/celestia-core#965

Merged

github-actions bot added the stale for use by stalebot label Jul 24, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2023

adizere mentioned this issue Mar 19, 2024

PBTS: Allow Application to Set BlockTimestamp cometbft/cometbft#2655

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ux: block times #5911

ux: block times #5911

tac0turtle commented Jan 15, 2021 •

edited

alexanderbez commented Jan 15, 2021

cmwaters commented Jan 18, 2021

sunnya97 commented Jan 18, 2021

alexanderbez commented Jan 18, 2021

cmwaters commented Jan 19, 2021

tessr commented Jan 22, 2021

alexanderbez commented Jan 22, 2021

sunnya97 commented Jan 22, 2021

ValarDragon commented Jan 29, 2021 •

edited

tac0turtle commented Feb 22, 2021

cmwaters commented Feb 22, 2021

alexanderbez commented Feb 22, 2021

tac0turtle commented Feb 22, 2021

alexanderbez commented Feb 22, 2021

tac0turtle commented Feb 22, 2021

cmwaters commented Mar 23, 2021

tac0turtle commented Apr 6, 2021

liamsi commented Aug 16, 2021

tessr commented Aug 16, 2021

ValarDragon commented Aug 16, 2021

cmwaters commented Aug 31, 2021

ValarDragon commented Aug 31, 2021 •

edited

williambanfield commented Sep 14, 2021

williambanfield commented Nov 12, 2021 •

edited

liamsi commented Nov 12, 2021

williambanfield commented Nov 12, 2021

liamsi commented Nov 18, 2021 •

edited

ValarDragon commented Nov 19, 2021

williambanfield commented Jan 4, 2022

ux: block times #5911

ux: block times #5911

Comments

tac0turtle commented Jan 15, 2021 • edited

Summary

Problem Definition

Proposal

Open Question

For Admin Use

alexanderbez commented Jan 15, 2021

cmwaters commented Jan 18, 2021

sunnya97 commented Jan 18, 2021

alexanderbez commented Jan 18, 2021

cmwaters commented Jan 19, 2021

tessr commented Jan 22, 2021

alexanderbez commented Jan 22, 2021

sunnya97 commented Jan 22, 2021

ValarDragon commented Jan 29, 2021 • edited

tac0turtle commented Feb 22, 2021

cmwaters commented Feb 22, 2021

alexanderbez commented Feb 22, 2021

tac0turtle commented Feb 22, 2021

alexanderbez commented Feb 22, 2021

tac0turtle commented Feb 22, 2021

cmwaters commented Mar 23, 2021

tac0turtle commented Apr 6, 2021

liamsi commented Aug 16, 2021

tessr commented Aug 16, 2021

ValarDragon commented Aug 16, 2021

cmwaters commented Aug 31, 2021

ValarDragon commented Aug 31, 2021 • edited

williambanfield commented Sep 14, 2021

williambanfield commented Nov 12, 2021 • edited

liamsi commented Nov 12, 2021

williambanfield commented Nov 12, 2021

liamsi commented Nov 18, 2021 • edited

ValarDragon commented Nov 19, 2021

williambanfield commented Jan 4, 2022

tac0turtle commented Jan 15, 2021 •

edited

ValarDragon commented Jan 29, 2021 •

edited

ValarDragon commented Aug 31, 2021 •

edited

williambanfield commented Nov 12, 2021 •

edited

liamsi commented Nov 18, 2021 •

edited