Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationalize the timeout parameters #7274

Closed
williambanfield opened this issue Nov 12, 2021 · 2 comments
Closed

Rationalize the timeout parameters #7274

williambanfield opened this issue Nov 12, 2021 · 2 comments
Labels
stale for use by stalebot

Comments

@williambanfield
Copy link
Contributor

williambanfield commented Nov 12, 2021

This issue proposes changing the number of timeout parameters from 7 to 3.

We currently have 7 parameters to configuring timings for tendermint consensus. They are defined here in the local config. These parameters are as follows:

	// How long we wait for a proposal block before prevoting nil
        TimeoutPropose time.Duration `mapstructure:"timeout-propose"`
 	TimeoutProposeDelta time.Duration `mapstructure:"timeout-propose-delta"`
	// How long we wait after receiving +2/3 prevotes for “anything” (ie. not a single block or nil)
	TimeoutPrevote time.Duration `mapstructure:"timeout-prevote"`
	// How much the timeout-prevote increases with each round
	TimeoutPrevoteDelta time.Duration `mapstructure:"timeout-prevote-delta"`
	// How long we wait after receiving +2/3 precommits for “anything” (ie. not a single block or nil)
	TimeoutPrecommit time.Duration `mapstructure:"timeout-precommit"`
	// How much the timeout-precommit increases with each round
	TimeoutPrecommitDelta time.Duration `mapstructure:"timeout-precommit-delta"`
	// How long we wait after committing a block, before starting on the new
	// height (this gives us a chance to receive some more precommits, even
	// though we already have +2/3).
	TimeoutCommit time.Duration `mapstructure:"timeout-commit"`

The consensus paper does not appear to require this many timeout parameters. The paper describes the following relation between timeouts:

timeoutPropose(r) > 2∆+timeoutPrecommit(r−1)
timeoutPrevote(r) > 2∆ 
timeoutPrecommit(r) > 2∆

I.e., when the algorithm has picked the correct timeouts, timeouts greater than 2∆, the algorithm will terminate.
Here, ∆ is a bound on the message delay between any two processes. I.e., it is assumed in the paper that if a process sends a message, it will be received by any other process in at most ∆ time. ∆ is not known in advance, so the algorithm tries to determine it by increasing the timeout each round. This per-round increase is specified as timeoutDelta in the paper.

It would therefore be possible to simply fix an initial assumed estimatedDelta and a timeoutDelta, and grow all of the timeouts until they reach the system's true ∆ instead of having a different configured timeout for each step. This would allow us to use only 2 timeout parameters instead of 7. We could still include a 3rd, TimeoutCommit parameter for chains that want to allow precommits to arrive from slow validators.

This has the upside of increasing the simplicity of the configuration file and of the algorithm as written in code. This is also a UX improvement for networks/operators, as keeping track of what all of these parameters do is likely confusing.

It has the downside of forcing all steps to use the same timeout. Different steps are likely to take different amounts of times. Namely, the propose step is likely to take the longest time and the precommit and prevote steps are likely to take much less time. This is because blocks are much larger and will therefore take longer to gossip across the network. We could simply let networks to use the longest possible timeout of any step as the value, but this may cause the prevote and precommit steps to slow down.

@cason
Copy link
Contributor

cason commented Nov 16, 2021

I don't know how are the defaults in the upstream version, but in the last version I used are:

`timeout_propose = "3s"
timeout_propose_delta = "500ms"
timeout_prevote = "1s"
timeout_prevote_delta = "500ms"
timeout_precommit = "1s"
timeout_precommit_delta = "500ms"`

So, the delta is already the same for all timeouts, and I am for having a single delta for all of them. The others, however, I would keep, as devops might want to optimize them to the specific scenario their validators observe. I agree that they should not be mandatory, and that for default values (when not manually set), they should follow a simple and reasonable relation with consensus parameter like Delta.

@cason
Copy link
Contributor

cason commented Nov 16, 2021

Moreover, I never really understood the duration of such distinct timeouts, but probably taking a better look on the paper would help.

From what I remember, the crucial here in terms of correctness is to make sure that votes of a round of consensus are received by all processes before their vote for a proposal in the next round of consensus. In other words, this timeout (a mix of Precommit and Propose ones) should ensure that correct processes have consistent locked or valid values, so that they will accept the same block reproposed in a round.

mergify bot pushed a commit that referenced this issue Jan 14, 2022
related to: #7274 and #7275 

Still somewhat uncertain on two things that I'd appreciate more feedback on:
1. The optional temporary local overrides. Perhaps this is superfluous and we can simply make the transition without the override?
2. If this set of parameters seems to be large enough to allow application developers to create the chains they want but not so large as to be needlessly complex.
@github-actions github-actions bot added the stale for use by stalebot label Jun 2, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale for use by stalebot
Development

No branches or pull requests

3 participants
@cason @williambanfield and others