Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 019: Configuration File Versioning #8379

Merged
merged 4 commits into from
Apr 21, 2022
Merged

Conversation

creachadair
Copy link
Contributor

@creachadair creachadair commented Apr 20, 2022

This RFC discusses issues in how we migrate configuration data across
Tendermint versions, and some options for how to improve the experience for
node operators in the future.

{rendered}

Copy link
Contributor

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice write up!

docs/rfc/rfc-019-config-version.md Outdated Show resolved Hide resolved
Comment on lines +138 to +144
> **Discussion point:** This example presumes we would keep config files
> compatible within a given release cycle, e.g., all of v0.36.x. We could also
> use patch numbers here, if we think there's some reason to permit changes
> that would require config file edits at that granularity. I don't think we
> should, but that's a design question to consider.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, I think it's reasonable that we should be able to provide the guarantee to users that we won't change the config file over the life of a minor release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this may make it difficult to backport some features that depend on a certain configuration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think addition-only changes that come with reasonable defaults are still safe in a minor release series, which I think gives us a enough latitude.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, some changes are safe. The main things we should avoid within a release if we want to be safe include:

  1. Changing the default value of an existing key (the operator may be using a migrated old config with that key not set). It would be fine to change what value a fresh node generates for a key though.
  2. Deleting an existing key.
  3. Moving or renaming an existing key, unless we also honor the old name/location as an alias.
  4. Changing the type of a value (e.g., string to array).

But simple additions should be fine as long as we make sure the default value is safe and universal. (Counterexamples include mode where the default is safe but may change the existing behaviour)

docs/rfc/rfc-019-config-version.md Outdated Show resolved Hide resolved
that the docs are bad, but to point out that prose is not the most efficient
format to convey detailed changes like this.

### Concrete Steps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to propose that we ship this as a sub-command within the tendermint binary (and also encourage the SDK to embed this subcommand in their own CLI handler so that chain developers don't need to ship an additional binary (or have upgrade instructions that depend on having (a specific version of) go installed or configured.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to do that, but would prefer not to for the first release, until we have had some production use to validate that it works on real configs rather than just clean initial configs.

Putting it in the TM CLI just makes it harder to experiment and fix bugs—at this stage I'd like to be able to fix things without having to release a new Tendermint. Later, once we have more confidence, I agree we ought to bake it in.

I definitely don't want the SDK trying to make a programmatic layer out of this yet.

Agreed that it's a little less convenient to set up at first, but there's no specific Go version requirement and it's easily usable with go get.

```toml
# THe minimum version of Tendermint compatible with the contents of
# this configuration file.
config-version = 'v0.35'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of want this to be an integer, but I don't think it's that important

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with that. In my mental version of this conversation my model of you incorrectly predicted you would prefer a string. 😂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a great idea, and many configuration file formats seem to make use of this approach of specifying the config file version within the config file itself.

Although one of the drawbacks of using a simple integer version value for the config file is that you now need to maintain a compatibility matrix between config file versions and versions of Tendermint, which seems tedious to me.

Just using the major/minor Tendermint versions in a string seems more optimal and intuitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although one of the drawbacks of using a simple integer version value for the config file is that you now need to maintain a compatibility matrix between config file versions and versions of Tendermint, which seems tedious to me.

Yeah, I was thinking more of just encoding the version in the integer, e.g., fmt.Sprintf("1%03d%03d", major, minor), so v0.35 == "1000035" or words to that effect.

Just using the major/minor Tendermint versions in a string seems more optimal and intuitive.

That was my original thought, but I think either works. There are good libraries for parsing semver strings.

Comment on lines +138 to +144
> **Discussion point:** This example presumes we would keep config files
> compatible within a given release cycle, e.g., all of v0.36.x. We could also
> use patch numbers here, if we think there's some reason to permit changes
> that would require config file edits at that granularity. I don't think we
> should, but that's a design question to consider.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think addition-only changes that come with reasonable defaults are still safe in a minor release series, which I think gives us a enough latitude.

@creachadair creachadair force-pushed the mjf/rfc-conversion branch 2 times, most recently from 7fbc83f to 84e701a Compare April 20, 2022 18:45
@williambanfield
Copy link
Contributor

I think that this is a good idea and a way of easing operator pain as we continue updating the configuration. I do wonder if, the fact that we thrash the contents of the config file so often, is an indication that we are providing too many knobs or knobs at the wrong level?

@thanethomson
Copy link
Contributor

I do wonder if, the fact that we thrash the contents of the config file so often, is an indication that we are providing too many knobs or knobs at the wrong level?

Is this not perhaps simply a product of the new features/functionality being rolled out? It's also, I imagine, a product of operator need. From what I remember, some parameters like those introduced in #7230 came about specifically because it's easier for operators to tune those parameters than us making changes to hard-coded defaults and performing new releases.

@creachadair
Copy link
Contributor Author

I think that this is a good idea and a way of easing operator pain as we continue updating the configuration. I do wonder if, the fact that we thrash the contents of the config file so often, is an indication that we are providing too many knobs or knobs at the wrong level?

I wrote and threw away a digression on that exact point several times now, and decided to leave it out. I think I agree, that it's not a great idea to keep making radical and incompatible changes to the config. For the 1.0 series, we should probably consider making stronger promises.

On the other hand, the config file is a practical way to deal with a lot of the ambiguity we've seen about user needs vs. design options. Ideally if we had better signal about which user requests are actually impactful, we could choose less-disruptively up front. But I think a lot of those product questions are likely to remain ambiguous because of how indirectly our end users experience what Tendermint does.

@creachadair
Copy link
Contributor Author

Is this not perhaps simply a product of the new features/functionality being rolled out? It's also, I imagine, a product of operator need. From what I remember, some parameters like those introduced in #7230 came about specifically because it's easier for operators to tune those parameters than us making changes to hard-coded defaults and performing new releases.

It's true that there are some knobs we probably always want to have. I don't think knobs are per se a problem. Rather, it's the churn that causes me concern.

A change like "renaming all the keys from snake_case to kebab-case" is an example of what was essentially valueless churn. Even though I prefer the new spellings, I don't think the disruption was worth it, and I think we should avoid those kinds of changes in the future even if we regret the aesthetics. ("No wait, let's switch to camelCase!" 🤯)

Temporary config settings are also probably OK if they ease migration (that's the purpose behind experimental-disable-websocket for example). There, I think we can signal transience with tricks like experimental-* prefixes, or using special sections for migration settings, or whatever. That also makes it easier to write tools to clean them up when they go out of scope.

Other changes reflect our learning experience: One could argue that putting the priv-validator-* settings at the top-level wasn't a great choice, but it wasn't terrible: The presence of validator keys is a fundamental property of the node, so you could see an argument for their globalness. In retrospect I think a subsection is better, and it's probably better for maintenance that we ripped that bandage off, but it was definitely a bit churny.

What I think we might want is a set of heuristics (or even rubrics) for what kinds of config changes we are OK with. Maybe I'm talking myself back into adding the discussion on that topic that I dropped when I first drafted this 👀

@creachadair
Copy link
Contributor Author

creachadair commented Apr 20, 2022

Motivated by some of the comments above, I dug out and expanded some of the thoughts I omitted from the first draft, about configuration settings in the design process. Anyone who's interested in that topic, PTAL.

@creachadair creachadair force-pushed the mjf/rfc-conversion branch 7 times, most recently from bbdb910 to 75ed385 Compare April 21, 2022 02:58
@creachadair creachadair force-pushed the mjf/rfc-conversion branch 14 times, most recently from e5b263c to 9af1817 Compare April 21, 2022 03:18
M. J. Fromberger added 4 commits April 21, 2022 07:03
This RFC discusses issues in how we migrate configuration data across
Tendermint versions, and some options for how to improve the experience for
node operators in the future.
@creachadair creachadair merged commit ef31470 into master Apr 21, 2022
@creachadair creachadair deleted the mjf/rfc-conversion branch April 21, 2022 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants