Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: set names automatically #9255

Merged
merged 9 commits into from
Oct 27, 2023

Conversation

Serpentian
Copy link
Contributor

@Serpentian Serpentian commented Oct 12, 2023

Notion link

We are trying to achieve the following behavior with this patchset:

  1. The behavior of box.cfg{} remains unchanged.

  2. lua/config, when there is a snapshot from which we are recovering:

    2.1 Before the first box.cfg{}, we access the .snap file using the xlog module and check the presence of all the names provided (replicaset, all replicas). If the own name or the replicaset name in the snapshot is not empty but different from the one provided, we raise an error.

    2.2 If any of the names are missing, we require that the UUID for that replica be provided in the configuration to match the name and UUID.

    2.3 The first (or first few) box.cfg{} calls are made without specifying the replicaset/replica name, if these specific names were not in the snapshot.

    2.4 We trigger an alert that the desired names do not match reality (if the name on that specific replica does not match).

    2.5 If the schema version is less than 3.0.0, we wait for a schema upgrade using a trigger.

    2.6 If we are not in read-write mode, we wait for read-write using box.ctl.wait_rw().

    2.7 Once read-write is achieved, we set all missing names from the configuration.

    2.8 On replicas: in the trigger, we check if the names arrived via replication. As soon as the corresponding name is applied, we call box.cfg{} with the actual instance/replicaset name, cancel any pending actions (schema version, read-write), and turn off the alert.

    2.9 In the documentation (if there is a section on changing names): after changing the name, a snapshot is mandatory.

  3. lua/config, when there are no snapshots (this is bootstrapping): In the very first box.cfg{}, we provide all names (instance, replicaset). No additional actions are taken.

Closes #8978

@Serpentian Serpentian force-pushed the gh-8978-upgrade branch 3 times, most recently from b2e52bc to fcd730d Compare October 12, 2023 14:13
@Serpentian Serpentian changed the title Set names automatically in config config: set names automatically Oct 12, 2023
@coveralls
Copy link

coveralls commented Oct 12, 2023

Coverage Status

coverage: 86.313% (-0.01%) from 86.327% when pulling 64f9087 on Serpentian:gh-8978-upgrade into e72e57b
on tarantool:master
.

@Serpentian Serpentian marked this pull request as ready for review October 12, 2023 14:31
@Serpentian Serpentian requested review from Totktonada and a team as code owners October 12, 2023 14:31
Copy link
Contributor

@p7nov p7nov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changelog fixes.

changelogs/unreleased/lua-xlog-meta.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!
Good job.

Please see my comments below.

src/box/lua/xlog.c Outdated Show resolved Hide resolved
src/box/lua/config/configdata.lua Outdated Show resolved Hide resolved
src/box/lua/config/configdata.lua Outdated Show resolved Hide resolved
src/box/lua/config/configdata.lua Outdated Show resolved Hide resolved
src/box/lua/config/configdata.lua Outdated Show resolved Hide resolved
src/box/lua/config/applier/box_cfg.lua Outdated Show resolved Hide resolved
src/box/lua/config/applier/box_cfg.lua Outdated Show resolved Hide resolved
@sergepetrenko sergepetrenko removed their assignment Oct 16, 2023
@Serpentian Serpentian force-pushed the gh-8978-upgrade branch 3 times, most recently from d882fde to 5ec42cf Compare October 25, 2023 14:33
Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

LGTM.

@sergepetrenko sergepetrenko removed their assignment Oct 25, 2023
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know almost nothing about the config module. But nonetheless you can find below some generic comments. I will approve the PR now because don't know if I will be at my PC until Monday to see any fixes.

src/box/lua/config/configdata.lua Show resolved Hide resolved
src/box/lua/config/configdata.lua Show resolved Hide resolved
src/box/lua/config/configdata.lua Outdated Show resolved Hide resolved
src/box/lua/config/applier/box_cfg.lua Outdated Show resolved Hide resolved
src/box/lua/config/applier/box_cfg.lua Show resolved Hide resolved
Config's box.cfg applier scans snapshot_dir in order to find out,
whether recovery is going to be done. It's needed in order to determine,
whether the instance should be started into ro mode firstly.

Let's move info about snapshot into separate file in utils. The commit
also introduces snapshot_path, which will be used in the following
commits in order to validate names.

Needed for tarantool#8978

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring
This commit introduces the new method for xlog module: xlog.meta().
It opens an xlog file, reads and returns the meta block of the file,
which includes its filetype, instance_uuid and vclocks.

It's needed in order to introduce checking of names inside the config
module in the following commit.

Needed for tarantool#8978

@TarantoolBot document
Title: xlog.meta(file-name) method

Description: Open an xlog file, and return its meta block.
Possible errors: Failed to open a file or it does not contain properly
formatted meta block.

Example:

```lua
tarantool> xlog = require('xlog')
---
...

tarantool> xlog.meta('00000000000000000000.snap')
---
- filetype: SNAP
  prev_vclock: {}
  instance_uuid: 87b2e60f-275c-4efa-9b0e-e9562e309692
  vclock: {}
...
```
box.schema has a number of constants, e.g. IDs of system spaces,
which may be useful for the user. Let's allow to access box.schema
before box.cfg is called.

It is used in checking names, as we need to know, which snapshot spaces
to scan.

Needed for tarantool#8978

NO_DOC=minor change
@Serpentian Serpentian force-pushed the gh-8978-upgrade branch 2 times, most recently from eddc275 to 216c037 Compare October 26, 2023 20:30
Currently only instance_uuid is validated before recovery process.
All names and replicaset_uuid are checked only when recovery is done,
which can take a long time. It can be frustrating to users, which
have been waiting for several hours only to get name mismatch error.

Let's read the small part of snapshot file before calling box.cfg
in order to figure out, whether the names and uuids, passed to
configuration match the ones, saved inside the snapshot.

During config reload there's no sense in reading snapshot file, as
data is already saved inside spaces, let's read them. We still check
that names in config and names in spaces don't contradict during
config reload.

This commit also introduces methods, for getting names, which are
not set in snap (or memory), this'll be used in consequent commits
to set names automatically.

Needed for tarantool#8978

NO_DOC=tarantool/doc#3661
For now it's impossible to drop created alert in any way except
manual searching for _alerts table. However, we need to drop alerts
on missing names, when the names are set.

Let's introduce simple key-value alerts in order to easily drop them
by key.

Needed for tarantool#8978

NO_DOC=internal
NO_TEST=following commit
NO_CHANGELOG=internal
When the name is manually set on master by replace in _cluster space,
calling box.cfg on replica with the same name causes its hang. The
problem is the fact, that resubscribe is initiated and waiting for
APPLIER_REGISTERED status is started. As applier knows, that no
registration should be done, this never happens.

Let's don't initiate registration, when instance name is already set.

Needed for tarantool#8978

NO_DOC=bugfix
NO_CHANGELOG=not released yet
Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! That's the large work.

The changes look correct for me.

I'm sad, to be honest, that we need all this triggers+watcher code just to react on certain events with certain actions under certain conditions ('if a tuple in space X is updated and the database schema is Y and the instance is in the RW mode and there was an unhandled config reload, then ...'). I hope, at some time, we'll revisit it with idea of some database event framework that will not be so complicated to use. At least, it would be nice to write such code in a more declarative way.

I would, maybe, simplify it further by eliminating deletion of the triggers -- we can just mark them as some kind of system triggers to hide from a user API and assume that the needed actions are performed always at the given conditions.

Anyway, all these ideas are to consider in a future and they don't block this patchset.

For now it's impossible to use config module in order to recover
from snaps, which don't have names set in them. Calling box.cfg
with names on recovery fails with MISMATCH error, which is caused
by difficult implementation of setting names on first box.cfg,
as names can be set only on rw instance.

This commit doesn't call box.cfg with names, if these names are
missing from the snapshot file. Instead it creates a fiber, which
will set names, when it's possible to do so. Only master sets the
names for the whole cluster.

Closes tarantool#8978

NO_DOC=tarantool/doc#3661
NO_CHANGELOG=following commits
This commit moves all code, related to working with versions and which
was used in box/lua/upgrade.lua, into a separate module and exports it
to Lua API as 'internal.version'

This is needed, as in the following commit we set names automatically
only when schema version is more than 3.0.0. This module is used their
in order to avoid code duplication.

Follow-up tarantool#8978

NO_DOC=internal
NO_TEST=<already tested>
NO_CHANGELOG=internal
It's impossible to set names on Tarantool below 3.0.0, as all DDL
is forbidden before schema upgrade.

Let's make names NoOp on schema below Tarantool 3.0.0 and set names
automatically only when schema upgrade is done.

Follow-up tarantool#8978

NO_DOC=tarantool/doc#3661
@Serpentian
Copy link
Contributor Author

Serpentian commented Oct 26, 2023

All EE tests work all right. Patchset is ready for merge

@sergepetrenko sergepetrenko added the full-ci Enables all tests for a pull request label Oct 27, 2023
@sergepetrenko sergepetrenko merged commit 24d7c8f into tarantool:master Oct 27, 2023
106 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full-ci Enables all tests for a pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow to set instance/replicaset/cluster names on recovery at first box.cfg
7 participants