-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config: set names automatically #9255
Conversation
b2e52bc
to
fcd730d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changelog fixes.
changelogs/unreleased/gh-8978-config-set-names-automatically.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch!
Good job.
Please see my comments below.
177c7db
to
4ac1b0a
Compare
4ac1b0a
to
21ce718
Compare
d882fde
to
5ec42cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know almost nothing about the config module. But nonetheless you can find below some generic comments. I will approve the PR now because don't know if I will be at my PC until Monday to see any fixes.
Config's box.cfg applier scans snapshot_dir in order to find out, whether recovery is going to be done. It's needed in order to determine, whether the instance should be started into ro mode firstly. Let's move info about snapshot into separate file in utils. The commit also introduces snapshot_path, which will be used in the following commits in order to validate names. Needed for tarantool#8978 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
This commit introduces the new method for xlog module: xlog.meta(). It opens an xlog file, reads and returns the meta block of the file, which includes its filetype, instance_uuid and vclocks. It's needed in order to introduce checking of names inside the config module in the following commit. Needed for tarantool#8978 @TarantoolBot document Title: xlog.meta(file-name) method Description: Open an xlog file, and return its meta block. Possible errors: Failed to open a file or it does not contain properly formatted meta block. Example: ```lua tarantool> xlog = require('xlog') --- ... tarantool> xlog.meta('00000000000000000000.snap') --- - filetype: SNAP prev_vclock: {} instance_uuid: 87b2e60f-275c-4efa-9b0e-e9562e309692 vclock: {} ... ```
box.schema has a number of constants, e.g. IDs of system spaces, which may be useful for the user. Let's allow to access box.schema before box.cfg is called. It is used in checking names, as we need to know, which snapshot spaces to scan. Needed for tarantool#8978 NO_DOC=minor change
eddc275
to
216c037
Compare
Currently only instance_uuid is validated before recovery process. All names and replicaset_uuid are checked only when recovery is done, which can take a long time. It can be frustrating to users, which have been waiting for several hours only to get name mismatch error. Let's read the small part of snapshot file before calling box.cfg in order to figure out, whether the names and uuids, passed to configuration match the ones, saved inside the snapshot. During config reload there's no sense in reading snapshot file, as data is already saved inside spaces, let's read them. We still check that names in config and names in spaces don't contradict during config reload. This commit also introduces methods, for getting names, which are not set in snap (or memory), this'll be used in consequent commits to set names automatically. Needed for tarantool#8978 NO_DOC=tarantool/doc#3661
For now it's impossible to drop created alert in any way except manual searching for _alerts table. However, we need to drop alerts on missing names, when the names are set. Let's introduce simple key-value alerts in order to easily drop them by key. Needed for tarantool#8978 NO_DOC=internal NO_TEST=following commit NO_CHANGELOG=internal
When the name is manually set on master by replace in _cluster space, calling box.cfg on replica with the same name causes its hang. The problem is the fact, that resubscribe is initiated and waiting for APPLIER_REGISTERED status is started. As applier knows, that no registration should be done, this never happens. Let's don't initiate registration, when instance name is already set. Needed for tarantool#8978 NO_DOC=bugfix NO_CHANGELOG=not released yet
216c037
to
513d0da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! That's the large work.
The changes look correct for me.
I'm sad, to be honest, that we need all this triggers+watcher code just to react on certain events with certain actions under certain conditions ('if a tuple in space X is updated and the database schema is Y and the instance is in the RW mode and there was an unhandled config reload, then ...'). I hope, at some time, we'll revisit it with idea of some database event framework that will not be so complicated to use. At least, it would be nice to write such code in a more declarative way.
I would, maybe, simplify it further by eliminating deletion of the triggers -- we can just mark them as some kind of system triggers to hide from a user API and assume that the needed actions are performed always at the given conditions.
Anyway, all these ideas are to consider in a future and they don't block this patchset.
513d0da
to
51ee92c
Compare
For now it's impossible to use config module in order to recover from snaps, which don't have names set in them. Calling box.cfg with names on recovery fails with MISMATCH error, which is caused by difficult implementation of setting names on first box.cfg, as names can be set only on rw instance. This commit doesn't call box.cfg with names, if these names are missing from the snapshot file. Instead it creates a fiber, which will set names, when it's possible to do so. Only master sets the names for the whole cluster. Closes tarantool#8978 NO_DOC=tarantool/doc#3661 NO_CHANGELOG=following commits
This commit moves all code, related to working with versions and which was used in box/lua/upgrade.lua, into a separate module and exports it to Lua API as 'internal.version' This is needed, as in the following commit we set names automatically only when schema version is more than 3.0.0. This module is used their in order to avoid code duplication. Follow-up tarantool#8978 NO_DOC=internal NO_TEST=<already tested> NO_CHANGELOG=internal
It's impossible to set names on Tarantool below 3.0.0, as all DDL is forbidden before schema upgrade. Let's make names NoOp on schema below Tarantool 3.0.0 and set names automatically only when schema upgrade is done. Follow-up tarantool#8978 NO_DOC=tarantool/doc#3661
51ee92c
to
64f9087
Compare
All EE tests work all right. Patchset is ready for merge |
Notion link
We are trying to achieve the following behavior with this patchset:
The behavior of box.cfg{} remains unchanged.
lua/config, when there is a snapshot from which we are recovering:
2.1 Before the first box.cfg{}, we access the .snap file using the xlog module and check the presence of all the names provided (replicaset, all replicas). If the own name or the replicaset name in the snapshot is not empty but different from the one provided, we raise an error.
2.2 If any of the names are missing, we require that the UUID for that replica be provided in the configuration to match the name and UUID.
2.3 The first (or first few) box.cfg{} calls are made without specifying the replicaset/replica name, if these specific names were not in the snapshot.
2.4 We trigger an alert that the desired names do not match reality (if the name on that specific replica does not match).
2.5 If the schema version is less than 3.0.0, we wait for a schema upgrade using a trigger.
2.6 If we are not in read-write mode, we wait for read-write using box.ctl.wait_rw().
2.7 Once read-write is achieved, we set all missing names from the configuration.
2.8 On replicas: in the trigger, we check if the names arrived via replication. As soon as the corresponding name is applied, we call box.cfg{} with the actual instance/replicaset name, cancel any pending actions (schema version, read-write), and turn off the alert.
2.9 In the documentation (if there is a section on changing names): after changing the name, a snapshot is mandatory.
lua/config, when there are no snapshots (this is bootstrapping): In the very first box.cfg{}, we provide all names (instance, replicaset). No additional actions are taken.
Closes #8978