-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster, replicaset, and instance names #8289
Cluster, replicaset, and instance names #8289
Conversation
To be clear, I don't think it is ready for a review. But I was asked to submit it anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patchset!
I've left some comments inline, and here's my opinion on your questions:
The behaviour of box.NULL-as-a-name might look a bit weird. When you write box.cfg{instance_name = box.NULL}, do you expect the name to be empty or "any"? Currently when you write box.cfg{replicaset_uuid = box.NULL} or box.cfg{instance_uuid = box.NULL}, the behaviour is the same as if you wrote = nil. But I don't know if this is good to do with the names too.
I think we shouldn't distinguish box.NULL
and nil
. Otherwise it would look like some dirty hack IMO.
Maybe we can forbid setting name
to box.NULL
completely. So that the user isn't confused.
From the previous point. If name = box.NULL is treated like name = nil in box.cfg{}, then there is no way to drop a name. Assume you don't like a name and you try to just drop it like this: box.cfg{force_recovery = true, instance_name = box.NULL}. However this line won't do anything if we treat box.NULL in names like "any" (like we do with UUIDs). To drop a name the user would have to do this: box.cfg{force_recovery = true} box.space._cluster:update({box.info.id}, {{'=', 'name', box.NULL}}). On one hand changing the name after boot is not supposed to be easy anyway, it is banned without force_recovery. But on the other hand it still looks weird. An alternative was to allow a special name like box.cfg{instance_name = ""}, which would be not "any", but would be displayed like nil everywhere (box.info.name == nil that is). I didn't do this because don't know if I should - it affects the public API and is not too trivial to allow.
Do we really need to drop names?
IMO if the user thinks name's bad, he can change it to something else, but not drop it completely.
The patchset won't land into 2.11. AFAIK, the next is 3.0, no? If yes, then it might be that the box.info rework shouldn't be flickering like depending on whether the names are used. It might as well just be reworked in accordance with names unconditionally.
Sounds good to me. We may show box.info.name
always.
The names in C code are stored in a new type tt_name_buf_t (which is just a static char array). I don't like the name tt_name too much, but couldn't find anything better. I am open to suggestions.
My thoughts were tt_entity_name
or something.
Or maybe tt_unique_name
would do.
I decided to store the names in a static char array (tt_name_buf_t) for simplicity - can treat them as strings everywhere. I didn't know if this is a good idea to make it a proper struct. I thought about it a lot, and decided to keep it a char array but with special methods. We can discuss if the names should get their own struct type.
Yep, that caught my eye. I vote for a proper struct. Mainly because you assume that tt_name
is no longer than 64 bytes, but the functions working on tt_name actually accept any string, so it's easy to misuse them.
I don't like that _schema is no longer a key-value dictionary. There are new keys: _cluster and _replicaset. They are supposed to store name in field-2 and uuid in field-3 (cluster has no uuid yet, it is null now). That makes it key-and-2-values in the new tuples. Also the field naming works not good: you can't do _schema:get{'_replicaset'}.uuid, and for name you have to do _schema:get{'_replicaset'}.value (because _schema format is {key, value}). Maybe we should store names and uuids separately? Like _replicaset_uuid, _replicaset_name, _cluster_name?
I vote for storing "instance_uuid", "instance_name", "replicaset_uuid", "replicaset_name" separately.
We do distinguish them already.
Personally - I don't care too much. This is only about easy-to-use public API. Private API will be to drop it from
Oh, if it would be just about this. The major breaking change is that
Nah, tbh these look the same as just |
Ok. If we decide to support dropping names, I'd probably prefer setting name to an empty string (""), rather than to box.NULL.
Yes, I missed that. Now I think this is fine for 3.0.
Maybe |
a35e567
to
0768ed5
Compare
0768ed5
to
4a89d09
Compare
77d80b1
to
a4f85c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Thanks for the patch!
Please find my comments on the first 14 commits below. I'll return to review the last 4 commits tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
Nicely done. I only have minor comments.
5a19bab
to
7d5335f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Waiting for fixes in integration tests (vshard, cartridge, crud, tarantool-python, go-tarantool).
The next major release of Tarantool (3.0.0) will not support Cartridge. So tet's be ready in advance. This patch removes the cartridge and crud integration test runs to make the integration tests pass for #8289. NO_DOC=ci NO_TEST=ci NO_CHANGELOG=ci
The next major release of Tarantool (3.0.0) will not support Cartridge. So tet's be ready in advance. This patch removes the cartridge and crud integration test runs to make the integration tests pass for #8289. NO_DOC=ci NO_TEST=ci NO_CHANGELOG=ci
It wasn't allowed to drop it, but was allowed to update. The patch bans it. Firstly, it was not supposed to work. Secondly, a future patch will introduce a new tuple in _schema, which would store replicaset UUID too. It won't allow UUID update. Would be strange, if the update would be let through one _schema tuple and wouldn't work via another. Needed for tarantool#5029 NO_DOC=bugfix
The function replica_check_id() is called on any change in _cluster: insert, delete, update. It was supposed to check if the replica ID is valid - not nil, not out of range (VCLOCK_MAX). But it was also raising an error when the ID matched this instance's ID unless the instance was joining. That happened even if a _cluster tuple was updated without changing the ID at all. For example, if one would just do _cluster:replace(_cluster:get(box.info.id)). That was a surprising side effect of the ID checker which blocked next patches. The next commits are going to introduce a new field in _cluster (replica name) which will be mutable. Such behaviour of replica_check_id() wouldn't allow to update even that new field. Better do the check in the only place where the mutation can happen - on deletion. Since replica ID is a primary key in _cluster, it can't be updated there. Only inserted or deleted. Needed for tarantool#5029 NO_DOC=bugfix and refactoring NO_CHANGELOG=couldn't happen unless user touched _cluster in a weird way NO_TEST=covered by next commits, too insignificant for an own test
Deletion of the own entry from _cluster space is allowed during the join stage, because the remote master could have already had the joining instance UUID in _cluster space but then deleted it. Then for the joining instance it looks like deletion of self from _cluster. But that is fine - in the end of join the master will register the replica again. The case is handled, but not covered with a test. The patch adds one. NO_DOC=test NO_CHANGELOG=test
_schema on_replace trigger used to treat replace as commit. No support for rollback at all and all changes are immediately visible. That is fine most of the time but still incorrect. The patch makes the space properly respect transactions. This is done as a preparation for adding several new _schema keys which will be transactional from the start. Would be strange to leave certain keys ignoring transactions. Hence this fix is done. In scope of tarantool#5029 NO_DOC=bugfix NO_CHANGELOG=couldn't happen with legal usage of public APIs
box.cfg.force_recovery used to be needed only during box.cfg() in a few places, but its usage is going to extend. In future commits about cluster/replicaset/instance names it will be needed to allow rename. It won't be entirely legal (hence can't be done without any flags), but won't be fully illegal either. The "valid" rename will be after upgrading, when an old cluster updated to a new version and wants to start using the names. Then it will have to set force_recovery, set the names, sync the instances, drop force_recovery. One-time action to allow old installations use the new feature - the names. Part of tarantool#5029 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=already covered
There were a few places where instance and replicaset UUIDs from box.cfg where passed as arguments in box.cc functions. It was fine although sometimes could cause struggling like "where along the callstack replicaset UUID is created when it was nil in cfg". But soon the situation will get more complicated. There will be up to 3 new arguments - cluster, replicaset, and instance names. Passing all these identifiers as parameters would be cumbersome. The patch makes the UUIDs fetched from the config by the functions which need them. The same will be done with the names where they are relevant. Needed for tarantool#5029 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=already covered
If attempt to set `box.cfg{replication_anon = false}` failed, the instance's ballot event had is_anon = false nonetheless. This was because on reconfig failure the option's scope guard did revert the option itself in C++ code, but didn't update the ballot. NO_DOC=bugfix
To tell whether the instance is anon there used to be just one flag in C code: replication_anon. Having one flag both for cfg and for the actual state is bad because if cfg is updated, then there is a moment when that flag can't be safely used to check the actual state. For example, when replication_anon had been true and was set to false, it took time to register the instance. In the meantime the C flag replication_anon was already false, although the instance is still anon (not present in _cluster). In the existing code it could lead to insignificant errors like when an anon instance was being registered, it could already accept IPROTO_REGISTER requests. It would fail on ER_READONLY instead of ER_UNSUPPORTED. It wasn't a critical problem, but still it wasn't correct to use cfg flag for checking the actual state. Now there is a separate cfg flag and a function for checking the real state. This patch is done because soon there will be a new option which also takes time to change: instance name. This commit sets a pattern how to deal with such options. In scope of tarantool#5029 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=already covered
_cluster on_replace trigger in alter.cc was a huge multi-screen function with many indentation levels. It was not too bad in its old state. But soon it is going to get more complicated as _cluster will get a new field - 'name'. Its update will require own on commit and rollback triggers, own checks, errors. Trying to fit name processing into the old monstrous function didn't look too tempting, so the trigger now is split into multiple functions serving update, insert, and delete separately. At least it helps to reduce the indentation. Needed for tarantool#5029 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=already covered
box_on_join() was called not only on IPROTO_JOIN but also on IPROTO_REGISTER. The name was a bit misleading. It is now called box_register_replica(). The old box_register_replica() is renamed to box_insert_replica_record(). It says "insert record", because this is what it does - inserts a new tuple into _cluster space. It also skips the check whether the instance is read-only. It allows to use the function on the bootstrap master so as it could register itself. Needed for tarantool#5029 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=already covered
Replicaset UUID was stored in _schema['cluster'] tuple. This is going to be confusing soon, because there will be introduced an actual concept of cluster as multiple replicasets. The patch renames it to 'replicaset_uuid'. Part of tarantool#5029 @TarantoolBot document Title: Update '_schema' with new 'replicaset_uuid' key Currently _schema system space is documented to have 'cluster' key with replicaset UUID value. Now this key is deleted (since 3.0) and the UUID is stored in 'replicaset_uuid' key.
It was named 'cluster', but really was just about the replicaset. This is going to be even more confusing soon, because there will be introduced an actual concept of cluster as multiple replicasets. The patch renames it to 'replicaset'. `box.info.cluster` now means the whole cluster and is empty so far. Next patches will add here the cluster name. Part of tarantool#5029 @TarantoolBot document Title: `box.info.cluster` is renamed to `box.info.replicaset` Done since 3.0.0. The old behaviour can be reverted back via the `compat` option `box_info_cluster_meaning`. `box.info.cluster` key is still here, but now means a totally different thing - the entire cluster with all its replicasets. <h2>Compat documentation</h2> `box.info.cluster` default meaning is the whole cluster with all its replicasets. To get info about only the current replicaset `box.info.replicaset` should be used. In old versions (< 3.0.0) `box.info.cluster` meant the current replicaset and `box.info.replicaset` didn't exist. <h3>Old and new behaviour</h3> New behaviour: ``` tarantool> box.info.cluster --- - <some cluster keys> ... tarantool> box.info.replicaset --- - uuid: <replicaset uuid> - <... other attributes of the replicaset> ... ``` Old behaviour: ``` tarantool> box.info.cluster --- - uuid: <replicaset uuid> - <... other attributes of the replicaset> ... tarantool> box.info.replicaset (= nil on < 3.0.0) --- - uuid: <replicaset uuid> - <... other attributes of the replicaset> ... ``` <h3>Known compatibility issues</h3> VShard versions < 0.1.24 do not support the new behaviour. <h3>Detecting issues in you codebase</h3> Look for all usages of `box.info.cluster`, `info.cluster`, and even just `.cluster`, `['cluster']`, `["cluster"]`. For the new behaviour to work all of them have to use 'replicaset' key.
Node name stores a DNS- and host- friendly string name. It will be used in the next patches for some new global names: cluster, replicaset, and instance. Part of tarantool#5029 NO_DOC=internal NO_CHANGELOG=internal
The new function check_global_ids_integrity() checks that the replicaset UUID specified in the config and found in the data match. Instance UUID is created at bootstrap and validated at the beginning of recovery, not in the end. Hence not checked here. For now this function is not very useful, but soon there will be more global IDs stored in WAL which will need validation. Needed for tarantool#5029 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=already covered
The patch adds 2 new entities to replication: the concept of a cluster which has multiple replicasets and a name for this cluster. The name so far doesn't participate in any replication protocols. It is just stored in _schema and is validated against the config. The old mentions of 'cluster' (in logs, in some protocol keys like in the feedback daemon) everywhere are now considered obsolete and probably will be eventually replaced with 'replicaset'. Part of tarantool#5029 @TarantoolBot document Title: `box.cfg.cluster_name` and `box.info.cluster.name` The new option `box.cfg.cluster_name` allows to assign the cluster name to a human-readable text value to be displayed in the new info key - `box.info.cluster.name` - and to be validated when the instances in the cluster connect to each other. The name is broadcasted in "box.id" built-in event as "cluster_name" key. It is string when set and nil when not set. When set, it has to match in all instances of the entire cluster in all its replicasets. If a name wasn't set on cluster bootstrap (was forgotten or the cluster is upgraded from a version < 3.0), then it can be set on an already running instance via `box.cfg.cluster_name`. To change or drop an already installed name one has to use `box.cfg.force_recovery == true` in all instances of the cluster. After the name is updated and all the instances synced, the `force_recovery` can be set back to `false`. The name can be <= 63 symbols long, can consist only of chars '0'-'9', '-' and 'a'-'z'. It must start with a letter. When upper-case letters are used in `box.cfg`, they are automatically converted to lower-case. The names are host- and DNS-friendly.
The replicaset name is carried with replicaset UUID wherever any sanity validations are needed like whether 2 instances belong to the same replicaset. Part of tarantool#5029 @TarantoolBot document Title: `box.cfg.replicaset_name` and `box.info.replicaset.name` The new option `box.cfg.replicaset_name` allows to assign the replicaset name to a human-readable text value to be displayed in the new info key - `box.info.replicaset.name` - and to be validated when the instances in the replicaset connect to each other. The name is broadcasted in "box.id" built-in event as "replicaset_name" key. It is string when set and nil when not set. When set, it has to match in all instances of the entire replicaset. If a name wasn't set on cluster bootstrap (was forgotten or the cluster is upgraded from a version < 3.0), then it can be set on an already running instance via `box.cfg.replicaset_name`. To change or drop an already installed name one has to use `box.cfg.force_recovery == true` in all instances of the cluster. After the name is updated and all the instances synced, the `force_recovery` can be set back to `false`. The name can be <= 63 symbols long, can consist only of chars ['0'-'9'], '-' and 'a'-'z'. It must start with a letter. When upper-case letters are used in `box.cfg`, they are automatically converted to lower-case. The names are host- and DNS-friendly.
The instance name is carried with instance UUID everywhere in the replication protocols. It is visible in all other instances via _cluster and is displayed in monitoring. Part of tarantool#5029 @TarantoolBot document Title: `box.cfg.instance_name` and `box.info.name` The new option `box.cfg.instance_name` allows to assign the instance name to a human-readable text value to be displayed in the new info key - `box.info.name`. Instances can see names of their peers in `box.info.replication[id].name`. The name is broadcasted in "box.id" built-in event as "instance_name" key. It is string when set and nil when not set. When set, it has to be unique in the instance's replicaset. If a name wasn't set on cluster bootstrap (was forgotten or the cluster is upgraded from a version < 3.0), then it can be set on an already running instance via `box.cfg.instance_name`. To change or drop an already installed name one has to use `box.cfg.force_recovery == true` in all instances of the cluster. After the name is updated and all the instances synced, the `force_recovery` can be set back to `false`. The name can be <= 63 symbols long, can consist only of chars ['0'-'9'], '-' and 'a'-'z'. It must start with a letter. When upper-case letters are used in `box.cfg`, they are automatically converted to lower-case. The names are host- and DNS-friendly.
Previously it wasn't allowed to change instance UUID in _cluster. When needed, it had to be done manually by deleting the instance from _cluster and inserting it back with a new UUID. Or not to be done at all. Re-UUID (like re-name) was reported to be used when people didn't want to register new replica IDs. They wanted to rejoin lost replicas from scratch but keep the numeric ID. With UUID they could deal by either setting it explicitly to the old value on a new instance, or by doing the manual re-UUID like described above. This commit is supposed to make things simpler. If a replica has a name, then its re-join with another UUID is not an error. Its record in _cluster is automatically updated to store the new UUID. That is only possible if the old-UUID-instance is not connected anymore and is not listed in replication cfg. Closes tarantool#5029 @TarantoolBot document Title: Instance rebootstrap with new UUID but same ID and name If an instance has a non-empty instance name (`box.cfg.instance_name`), then at rebootstrap it can keep the name and its old numeric ID (space `_cluster['id']` field). This might be needed if one doesn't want to pollute `_cluster` with new rows, and somewhy doesn't want to or can't just drop the rows belonging to the dead replicas. In order for this to work 1) the rebootstrapping replica must keep its old non-empty instance name, 2) the other instances should not have any alive connections to the old dead replica. Ideally, the old replica should be just deleted from `box.cfg.replication` everywhere. When that works, the old row in `_cluster` is automatically updated with the new instance UUID.
93d3e7d
to
0537386
Compare
We need use `box.info.replication.uuid` instead of `box.info.cluster.uuid` to support Tarantool 3.0 [1]. 1. tarantool/tarantool#8289 Part of #366 Closes #371
We need use `box.info.replication.uuid` instead of `box.info.cluster.uuid` to support Tarantool 3.0 [1]. 1. tarantool/tarantool#8289 Part of #366 Closes #371
We need use `box.info.replication.uuid` instead of `box.info.cluster.uuid` to support Tarantool 3.0 [1]. 1. tarantool/tarantool#8289 Part of #366 Closes #371
We need use `box.info.replication.uuid` instead of `box.info.cluster.uuid` to support Tarantool 3.0 [1]. 1. tarantool/tarantool#8289 Part of #366 Closes #371
The patchset allows to give instance, replicaset, and cluster (multiple replicasets) a string name. It is like UUID, but much less limitations on what the name can be. See also #5029.
Integration tests surely won't pass because there are notable breaking changes.
Closes #5029