Cluster, replicaset, and instance names #8289

Gerold103 · 2023-02-09T23:45:11Z

The patchset allows to give instance, replicaset, and cluster (multiple replicasets) a string name. It is like UUID, but much less limitations on what the name can be. See also #5029.

Integration tests surely won't pass because there are notable breaking changes.

Closes #5029

Gerold103 · 2023-02-12T21:28:52Z

To be clear, I don't think it is ready for a review. But I was asked to submit it anyway.

sergepetrenko

Thanks for the patchset!

I've left some comments inline, and here's my opinion on your questions:

The behaviour of box.NULL-as-a-name might look a bit weird. When you write box.cfg{instance_name = box.NULL}, do you expect the name to be empty or "any"? Currently when you write box.cfg{replicaset_uuid = box.NULL} or box.cfg{instance_uuid = box.NULL}, the behaviour is the same as if you wrote = nil. But I don't know if this is good to do with the names too.

I think we shouldn't distinguish box.NULL and nil. Otherwise it would look like some dirty hack IMO.
Maybe we can forbid setting name to box.NULL completely. So that the user isn't confused.

From the previous point. If name = box.NULL is treated like name = nil in box.cfg{}, then there is no way to drop a name. Assume you don't like a name and you try to just drop it like this: box.cfg{force_recovery = true, instance_name = box.NULL}. However this line won't do anything if we treat box.NULL in names like "any" (like we do with UUIDs). To drop a name the user would have to do this: box.cfg{force_recovery = true} box.space._cluster:update({box.info.id}, {{'=', 'name', box.NULL}}). On one hand changing the name after boot is not supposed to be easy anyway, it is banned without force_recovery. But on the other hand it still looks weird. An alternative was to allow a special name like box.cfg{instance_name = ""}, which would be not "any", but would be displayed like nil everywhere (box.info.name == nil that is). I didn't do this because don't know if I should - it affects the public API and is not too trivial to allow.

Do we really need to drop names?
IMO if the user thinks name's bad, he can change it to something else, but not drop it completely.

The patchset won't land into 2.11. AFAIK, the next is 3.0, no? If yes, then it might be that the box.info rework shouldn't be flickering like depending on whether the names are used. It might as well just be reworked in accordance with names unconditionally.

Sounds good to me. We may show box.info.name always.

The names in C code are stored in a new type tt_name_buf_t (which is just a static char array). I don't like the name tt_name too much, but couldn't find anything better. I am open to suggestions.

My thoughts were tt_entity_name or something.
Or maybe tt_unique_name would do.

I decided to store the names in a static char array (tt_name_buf_t) for simplicity - can treat them as strings everywhere. I didn't know if this is a good idea to make it a proper struct. I thought about it a lot, and decided to keep it a char array but with special methods. We can discuss if the names should get their own struct type.

Yep, that caught my eye. I vote for a proper struct. Mainly because you assume that tt_name is no longer than 64 bytes, but the functions working on tt_name actually accept any string, so it's easy to misuse them.

I don't like that _schema is no longer a key-value dictionary. There are new keys: _cluster and _replicaset. They are supposed to store name in field-2 and uuid in field-3 (cluster has no uuid yet, it is null now). That makes it key-and-2-values in the new tuples. Also the field naming works not good: you can't do _schema:get{'_replicaset'}.uuid, and for name you have to do _schema:get{'_replicaset'}.value (because _schema format is {key, value}). Maybe we should store names and uuids separately? Like _replicaset_uuid, _replicaset_name, _cluster_name?

I vote for storing "instance_uuid", "instance_name", "replicaset_uuid", "replicaset_name" separately.

src/lib/core/tt_name.c

src/lib/core/tt_name.h

src/box/replication.cc

src/box/alter.cc

test/box-luatest/schema_sys_space_test.lua

src/box/box.cc

test/replication/suite.ini

Gerold103 · 2023-02-16T22:13:06Z

I think we shouldn't distinguish box.NULL and nil

We do distinguish them already. box.NULL is supposed to mean 'set to default'. For example, you can drop box.cfg.listen this way.

Do we really need to drop names?

Personally - I don't care too much. This is only about easy-to-use public API. Private API will be to drop it from _schema or _cluster via :update(). For me either way is ok.

We may show box.info.name always.

Oh, if it would be just about this. The major breaking change is that box.info.cluster, an already existing field, completely changes its meaning.

tt_entity_name, tt_unique_name

Nah, tbh these look the same as just tt_name for me. I was thinking about tt_instance_name, but the problem is that it is not just about instances. Then I thought about tt_host_name, but the same problem - it is also about replicasets and the whole cluster which are not hosts. Although still maybe it is better than just tt_name ...

sergepetrenko · 2023-02-17T14:59:04Z

I think we shouldn't distinguish box.NULL and nil

We do distinguish them already. box.NULL is supposed to mean 'set to default'. For example, you can drop box.cfg.listen this way.

Ok. If we decide to support dropping names, I'd probably prefer setting name to an empty string (""), rather than to box.NULL.
Besides, you can stop listening the same way: box.cfg{listen=""}.

Do we really need to drop names?

Personally - I don't care too much. This is only about easy-to-use public API. Private API will be to drop it from _schema or _cluster via :update(). For me either way is ok.

We may show box.info.name always.

Oh, if it would be just about this. The major breaking change is that box.info.cluster, an already existing field, completely changes its meaning.

Yes, I missed that. Now I think this is fine for 3.0.
We could at most introduce a compat option for it with "default" equal to "new" (since it's a major release).

tt_entity_name, tt_unique_name

Nah, tbh these look the same as just tt_name for me. I was thinking about tt_instance_name, but the problem is that it is not just about instances. Then I thought about tt_host_name, but the same problem - it is also about replicasets and the whole cluster which are not hosts. Although still maybe it is better than just tt_name ...

Maybe tt_identifier? But it's just a synonym to name.
Or simply tt_id? Or even tt_id_string.

sergepetrenko

Hi! Thanks for the patch!

Please find my comments on the first 14 commits below. I'll return to review the last 4 commits tomorrow.

test/box-luatest/schema_sys_space_test.lua

src/box/alter.cc

src/box/box.cc

test/replication-luatest/anon_test.lua

src/box/box.cc

changelogs/unreleased/schema-new-replicaset-uuid-key.md

src/lib/core/tt_hostname.h

sergepetrenko

Thanks for working on this!
Nicely done. I only have minor comments.

src/box/alter.cc

src/box/lua/load_cfg.lua

src/box/alter.cc

test/replication-luatest/cluster_name_test.lua

src/box/alter.cc

src/box/box.cc

test/replication-luatest/replicaset_name_test.lua

src/box/alter.cc

coveralls · 2023-03-27T14:35:47Z

Coverage: 85.769% (+0.03%) from 85.739% when pulling 0537386 on Gerold103:gerold103/gh-5029-names into 7316d81
on tarantool:master.

sergepetrenko

LGTM.

Waiting for fixes in integration tests (vshard, cartridge, crud, tarantool-python, go-tarantool).

The next major release of Tarantool (3.0.0) will not support Cartridge. So tet's be ready in advance. This patch removes the cartridge and crud integration test runs to make the integration tests pass for #8289. NO_DOC=ci NO_TEST=ci NO_CHANGELOG=ci

It wasn't allowed to drop it, but was allowed to update. The patch bans it. Firstly, it was not supposed to work. Secondly, a future patch will introduce a new tuple in _schema, which would store replicaset UUID too. It won't allow UUID update. Would be strange, if the update would be let through one _schema tuple and wouldn't work via another. Needed for tarantool#5029 NO_DOC=bugfix

The function replica_check_id() is called on any change in _cluster: insert, delete, update. It was supposed to check if the replica ID is valid - not nil, not out of range (VCLOCK_MAX). But it was also raising an error when the ID matched this instance's ID unless the instance was joining. That happened even if a _cluster tuple was updated without changing the ID at all. For example, if one would just do _cluster:replace(_cluster:get(box.info.id)). That was a surprising side effect of the ID checker which blocked next patches. The next commits are going to introduce a new field in _cluster (replica name) which will be mutable. Such behaviour of replica_check_id() wouldn't allow to update even that new field. Better do the check in the only place where the mutation can happen - on deletion. Since replica ID is a primary key in _cluster, it can't be updated there. Only inserted or deleted. Needed for tarantool#5029 NO_DOC=bugfix and refactoring NO_CHANGELOG=couldn't happen unless user touched _cluster in a weird way NO_TEST=covered by next commits, too insignificant for an own test

Deletion of the own entry from _cluster space is allowed during the join stage, because the remote master could have already had the joining instance UUID in _cluster space but then deleted it. Then for the joining instance it looks like deletion of self from _cluster. But that is fine - in the end of join the master will register the replica again. The case is handled, but not covered with a test. The patch adds one. NO_DOC=test NO_CHANGELOG=test

_schema on_replace trigger used to treat replace as commit. No support for rollback at all and all changes are immediately visible. That is fine most of the time but still incorrect. The patch makes the space properly respect transactions. This is done as a preparation for adding several new _schema keys which will be transactional from the start. Would be strange to leave certain keys ignoring transactions. Hence this fix is done. In scope of tarantool#5029 NO_DOC=bugfix NO_CHANGELOG=couldn't happen with legal usage of public APIs

box.cfg.force_recovery used to be needed only during box.cfg() in a few places, but its usage is going to extend. In future commits about cluster/replicaset/instance names it will be needed to allow rename. It won't be entirely legal (hence can't be done without any flags), but won't be fully illegal either. The "valid" rename will be after upgrading, when an old cluster updated to a new version and wants to start using the names. Then it will have to set force_recovery, set the names, sync the instances, drop force_recovery. One-time action to allow old installations use the new feature - the names. Part of tarantool#5029 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=already covered

There were a few places where instance and replicaset UUIDs from box.cfg where passed as arguments in box.cc functions. It was fine although sometimes could cause struggling like "where along the callstack replicaset UUID is created when it was nil in cfg". But soon the situation will get more complicated. There will be up to 3 new arguments - cluster, replicaset, and instance names. Passing all these identifiers as parameters would be cumbersome. The patch makes the UUIDs fetched from the config by the functions which need them. The same will be done with the names where they are relevant. Needed for tarantool#5029 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=already covered

If attempt to set `box.cfg{replication_anon = false}` failed, the instance's ballot event had is_anon = false nonetheless. This was because on reconfig failure the option's scope guard did revert the option itself in C++ code, but didn't update the ballot. NO_DOC=bugfix

To tell whether the instance is anon there used to be just one flag in C code: replication_anon. Having one flag both for cfg and for the actual state is bad because if cfg is updated, then there is a moment when that flag can't be safely used to check the actual state. For example, when replication_anon had been true and was set to false, it took time to register the instance. In the meantime the C flag replication_anon was already false, although the instance is still anon (not present in _cluster). In the existing code it could lead to insignificant errors like when an anon instance was being registered, it could already accept IPROTO_REGISTER requests. It would fail on ER_READONLY instead of ER_UNSUPPORTED. It wasn't a critical problem, but still it wasn't correct to use cfg flag for checking the actual state. Now there is a separate cfg flag and a function for checking the real state. This patch is done because soon there will be a new option which also takes time to change: instance name. This commit sets a pattern how to deal with such options. In scope of tarantool#5029 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=already covered

_cluster on_replace trigger in alter.cc was a huge multi-screen function with many indentation levels. It was not too bad in its old state. But soon it is going to get more complicated as _cluster will get a new field - 'name'. Its update will require own on commit and rollback triggers, own checks, errors. Trying to fit name processing into the old monstrous function didn't look too tempting, so the trigger now is split into multiple functions serving update, insert, and delete separately. At least it helps to reduce the indentation. Needed for tarantool#5029 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=already covered

box_on_join() was called not only on IPROTO_JOIN but also on IPROTO_REGISTER. The name was a bit misleading. It is now called box_register_replica(). The old box_register_replica() is renamed to box_insert_replica_record(). It says "insert record", because this is what it does - inserts a new tuple into _cluster space. It also skips the check whether the instance is read-only. It allows to use the function on the bootstrap master so as it could register itself. Needed for tarantool#5029 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=already covered

@TarantoolBot

Replicaset UUID was stored in _schema['cluster'] tuple. This is going to be confusing soon, because there will be introduced an actual concept of cluster as multiple replicasets. The patch renames it to 'replicaset_uuid'. Part of tarantool#5029 @TarantoolBot document Title: Update '_schema' with new 'replicaset_uuid' key Currently _schema system space is documented to have 'cluster' key with replicaset UUID value. Now this key is deleted (since 3.0) and the UUID is stored in 'replicaset_uuid' key.

@TarantoolBot

It was named 'cluster', but really was just about the replicaset. This is going to be even more confusing soon, because there will be introduced an actual concept of cluster as multiple replicasets. The patch renames it to 'replicaset'. `box.info.cluster` now means the whole cluster and is empty so far. Next patches will add here the cluster name. Part of tarantool#5029 @TarantoolBot document Title: `box.info.cluster` is renamed to `box.info.replicaset` Done since 3.0.0. The old behaviour can be reverted back via the `compat` option `box_info_cluster_meaning`. `box.info.cluster` key is still here, but now means a totally different thing - the entire cluster with all its replicasets. <h2>Compat documentation</h2> `box.info.cluster` default meaning is the whole cluster with all its replicasets. To get info about only the current replicaset `box.info.replicaset` should be used. In old versions (< 3.0.0) `box.info.cluster` meant the current replicaset and `box.info.replicaset` didn't exist. <h3>Old and new behaviour</h3> New behaviour: ``` tarantool> box.info.cluster --- - <some cluster keys> ... tarantool> box.info.replicaset --- - uuid: <replicaset uuid> - <... other attributes of the replicaset> ... ``` Old behaviour: ``` tarantool> box.info.cluster --- - uuid: <replicaset uuid> - <... other attributes of the replicaset> ... tarantool> box.info.replicaset (= nil on < 3.0.0) --- - uuid: <replicaset uuid> - <... other attributes of the replicaset> ... ``` <h3>Known compatibility issues</h3> VShard versions < 0.1.24 do not support the new behaviour. <h3>Detecting issues in you codebase</h3> Look for all usages of `box.info.cluster`, `info.cluster`, and even just `.cluster`, `['cluster']`, `["cluster"]`. For the new behaviour to work all of them have to use 'replicaset' key.

Node name stores a DNS- and host- friendly string name. It will be used in the next patches for some new global names: cluster, replicaset, and instance. Part of tarantool#5029 NO_DOC=internal NO_CHANGELOG=internal

The new function check_global_ids_integrity() checks that the replicaset UUID specified in the config and found in the data match. Instance UUID is created at bootstrap and validated at the beginning of recovery, not in the end. Hence not checked here. For now this function is not very useful, but soon there will be more global IDs stored in WAL which will need validation. Needed for tarantool#5029 NO_DOC=refactoring NO_CHANGELOG=refactoring NO_TEST=already covered

@TarantoolBot

The patch adds 2 new entities to replication: the concept of a cluster which has multiple replicasets and a name for this cluster. The name so far doesn't participate in any replication protocols. It is just stored in _schema and is validated against the config. The old mentions of 'cluster' (in logs, in some protocol keys like in the feedback daemon) everywhere are now considered obsolete and probably will be eventually replaced with 'replicaset'. Part of tarantool#5029 @TarantoolBot document Title: `box.cfg.cluster_name` and `box.info.cluster.name` The new option `box.cfg.cluster_name` allows to assign the cluster name to a human-readable text value to be displayed in the new info key - `box.info.cluster.name` - and to be validated when the instances in the cluster connect to each other. The name is broadcasted in "box.id" built-in event as "cluster_name" key. It is string when set and nil when not set. When set, it has to match in all instances of the entire cluster in all its replicasets. If a name wasn't set on cluster bootstrap (was forgotten or the cluster is upgraded from a version < 3.0), then it can be set on an already running instance via `box.cfg.cluster_name`. To change or drop an already installed name one has to use `box.cfg.force_recovery == true` in all instances of the cluster. After the name is updated and all the instances synced, the `force_recovery` can be set back to `false`. The name can be <= 63 symbols long, can consist only of chars '0'-'9', '-' and 'a'-'z'. It must start with a letter. When upper-case letters are used in `box.cfg`, they are automatically converted to lower-case. The names are host- and DNS-friendly.

@TarantoolBot

The replicaset name is carried with replicaset UUID wherever any sanity validations are needed like whether 2 instances belong to the same replicaset. Part of tarantool#5029 @TarantoolBot document Title: `box.cfg.replicaset_name` and `box.info.replicaset.name` The new option `box.cfg.replicaset_name` allows to assign the replicaset name to a human-readable text value to be displayed in the new info key - `box.info.replicaset.name` - and to be validated when the instances in the replicaset connect to each other. The name is broadcasted in "box.id" built-in event as "replicaset_name" key. It is string when set and nil when not set. When set, it has to match in all instances of the entire replicaset. If a name wasn't set on cluster bootstrap (was forgotten or the cluster is upgraded from a version < 3.0), then it can be set on an already running instance via `box.cfg.replicaset_name`. To change or drop an already installed name one has to use `box.cfg.force_recovery == true` in all instances of the cluster. After the name is updated and all the instances synced, the `force_recovery` can be set back to `false`. The name can be <= 63 symbols long, can consist only of chars ['0'-'9'], '-' and 'a'-'z'. It must start with a letter. When upper-case letters are used in `box.cfg`, they are automatically converted to lower-case. The names are host- and DNS-friendly.

@TarantoolBot

The instance name is carried with instance UUID everywhere in the replication protocols. It is visible in all other instances via _cluster and is displayed in monitoring. Part of tarantool#5029 @TarantoolBot document Title: `box.cfg.instance_name` and `box.info.name` The new option `box.cfg.instance_name` allows to assign the instance name to a human-readable text value to be displayed in the new info key - `box.info.name`. Instances can see names of their peers in `box.info.replication[id].name`. The name is broadcasted in "box.id" built-in event as "instance_name" key. It is string when set and nil when not set. When set, it has to be unique in the instance's replicaset. If a name wasn't set on cluster bootstrap (was forgotten or the cluster is upgraded from a version < 3.0), then it can be set on an already running instance via `box.cfg.instance_name`. To change or drop an already installed name one has to use `box.cfg.force_recovery == true` in all instances of the cluster. After the name is updated and all the instances synced, the `force_recovery` can be set back to `false`. The name can be <= 63 symbols long, can consist only of chars ['0'-'9'], '-' and 'a'-'z'. It must start with a letter. When upper-case letters are used in `box.cfg`, they are automatically converted to lower-case. The names are host- and DNS-friendly.

@TarantoolBot

Previously it wasn't allowed to change instance UUID in _cluster. When needed, it had to be done manually by deleting the instance from _cluster and inserting it back with a new UUID. Or not to be done at all. Re-UUID (like re-name) was reported to be used when people didn't want to register new replica IDs. They wanted to rejoin lost replicas from scratch but keep the numeric ID. With UUID they could deal by either setting it explicitly to the old value on a new instance, or by doing the manual re-UUID like described above. This commit is supposed to make things simpler. If a replica has a name, then its re-join with another UUID is not an error. Its record in _cluster is automatically updated to store the new UUID. That is only possible if the old-UUID-instance is not connected anymore and is not listed in replication cfg. Closes tarantool#5029 @TarantoolBot document Title: Instance rebootstrap with new UUID but same ID and name If an instance has a non-empty instance name (`box.cfg.instance_name`), then at rebootstrap it can keep the name and its old numeric ID (space `_cluster['id']` field). This might be needed if one doesn't want to pollute `_cluster` with new rows, and somewhy doesn't want to or can't just drop the rows belonging to the dead replicas. In order for this to work 1) the rebootstrapping replica must keep its old non-empty instance name, 2) the other instances should not have any alive connections to the old dead replica. Ideally, the old replica should be just deleted from `box.cfg.replication` everywhere. When that works, the old row in `_cluster` is automatically updated with the new instance UUID.

We need use `box.info.replication.uuid` instead of `box.info.cluster.uuid` to support Tarantool 3.0 [1]. 1. tarantool/tarantool#8289 Part of #366 Closes #371

Gerold103 self-assigned this Feb 9, 2023

Gerold103 added the do not merge Not ready to be merged label Feb 9, 2023

sergos requested review from sergepetrenko, Mons and locker February 10, 2023 20:57

sergepetrenko reviewed Feb 16, 2023

View reviewed changes

Gerold103 force-pushed the gerold103/gh-5029-names branch from a35e567 to 0768ed5 Compare March 9, 2023 22:48

Gerold103 changed the title ~~[WIP] Cluster, replicaset, and instance names~~ Cluster, replicaset, and instance names Mar 9, 2023

Gerold103 removed the do not merge Not ready to be merged label Mar 9, 2023

Gerold103 force-pushed the gerold103/gh-5029-names branch from 0768ed5 to 4a89d09 Compare March 9, 2023 22:54

Gerold103 marked this pull request as ready for review March 9, 2023 22:54

Gerold103 requested review from sergepetrenko and removed request for Mons March 9, 2023 22:57

Gerold103 force-pushed the gerold103/gh-5029-names branch 3 times, most recently from 77d80b1 to a4f85c6 Compare March 10, 2023 00:35

sergepetrenko assigned sergepetrenko and unassigned Gerold103 Mar 10, 2023

sergepetrenko requested changes Mar 20, 2023

View reviewed changes

sergepetrenko requested changes Mar 21, 2023

View reviewed changes

sergepetrenko assigned Gerold103 and unassigned sergepetrenko Mar 21, 2023

Gerold103 force-pushed the gerold103/gh-5029-names branch 2 times, most recently from 5a19bab to 7d5335f Compare March 27, 2023 14:02

Gerold103 requested a review from sergepetrenko March 28, 2023 10:49

Gerold103 mentioned this pull request Mar 28, 2023

Support Tarantool 3.0 tarantool/vshard#407

Merged

sergepetrenko approved these changes May 18, 2023

View reviewed changes

ylobankov mentioned this pull request May 19, 2023

ci: rm cartridge and crud integration test runs #8674

Merged

Gerold103 added 18 commits May 19, 2023 17:01

box: introduce node_name funcs and constants

d6d9b00

Node name stores a DNS- and host- friendly string name. It will be used in the next patches for some new global names: cluster, replicaset, and instance. Part of tarantool#5029 NO_DOC=internal NO_CHANGELOG=internal

sergepetrenko force-pushed the gerold103/gh-5029-names branch from 93d3e7d to 0537386 Compare May 19, 2023 14:14

sergepetrenko merged commit 4507c59 into tarantool:master May 19, 2023
86 checks passed

Totktonada mentioned this pull request Jul 14, 2023

Support instance/replicaset/cluster names instead of UUIDs in the configuration tarantool/vshard#426

Closed

Totktonada mentioned this pull request Aug 29, 2023

doc/playground.lua does not start under tarantool 3.0 tarantool/crud#371

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster, replicaset, and instance names #8289

Cluster, replicaset, and instance names #8289

Gerold103 commented Feb 9, 2023 •

edited by sergepetrenko

Gerold103 commented Feb 12, 2023

sergepetrenko left a comment

Gerold103 commented Feb 16, 2023

sergepetrenko commented Feb 17, 2023 •

edited

sergepetrenko left a comment

sergepetrenko left a comment

coveralls commented Mar 27, 2023 •

edited

sergepetrenko left a comment

Cluster, replicaset, and instance names #8289

Cluster, replicaset, and instance names #8289

Conversation

Gerold103 commented Feb 9, 2023 • edited by sergepetrenko

Gerold103 commented Feb 12, 2023

sergepetrenko left a comment

Choose a reason for hiding this comment

Gerold103 commented Feb 16, 2023

sergepetrenko commented Feb 17, 2023 • edited

sergepetrenko left a comment

Choose a reason for hiding this comment

sergepetrenko left a comment

Choose a reason for hiding this comment

coveralls commented Mar 27, 2023 • edited

sergepetrenko left a comment

Choose a reason for hiding this comment

Gerold103 commented Feb 9, 2023 •

edited by sergepetrenko

sergepetrenko commented Feb 17, 2023 •

edited

coveralls commented Mar 27, 2023 •

edited