New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SWIM: payload may be not disseminated after restart #4280
Comments
Gerold103
added a commit
that referenced
this issue
Jun 20, 2019
Traditional SWIM describes member age as incarnation - monotonically growing number to refute false gossips. But it is not enough in the real world because of necessity to detect restarts. Incarnations are not persisted, and even being persistent it won't help without addition of new incarnation-like attributes. This patch encapsulates incarnation into an 'age' to simplify further work around this area. Part of #4280
Gerold103
added a commit
that referenced
this issue
Jun 20, 2019
SWIM uses incarnation to refute old information, but it is not enough when restarts are possible. If an instance restarts, its incarnation is reset to 0. After several local and fast updates it gets N. But it is possible, that other instances also know incarnation of this instance as N, from its previous life, but with different information. They will never take new version of data, because their current version is also considered actual. As a result, incarnation is not enough. There was a necessity to create a persistent part of incarnation. This patch introduces it and calls 'generation'. As an additional profit, generation allows to react on instance restart in user defined triggers. Closes #4280 @TarantoolBot document Title: SWIM generation Generation is a persistent part of incarnation allowing users to refute old pieces of information left from previous lifes of an instance. It is a static attribute set when a SWIM instance is created, and can't be changed without restarting the instance. Generation not only helps with overriding old information, but also can be used to detect restarts in user defined triggers. How to set generation: ```Lua swim = require('swim') s = swim.new({generation = <value>}) ``` Generation can't be set in `swim:cfg`. If it is omitted, then 0 is used by default. But be careful - if the instance is started not a first time, it is safer to use a new generation. Ideally it should be persisted somehow: in a file, in a space, in a global service. How to detect restarts: ```Lua swim = require('swim') s = swim.new() s:on_member_event(function(m, e) if e:is_new_generation() then ... -- Process restart. end end) ``` `is_new_generation` is a new method of event object passed into triggers. How to learn generation - use new `swim_member:generation()` method. Binary protocol is updated. Now Protocol Logic section looks like this: +-------------------Protocol logic section--------------------+ | map { | | 0 = SWIM_SRC_UUID: 16 byte UUID, | | | | AND | | | | 2 = SWIM_FAILURE_DETECTION: map { | | 0 = SWIM_FD_MSG_TYPE: uint, enum swim_fd_msg_type, | | 1 = SWIM_FD_GENERATION: uint, | | 2 = SWIM_FD_INCARNATION: uint | | }, | | | | OR/AND | | | | 3 = SWIM_DISSEMINATION: array [ | | map { | | 0 = SWIM_MEMBER_STATUS: uint, | | enum member_status, | | 1 = SWIM_MEMBER_ADDRESS: uint, ip, | | 2 = SWIM_MEMBER_PORT: uint, port, | | 3 = SWIM_MEMBER_UUID: 16 byte UUID, | | 4 = SWIM_MEMBER_GENERATION: uint, | | 5 = SWIM_MEMBER_INCARNATION: uint, | | 6 = SWIM_MEMBER_PAYLOAD: bin | | }, | | ... | | ], | | | | OR/AND | | | | 1 = SWIM_ANTI_ENTROPY: array [ | | map { | | 0 = SWIM_MEMBER_STATUS: uint, | | enum member_status, | | 1 = SWIM_MEMBER_ADDRESS: uint, ip, | | 2 = SWIM_MEMBER_PORT: uint, port, | | 3 = SWIM_MEMBER_UUID: 16 byte UUID, | | 4 = SWIM_MEMBER_GENERATION: uint, | | 5 = SWIM_MEMBER_INCARNATION: uint, | | 6 = SWIM_MEMBER_PAYLOAD: bin | | }, | | ... | | ], | | | | OR/AND | | | | 4 = SWIM_QUIT: map { | | 0 = SWIM_QUIT_GENERATION: uint, | | 1 = SWIM_QUIT_INCARNATION: uint | | } | | } | +-------------------------------------------------------------+ Note - SWIM_FD_INCARNATION, SWIM_MEMBER_INCARNATION, SWIM_MEMBER_PAYLOAD, SWIM_QUIT_INCARNATION got new values. This is because 1) the SWIM is not released yet, and it is legal to change values, 2) I wanted to emphasize that 'generation' is first/upper part of member age, 'incarnation' is second/lower part.
Gerold103
added a commit
that referenced
this issue
Jun 22, 2019
Traditional SWIM describes member version as incarnation - volatile monotonically growing number to refute false gossips. But it is not enough in the real world because of necessity to detect restarts and refute information from previous lifes of an instance. Incarnation is going to be a two-part value with persistent upper part and volatile lower part. This patch does preparations making incarnation struct instead of a number. Volatile part is called 'version. Part of #4280
Gerold103
added a commit
that referenced
this issue
Jun 22, 2019
SWIM uses incarnation to refute old information, but it is not enough when restarts are possible. If an instance restarts, its incarnation is reset to 0. After several local and fast updates it gets N. But it is possible, that other instances also know incarnation of this instance as N, from its previous life, but with different information. They will never take new version of data, because their current version is also considered actual. As a result, incarnation is not enough. There was a necessity to create a persistent part of incarnation. This patch introduces it and calls 'generation'. As an additional profit, generation allows to react on instance restart in user defined triggers. Closes #4280 @TarantoolBot document Title: SWIM generation Incarnation now is a two-part value {generation, version}. Version is exactly the same that is called 'incarnation' in the original SWIM paper, and before this patch. It is a volatile automatically managed number to refute false gossips and update information on remote nodes. Generation is a new persistent part of incarnation allowing users to refute old pieces of information left from previous lifes of an instance. It is a static attribute set when a SWIM instance is created, and can't be changed without restarting the instance. A one could think of incarnation as 128 bit unsigned integer, where upper 64 bits are static and persistent, while lower 64 bits are volatile. Generation not only helps with overriding old information, but also can be used to detect restarts in user defined triggers, because it can be updated only when a SWIM instance is recreated. How to set generation: ```Lua swim = require('swim') s = swim.new({generation = <value>}) ``` Generation can't be set in `swim:cfg`. If it is omitted, then 0 is used by default. But be careful - if the instance is started not a first time, it is safer to use a new generation. Ideally it should be persisted somehow: in a file, in a space, in a global service. How is incarnation update changed: ```Lua swim = require('swim') s = swim.new() s:on_member_event(function(m, e) if e:is_new_incarnation() then if e:is_new_generation() then -- Process restart. end if e:is_new_version() then -- Process version update. It means -- the member is somehow changed. end end end) ``` Note, `is_new_incarnation` is now a shortcut for checking update of generation, or version, or both. Method `member:incarnation()` is changed. Now it returns cdata object with attributes `version` and `generation`. Usage: ```Lua incarnation = member:incarnation() tarantool> incarnation.version --- - 15 ... tarantool> incarnation.generation --- - 2 ... ``` These objects can be compared using comparison operators: ```Lua member1:incarnation() < member2:incarnation member1:incarnation() >= member2:incarnation() -- Any operator works: ==, <, >, <=, >=, ~=. ``` Being printed, incarnation shows a string with both generation and incarnation. Binary protocol is updated. Now Protocol Logic section looks like this: ``` +-------------------Protocol logic section--------------------+ | map { | | 0 = SWIM_SRC_UUID: 16 byte UUID, | | | | AND | | | | 2 = SWIM_FAILURE_DETECTION: map { | | 0 = SWIM_FD_MSG_TYPE: uint, enum swim_fd_msg_type, | | 1 = SWIM_FD_GENERATION: uint, | | 2 = SWIM_FD_VERSION: uint | | }, | | | | OR/AND | | | | 3 = SWIM_DISSEMINATION: array [ | | map { | | 0 = SWIM_MEMBER_STATUS: uint, | | enum member_status, | | 1 = SWIM_MEMBER_ADDRESS: uint, ip, | | 2 = SWIM_MEMBER_PORT: uint, port, | | 3 = SWIM_MEMBER_UUID: 16 byte UUID, | | 4 = SWIM_MEMBER_GENERATION: uint, | | 5 = SWIM_MEMBER_VERSION: uint, | | 6 = SWIM_MEMBER_PAYLOAD: bin | | }, | | ... | | ], | | | | OR/AND | | | | 1 = SWIM_ANTI_ENTROPY: array [ | | map { | | 0 = SWIM_MEMBER_STATUS: uint, | | enum member_status, | | 1 = SWIM_MEMBER_ADDRESS: uint, ip, | | 2 = SWIM_MEMBER_PORT: uint, port, | | 3 = SWIM_MEMBER_UUID: 16 byte UUID, | | 4 = SWIM_MEMBER_GENERATION: uint, | | 5 = SWIM_MEMBER_VERSION: uint, | | 6 = SWIM_MEMBER_PAYLOAD: bin | | }, | | ... | | ], | | | | OR/AND | | | | 4 = SWIM_QUIT: map { | | 0 = SWIM_QUIT_GENERATION: uint, | | 1 = SWIM_QUIT_VERSION: uint | | } | | } | +-------------------------------------------------------------+ ``` Note - SWIM_FD_INCARNATION, SWIM_MEMBER_INCARNATION, and SWIM_QUIT_INCARNATION disappeared. Incarnation is sent now in two parts: version and generation. SWIM_MEMBER_PAYLOAD got a new value. This changes are legal because 1) the SWIM is not released yet, so it is mutable, 2) I wanted to emphasize that 'generation' is first/upper part of incarnation, 'version' is second/lower part.
Gerold103
added a commit
that referenced
this issue
Jun 23, 2019
Traditional SWIM describes member version as incarnation - volatile monotonically growing number to refute false gossips. But it is not enough in the real world because of necessity to detect restarts and refute information from previous lifes of an instance. Incarnation is going to be a two-part value with persistent upper part and volatile lower part. This patch does preparations making incarnation struct instead of a number. Volatile part is called 'version. Part of #4280
Gerold103
added a commit
that referenced
this issue
Jun 27, 2019
swim.new() is declared as allowed to be called before swim:cfg(). But in fact swim.new({generation = ...}) didn't work because after generation extraction the empty config {} was passed to swim:cfg() and led to an error. The patch allows to call swim.new() with generation only, as well as without parameters at all. Follow up #4280
Gerold103
added a commit
that referenced
this issue
Jun 27, 2019
Generation is supposed to be a persistent counter to distinguish between different installations of the same SWIM instance. By default it was set to 0, which was quite unsafe. Kostja proposed an easy and bright solution - generation could be set to timestamp by default. In such a case on each restart it is almost 100% will be different. Follow up #4280
Gerold103
added a commit
that referenced
this issue
Jun 28, 2019
swim.new() is declared as allowed to be called before swim:cfg(). But in fact swim.new({generation = ...}) didn't work because after generation extraction the empty config {} was passed to swim:cfg() and led to an error. The patch allows to call swim.new() with generation only, as well as without parameters at all. Follow up #4280
Gerold103
added a commit
that referenced
this issue
Jun 28, 2019
Generation is supposed to be a persistent counter to distinguish between different installations of the same SWIM instance. By default it was set to 0, which was quite unsafe. Kostja proposed an easy and bright solution - generation could be set to timestamp by default. In such a case on each restart it is almost 100% will be different. Follow up #4280
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Payload dissemination has problems with restart. An example:
It happens, because payloads are updated only with a
new incarnation. Here S1 had incarnation 1 before restart, and
1 afterwards. So S2 does not see a reason to update S1's
payload.
After a discussion with Kostja, the following solution was approved. It is an evolution of the method used in ScyllaDB.
Lets add a new value
generation
. It works exactly like incarnation, but is persisted. It participates in incarnation/status comparisons, as a part of compound key:{generation, incarnation, status}
. User persists the generation anywhere, and specifies it in firstswim:cfg()
. The counter is incremented by user each time when a new SWIM instance is created.When a SWIM instance S1 receives from S2 a new generation, it 1) fires
a trigger about that so a user could react on S2's restart, 2) invalidates
local copy of S2's payload.
The text was updated successfully, but these errors were encountered: