Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWIM: payload may be not disseminated after restart #4280

Closed
Gerold103 opened this issue Jun 9, 2019 · 0 comments
Closed

SWIM: payload may be not disseminated after restart #4280

Gerold103 opened this issue Jun 9, 2019 · 0 comments
Assignees
Labels
app bug Something isn't working
Milestone

Comments

@Gerold103
Copy link
Collaborator

Payload dissemination has problems with restart. An example:

-- Create 2 SWIMs and interconnect them.

s1 = swim.new({uuid = uuid(1), uri = 0, heartbeat_rate = 0.3})
s2 = swim.new({uuid = uuid(2), uri = 0, heartbeat_rate = 0.3})
s2:add_member({uuid = uuid(1), uri = s1:self():uri()})
s1:set_payload('payload 1')
while not s2:member_by_uuid(uuid(1)):payload() do fiber.sleep(0.1) end
s2:member_by_uuid(uuid(1)):payload()

-- Now S2 knows S1's payload as 'payload 1'.

s1:delete()
s1 = swim.new({uuid = uuid(1), uri = 0, heartbeat_rate = 0.3})
s1:set_payload('payload 2')

-- Since this moment S2 will never learn S1's new
-- payload 'payload 2'.

It happens, because payloads are updated only with a
new incarnation. Here S1 had incarnation 1 before restart, and
1 afterwards. So S2 does not see a reason to update S1's
payload.

After a discussion with Kostja, the following solution was approved. It is an evolution of the method used in ScyllaDB.

Lets add a new value generation. It works exactly like incarnation, but is persisted. It participates in incarnation/status comparisons, as a part of compound key: {generation, incarnation, status}. User persists the generation anywhere, and specifies it in first swim:cfg(). The counter is incremented by user each time when a new SWIM instance is created.

When a SWIM instance S1 receives from S2 a new generation, it 1) fires
a trigger about that so a user could react on S2's restart, 2) invalidates
local copy of S2's payload.

@Gerold103 Gerold103 added bug Something isn't working app labels Jun 9, 2019
@Gerold103 Gerold103 self-assigned this Jun 9, 2019
@kyukhin kyukhin added this to the 2.3.0 milestone Jun 13, 2019
Gerold103 added a commit that referenced this issue Jun 20, 2019
Traditional SWIM describes member age as incarnation -
monotonically growing number to refute false gossips. But it is
not enough in the real world because of necessity to detect
restarts. Incarnations are not persisted, and even being
persistent it won't help without addition of new incarnation-like
attributes.

This patch encapsulates incarnation into an 'age' to simplify
further work around this area.

Part of #4280
Gerold103 added a commit that referenced this issue Jun 20, 2019
SWIM uses incarnation to refute old information, but it is not
enough when restarts are possible. If an instance restarts, its
incarnation is reset to 0. After several local and fast updates
it gets N. But it is possible, that other instances also know
incarnation of this instance as N, from its previous life, but
with different information. They will never take new version of
data, because their current version is also considered actual.

As a result, incarnation is not enough. There was a necessity to
create a persistent part of incarnation. This patch introduces it
and calls 'generation'. As an additional profit, generation
allows to react on instance restart in user defined triggers.

Closes #4280

@TarantoolBot document
Title: SWIM generation

Generation is a persistent part of incarnation allowing users to
refute old pieces of information left from previous lifes of an
instance. It is a static attribute set when a SWIM instance is
created, and can't be changed without restarting the instance.

Generation not only helps with overriding old information, but
also can be used to detect restarts in user defined triggers.

How to set generation:
```Lua
swim = require('swim')
s = swim.new({generation = <value>})
```
Generation can't be set in `swim:cfg`. If it is omitted, then 0
is used by default. But be careful - if the instance is started
not a first time, it is safer to use a new generation. Ideally it
should be persisted somehow: in a file, in a space, in a global
service.

How to detect restarts:
```Lua
swim = require('swim')
s = swim.new()
s:on_member_event(function(m, e)
    if e:is_new_generation() then
        ... -- Process restart.
    end
end)
```

`is_new_generation` is a new method of event object passed into
triggers.

How to learn generation - use new `swim_member:generation()`
method.

Binary protocol is updated. Now Protocol Logic section looks like
this:

+-------------------Protocol logic section--------------------+
| map {                                                       |
|     0 = SWIM_SRC_UUID: 16 byte UUID,                        |
|                                                             |
|                 AND                                         |
|                                                             |
|     2 = SWIM_FAILURE_DETECTION: map {                       |
|         0 = SWIM_FD_MSG_TYPE: uint, enum swim_fd_msg_type,  |
|         1 = SWIM_FD_GENERATION: uint,                       |
|         2 = SWIM_FD_INCARNATION: uint                       |
|     },                                                      |
|                                                             |
|               OR/AND                                        |
|                                                             |
|     3 = SWIM_DISSEMINATION: array [                         |
|         map {                                               |
|             0 = SWIM_MEMBER_STATUS: uint,                   |
|                                     enum member_status,     |
|             1 = SWIM_MEMBER_ADDRESS: uint, ip,              |
|             2 = SWIM_MEMBER_PORT: uint, port,               |
|             3 = SWIM_MEMBER_UUID: 16 byte UUID,             |
|             4 = SWIM_MEMBER_GENERATION: uint,               |
|             5 = SWIM_MEMBER_INCARNATION: uint,              |
|             6 = SWIM_MEMBER_PAYLOAD: bin                    |
|         },                                                  |
|         ...                                                 |
|     ],                                                      |
|                                                             |
|               OR/AND                                        |
|                                                             |
|     1 = SWIM_ANTI_ENTROPY: array [                          |
|         map {                                               |
|             0 = SWIM_MEMBER_STATUS: uint,                   |
|                                     enum member_status,     |
|             1 = SWIM_MEMBER_ADDRESS: uint, ip,              |
|             2 = SWIM_MEMBER_PORT: uint, port,               |
|             3 = SWIM_MEMBER_UUID: 16 byte UUID,             |
|             4 = SWIM_MEMBER_GENERATION: uint,               |
|             5 = SWIM_MEMBER_INCARNATION: uint,              |
|             6 = SWIM_MEMBER_PAYLOAD: bin                    |
|         },                                                  |
|         ...                                                 |
|     ],                                                      |
|                                                             |
|               OR/AND                                        |
|                                                             |
|     4 = SWIM_QUIT: map {                                    |
|         0 = SWIM_QUIT_GENERATION: uint,                     |
|         1 = SWIM_QUIT_INCARNATION: uint                     |
|     }                                                       |
| }                                                           |
+-------------------------------------------------------------+

Note - SWIM_FD_INCARNATION, SWIM_MEMBER_INCARNATION,
SWIM_MEMBER_PAYLOAD, SWIM_QUIT_INCARNATION got new values. This
is because 1) the SWIM is not released yet, and it is legal to
change values, 2) I wanted to emphasize that 'generation' is
first/upper part of member age, 'incarnation' is second/lower
part.
Gerold103 added a commit that referenced this issue Jun 22, 2019
Traditional SWIM describes member version as incarnation -
volatile monotonically growing number to refute false gossips.
But it is not enough in the real world because of necessity to
detect restarts and refute information from previous lifes of an
instance.

Incarnation is going to be a two-part value with persistent upper
part and volatile lower part. This patch does preparations making
incarnation struct instead of a number.

Volatile part is called 'version.

Part of #4280
Gerold103 added a commit that referenced this issue Jun 22, 2019
SWIM uses incarnation to refute old information, but it is not
enough when restarts are possible. If an instance restarts, its
incarnation is reset to 0. After several local and fast updates
it gets N. But it is possible, that other instances also know
incarnation of this instance as N, from its previous life, but
with different information. They will never take new version of
data, because their current version is also considered actual.

As a result, incarnation is not enough. There was a necessity to
create a persistent part of incarnation. This patch introduces it
and calls 'generation'. As an additional profit, generation
allows to react on instance restart in user defined triggers.

Closes #4280

@TarantoolBot document
Title: SWIM generation

Incarnation now is a two-part value {generation, version}.

Version is exactly the same that is called 'incarnation' in the
original SWIM paper, and before this patch. It is a volatile
automatically managed number to refute false gossips and update
information on remote nodes.

Generation is a new persistent part of incarnation allowing users
to refute old pieces of information left from previous lifes of an
instance. It is a static attribute set when a SWIM instance is
created, and can't be changed without restarting the instance.

A one could think of incarnation as 128 bit unsigned integer,
where upper 64 bits are static and persistent, while lower 64 bits
are volatile.

Generation not only helps with overriding old information, but
also can be used to detect restarts in user defined triggers,
because it can be updated only when a SWIM instance is recreated.

How to set generation:
```Lua
swim = require('swim')
s = swim.new({generation = <value>})
```
Generation can't be set in `swim:cfg`. If it is omitted, then 0
is used by default. But be careful - if the instance is started
not a first time, it is safer to use a new generation. Ideally it
should be persisted somehow: in a file, in a space, in a global
service.

How is incarnation update changed:
```Lua
swim = require('swim')
s = swim.new()
s:on_member_event(function(m, e)
    if e:is_new_incarnation() then
        if e:is_new_generation() then
            -- Process restart.
        end
        if e:is_new_version() then
            -- Process version update. It means
            -- the member is somehow changed.
        end
    end
end)
```

Note, `is_new_incarnation` is now a shortcut for checking update
of generation, or version, or both.

Method `member:incarnation()` is changed. Now it returns cdata
object with attributes `version` and `generation`. Usage:
```Lua
incarnation = member:incarnation()
tarantool> incarnation.version
---
- 15
...
tarantool> incarnation.generation
---
- 2
...
```

These objects can be compared using comparison operators:
```Lua
member1:incarnation() < member2:incarnation
member1:incarnation() >= member2:incarnation()
-- Any operator works: ==, <, >, <=, >=, ~=.
```

Being printed, incarnation shows a string with both generation
and incarnation.

Binary protocol is updated. Now Protocol Logic section looks like
this:

```
+-------------------Protocol logic section--------------------+
| map {                                                       |
|     0 = SWIM_SRC_UUID: 16 byte UUID,                        |
|                                                             |
|                 AND                                         |
|                                                             |
|     2 = SWIM_FAILURE_DETECTION: map {                       |
|         0 = SWIM_FD_MSG_TYPE: uint, enum swim_fd_msg_type,  |
|         1 = SWIM_FD_GENERATION: uint,                       |
|         2 = SWIM_FD_VERSION: uint                           |
|     },                                                      |
|                                                             |
|               OR/AND                                        |
|                                                             |
|     3 = SWIM_DISSEMINATION: array [                         |
|         map {                                               |
|             0 = SWIM_MEMBER_STATUS: uint,                   |
|                                     enum member_status,     |
|             1 = SWIM_MEMBER_ADDRESS: uint, ip,              |
|             2 = SWIM_MEMBER_PORT: uint, port,               |
|             3 = SWIM_MEMBER_UUID: 16 byte UUID,             |
|             4 = SWIM_MEMBER_GENERATION: uint,               |
|             5 = SWIM_MEMBER_VERSION: uint,                  |
|             6 = SWIM_MEMBER_PAYLOAD: bin                    |
|         },                                                  |
|         ...                                                 |
|     ],                                                      |
|                                                             |
|               OR/AND                                        |
|                                                             |
|     1 = SWIM_ANTI_ENTROPY: array [                          |
|         map {                                               |
|             0 = SWIM_MEMBER_STATUS: uint,                   |
|                                     enum member_status,     |
|             1 = SWIM_MEMBER_ADDRESS: uint, ip,              |
|             2 = SWIM_MEMBER_PORT: uint, port,               |
|             3 = SWIM_MEMBER_UUID: 16 byte UUID,             |
|             4 = SWIM_MEMBER_GENERATION: uint,               |
|             5 = SWIM_MEMBER_VERSION: uint,                  |
|             6 = SWIM_MEMBER_PAYLOAD: bin                    |
|         },                                                  |
|         ...                                                 |
|     ],                                                      |
|                                                             |
|               OR/AND                                        |
|                                                             |
|     4 = SWIM_QUIT: map {                                    |
|         0 = SWIM_QUIT_GENERATION: uint,                     |
|         1 = SWIM_QUIT_VERSION: uint                         |
|     }                                                       |
| }                                                           |
+-------------------------------------------------------------+
```

Note - SWIM_FD_INCARNATION, SWIM_MEMBER_INCARNATION, and
SWIM_QUIT_INCARNATION disappeared. Incarnation is sent now in two
parts: version and generation.

SWIM_MEMBER_PAYLOAD got a new value.

This changes are legal because 1) the SWIM is not released yet,
so it is mutable, 2) I wanted to emphasize that 'generation' is
first/upper part of incarnation, 'version' is second/lower part.
Gerold103 added a commit that referenced this issue Jun 23, 2019
Traditional SWIM describes member version as incarnation -
volatile monotonically growing number to refute false gossips.
But it is not enough in the real world because of necessity to
detect restarts and refute information from previous lifes of an
instance.

Incarnation is going to be a two-part value with persistent upper
part and volatile lower part. This patch does preparations making
incarnation struct instead of a number.

Volatile part is called 'version.

Part of #4280
Gerold103 added a commit that referenced this issue Jun 27, 2019
swim.new() is declared as allowed to be called before swim:cfg().
But in fact swim.new({generation = ...}) didn't work because
after generation extraction the empty config {} was passed to
swim:cfg() and led to an error.

The patch allows to call swim.new() with generation only, as well
as without parameters at all.

Follow up #4280
Gerold103 added a commit that referenced this issue Jun 27, 2019
Generation is supposed to be a persistent counter to distinguish
between different installations of the same SWIM instance. By
default it was set to 0, which was quite unsafe.

Kostja proposed an easy and bright solution - generation could be
set to timestamp by default. In such a case on each restart it is
almost 100% will be different.

Follow up #4280
Gerold103 added a commit that referenced this issue Jun 28, 2019
swim.new() is declared as allowed to be called before swim:cfg().
But in fact swim.new({generation = ...}) didn't work because
after generation extraction the empty config {} was passed to
swim:cfg() and led to an error.

The patch allows to call swim.new() with generation only, as well
as without parameters at all.

Follow up #4280
Gerold103 added a commit that referenced this issue Jun 28, 2019
Generation is supposed to be a persistent counter to distinguish
between different installations of the same SWIM instance. By
default it was set to 0, which was quite unsafe.

Kostja proposed an easy and bright solution - generation could be
set to timestamp by default. In such a case on each restart it is
almost 100% will be different.

Follow up #4280
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants