Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion fail in vclock_follow() #4739

Closed
Totktonada opened this issue Jan 22, 2020 · 4 comments
Closed

Assertion fail in vclock_follow() #4739

Totktonada opened this issue Jan 22, 2020 · 4 comments
Assignees
Labels
bug Something isn't working crash replication
Milestone

Comments

@Totktonada
Copy link
Member

Tarantool version: 2.4.0-16-gcdf502c66.
OS version: Linux.

How to reproduce

Add the file:

$ cat test/replication/upsert_stress.test.lua
test_run = require('test_run').new()
fiber = require('fiber')

SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
test_run:create_cluster(SERVERS, "replication", {args='0.1'})
test_run:wait_fullmesh(SERVERS)

_ = test_run:cmd("switch autobootstrap1")
test_run = require('test_run').new()
engine = test_run:get_cfg('engine')
_ = pcall(function() box.space.test:drop() end)
s = box.schema.space.create('test', {engine = engine})
_ = s:create_index('pk')
_ = s:create_index('sk', {parts = {{2, 'string'}}, unique = false})

-- Wait for schema update on all instances.
_ = test_run:cmd('switch default')
vclock = test_run:get_cluster_vclock(SERVERS)
_ = test_run:wait_cluster_vclock(SERVERS, vclock)

test_run:cmd("stop server autobootstrap1 with signal=KILL")
test_run:cmd("start server autobootstrap1")
_ = test_run:cmd("switch autobootstrap1")
fiber = require('fiber')
_ = fiber.create(function()                                                  \
    for j = 1, 10 do                                                         \
        for i = 1, 1000 do                                                   \
            box.space.test:upsert({i, tostring(i)}, {{'=', 2, tostring(i)}}) \
        end                                                                  \
    end                                                                      \
end)

test_run:cmd("stop server autobootstrap2 with signal=KILL")
test_run:cmd("start server autobootstrap2")
_ = test_run:cmd("switch autobootstrap2")
fiber = require('fiber')
_ = fiber.create(function()                                                  \
    for j = 1, 10 do                                                         \
        box.begin()                                                          \
        for i = 1, 1000 do                                                   \
            box.space.test:upsert({i, tostring(i)}, {{'=', 2, tostring(i)}}) \
        end                                                                  \
        box.commit()                                                         \
    end                                                                      \
end)

test_run:cmd("stop server autobootstrap3 with signal=KILL")
test_run:cmd("start server autobootstrap3")
_ = test_run:cmd("switch autobootstrap3")
fiber = require('fiber')
_ = fiber.create(function()                                                  \
    for j = 1, 10 do                                                         \
        for i = 1, 1000 do                                                   \
            box.space.test:upsert({i, tostring(i)}, {{'=', 2, tostring(i)}}) \
        end                                                                  \
    end                                                                      \
end)

test_run:cmd("stop server autobootstrap1 with signal=KILL")
test_run:cmd("start server autobootstrap1")
_ = test_run:cmd("switch autobootstrap1")
fiber = require('fiber')
_ = fiber.create(function()                                                  \
    for j = 1, 10 do                                                         \
        for i = 1, 1000 do                                                   \
            box.space.test:upsert({i, tostring(i)}, {{'=', 2, tostring(i)}}) \
        end                                                                  \
    end                                                                      \
end)

test_run:cmd("stop server autobootstrap2 with signal=KILL")
test_run:cmd("start server autobootstrap2")
_ = test_run:cmd("switch autobootstrap2")
fiber = require('fiber')
_ = fiber.create(function()                                                  \
    for j = 1, 10 do                                                         \
        for i = 1, 1000 do                                                   \
            box.space.test:upsert({i, tostring(i)}, {{'=', 2, tostring(i)}}) \
        end                                                                  \
    end                                                                      \
end)

test_run:cmd("stop server autobootstrap3 with signal=KILL")
test_run:cmd("start server autobootstrap3")
_ = test_run:cmd("switch autobootstrap3")
fiber = require('fiber')
_ = fiber.create(function()                                                  \
    for j = 1, 10 do                                                         \
        box.begin()                                                          \
        for i = 1, 1000 do                                                   \
            box.space.test:upsert({i, tostring(i)}, {{'=', 2, tostring(i)}}) \
        end                                                                  \
        box.commit()                                                         \
    end                                                                      \
end)

test_run:cmd("stop server autobootstrap1 with signal=KILL")
test_run:cmd("start server autobootstrap1")
_ = test_run:cmd("switch autobootstrap1")
fiber = require('fiber')
_ = fiber.create(function()                                                  \
    for j = 1, 10 do                                                         \
        box.begin()                                                          \
        for i = 1, 1000 do                                                   \
            box.space.test:upsert({i, tostring(i)}, {{'=', 2, tostring(i)}}) \
        end                                                                  \
        box.commit()                                                         \
    end                                                                      \
end)

test_run:cmd("stop server autobootstrap2 with signal=KILL")
test_run:cmd("start server autobootstrap2")
_ = test_run:cmd("switch autobootstrap2")
fiber = require('fiber')
_ = fiber.create(function()                                                  \
    for j = 1, 10 do                                                         \
        for i = 1, 1000 do                                                   \
            box.space.test:upsert({i, tostring(i)}, {{'=', 2, tostring(i)}}) \
        end                                                                  \
    end                                                                      \
end)

test_run:cmd("stop server autobootstrap3 with signal=KILL")
test_run:cmd("start server autobootstrap3")
_ = test_run:cmd("switch autobootstrap3")
fiber = require('fiber')
_ = fiber.create(function()                                                  \
    for j = 1, 10 do                                                         \
        for i = 1, 1000 do                                                   \
            box.space.test:upsert({i, tostring(i)}, {{'=', 2, tostring(i)}}) \
        end                                                                  \
    end                                                                      \
end)

_ = test_run:cmd("switch autobootstrap1")
box.space.test:drop()

_ = test_run:cmd("switch default")

test_run:drop_cluster(SERVERS)

Generate the result file:

$ (cd test && ./test-run.py upsert_stress --conf memtx)

Run in parallel (for memtx or vinyl, does not matter):

$ (cd test && ./test-run.py -j 20 $(yes upsert_stress | head -n 100) --conf memtx)

Got result

[014] replication/upsert_stress.test.lua              memtx           
[014] 
[014] [Instance "autobootstrap1" killed by signal: 6 (SIGABRT)]
[014] Found assertion fail in the results file [/home/alex/projects/tarantool-meta/tarantool/test/var/014_replication/autobootstrap1.log]:
2020-01-22 14:42:10.561 [28419] main/110/main I> remote vclock {1: 9017, 2: 8962, 3: 10000} local vclock {1: 9017, 2: 8961, 3: 10000}
2020-01-22 14:42:10.620 [28419] relay/unix/:(socket)/101/main I> recover from `/home/alex/projects/tarantool-meta/tarantool/test/var/014_replication/autobootstrap1/00000000000000009035.xlog'
2020-01-22 14:42:10.647 [28419] relay/unix/:(socket)/101/main I> recover from `/home/alex/projects/tarantool-meta/tarantool/test/var/014_replication/autobootstrap1/00000000000000009035.xlog'
2020-01-22 14:42:10.910 [28419] main/112/applier/cluster@unix/:/home/ale I> can't read row
2020-01-22 14:42:10.910 [28419] main/112/applier/cluster@unix/:/home/ale coio.cc:378 !> SystemError unexpected EOF when reading from socket, called on fd 22, aka unix/:(socket), peer of unix/:(socket): Broken pipe
2020-01-22 14:42:10.910 [28419] main/112/applier/cluster@unix/:/home/ale I> will retry every 1.00 second
2020-01-22 14:42:10.929 [28419] relay/unix/:(socket)/101/main sio.c:268 !> SystemError writev(2), called on fd 32, aka unix/:(socket), peer of unix/:(socket): Broken pipe
2020-01-22 14:42:10.929 [28419] relay/unix/:(socket)/101/main C> exiting the relay loop
2020-01-22 14:42:13.595 [28419] main/113/applier/cluster@unix/:/home/ale I> can't read row
2020-01-22 14:42:13.595 [28419] main/113/applier/cluster@unix/:/home/ale xrow.c:1082 E> ER_SYSTEM: timed out
2020-01-22 14:42:13.595 [28419] main/113/applier/cluster@unix/:/home/ale I> will retry every 1.00 second
2020-01-22 14:42:14.596 [28419] main/113/applier/cluster@unix/:/home/ale I> authenticated
2020-01-22 14:42:14.597 [28419] main/113/applier/cluster@unix/:/home/ale I> subscribed
2020-01-22 14:42:14.597 [28419] main/113/applier/cluster@unix/:/home/ale I> remote vclock {1: 12991, 2: 9198, 3: 10000} local vclock {1: 12990, 2: 9198, 3: 10000}
tarantool: /home/alex/p/tarantool-meta/tarantool/src/box/vclock.c:43: vclock_follow: Assertion `lsn >= 0' failed.
[014] [ fail ]

Backtrace:

(gdb) bt
#0  0x00007fa78c6b61f1 in raise () from /lib64/libc.so.6
#1  0x00007fa78c69e55b in abort () from /lib64/libc.so.6
#2  0x00007fa78c69e42f in __assert_fail_base.cold () from /lib64/libc.so.6
#3  0x00007fa78c6ad9e2 in __assert_fail () from /lib64/libc.so.6
#4  0x000055889f0e40c8 in vclock_follow (vclock=0x7fa7884fe320, replica_id=1, lsn=-5929) at /home/alex/p/tarantool-meta/tarantool/src/box/vclock.c:43
#5  0x000055889ef27221 in wal_assign_lsn (vclock_diff=0x7fa7884fe320, base=0x55889f3ef518 <wal_writer_singleton+5976>, row=0x7fa755e3d2e0, end=0x7fa755e3d2e8)
    at /home/alex/p/tarantool-meta/tarantool/src/box/wal.c:954
#6  0x000055889ef27438 in wal_write_to_disk (msg=0x7fa78bc343a0) at /home/alex/p/tarantool-meta/tarantool/src/box/wal.c:1029
#7  0x000055889ef76efd in cmsg_deliver (msg=0x7fa78bc343a0) at /home/alex/p/tarantool-meta/tarantool/src/lib/core/cbus.c:353
#8  0x000055889ef77780 in cbus_process (endpoint=0x7fa7884ffe90) at /home/alex/p/tarantool-meta/tarantool/src/lib/core/cbus.c:635
#9  0x000055889ef777cf in cbus_loop (endpoint=0x7fa7884ffe90) at /home/alex/p/tarantool-meta/tarantool/src/lib/core/cbus.c:642
#10 0x000055889ef277c9 in wal_writer_f (ap=0x7fa788400278) at /home/alex/p/tarantool-meta/tarantool/src/box/wal.c:1127
#11 0x000055889ee25bc9 in fiber_cxx_invoke(fiber_func, typedef __va_list_tag __va_list_tag *) (f=0x55889ef27751 <wal_writer_f>, ap=0x7fa788400278)
    at /home/alex/p/tarantool-meta/tarantool/src/lib/core/fiber.h:761
#12 0x000055889ef7096b in fiber_loop (data=0x0) at /home/alex/p/tarantool-meta/tarantool/src/lib/core/fiber.c:830
#13 0x000055889f17e09f in coro_init () at /home/alex/p/tarantool-meta/tarantool/third_party/coro/coro.c:110

Observations

It seems the LSN is not sequential for replica with id 1 (it is the instance that is crashed) or replica_id in struct row is incorrect.

@Totktonada Totktonada added bug Something isn't working crash replication labels Jan 22, 2020
@kyukhin kyukhin added this to the 2.4.1 milestone Jan 24, 2020
@kyukhin
Copy link
Contributor

kyukhin commented Jan 25, 2020

I've reproduced it on 1.10.
On 4th server, command is:

./test-run.py --builddir ../../bld -j 200 $(yes replication/upsert-stress | head -n 200) --conf memtx

@Totktonada
Copy link
Member Author

RelWithDebInfo build does not fail on assertion, of course, but writes duplicate entries into an xlog file:

$ cat test/var/014_replication/autobootstrap1.log
2020-01-31 19:42:04.884 [23055] main/102/autobootstrap1 C> Tarantool 2.2.1-117-gb62c11108
2020-01-31 19:42:04.884 [23055] main/102/autobootstrap1 C> log level 5
2020-01-31 19:42:04.884 [23055] main/102/autobootstrap1 I> mapping 268435456 bytes for memtx tuple arena...
2020-01-31 19:42:04.884 [23055] main/102/autobootstrap1 I> mapping 134217728 bytes for vinyl tuple arena...
2020-01-31 19:42:05.027 [23055] main/102/autobootstrap1 I> instance uuid a62762e5-5437-4a65-9223-ad19747f0527
2020-01-31 19:42:05.030 [23055] main/102/autobootstrap1 F> LSN for 3 is used twice or COMMIT order is broken: confirmed: 7835, new: 7541, req: {type: 'UPSERT', replica_id: 3, lsn: 7541, space_id: 513, index_id: 0, tuple: [533, "533"], ops: [["=", 2, "533"]]}
2020-01-31 19:42:05.030 [23055] main/102/autobootstrap1 F> LSN for 3 is used twice or COMMIT order is broken: confirmed: 7835, new: 7541, req: {type: 'UPSERT', replica_id: 3, lsn: 7541, space_id: 513, index_id: 0, tuple: [533, "533"], ops: [["=", 2, "533"]]}
$ tarantoolctl cat --show-system test/var/014_replication/autobootstrap1/00000000000000000017.xlog | grep -A 9 -B 2 'lsn: 7541'
Processing file 'test/var/014_replication/autobootstrap1/00000000000000000017.xlog'
---
HEADER:
  lsn: 7541
  replica_id: 3
  type: UPSERT
  timestamp: 1580488922.5749
BODY:
  space_id: 513
  operations: [['=', 2, '533']]
  index_base: 1
  tuple: [533, '533']
---
--
---
HEADER:
  lsn: 7541
  replica_id: 3
  type: UPSERT
  timestamp: 1580488922.6664
BODY:
  space_id: 513
  operations: [['=', 2, '533']]
  index_base: 1
  tuple: [533, '533']
---
--
---
HEADER:
  lsn: 7541
  replica_id: 1
  type: UPSERT
  tsn: 7010
  timestamp: 1580488923.1669
BODY:
  space_id: 513
  operations: [['=', 2, '532']]
  index_base: 1
  tuple: [532, '532']

It seems we should panic in the case in a RelWithDebInfo build rather than write an incorrect xlog file.

Symptoms looks similar to #4749.

@sergepetrenko
Copy link
Collaborator

Bisect shows that the problem lies in commit 8c84932

@sergepetrenko
Copy link
Collaborator

A different error appears on 1.10: XlogError: invalid magic 0x0
This probably has to be investigated separately.

sergepetrenko added a commit that referenced this issue Feb 12, 2020
Fix replicaset.applier.vclock initialization issues: it wasn't
initialized at all previously. Moreover, there is no valid point in code
to initialize it, since it may get stale right away if new entries are
written to WAL. So, check for both applier and replicaset vclocks.
The greater one protects the instance from applying the rows it has
already applied or has already scheduled to write.
Also remove an unnecessary aplier vclock initialization from
replication_init().

Closes #4739
sergepetrenko added a commit that referenced this issue Feb 12, 2020
There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
fire in release builds, of course. So we better panic on an attemt to
write a record with a duplicate or otherwise broken lsn.

Follow-up #4739
sergepetrenko added a commit that referenced this issue Feb 13, 2020
When master processes a subscribe response, it responds with its vclock
at the moment of receiving the request. However, the fiber processing
the request may yield on coio_write_xrow, when sending the response to
the replica. In the meantime, master may apply additional rows coming
from the replica after it has issued SUBSCRIBE.
Then in relay_subscribe master sets its local vclock_at_subscribe to
a possibly updated value of replicaset.vclock
So, set local_vclock_at_subscribe to a remembered value, rather than an
updated one.

Part of #4739
sergepetrenko added a commit that referenced this issue Feb 13, 2020
is_orphan status check is needed by applier in order to not re-apply
local instance rows coming from the replica after replication has
synced.

Prerequisite #4739
sergepetrenko added a commit that referenced this issue Feb 13, 2020
Remove applier vclock initialization from replication_init(), where it
is zeroed-out, and place it in the end of box_cfg_xc(), where replicaset
vclock already has a meaningful value.
Do not apply rows originating form the current instance if replication
sync has ended.

Closes #4739
sergepetrenko added a commit that referenced this issue Feb 13, 2020
There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
fire in release builds, of course. Let's at least warn the user on an
attemt to write a record with a duplicate or otherwise broken lsn.

Follow-up #4739
sergepetrenko added a commit that referenced this issue Feb 13, 2020
When master processes a subscribe response, it responds with its vclock
at the moment of receiving the request. However, the fiber processing
the request may yield on coio_write_xrow, when sending the response to
the replica. In the meantime, master may apply additional rows coming
from the replica after it has issued SUBSCRIBE.
Then in relay_subscribe master sets its local vclock_at_subscribe to
a possibly updated value of replicaset.vclock
So, set local_vclock_at_subscribe to a remembered value, rather than an
updated one.

Follow-up #4739
sergepetrenko added a commit that referenced this issue Feb 14, 2020
is_orphan status check is needed by applier in order to tell relay
whether to send the instance's own rows back or not.

Prerequisite #4739
sergepetrenko added a commit that referenced this issue Feb 14, 2020
We have a mechanism for restoring rows originating from an instance that
suffered a sudden power loss: remote masters resend the isntance's rows
received before a certain point in time, defined by remote master vclock
at the moment of subscribe.
However, this is useful only on initial replication configuraiton, when
an instance has just recovered, so that it can receive what it has
relayed but haven't synced to disk.
In other cases, when an instance is operating normally and master-master
replication is configured, the mechanism described above may lead to
instance re-applying instance's own rows, coming from a master it has just
subscribed to.
To fix the problem do not relay rows coming from a remote instance, if
the instance has already recovered.

Closes #4739
sergepetrenko added a commit that referenced this issue Feb 14, 2020
There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
fire in release builds, of course. Let's at least warn the user on an
attemt to write a record with a duplicate or otherwise broken lsn.

Follow-up #4739
sergepetrenko added a commit that referenced this issue Feb 18, 2020
is_orphan status check is needed by applier in order to tell relay
whether to send the instance's own rows back or not.

Prerequisite #4739
sergepetrenko added a commit that referenced this issue Feb 18, 2020
We have a mechanism for restoring rows originating from an instance that
suffered a sudden power loss: remote masters resend the isntance's rows
received before a certain point in time, defined by remote master vclock
at the moment of subscribe.
However, this is useful only on initial replication configuraiton, when
an instance has just recovered, so that it can receive what it has
relayed but haven't synced to disk.
In other cases, when an instance is operating normally and master-master
replication is configured, the mechanism described above may lead to
instance re-applying instance's own rows, coming from a master it has just
subscribed to.
To fix the problem do not relay rows coming from a remote instance, if
the instance has already recovered.

Closes #4739
sergepetrenko added a commit that referenced this issue Feb 18, 2020
There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
fire in release builds, of course. Let's at least warn the user on an
attemt to write a record with a duplicate or otherwise broken lsn, and
not follow such an lsn.

Follow-up #4739
sergepetrenko added a commit that referenced this issue Feb 28, 2020
is_orphan status check is needed by applier in order to tell relay
whether to send the instance's own rows back or not.

Prerequisite #4739
sergepetrenko added a commit that referenced this issue Feb 28, 2020
There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
fire in release builds, of course. Let's at least warn the user on an
attemt to write a record with a duplicate or otherwise broken lsn, and
not follow such an lsn.

Follow-up #4739
sergepetrenko added a commit that referenced this issue Feb 28, 2020
Add a filter for relay to skip rows coming from unwanted instances.
A list of instance ids whose rows replica doesn't want to fetch is encoded
together with SUBSCRIBE request after a freshly introduced flag IPROTO_ID_FILTER.

Filtering rows is needed to prevent an instance from fetching its own
rows from a remote master, which is useful on initial configuration and
harmful on resubscribe.

Prerequisite #4739, #3294

@TarantoolBot document

Title: document new binary protocol key and subscribe request changes

Add key `IPROTO_ID_FILTER = 0x51` to the internals reference.
This is an optional key used in SUBSCRIBE request followed by an array
of ids of instances whose rows won't be relayed to the replica.

SUBSCRIBE request is supplemented with an optional field of the
following structure:
```
+====================+
|      ID_FILTER     |
|   0x51 : ID LIST   |
| MP_INT : MP_ARRRAY |
|                    |
+====================+
```
The field is encoded only when the id list is not empty.
sergepetrenko added a commit that referenced this issue Feb 28, 2020
We have a mechanism for restoring rows originating from an instance that
suffered a sudden power loss: remote masters resend the isntance's rows
received before a certain point in time, defined by remote master vclock
at the moment of subscribe.
However, this is useful only on initial replication configuraiton, when
an instance has just recovered, so that it can receive what it has
relayed but haven't synced to disk.
In other cases, when an instance is operating normally and master-master
replication is configured, the mechanism described above may lead to
instance re-applying instance's own rows, coming from a master it has just
subscribed to.
To fix the problem do not relay rows coming from a remote instance, if
the instance has already recovered.

Closes #4739
sergepetrenko added a commit that referenced this issue Feb 28, 2020
We have a mechanism for restoring rows originating from an instance that
suffered a sudden power loss: remote masters resend the isntance's rows
received before a certain point in time, defined by remote master vclock
at the moment of subscribe.
However, this is useful only on initial replication configuraiton, when
an instance has just recovered, so that it can receive what it has
relayed but haven't synced to disk.
In other cases, when an instance is operating normally and master-master
replication is configured, the mechanism described above may lead to
instance re-applying instance's own rows, coming from a master it has just
subscribed to.
To fix the problem do not relay rows coming from a remote instance, if
the instance has already recovered.

Closes #4739
sergepetrenko added a commit that referenced this issue Feb 28, 2020
…g.replication

When checking wheter rejoin is needed, replica loops through all the
instances in box.cfg.replication, which makes it believe that there is a
master holding files, needed by it, since it accounts itself just like
all other instances.
So make replica skip itself when finding an instance which holds files
needed by it, and determining whether rebootstrap is needed.

We already have a working test for the issue, it missed the issue due to
replica.lua settings. Fix replica.lua to include itself in
box.cfg.replication

Closes #4739
Gerold103 pushed a commit that referenced this issue Feb 28, 2020
There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
fire in release builds, of course. Let's at least warn the user on an
attempt to write a record with a duplicate or otherwise broken lsn, and
not follow such an lsn.

Follow-up #4739
Gerold103 pushed a commit that referenced this issue Feb 28, 2020
Add a filter for relay to skip rows coming from unwanted instances.
A list of instance ids whose rows replica doesn't want to fetch is encoded
together with SUBSCRIBE request after a freshly introduced flag IPROTO_ID_FILTER.

Filtering rows is needed to prevent an instance from fetching its own
rows from a remote master, which is useful on initial configuration and
harmful on resubscribe.

Prerequisite #4739, #3294

@TarantoolBot document

Title: document new binary protocol key and subscribe request changes

Add key `IPROTO_ID_FILTER = 0x51` to the internals reference.
This is an optional key used in SUBSCRIBE request followed by an array
of ids of instances whose rows won't be relayed to the replica.

SUBSCRIBE request is supplemented with an optional field of the
following structure:
```
+====================+
|      ID_FILTER     |
|   0x51 : ID LIST   |
| MP_INT : MP_ARRRAY |
|                    |
+====================+
```
The field is encoded only when the id list is not empty.
Gerold103 pushed a commit that referenced this issue Feb 28, 2020
We have a mechanism for restoring rows originating from an instance that
suffered a sudden power loss: remote masters resend the isntance's rows
received before a certain point in time, defined by remote master vclock
at the moment of subscribe.
However, this is useful only on initial replication configuraiton, when
an instance has just recovered, so that it can receive what it has
relayed but haven't synced to disk.
In other cases, when an instance is operating normally and master-master
replication is configured, the mechanism described above may lead to
instance re-applying instance's own rows, coming from a master it has just
subscribed to.
To fix the problem do not relay rows coming from a remote instance, if
the instance has already recovered.

Closes #4739
sergepetrenko added a commit that referenced this issue Feb 29, 2020
is_orphan status check is needed by applier in order to tell relay
whether to send the instance's own rows back or not.

Prerequisite #4739
sergepetrenko added a commit that referenced this issue Feb 29, 2020
There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
fire in release builds, of course. Let's at least warn the user on an
attempt to write a record with a duplicate or otherwise broken lsn, and
not follow such an lsn.

Follow-up #4739
sergepetrenko added a commit that referenced this issue Feb 29, 2020
Add a filter for relay to skip rows coming from unwanted instances.
A list of instance ids whose rows replica doesn't want to fetch is encoded
together with SUBSCRIBE request after a freshly introduced flag IPROTO_ID_FILTER.

Filtering rows is needed to prevent an instance from fetching its own
rows from a remote master, which is useful on initial configuration and
harmful on resubscribe.

Prerequisite #4739, #3294

@TarantoolBot document

Title: document new binary protocol key and subscribe request changes

Add key `IPROTO_ID_FILTER = 0x51` to the internals reference.
This is an optional key used in SUBSCRIBE request followed by an array
of ids of instances whose rows won't be relayed to the replica.

SUBSCRIBE request is supplemented with an optional field of the
following structure:
```
+====================+
|      ID_FILTER     |
|   0x51 : ID LIST   |
| MP_INT : MP_ARRRAY |
|                    |
+====================+
```
The field is encoded only when the id list is not empty.
sergepetrenko added a commit that referenced this issue Feb 29, 2020
We have a mechanism for restoring rows originating from an instance that
suffered a sudden power loss: remote masters resend the isntance's rows
received before a certain point in time, defined by remote master vclock
at the moment of subscribe.
However, this is useful only on initial replication configuraiton, when
an instance has just recovered, so that it can receive what it has
relayed but haven't synced to disk.
In other cases, when an instance is operating normally and master-master
replication is configured, the mechanism described above may lead to
instance re-applying instance's own rows, coming from a master it has just
subscribed to.
To fix the problem do not relay rows coming from a remote instance, if
the instance has already recovered.

Closes #4739
kyukhin pushed a commit that referenced this issue Mar 2, 2020
is_orphan status check is needed by applier in order to tell relay
whether to send the instance's own rows back or not.

Prerequisite #4739
kyukhin pushed a commit that referenced this issue Mar 2, 2020
There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
fire in release builds, of course. Let's at least warn the user on an
attempt to write a record with a duplicate or otherwise broken lsn, and
not follow such an lsn.

Follow-up #4739
kyukhin pushed a commit that referenced this issue Mar 2, 2020
Add a filter for relay to skip rows coming from unwanted instances.
A list of instance ids whose rows replica doesn't want to fetch is encoded
together with SUBSCRIBE request after a freshly introduced flag IPROTO_ID_FILTER.

Filtering rows is needed to prevent an instance from fetching its own
rows from a remote master, which is useful on initial configuration and
harmful on resubscribe.

Prerequisite #4739, #3294

@TarantoolBot document

Title: document new binary protocol key and subscribe request changes

Add key `IPROTO_ID_FILTER = 0x51` to the internals reference.
This is an optional key used in SUBSCRIBE request followed by an array
of ids of instances whose rows won't be relayed to the replica.

SUBSCRIBE request is supplemented with an optional field of the
following structure:
```
+====================+
|      ID_FILTER     |
|   0x51 : ID LIST   |
| MP_INT : MP_ARRRAY |
|                    |
+====================+
```
The field is encoded only when the id list is not empty.
@kyukhin kyukhin closed this as completed in ed2e143 Mar 2, 2020
kyukhin pushed a commit that referenced this issue Mar 2, 2020
is_orphan status check is needed by applier in order to tell relay
whether to send the instance's own rows back or not.

Prerequisite #4739

(cherry picked from commit 7b83b73)
kyukhin pushed a commit that referenced this issue Mar 2, 2020
There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
fire in release builds, of course. Let's at least warn the user on an
attempt to write a record with a duplicate or otherwise broken lsn, and
not follow such an lsn.

Follow-up #4739

(cherry picked from commit e075026)
kyukhin pushed a commit that referenced this issue Mar 2, 2020
Add a filter for relay to skip rows coming from unwanted instances.
A list of instance ids whose rows replica doesn't want to fetch is encoded
together with SUBSCRIBE request after a freshly introduced flag IPROTO_ID_FILTER.

Filtering rows is needed to prevent an instance from fetching its own
rows from a remote master, which is useful on initial configuration and
harmful on resubscribe.

Prerequisite #4739, #3294

@TarantoolBot document

Title: document new binary protocol key and subscribe request changes

Add key `IPROTO_ID_FILTER = 0x51` to the internals reference.
This is an optional key used in SUBSCRIBE request followed by an array
of ids of instances whose rows won't be relayed to the replica.

SUBSCRIBE request is supplemented with an optional field of the
following structure:
```
+====================+
|      ID_FILTER     |
|   0x51 : ID LIST   |
| MP_INT : MP_ARRRAY |
|                    |
+====================+
```
The field is encoded only when the id list is not empty.

(cherry picked from commit 45de990)
kyukhin pushed a commit that referenced this issue Mar 2, 2020
We have a mechanism for restoring rows originating from an instance that
suffered a sudden power loss: remote masters resend the isntance's rows
received before a certain point in time, defined by remote master vclock
at the moment of subscribe.
However, this is useful only on initial replication configuraiton, when
an instance has just recovered, so that it can receive what it has
relayed but haven't synced to disk.
In other cases, when an instance is operating normally and master-master
replication is configured, the mechanism described above may lead to
instance re-applying instance's own rows, coming from a master it has just
subscribed to.
To fix the problem do not relay rows coming from a remote instance, if
the instance has already recovered.

Closes #4739

(cherry picked from commit ed2e143)
kyukhin pushed a commit that referenced this issue Mar 2, 2020
is_orphan status check is needed by applier in order to tell relay
whether to send the instance's own rows back or not.

Prerequisite #4739

(cherry picked from commit 7b83b73)
kyukhin pushed a commit that referenced this issue Mar 2, 2020
There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
fire in release builds, of course. Let's at least warn the user on an
attempt to write a record with a duplicate or otherwise broken lsn, and
not follow such an lsn.

Follow-up #4739

(cherry picked from commit e075026)
kyukhin pushed a commit that referenced this issue Mar 2, 2020
Add a filter for relay to skip rows coming from unwanted instances.
A list of instance ids whose rows replica doesn't want to fetch is encoded
together with SUBSCRIBE request after a freshly introduced flag IPROTO_ID_FILTER.

Filtering rows is needed to prevent an instance from fetching its own
rows from a remote master, which is useful on initial configuration and
harmful on resubscribe.

Prerequisite #4739, #3294

@TarantoolBot document

Title: document new binary protocol key and subscribe request changes

Add key `IPROTO_ID_FILTER = 0x51` to the internals reference.
This is an optional key used in SUBSCRIBE request followed by an array
of ids of instances whose rows won't be relayed to the replica.

SUBSCRIBE request is supplemented with an optional field of the
following structure:
```
+====================+
|      ID_FILTER     |
|   0x51 : ID LIST   |
| MP_INT : MP_ARRRAY |
|                    |
+====================+
```
The field is encoded only when the id list is not empty.

(cherry picked from commit 45de990)
kyukhin pushed a commit that referenced this issue Mar 2, 2020
We have a mechanism for restoring rows originating from an instance that
suffered a sudden power loss: remote masters resend the isntance's rows
received before a certain point in time, defined by remote master vclock
at the moment of subscribe.
However, this is useful only on initial replication configuraiton, when
an instance has just recovered, so that it can receive what it has
relayed but haven't synced to disk.
In other cases, when an instance is operating normally and master-master
replication is configured, the mechanism described above may lead to
instance re-applying instance's own rows, coming from a master it has just
subscribed to.
To fix the problem do not relay rows coming from a remote instance, if
the instance has already recovered.

Closes #4739

(cherry picked from commit ed2e143)
@kyukhin kyukhin modified the milestones: 1.10.6, 2.2.3 Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working crash replication
Projects
None yet
Development

No branches or pull requests

4 participants