Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage access to universe is denied for user 'admin' #4606

Closed
rosik opened this issue Nov 1, 2019 · 1 comment
Closed

Usage access to universe is denied for user 'admin' #4606

rosik opened this issue Nov 1, 2019 · 1 comment
Assignees
Labels
bug Something isn't working replication
Milestone

Comments

@rosik
Copy link
Contributor

rosik commented Nov 1, 2019

Tarantool version: 2.3.0-193-g2bb8d1ea1

Bug description:

Setting up replication fails

Instance 1:

box.cfg({listen = 3301}) 
box.schema.user.passwd('admin', '111')
box.schema.user.grant('admin', 'replication')

Instance 2:

box.cfg({replication = "admin:111@localhost:3301"})

Master log


tarantool> 2019-11-01 19:28:33.361 [28614] main/115/main I> joining replica b28c6ed2-e693-411e-927b-06d03bda6dda at fd 13, aka 127.0.0.1:3301, peer of 127.0.0.1:48314
2019-11-01 19:28:33.366 [28614] main/115/main I> initial data sent.
2019-11-01 19:28:33.366 [28614] main I> assigned id 2 to replica b28c6ed2-e693-411e-927b-06d03bda6dda
2019-11-01 19:28:33.367 [28614] relay/127.0.0.1:48314/101/main I> recover from `./00000000000000000000.xlog'
2019-11-01 19:28:33.367 [28614] main/115/main I> final data sent.
2019-11-01 19:28:34.365 [28614] main/115/main I> subscribed replica b28c6ed2-e693-411e-927b-06d03bda6dda at fd 13, aka 127.0.0.1:3301, peer of 127.0.0.1:48316
2019-11-01 19:28:34.365 [28614] main/115/main I> remote vclock {1: 3} local vclock {1: 4}
2019-11-01 19:28:34.367 [28614] relay/127.0.0.1:48316/101/main I> recover from `./00000000000000000000.xlog'
2019-11-01 19:28:34.368 [28614] relay/127.0.0.1:48316/101/main coio.cc:379 !> SystemError unexpected EOF when reading from socket, called on fd 13, aka 127.0.0.1:3301: Broken pipe
2019-11-01 19:28:34.368 [28614] relay/127.0.0.1:48316/101/main C> exiting the relay loop

Replica log

2019-11-01 19:28:33.357 [28622] main/102/interactive I> connecting to 1 replicas
2019-11-01 19:28:33.359 [28622] main/109/applier/admin@localhost:3301 I> remote master 0598ee2d-d99d-4dd4-8490-03938667ba58 at 127.0.0.1:3301 running Tarantool 2.3.0
2019-11-01 19:28:33.360 [28622] main/102/interactive I> connected to 1 replicas
2019-11-01 19:28:33.360 [28622] main/109/applier/admin@localhost:3301 I> authenticated
2019-11-01 19:28:33.360 [28622] main/102/interactive I> bootstrapping replica from 0598ee2d-d99d-4dd4-8490-03938667ba58 at 127.0.0.1:3301
2019-11-01 19:28:33.362 [28622] main/109/applier/admin@localhost:3301 I> cluster uuid d59a749c-36bb-4ee7-aabe-d1ba973bed89
2019-11-01 19:28:33.383 [28622] main/109/applier/admin@localhost:3301 I> can't read row
2019-11-01 19:28:33.383 [28622] main/109/applier/admin@localhost:3301 alter.cc:116 E> ER_ACCESS_DENIED: Usage access to universe '' is denied for user 'admin'
2019-11-01 19:28:33.383 [28622] main/109/applier/admin@localhost:3301 I> will retry every 1.00 second
2019-11-01 19:28:34.365 [28622] main/109/applier/admin@localhost:3301 I> authenticated
2019-11-01 19:28:34.365 [28622] main/109/applier/admin@localhost:3301 I> subscribed
2019-11-01 19:28:34.365 [28622] main/109/applier/admin@localhost:3301 I> remote vclock {1: 4} local vclock {1: 3}
2019-11-01 19:28:34.367 [28622] main/109/applier/admin@localhost:3301 applier.cc:264 E> error applying row: {type: 'INSERT', replica_id: 1, lsn: 4, space_id: 320, index_id: 0, tuple: [2, "b28c6ed2-e693-411e-927b-06d03bda6dda"]}

@Gerold103 Gerold103 self-assigned this Nov 1, 2019
Gerold103 added a commit that referenced this issue Nov 1, 2019
The admin user has universal privileges before bootstrap or
recovery are done. That allows to, for example, bootstrap from a
remote master, because to do that the admin should be able to
insert into system spaces, such as _priv.

But after the patch on online credentials update was implemented
(#2763, 48d00b0) the admin could
loose its universal access if, for example, a role was granted to
him before universal access was recovered.

That happened by two reasons:

    - Any change in access rights, even in granted roles, led to
      rebuild of universal access;

    - Any change in access rights updated the universal access in
      all existing sessions, thanks to #2763.

What happened: two tarantools were started. One of them master,
granted 'replication' role to admin. Second node, slave, tried to
bootstrap from the master. The slave created an admin session and
started loading data. After it loaded 'grant replication role to
admin' command, this nullified admin universal access everywhere,
including this session. Next rows could not be applied.

Closes #4606
Gerold103 added a commit that referenced this issue Nov 1, 2019
The admin user has universal privileges before bootstrap or
recovery are done. That allows to, for example, bootstrap from a
remote master, because to do that the admin should be able to
insert into system spaces, such as _priv.

But after the patch on online credentials update was implemented
(#2763, 48d00b0) the admin could
loose its universal access if, for example, a role was granted to
him before universal access was recovered.

That happened by two reasons:

    - Any change in access rights, even in granted roles, led to
      rebuild of universal access;

    - Any change in access rights updated the universal access in
      all existing sessions, thanks to #2763.

What happened: two tarantools were started. One of them master,
granted 'replication' role to admin. Second node, slave, tried to
bootstrap from the master. The slave created an admin session and
started loading data. After it loaded 'grant replication role to
admin' command, this nullified admin universal access everywhere,
including this session. Next rows could not be applied.

Closes #4606
@Gerold103 Gerold103 added bug Something isn't working replication labels Nov 1, 2019
@rosik
Copy link
Contributor Author

rosik commented Nov 1, 2019

Please, note this bug also affects 1.10 branch. My reproducer didn't work on 1.10.4 (and I have no idea why), but @dokshina has provided more complex one - tarantool/cartridge#322, which does.

Gerold103 added a commit that referenced this issue Nov 1, 2019
The admin user has universal privileges before bootstrap or
recovery are done. That allows to, for example, bootstrap from a
remote master, because to do that the admin should be able to
insert into system spaces, such as _priv.

But after the patch on online credentials update was implemented
(#2763, 48d00b0) the admin could
loose its universal access if, for example, a role was granted to
him before universal access was recovered.

That happened by two reasons:

    - Any change in access rights, even in granted roles, led to
      rebuild of universal access;

    - Any change in access rights updated the universal access in
      all existing sessions, thanks to #2763.

What happened: two tarantools were started. One of them master,
granted 'replication' role to admin. Second node, slave, tried to
bootstrap from the master. The slave created an admin session and
started loading data. After it loaded 'grant replication role to
admin' command, this nullified admin universal access everywhere,
including this session. Next rows could not be applied.

Closes #4606
Gerold103 added a commit that referenced this issue Nov 1, 2019
The admin user has universal privileges before bootstrap or
recovery are done. That allows to, for example, bootstrap from a
remote master, because to do that the admin should be able to
insert into system spaces, such as _priv.

But after the patch on online credentials update was implemented
(#2763, 48d00b0) the admin could
loose its universal access if, for example, a role was granted to
him before universal access was recovered.

That happened by two reasons:

    - Any change in access rights, even in granted roles, led to
      rebuild of universal access;

    - Any change in access rights updated the universal access in
      all existing sessions, thanks to #2763.

What happened: two tarantools were started. One of them master,
granted 'replication' role to admin. Second node, slave, tried to
bootstrap from the master. The slave created an admin session and
started loading data. After it loaded 'grant replication role to
admin' command, this nullified admin universal access everywhere,
including this session. Next rows could not be applied.

Closes #4606
Gerold103 added a commit that referenced this issue Nov 5, 2019
The admin user has universal privileges before bootstrap or
recovery are done. That allows to, for example, bootstrap from a
remote master, because to do that the admin should be able to
insert into system spaces, such as _priv.

But after the patch on online credentials update was implemented
(#2763, 48d00b0) the admin could
loose its universal access if, for example, a role was granted to
him before universal access was recovered.

That happened by two reasons:

    - Any change in access rights, even in granted roles, led to
      rebuild of universal access;

    - Any change in access rights updated the universal access in
      all existing sessions, thanks to #2763.

What happened: two tarantools were started. One of them master,
granted 'replication' role to admin. Second node, slave, tried to
bootstrap from the master. The slave created an admin session and
started loading data. After it loaded 'grant replication role to
admin' command, this nullified admin universal access everywhere,
including this session. Next rows could not be applied.

Closes #4606
@kyukhin kyukhin added this to the 1.10.5 milestone Nov 8, 2019
kyukhin pushed a commit that referenced this issue Nov 12, 2019
The admin user has universal privileges before bootstrap or
recovery are done. That allows to, for example, bootstrap from a
remote master, because to do that the admin should be able to
insert into system spaces, such as _priv.

But after the patch on online credentials update was implemented
(#2763, 48d00b0) the admin could
loose its universal access if, for example, a role was granted to
him before universal access was recovered.

That happened by two reasons:

    - Any change in access rights, even in granted roles, led to
      rebuild of universal access;

    - Any change in access rights updated the universal access in
      all existing sessions, thanks to #2763.

What happened: two tarantools were started. One of them master,
granted 'replication' role to admin. Second node, slave, tried to
bootstrap from the master. The slave created an admin session and
started loading data. After it loaded 'grant replication role to
admin' command, this nullified admin universal access everywhere,
including this session. Next rows could not be applied.

Closes #4606

(cherry picked from commit 95237ac)
kyukhin pushed a commit that referenced this issue Nov 12, 2019
The admin user has universal privileges before bootstrap or
recovery are done. That allows to, for example, bootstrap from a
remote master, because to do that the admin should be able to
insert into system spaces, such as _priv.

But after the patch on online credentials update was implemented
(#2763, 48d00b0) the admin could
loose its universal access if, for example, a role was granted to
him before universal access was recovered.

That happened by two reasons:

    - Any change in access rights, even in granted roles, led to
      rebuild of universal access;

    - Any change in access rights updated the universal access in
      all existing sessions, thanks to #2763.

What happened: two tarantools were started. One of them master,
granted 'replication' role to admin. Second node, slave, tried to
bootstrap from the master. The slave created an admin session and
started loading data. After it loaded 'grant replication role to
admin' command, this nullified admin universal access everywhere,
including this session. Next rows could not be applied.

Closes #4606

(cherry picked from commit 95237ac)
kyukhin pushed a commit that referenced this issue Nov 12, 2019
The admin user has universal privileges before bootstrap or
recovery are done. That allows to, for example, bootstrap from a
remote master, because to do that the admin should be able to
insert into system spaces, such as _priv.

But after the patch on online credentials update was implemented
(#2763, 48d00b0) the admin could
loose its universal access if, for example, a role was granted to
him before universal access was recovered.

That happened by two reasons:

    - Any change in access rights, even in granted roles, led to
      rebuild of universal access;

    - Any change in access rights updated the universal access in
      all existing sessions, thanks to #2763.

What happened: two tarantools were started. One of them master,
granted 'replication' role to admin. Second node, slave, tried to
bootstrap from the master. The slave created an admin session and
started loading data. After it loaded 'grant replication role to
admin' command, this nullified admin universal access everywhere,
including this session. Next rows could not be applied.

Closes #4606

(cherry picked from commit 95237ac)
Gerold103 added a commit that referenced this issue Nov 25, 2019
The tests didn't cleanup _cluster table. Autocleanup is turned on
only in > 1.10 versions. In 1.10 these tests failed, when were
launched in one test-run worker, because they still remembered
previous already deleted instances.

The patch makes cleanup in the end of these tests. Autocleanup
still works on master, but for the sake of similarity the manual
cleanup is done for all branches, including > 1.10.

Closes #4606
avtikhon added a commit that referenced this issue Sep 6, 2020
On heavy loaded hosts found the following issue:

  [021] --- replication/gh-4606-admin-creds.result	Wed Apr 15 15:47:41 2020
  [021] +++ replication/gh-4606-admin-creds.reject	Sun Sep  6 20:23:09 2020
  [021] @@ -36,7 +36,42 @@
  [021]   | ...
  [021]  i.replication[i.id % 2 + 1].upstream.status == 'follow' or i
  [021]   | ---
  [021] - | - true
  [021] + | - version: 2.6.0-52-g71a24b9f2
  [021] + |   id: 2
  [021] + |   ro: false
  [021] + |   uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |   package: Tarantool
  [021] + |   cluster:
  [021] + |     uuid: f27dfdfe-2802-486a-bc47-abc83b9097cf
  [021] + |   listen: unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/replica_auth.socket-iproto
  [021] + |   replication_anon:
  [021] + |     count: 0
  [021] + |   replication:
  [021] + |     1:
  [021] + |       id: 1
  [021] + |       uuid: a07cad18-d27f-48c4-8d56-96b17026702e
  [021] + |       lsn: 3
  [021] + |       upstream:
  [021] + |         peer: admin@unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/master.socket-iproto
  [021] + |         lag: 0.0030207633972168
  [021] + |         status: disconnected
  [021] + |         idle: 0.44824500009418
  [021] + |         message: timed out
  [021] + |         system_message: Operation timed out
  [021] + |     2:
  [021] + |       id: 2
  [021] + |       uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |       lsn: 0
  [021] + |   signature: 3
  [021] + |   status: running
  [021] + |   vclock: {1: 3}
  [021] + |   uptime: 1
  [021] + |   lsn: 0
  [021] + |   sql: []
  [021] + |   gc: []
  [021] + |   vinyl: []
  [021] + |   memory: []
  [021] + |   pid: 40326
  [021]   | ...
  [021]  test_run:switch('default')
  [021]   | ---

It happened because replication upstream status check occurred too
early, when it was only in 'disconnected' state. To give the
replication status check routine ability to reach the needed 'follow'
state, it need to wait for it using test_run:wait_upstream() routine.

Closes #5233
avtikhon added a commit that referenced this issue Sep 6, 2020
On heavy loaded hosts found the following issue:

  [021] --- replication/gh-4606-admin-creds.result	Wed Apr 15 15:47:41 2020
  [021] +++ replication/gh-4606-admin-creds.reject	Sun Sep  6 20:23:09 2020
  [021] @@ -36,7 +36,42 @@
  [021]   | ...
  [021]  i.replication[i.id % 2 + 1].upstream.status == 'follow' or i
  [021]   | ---
  [021] - | - true
  [021] + | - version: 2.6.0-52-g71a24b9f2
  [021] + |   id: 2
  [021] + |   ro: false
  [021] + |   uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |   package: Tarantool
  [021] + |   cluster:
  [021] + |     uuid: f27dfdfe-2802-486a-bc47-abc83b9097cf
  [021] + |   listen: unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/replica_auth.socket-iproto
  [021] + |   replication_anon:
  [021] + |     count: 0
  [021] + |   replication:
  [021] + |     1:
  [021] + |       id: 1
  [021] + |       uuid: a07cad18-d27f-48c4-8d56-96b17026702e
  [021] + |       lsn: 3
  [021] + |       upstream:
  [021] + |         peer: admin@unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/master.socket-iproto
  [021] + |         lag: 0.0030207633972168
  [021] + |         status: disconnected
  [021] + |         idle: 0.44824500009418
  [021] + |         message: timed out
  [021] + |         system_message: Operation timed out
  [021] + |     2:
  [021] + |       id: 2
  [021] + |       uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |       lsn: 0
  [021] + |   signature: 3
  [021] + |   status: running
  [021] + |   vclock: {1: 3}
  [021] + |   uptime: 1
  [021] + |   lsn: 0
  [021] + |   sql: []
  [021] + |   gc: []
  [021] + |   vinyl: []
  [021] + |   memory: []
  [021] + |   pid: 40326
  [021]   | ...
  [021]  test_run:switch('default')
  [021]   | ---

It happened because replication upstream status check occurred too
early, when it was only in 'disconnected' state. To give the
replication status check routine ability to reach the needed 'follow'
state, it need to wait for it using test_run:wait_upstream() routine.

Closes #5233
kyukhin pushed a commit that referenced this issue Sep 11, 2020
On heavy loaded hosts found the following issue:

  [021] --- replication/gh-4606-admin-creds.result	Wed Apr 15 15:47:41 2020
  [021] +++ replication/gh-4606-admin-creds.reject	Sun Sep  6 20:23:09 2020
  [021] @@ -36,7 +36,42 @@
  [021]   | ...
  [021]  i.replication[i.id % 2 + 1].upstream.status == 'follow' or i
  [021]   | ---
  [021] - | - true
  [021] + | - version: 2.6.0-52-g71a24b9f2
  [021] + |   id: 2
  [021] + |   ro: false
  [021] + |   uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |   package: Tarantool
  [021] + |   cluster:
  [021] + |     uuid: f27dfdfe-2802-486a-bc47-abc83b9097cf
  [021] + |   listen: unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/replica_auth.socket-iproto
  [021] + |   replication_anon:
  [021] + |     count: 0
  [021] + |   replication:
  [021] + |     1:
  [021] + |       id: 1
  [021] + |       uuid: a07cad18-d27f-48c4-8d56-96b17026702e
  [021] + |       lsn: 3
  [021] + |       upstream:
  [021] + |         peer: admin@unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/master.socket-iproto
  [021] + |         lag: 0.0030207633972168
  [021] + |         status: disconnected
  [021] + |         idle: 0.44824500009418
  [021] + |         message: timed out
  [021] + |         system_message: Operation timed out
  [021] + |     2:
  [021] + |       id: 2
  [021] + |       uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |       lsn: 0
  [021] + |   signature: 3
  [021] + |   status: running
  [021] + |   vclock: {1: 3}
  [021] + |   uptime: 1
  [021] + |   lsn: 0
  [021] + |   sql: []
  [021] + |   gc: []
  [021] + |   vinyl: []
  [021] + |   memory: []
  [021] + |   pid: 40326
  [021]   | ...
  [021]  test_run:switch('default')
  [021]   | ---

It happened because replication upstream status check occurred too
early, when it was only in 'disconnected' state. To give the
replication status check routine ability to reach the needed 'follow'
state, it need to wait for it using test_run:wait_upstream() routine.

Closes #5233
kyukhin pushed a commit that referenced this issue Sep 11, 2020
On heavy loaded hosts found the following issue:

  [021] --- replication/gh-4606-admin-creds.result	Wed Apr 15 15:47:41 2020
  [021] +++ replication/gh-4606-admin-creds.reject	Sun Sep  6 20:23:09 2020
  [021] @@ -36,7 +36,42 @@
  [021]   | ...
  [021]  i.replication[i.id % 2 + 1].upstream.status == 'follow' or i
  [021]   | ---
  [021] - | - true
  [021] + | - version: 2.6.0-52-g71a24b9f2
  [021] + |   id: 2
  [021] + |   ro: false
  [021] + |   uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |   package: Tarantool
  [021] + |   cluster:
  [021] + |     uuid: f27dfdfe-2802-486a-bc47-abc83b9097cf
  [021] + |   listen: unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/replica_auth.socket-iproto
  [021] + |   replication_anon:
  [021] + |     count: 0
  [021] + |   replication:
  [021] + |     1:
  [021] + |       id: 1
  [021] + |       uuid: a07cad18-d27f-48c4-8d56-96b17026702e
  [021] + |       lsn: 3
  [021] + |       upstream:
  [021] + |         peer: admin@unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/master.socket-iproto
  [021] + |         lag: 0.0030207633972168
  [021] + |         status: disconnected
  [021] + |         idle: 0.44824500009418
  [021] + |         message: timed out
  [021] + |         system_message: Operation timed out
  [021] + |     2:
  [021] + |       id: 2
  [021] + |       uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |       lsn: 0
  [021] + |   signature: 3
  [021] + |   status: running
  [021] + |   vclock: {1: 3}
  [021] + |   uptime: 1
  [021] + |   lsn: 0
  [021] + |   sql: []
  [021] + |   gc: []
  [021] + |   vinyl: []
  [021] + |   memory: []
  [021] + |   pid: 40326
  [021]   | ...
  [021]  test_run:switch('default')
  [021]   | ---

It happened because replication upstream status check occurred too
early, when it was only in 'disconnected' state. To give the
replication status check routine ability to reach the needed 'follow'
state, it need to wait for it using test_run:wait_upstream() routine.

Closes #5233

(cherry picked from commit 11ba332)
kyukhin pushed a commit that referenced this issue Sep 11, 2020
On heavy loaded hosts found the following issue:

  [021] --- replication/gh-4606-admin-creds.result	Wed Apr 15 15:47:41 2020
  [021] +++ replication/gh-4606-admin-creds.reject	Sun Sep  6 20:23:09 2020
  [021] @@ -36,7 +36,42 @@
  [021]   | ...
  [021]  i.replication[i.id % 2 + 1].upstream.status == 'follow' or i
  [021]   | ---
  [021] - | - true
  [021] + | - version: 2.6.0-52-g71a24b9f2
  [021] + |   id: 2
  [021] + |   ro: false
  [021] + |   uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |   package: Tarantool
  [021] + |   cluster:
  [021] + |     uuid: f27dfdfe-2802-486a-bc47-abc83b9097cf
  [021] + |   listen: unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/replica_auth.socket-iproto
  [021] + |   replication_anon:
  [021] + |     count: 0
  [021] + |   replication:
  [021] + |     1:
  [021] + |       id: 1
  [021] + |       uuid: a07cad18-d27f-48c4-8d56-96b17026702e
  [021] + |       lsn: 3
  [021] + |       upstream:
  [021] + |         peer: admin@unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/master.socket-iproto
  [021] + |         lag: 0.0030207633972168
  [021] + |         status: disconnected
  [021] + |         idle: 0.44824500009418
  [021] + |         message: timed out
  [021] + |         system_message: Operation timed out
  [021] + |     2:
  [021] + |       id: 2
  [021] + |       uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |       lsn: 0
  [021] + |   signature: 3
  [021] + |   status: running
  [021] + |   vclock: {1: 3}
  [021] + |   uptime: 1
  [021] + |   lsn: 0
  [021] + |   sql: []
  [021] + |   gc: []
  [021] + |   vinyl: []
  [021] + |   memory: []
  [021] + |   pid: 40326
  [021]   | ...
  [021]  test_run:switch('default')
  [021]   | ---

It happened because replication upstream status check occurred too
early, when it was only in 'disconnected' state. To give the
replication status check routine ability to reach the needed 'follow'
state, it need to wait for it using test_run:wait_upstream() routine.

Closes #5233

(cherry picked from commit 11ba332)
avtikhon added a commit that referenced this issue Sep 11, 2020
On heavy loaded hosts found the following issue:

  [021] --- replication/gh-4606-admin-creds.result	Wed Apr 15 15:47:41 2020
  [021] +++ replication/gh-4606-admin-creds.reject	Sun Sep  6 20:23:09 2020
  [021] @@ -36,7 +36,42 @@
  [021]   | ...
  [021]  i.replication[i.id % 2 + 1].upstream.status == 'follow' or i
  [021]   | ---
  [021] - | - true
  [021] + | - version: 2.6.0-52-g71a24b9f2
  [021] + |   id: 2
  [021] + |   ro: false
  [021] + |   uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |   package: Tarantool
  [021] + |   cluster:
  [021] + |     uuid: f27dfdfe-2802-486a-bc47-abc83b9097cf
  [021] + |   listen: unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/replica_auth.socket-iproto
  [021] + |   replication_anon:
  [021] + |     count: 0
  [021] + |   replication:
  [021] + |     1:
  [021] + |       id: 1
  [021] + |       uuid: a07cad18-d27f-48c4-8d56-96b17026702e
  [021] + |       lsn: 3
  [021] + |       upstream:
  [021] + |         peer: admin@unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/master.socket-iproto
  [021] + |         lag: 0.0030207633972168
  [021] + |         status: disconnected
  [021] + |         idle: 0.44824500009418
  [021] + |         message: timed out
  [021] + |         system_message: Operation timed out
  [021] + |     2:
  [021] + |       id: 2
  [021] + |       uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |       lsn: 0
  [021] + |   signature: 3
  [021] + |   status: running
  [021] + |   vclock: {1: 3}
  [021] + |   uptime: 1
  [021] + |   lsn: 0
  [021] + |   sql: []
  [021] + |   gc: []
  [021] + |   vinyl: []
  [021] + |   memory: []
  [021] + |   pid: 40326
  [021]   | ...
  [021]  test_run:switch('default')
  [021]   | ---

It happened because replication upstream status check occurred too
early, when it was only in 'disconnected' state. To give the
replication status check routine ability to reach the needed 'follow'
state, it need to wait for it using test_run:wait_upstream() routine.

Closes #5233
kyukhin pushed a commit that referenced this issue Sep 11, 2020
On heavy loaded hosts found the following issue:

  [021] --- replication/gh-4606-admin-creds.result	Wed Apr 15 15:47:41 2020
  [021] +++ replication/gh-4606-admin-creds.reject	Sun Sep  6 20:23:09 2020
  [021] @@ -36,7 +36,42 @@
  [021]   | ...
  [021]  i.replication[i.id % 2 + 1].upstream.status == 'follow' or i
  [021]   | ---
  [021] - | - true
  [021] + | - version: 2.6.0-52-g71a24b9f2
  [021] + |   id: 2
  [021] + |   ro: false
  [021] + |   uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |   package: Tarantool
  [021] + |   cluster:
  [021] + |     uuid: f27dfdfe-2802-486a-bc47-abc83b9097cf
  [021] + |   listen: unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/replica_auth.socket-iproto
  [021] + |   replication_anon:
  [021] + |     count: 0
  [021] + |   replication:
  [021] + |     1:
  [021] + |       id: 1
  [021] + |       uuid: a07cad18-d27f-48c4-8d56-96b17026702e
  [021] + |       lsn: 3
  [021] + |       upstream:
  [021] + |         peer: admin@unix/:/Users/tntmac02.tarantool.i/tnt/test/var/014_replication/master.socket-iproto
  [021] + |         lag: 0.0030207633972168
  [021] + |         status: disconnected
  [021] + |         idle: 0.44824500009418
  [021] + |         message: timed out
  [021] + |         system_message: Operation timed out
  [021] + |     2:
  [021] + |       id: 2
  [021] + |       uuid: 3921679b-d994-4cf0-a6ef-1f6a0d96fc79
  [021] + |       lsn: 0
  [021] + |   signature: 3
  [021] + |   status: running
  [021] + |   vclock: {1: 3}
  [021] + |   uptime: 1
  [021] + |   lsn: 0
  [021] + |   sql: []
  [021] + |   gc: []
  [021] + |   vinyl: []
  [021] + |   memory: []
  [021] + |   pid: 40326
  [021]   | ...
  [021]  test_run:switch('default')
  [021]   | ---

It happened because replication upstream status check occurred too
early, when it was only in 'disconnected' state. To give the
replication status check routine ability to reach the needed 'follow'
state, it need to wait for it using test_run:wait_upstream() routine.

Closes #5233
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working replication
Projects
None yet
Development

No branches or pull requests

3 participants