Skip to content

Commit

Permalink
Use SIGKILL to stop server replica
Browse files Browse the repository at this point in the history
Used the signal option set to SIGKILL to stop server replica
routine to be able to stop the replica imediately to imitate
the replica crash and, then, wake up.
Just 'stop server replica' (SIGTERM) is not sufficient to stop
a tarantool instance when ERRINJ_WAL_DELAY is set, because
"tarantool" thread wait for paused "wal" thread infinitely.
Changed server stop routine to to kill routine to be able
to use SIGKILL instead of SIGTERM to the replica server. In
this way the server replica will be killed immediately and
*.xlog files will be removed as it has to be.
The logic of the replication was tried to change, but met
the new issues, so the suggested fix at commit:
b5b4809
was reverted at commit:
766cd3e

[029] --- replication/gc.result Mon Apr 15 14:58:09 2019
[029] +++ replication/gc.reject Tue Apr 16 09:17:47 2019
[029] @@ -290,7 +290,12 @@
[029] ...
[029] wait_xlog(1) or fio.listdir('./master')
[029] ---
[048] replication/gc.test.lua vinyl [ fail ]
[048]
[048] Test failed! Result content mismatch:
[029] -- true
[029] +- - 00000000000000000305.vylog
[029] + - 00000000000000000305.xlog
[029] + - '512'
[029] + - 00000000000000000310.xlog
[029] + - 00000000000000000310.vylog
[029] + - 00000000000000000310.snap
[029] ...
[029] -- Stop the replica.
[029] test_run:cmd("stop server replica")
[029] @@ -326,7 +331,13 @@
[029] ...
[029] wait_xlog(2) or fio.listdir('./master')
[029] ---
[029] -- true
[029] +- - 00000000000000000305.xlog
[029] + - 00000000000000000316.xlog
[029] + - 00000000000000000316.vylog
[029] + - '512'
[029] + - 00000000000000000310.xlog
[029] + - 00000000000000000317.vylog
[029] + - 00000000000000000317.snap
[029] ...
[029] -- The xlog should only be deleted after the replica
[029] -- is unregistered.
[029]

Close #4162
  • Loading branch information
avtikhon committed Apr 30, 2019
1 parent d5fdc53 commit c6a6f59
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 15 deletions.
16 changes: 7 additions & 9 deletions test/replication/gc.result
Expand Up @@ -252,20 +252,18 @@ wait_xlog(2) or fio.listdir('./master')
---
- true
...
test_run:cmd("switch replica")
-- Imitate the replica crash and, then, wake up.
-- Just 'stop server replica' (SIGTERM) is not sufficient to stop
-- a tarantool instance when ERRINJ_WAL_DELAY is set, because
-- "tarantool" thread wait for paused "wal" thread infinitely.
test_run:cmd("stop server replica with signal=SIGKILL")
---
- true
...
-- Unblock the replica and break replication.
box.error.injection.set("ERRINJ_WAL_DELAY", false)
---
- ok
...
box.cfg{replication = {}}
test_run:cmd("start server replica")
---
- true
...
-- Restart the replica to reestablish replication.
test_run:cmd("restart server replica")
-- Wait for the replica to catch up.
test_run:cmd("switch replica")
---
Expand Down
12 changes: 6 additions & 6 deletions test/replication/gc.test.lua
Expand Up @@ -122,12 +122,12 @@ fiber.sleep(0.1) -- wait for master to relay data
-- the old snapshot.
wait_gc(1) or box.info.gc()
wait_xlog(2) or fio.listdir('./master')
test_run:cmd("switch replica")
-- Unblock the replica and break replication.
box.error.injection.set("ERRINJ_WAL_DELAY", false)
box.cfg{replication = {}}
-- Restart the replica to reestablish replication.
test_run:cmd("restart server replica")
-- Imitate the replica crash and, then, wake up.
-- Just 'stop server replica' (SIGTERM) is not sufficient to stop
-- a tarantool instance when ERRINJ_WAL_DELAY is set, because
-- "tarantool" thread wait for paused "wal" thread infinitely.
test_run:cmd("stop server replica with signal=SIGKILL")
test_run:cmd("start server replica")
-- Wait for the replica to catch up.
test_run:cmd("switch replica")
test_run:wait_cond(function() return box.space.test:count() == 310 end, 10)
Expand Down

0 comments on commit c6a6f59

Please sign in to comment.