New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: flaky replication/wal_rw_stress.test.lua test #4977
Labels
Comments
avtikhon
added a commit
that referenced
this issue
May 15, 2020
avtikhon
added a commit
that referenced
this issue
May 15, 2020
avtikhon
added a commit
that referenced
this issue
May 15, 2020
Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 box/alter_limits.test.lua ; gh-4926 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/misc.test.lua ; gh-4979 Part of #4953 t
avtikhon
added a commit
that referenced
this issue
May 15, 2020
Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 box/alter_limits.test.lua ; gh-4926 ddl.test.lua ; gh-4353 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/misc.test.lua ; gh-4979 Part of #4953
avtikhon
added a commit
that referenced
this issue
May 15, 2020
Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 box/alter_limits.test.lua ; gh-4926 engine/ddl.test.lua ; gh-4353 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/misc.test.lua ; gh-4979 Part of #4953
avtikhon
added a commit
that referenced
this issue
May 15, 2020
Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 engine/ddl.test.lua ; gh-4353 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/misc.test.lua ; gh-4979 Part of #4953
avtikhon
added a commit
that referenced
this issue
May 15, 2020
Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 engine/ddl.test.lua ; gh-4353 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 Part of #4953
avtikhon
added a commit
that referenced
this issue
May 15, 2020
Added skip condition on OSX for test: replication/box_set_replication_stress.test.lua Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 Part of #4953 (cherry picked from commit 72a2bae)
avtikhon
added a commit
that referenced
this issue
May 16, 2020
Added skip condition on OSX for test: replication/box_set_replication_stress.test.lua Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 app/fiber.test.lua ; gh-4987 box/tuple.test.lua ; gh-4988 box/transaction.test.lua ; gh-4990 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 engine/ddl.test.lua ; gh-4353 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 Part of #4953 (cherry picked from commit 72a2bae)
avtikhon
added a commit
that referenced
this issue
May 16, 2020
Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 app/fiber.test.lua ; gh-4987 box/tuple.test.lua ; gh-4988 box/transaction.test.lua ; gh-4990 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 engine/ddl.test.lua ; gh-4353 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 Part of #4953
avtikhon
added a commit
that referenced
this issue
May 17, 2020
Added skip condition on OSX for test: replication/box_set_replication_stress.test.lua Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 app/fiber.test.lua ; gh-4987 box/tuple.test.lua ; gh-4988 box/transaction.test.lua ; gh-4990 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 engine/ddl.test.lua ; gh-4353 replication/box_set_replication_stress ; gh-4992 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/errinj_ddl.test.lua ; gh-4993 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 xlog/panic_on_broken_lsn.test.lua ; gh-4991 Part of #4953 (cherry picked from commit 72a2bae)
avtikhon
added a commit
that referenced
this issue
May 17, 2020
Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 app/fiber.test.lua ; gh-4987 box/tuple.test.lua ; gh-4988 box/transaction.test.lua ; gh-4990 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 engine/ddl.test.lua ; gh-4353 replication/box_set_replication_stress ; gh-4992 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/errinj_ddl.test.lua ; gh-4993 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 xlog/panic_on_broken_lsn.test.lua ; gh-4991 Part of #4953
avtikhon
added a commit
that referenced
this issue
May 17, 2020
Added skip condition on OSX for test: replication/box_set_replication_stress.test.lua Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber.test.lua ; gh-4987 app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 app-tap/popen.test.lua ; gh-4995 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 box/rtree_rect.test.lua ; gh-4994 box/tuple.test.lua ; gh-4988 engine/ddl.test.lua ; gh-4353 replication/box_set_replication_stress ; gh-4992 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/errinj_ddl.test.lua ; gh-4993 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 xlog/panic_on_broken_lsn.test.lua ; gh-4991 Part of #4953 (cherry picked from commit 72a2bae)
avtikhon
added a commit
that referenced
this issue
May 17, 2020
Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber.test.lua ; gh-4987 app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 app-tap/popen.test.lua ; gh-4995 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 box/rtree_rect.test.lua ; gh-4994 box/tuple.test.lua ; gh-4988 engine/ddl.test.lua ; gh-4353 replication/box_set_replication_stress ; gh-4992 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/errinj_ddl.test.lua ; gh-4993 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 xlog/panic_on_broken_lsn.test.lua ; gh-4991 Part of #4953
avtikhon
added a commit
that referenced
this issue
May 18, 2020
Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber.test.lua ; gh-4987 app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 app-tap/popen.test.lua ; gh-4995 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 box/role.test.lua ; gh-4998 box/rtree_rect.test.lua ; gh-4994 box/sequence.test.lua ; gh-4996 box/tuple.test.lua ; gh-4988 engine/ddl.test.lua ; gh-4353 replication/box_set_replication_stress ; gh-4992 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/errinj_ddl.test.lua ; gh-4993 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 xlog/panic_on_broken_lsn.test.lua ; gh-4991 Part of #4953
avtikhon
added a commit
that referenced
this issue
May 18, 2020
Added skip condition on OSX for test: replication/box_set_replication_stress.test.lua Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber.test.lua ; gh-4987 app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 app-tap/popen.test.lua ; gh-4995 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 box/role.test.lua ; gh-4998 box/rtree_rect.test.lua ; gh-4994 box/sequence.test.lua ; gh-4996 box/tuple.test.lua ; gh-4988 engine/ddl.test.lua ; gh-4353 replication/box_set_replication_stress ; gh-4992 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/errinj_ddl.test.lua ; gh-4993 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 xlog/panic_on_broken_lsn.test.lua ; gh-4991 Part of #4953 (cherry picked from commit 72a2bae)
avtikhon
added a commit
that referenced
this issue
May 19, 2020
Added skip condition on OSX for test: replication/box_set_replication_stress.test.lua Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber.test.lua ; gh-4987 app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 box/role.test.lua ; gh-4998 box/rtree_rect.test.lua ; gh-4994 box/sequence.test.lua ; gh-4996 box/tuple.test.lua ; gh-4988 engine/ddl.test.lua ; gh-4353 replication/box_set_replication_stress ; gh-4992 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/errinj_ddl.test.lua ; gh-4993 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 xlog/panic_on_broken_lsn.test.lua ; gh-4991 Part of #4953 (cherry picked from commit 72a2bae)
avtikhon
added a commit
that referenced
this issue
May 19, 2020
Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber.test.lua ; gh-4987 app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 box/role.test.lua ; gh-4998 box/rtree_rect.test.lua ; gh-4994 box/sequence.test.lua ; gh-4996 box/tuple.test.lua ; gh-4988 engine/ddl.test.lua ; gh-4353 replication/box_set_replication_stress ; gh-4992 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/errinj_ddl.test.lua ; gh-4993 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 xlog/panic_on_broken_lsn.test.lua ; gh-4991 Part of #4953
kyukhin
pushed a commit
that referenced
this issue
May 20, 2020
Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber.test.lua ; gh-4987 app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 box/role.test.lua ; gh-4998 box/rtree_rect.test.lua ; gh-4994 box/sequence.test.lua ; gh-4996 box/tuple.test.lua ; gh-4988 engine/ddl.test.lua ; gh-4353 replication/box_set_replication_stress ; gh-4992 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/errinj_ddl.test.lua ; gh-4993 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 xlog/panic_on_broken_lsn.test.lua ; gh-4991 Part of #4953 (cherry picked from commit 430c0e8)
kyukhin
pushed a commit
that referenced
this issue
May 20, 2020
Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber.test.lua ; gh-4987 app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 box/role.test.lua ; gh-4998 box/rtree_rect.test.lua ; gh-4994 box/sequence.test.lua ; gh-4996 box/tuple.test.lua ; gh-4988 engine/ddl.test.lua ; gh-4353 replication/box_set_replication_stress ; gh-4992 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/errinj_ddl.test.lua ; gh-4993 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 xlog/panic_on_broken_lsn.test.lua ; gh-4991 Part of #4953 (cherry picked from commit 430c0e8)
kyukhin
pushed a commit
that referenced
this issue
May 20, 2020
Added skip condition on OSX for test: replication/box_set_replication_stress.test.lua Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber.test.lua ; gh-4987 app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 box/role.test.lua ; gh-4998 box/rtree_rect.test.lua ; gh-4994 box/sequence.test.lua ; gh-4996 box/tuple.test.lua ; gh-4988 engine/ddl.test.lua ; gh-4353 replication/box_set_replication_stress ; gh-4992 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/errinj_ddl.test.lua ; gh-4993 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 xlog/panic_on_broken_lsn.test.lua ; gh-4991 Part of #4953 (cherry picked from commit 430c0e8)
kyukhin
pushed a commit
that referenced
this issue
May 20, 2020
Fragiled flaky tests from parallel runs to avoid of flaky fails in regular testing: app/fiber.test.lua ; gh-4987 app/fiber_channel.test.lua ; gh-4961 app/socket.test.lua ; gh-4978 box/alter_limits.test.lua ; gh-4926 box/misc.test.lua ; gh-4982 box/role.test.lua ; gh-4998 box/rtree_rect.test.lua ; gh-4994 box/sequence.test.lua ; gh-4996 box/tuple.test.lua ; gh-4988 engine/ddl.test.lua ; gh-4353 replication/box_set_replication_stress ; gh-4992 replication/recover_missing_xlog.test.lua ; gh-4989 replication/replica_rejoin.test.lua ; gh-4985 replication/wal_rw_stress.test.lua ; gh-4977 replication-py/conflict.test.py ; gh-4980 vinyl/errinj_ddl.test.lua ; gh-4993 vinyl/misc.test.lua ; gh-4979 vinyl/snapshot.test.lua ; gh-4984 vinyl/write_iterator.test.lua ; gh-4572 xlog/panic_on_broken_lsn.test.lua ; gh-4991 Part of #4953
avtikhon
added a commit
that referenced
this issue
Jun 15, 2020
Found issue: [016] --- replication/wal_rw_stress.result Fri Feb 21 11:53:21 2020 [016] +++ replication/wal_rw_stress.reject Fri May 8 08:23:56 2020 [016] @@ -73,7 +73,42 @@ [016] ... [016] box.info.replication[1].downstream.status ~= 'stopped' or box.info [016] --- [016] -- true [016] +- version: 2.5.0-27-g32f59756a [016] + id: 2 [016] + ro: false [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + package: Tarantool [016] + cluster: [016] + uuid: 397c196f-9105-11ea-96ab-08002739cbd6 [016] + listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto [016] + replication: [016] + 1: [016] + id: 1 [016] + uuid: 397a1886-9105-11ea-96ab-08002739cbd6 [016] + lsn: 10005 [016] + upstream: [016] + status: follow [016] + idle: 0.46353673400017 [016] + peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto [016] + lag: -0.45732522010803 [016] + downstream: [016] + status: stopped [016] + message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati [016] + system_message: Broken pipe [016] + 2: [016] + id: 2 [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + lsn: 0 [016] + signature: 10005 [016] + status: running [016] + vinyl: [] [016] + uptime: 2 [016] + lsn: 0 [016] + sql: [] [016] + gc: [] [016] + pid: 41231 [016] + memory: [] [016] + vclock: {1: 10005} [016] ... [016] test_run:cmd("switch default") [016] --- To check the downstream status and it's message need to wait until an downstream appears. This prevents an attempt to index a nil value when one of those functions are called before a record about a peer appears in box.info.replication. It was observed on test: replication/show_error_on_disconnect after commit c6bea65 ('replication: recfg with 0 quorum returns immediately'). Closes #4977
avtikhon
added a commit
that referenced
this issue
Jun 19, 2020
Found issue (reproduced on VBox FreeBSD machine): [016] --- replication/wal_rw_stress.result Fri Feb 21 11:53:21 2020 [016] +++ replication/wal_rw_stress.reject Fri May 8 08:23:56 2020 [016] @@ -73,7 +73,42 @@ [016] ... [016] box.info.replication[1].downstream.status ~= 'stopped' or box.info [016] --- [016] -- true [016] +- version: 2.5.0-27-g32f59756a [016] + id: 2 [016] + ro: false [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + package: Tarantool [016] + cluster: [016] + uuid: 397c196f-9105-11ea-96ab-08002739cbd6 [016] + listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto [016] + replication: [016] + 1: [016] + id: 1 [016] + uuid: 397a1886-9105-11ea-96ab-08002739cbd6 [016] + lsn: 10005 [016] + upstream: [016] + status: follow [016] + idle: 0.46353673400017 [016] + peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto [016] + lag: -0.45732522010803 [016] + downstream: [016] + status: stopped [016] + message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati [016] + system_message: Broken pipe [016] + 2: [016] + id: 2 [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + lsn: 0 [016] + signature: 10005 [016] + status: running [016] + vinyl: [] [016] + uptime: 2 [016] + lsn: 0 [016] + sql: [] [016] + gc: [] [016] + pid: 41231 [016] + memory: [] [016] + vclock: {1: 10005} [016] ... [016] test_run:cmd("switch default") [016] --- To check the downstream status and it's message need to wait until an downstream appears. This prevents an attempt to index a nil value when one of those functions are called before a record about a peer appears in box.info.replication. It was observed on test: replication/show_error_on_disconnect after commit c6bea65 ('replication: recfg with 0 quorum returns immediately'). Checked that test still checks the error for which it was created at b9db91e ('xlog: fix fallocate vs read race') patch and successfully got the needed error "tx checksum mismatch": [153] --- replication/wal_rw_stress.result Fri Jun 19 15:01:49 2020 [153] +++ replication/wal_rw_stress.reject Fri Jun 19 15:04:02 2020 [153] @@ -73,7 +73,43 @@ [153] ... [153] test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info ... [153] + downstream: [153] + status: stopped [153] + message: tx checksum mismatch Closes #4977
avtikhon
added a commit
that referenced
this issue
Jun 23, 2020
Found issue (reproduced on VBox FreeBSD machine): [016] --- replication/wal_rw_stress.result Fri Feb 21 11:53:21 2020 [016] +++ replication/wal_rw_stress.reject Fri May 8 08:23:56 2020 [016] @@ -73,7 +73,42 @@ [016] ... [016] box.info.replication[1].downstream.status ~= 'stopped' or box.info [016] --- [016] -- true [016] +- version: 2.5.0-27-g32f59756a [016] + id: 2 [016] + ro: false [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + package: Tarantool [016] + cluster: [016] + uuid: 397c196f-9105-11ea-96ab-08002739cbd6 [016] + listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto [016] + replication: [016] + 1: [016] + id: 1 [016] + uuid: 397a1886-9105-11ea-96ab-08002739cbd6 [016] + lsn: 10005 [016] + upstream: [016] + status: follow [016] + idle: 0.46353673400017 [016] + peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto [016] + lag: -0.45732522010803 [016] + downstream: [016] + status: stopped [016] + message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati [016] + system_message: Broken pipe [016] + 2: [016] + id: 2 [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + lsn: 0 [016] + signature: 10005 [016] + status: running [016] + vinyl: [] [016] + uptime: 2 [016] + lsn: 0 [016] + sql: [] [016] + gc: [] [016] + pid: 41231 [016] + memory: [] [016] + vclock: {1: 10005} [016] ... [016] test_run:cmd("switch default") [016] --- To check the downstream status and it's message need to wait until an downstream appears. This prevents an attempt to index a nil value when one of those functions are called before a record about a peer appears in box.info.replication. It was observed on test: replication/show_error_on_disconnect after commit c6bea65 ('replication: recfg with 0 quorum returns immediately'). Checked that test still checks the error for which it was created at b9db91e ('xlog: fix fallocate vs read race') patch and successfully got the needed error "tx checksum mismatch": [153] --- replication/wal_rw_stress.result Fri Jun 19 15:01:49 2020 [153] +++ replication/wal_rw_stress.reject Fri Jun 19 15:04:02 2020 [153] @@ -73,7 +73,43 @@ [153] ... [153] test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info ... [153] + downstream: [153] + status: stopped [153] + message: tx checksum mismatch Note that wait_cond() allows to overcome a transient network connectivity errors, but 'tx checksum mismatch' is persistent one and will be catched. Closes #4977
kyukhin
pushed a commit
that referenced
this issue
Jun 26, 2020
Found issue (reproduced on VBox FreeBSD machine): [016] --- replication/wal_rw_stress.result Fri Feb 21 11:53:21 2020 [016] +++ replication/wal_rw_stress.reject Fri May 8 08:23:56 2020 [016] @@ -73,7 +73,42 @@ [016] ... [016] box.info.replication[1].downstream.status ~= 'stopped' or box.info [016] --- [016] -- true [016] +- version: 2.5.0-27-g32f59756a [016] + id: 2 [016] + ro: false [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + package: Tarantool [016] + cluster: [016] + uuid: 397c196f-9105-11ea-96ab-08002739cbd6 [016] + listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto [016] + replication: [016] + 1: [016] + id: 1 [016] + uuid: 397a1886-9105-11ea-96ab-08002739cbd6 [016] + lsn: 10005 [016] + upstream: [016] + status: follow [016] + idle: 0.46353673400017 [016] + peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto [016] + lag: -0.45732522010803 [016] + downstream: [016] + status: stopped [016] + message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati [016] + system_message: Broken pipe [016] + 2: [016] + id: 2 [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + lsn: 0 [016] + signature: 10005 [016] + status: running [016] + vinyl: [] [016] + uptime: 2 [016] + lsn: 0 [016] + sql: [] [016] + gc: [] [016] + pid: 41231 [016] + memory: [] [016] + vclock: {1: 10005} [016] ... [016] test_run:cmd("switch default") [016] --- To check the downstream status and it's message need to wait until an downstream appears. This prevents an attempt to index a nil value when one of those functions are called before a record about a peer appears in box.info.replication. It was observed on test: replication/show_error_on_disconnect after commit c6bea65 ('replication: recfg with 0 quorum returns immediately'). Checked that test still checks the error for which it was created at b9db91e ('xlog: fix fallocate vs read race') patch and successfully got the needed error "tx checksum mismatch": [153] --- replication/wal_rw_stress.result Fri Jun 19 15:01:49 2020 [153] +++ replication/wal_rw_stress.reject Fri Jun 19 15:04:02 2020 [153] @@ -73,7 +73,43 @@ [153] ... [153] test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info ... [153] + downstream: [153] + status: stopped [153] + message: tx checksum mismatch Note that wait_cond() allows to overcome a transient network connectivity errors, but 'tx checksum mismatch' is persistent one and will be catched. Closes #4977 (cherry picked from commit 06eda0f)
kyukhin
pushed a commit
that referenced
this issue
Jun 26, 2020
Found issue (reproduced on VBox FreeBSD machine): [016] --- replication/wal_rw_stress.result Fri Feb 21 11:53:21 2020 [016] +++ replication/wal_rw_stress.reject Fri May 8 08:23:56 2020 [016] @@ -73,7 +73,42 @@ [016] ... [016] box.info.replication[1].downstream.status ~= 'stopped' or box.info [016] --- [016] -- true [016] +- version: 2.5.0-27-g32f59756a [016] + id: 2 [016] + ro: false [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + package: Tarantool [016] + cluster: [016] + uuid: 397c196f-9105-11ea-96ab-08002739cbd6 [016] + listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto [016] + replication: [016] + 1: [016] + id: 1 [016] + uuid: 397a1886-9105-11ea-96ab-08002739cbd6 [016] + lsn: 10005 [016] + upstream: [016] + status: follow [016] + idle: 0.46353673400017 [016] + peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto [016] + lag: -0.45732522010803 [016] + downstream: [016] + status: stopped [016] + message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati [016] + system_message: Broken pipe [016] + 2: [016] + id: 2 [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + lsn: 0 [016] + signature: 10005 [016] + status: running [016] + vinyl: [] [016] + uptime: 2 [016] + lsn: 0 [016] + sql: [] [016] + gc: [] [016] + pid: 41231 [016] + memory: [] [016] + vclock: {1: 10005} [016] ... [016] test_run:cmd("switch default") [016] --- To check the downstream status and it's message need to wait until an downstream appears. This prevents an attempt to index a nil value when one of those functions are called before a record about a peer appears in box.info.replication. It was observed on test: replication/show_error_on_disconnect after commit c6bea65 ('replication: recfg with 0 quorum returns immediately'). Checked that test still checks the error for which it was created at b9db91e ('xlog: fix fallocate vs read race') patch and successfully got the needed error "tx checksum mismatch": [153] --- replication/wal_rw_stress.result Fri Jun 19 15:01:49 2020 [153] +++ replication/wal_rw_stress.reject Fri Jun 19 15:04:02 2020 [153] @@ -73,7 +73,43 @@ [153] ... [153] test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info ... [153] + downstream: [153] + status: stopped [153] + message: tx checksum mismatch Note that wait_cond() allows to overcome a transient network connectivity errors, but 'tx checksum mismatch' is persistent one and will be catched. Closes #4977 (cherry picked from commit 06eda0f)
kyukhin
pushed a commit
that referenced
this issue
Jun 26, 2020
Found issue (reproduced on VBox FreeBSD machine): [016] --- replication/wal_rw_stress.result Fri Feb 21 11:53:21 2020 [016] +++ replication/wal_rw_stress.reject Fri May 8 08:23:56 2020 [016] @@ -73,7 +73,42 @@ [016] ... [016] box.info.replication[1].downstream.status ~= 'stopped' or box.info [016] --- [016] -- true [016] +- version: 2.5.0-27-g32f59756a [016] + id: 2 [016] + ro: false [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + package: Tarantool [016] + cluster: [016] + uuid: 397c196f-9105-11ea-96ab-08002739cbd6 [016] + listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto [016] + replication: [016] + 1: [016] + id: 1 [016] + uuid: 397a1886-9105-11ea-96ab-08002739cbd6 [016] + lsn: 10005 [016] + upstream: [016] + status: follow [016] + idle: 0.46353673400017 [016] + peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto [016] + lag: -0.45732522010803 [016] + downstream: [016] + status: stopped [016] + message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati [016] + system_message: Broken pipe [016] + 2: [016] + id: 2 [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + lsn: 0 [016] + signature: 10005 [016] + status: running [016] + vinyl: [] [016] + uptime: 2 [016] + lsn: 0 [016] + sql: [] [016] + gc: [] [016] + pid: 41231 [016] + memory: [] [016] + vclock: {1: 10005} [016] ... [016] test_run:cmd("switch default") [016] --- To check the downstream status and it's message need to wait until an downstream appears. This prevents an attempt to index a nil value when one of those functions are called before a record about a peer appears in box.info.replication. It was observed on test: replication/show_error_on_disconnect after commit c6bea65 ('replication: recfg with 0 quorum returns immediately'). Checked that test still checks the error for which it was created at b9db91e ('xlog: fix fallocate vs read race') patch and successfully got the needed error "tx checksum mismatch": [153] --- replication/wal_rw_stress.result Fri Jun 19 15:01:49 2020 [153] +++ replication/wal_rw_stress.reject Fri Jun 19 15:04:02 2020 [153] @@ -73,7 +73,43 @@ [153] ... [153] test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info ... [153] + downstream: [153] + status: stopped [153] + message: tx checksum mismatch Note that wait_cond() allows to overcome a transient network connectivity errors, but 'tx checksum mismatch' is persistent one and will be catched. Closes #4977 (cherry picked from commit 06eda0f)
Seems that downstream status should be checked when downstream structure will be available to avoid of issue:
|
avtikhon
added a commit
that referenced
this issue
Jul 14, 2020
Found that on heavy loaded hosts the test tries to check replication downstream status when downstream structure is not ready and it fails with the error: [017] --- replication/wal_rw_stress.result Thu Jul 9 17:04:16 2020 [017] +++ replication/wal_rw_stress.reject Fri May 8 08:25:15 2020 [017] @@ -75,7 +75,8 @@ [017] return box.info.replication[1].downstream.status ~= 'stopped' \ [017] end) or box.info [017] --- [017] -- true [017] +- error: '[string "return test_run:wait_cond(function() ..."]:1: attempt to [017] + index field ''downstream'' (a nil value)' [017] ... [017] test_run:cmd("switch default") [017] --- [017] So the wait condition should start from the check of the downstream structure availability. Follows up #4977
avtikhon
added a commit
that referenced
this issue
Jul 14, 2020
Found that on heavy loaded hosts the test tries to check replication downstream status when downstream structure is not ready and it fails with the error: [017] --- replication/wal_rw_stress.result Thu Jul 9 17:04:16 2020 [017] +++ replication/wal_rw_stress.reject Fri May 8 08:25:15 2020 [017] @@ -75,7 +75,8 @@ [017] return box.info.replication[1].downstream.status ~= 'stopped' \ [017] end) or box.info [017] --- [017] -- true [017] +- error: '[string "return test_run:wait_cond(function() ..."]:1: attempt to [017] + index field ''downstream'' (a nil value)' [017] ... [017] test_run:cmd("switch default") [017] --- [017] So the wait condition should start from the check of the downstream structure availability. Follows up #4977
kyukhin
pushed a commit
that referenced
this issue
Jul 14, 2020
Found that on heavy loaded hosts the test tries to check replication downstream status when downstream structure is not ready and it fails with the error: [017] --- replication/wal_rw_stress.result Thu Jul 9 17:04:16 2020 [017] +++ replication/wal_rw_stress.reject Fri May 8 08:25:15 2020 [017] @@ -75,7 +75,8 @@ [017] return box.info.replication[1].downstream.status ~= 'stopped' \ [017] end) or box.info [017] --- [017] -- true [017] +- error: '[string "return test_run:wait_cond(function() ..."]:1: attempt to [017] + index field ''downstream'' (a nil value)' [017] ... [017] test_run:cmd("switch default") [017] --- [017] So the wait condition should start from the check of the downstream structure availability. Follows up #4977 (cherry picked from commit d3e2a2a)
kyukhin
pushed a commit
that referenced
this issue
Jul 14, 2020
Found that on heavy loaded hosts the test tries to check replication downstream status when downstream structure is not ready and it fails with the error: [017] --- replication/wal_rw_stress.result Thu Jul 9 17:04:16 2020 [017] +++ replication/wal_rw_stress.reject Fri May 8 08:25:15 2020 [017] @@ -75,7 +75,8 @@ [017] return box.info.replication[1].downstream.status ~= 'stopped' \ [017] end) or box.info [017] --- [017] -- true [017] +- error: '[string "return test_run:wait_cond(function() ..."]:1: attempt to [017] + index field ''downstream'' (a nil value)' [017] ... [017] test_run:cmd("switch default") [017] --- [017] So the wait condition should start from the check of the downstream structure availability. Follows up #4977
kyukhin
pushed a commit
that referenced
this issue
Jul 14, 2020
Found that on heavy loaded hosts the test tries to check replication downstream status when downstream structure is not ready and it fails with the error: [017] --- replication/wal_rw_stress.result Thu Jul 9 17:04:16 2020 [017] +++ replication/wal_rw_stress.reject Fri May 8 08:25:15 2020 [017] @@ -75,7 +75,8 @@ [017] return box.info.replication[1].downstream.status ~= 'stopped' \ [017] end) or box.info [017] --- [017] -- true [017] +- error: '[string "return test_run:wait_cond(function() ..."]:1: attempt to [017] + index field ''downstream'' (a nil value)' [017] ... [017] test_run:cmd("switch default") [017] --- [017] So the wait condition should start from the check of the downstream structure availability. Follows up #4977 (cherry picked from commit d3e2a2a)
kyukhin
pushed a commit
that referenced
this issue
Jul 14, 2020
Found that on heavy loaded hosts the test tries to check replication downstream status when downstream structure is not ready and it fails with the error: [017] --- replication/wal_rw_stress.result Thu Jul 9 17:04:16 2020 [017] +++ replication/wal_rw_stress.reject Fri May 8 08:25:15 2020 [017] @@ -75,7 +75,8 @@ [017] return box.info.replication[1].downstream.status ~= 'stopped' \ [017] end) or box.info [017] --- [017] -- true [017] +- error: '[string "return test_run:wait_cond(function() ..."]:1: attempt to [017] + index field ''downstream'' (a nil value)' [017] ... [017] test_run:cmd("switch default") [017] --- [017] So the wait condition should start from the check of the downstream structure availability. Follows up #4977 (cherry picked from commit d3e2a2a)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Tarantool version:
Tarantool 2.5.0-27-g32f59756a
Target: FreeBSD-amd64-RelWithDebInfo
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=OFF
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -Wno-unknown-pragmas -fexceptions -funwind-tables -fno-common -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Werror
CXX_FLAGS: -Wno-unknown-pragmas -fexceptions -funwind-tables -fno-common -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Werror
OS version:
Freebsd 12
Failed to reproduce on Linux, but it doesn't mean that it always passes there.
Bug description:
Steps to reproduce:
Used VBox FreeBSD with commands:
Optional (but very desirable):
The text was updated successfully, but these errors were encountered: