Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: flaky replication/wal_rw_stress.test.lua test #4977

Closed
avtikhon opened this issue May 15, 2020 · 1 comment
Closed

test: flaky replication/wal_rw_stress.test.lua test #4977

avtikhon opened this issue May 15, 2020 · 1 comment
Assignees
Labels
flaky test qa Issues related to tests or testing subsystem

Comments

@avtikhon
Copy link
Contributor

avtikhon commented May 15, 2020

Tarantool version:
Tarantool 2.5.0-27-g32f59756a
Target: FreeBSD-amd64-RelWithDebInfo
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=OFF
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -Wno-unknown-pragmas -fexceptions -funwind-tables -fno-common -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Werror
CXX_FLAGS: -Wno-unknown-pragmas -fexceptions -funwind-tables -fno-common -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Werror

OS version:
Freebsd 12
Failed to reproduce on Linux, but it doesn't mean that it always passes there.

Bug description:

 [016] --- replication/wal_rw_stress.result	Fri Feb 21 11:53:21 2020
 [016] +++ replication/wal_rw_stress.reject	Fri May  8 08:23:56 2020
 [016] @@ -73,7 +73,42 @@
 [016]  ...
 [016]  box.info.replication[1].downstream.status ~= 'stopped' or box.info
 [016]  ---
 [016] -- true
 [016] +- version: 2.5.0-27-g32f59756a
 [016] +  id: 2
 [016] +  ro: false
 [016] +  uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +  package: Tarantool
 [016] +  cluster:
 [016] +    uuid: 397c196f-9105-11ea-96ab-08002739cbd6
 [016] +  listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto
 [016] +  replication:
 [016] +    1:
 [016] +      id: 1
 [016] +      uuid: 397a1886-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 10005
 [016] +      upstream:
 [016] +        status: follow
 [016] +        idle: 0.46353673400017
 [016] +        peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto
 [016] +        lag: -0.45732522010803
 [016] +      downstream:
 [016] +        status: stopped
 [016] +        message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati
 [016] +        system_message: Broken pipe
 [016] +    2:
 [016] +      id: 2
 [016] +      uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 0
 [016] +  signature: 10005
 [016] +  status: running
 [016] +  vinyl: []
 [016] +  uptime: 2
 [016] +  lsn: 0
 [016] +  sql: []
 [016] +  gc: []
 [016] +  pid: 41231
 [016] +  memory: []
 [016] +  vclock: {1: 10005}
 [016]  ...
 [016]  test_run:cmd("switch default")
 [016]  ---

Steps to reproduce:
Used VBox FreeBSD with commands:

cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=OFF
gmake -j
cd test
l=0 ; while ./test-run.py --long -j5 `for r in {1..10} ; do echo replication/wal_rw_stress.test.lua ; done 2>/dev/null` ; do l=$(($l+1)) ; echo ; =; ======= $l ============= ; done | tee a.log 2>&1

Optional (but very desirable):

  • coredump
  • backtrace
  • netstat
@avtikhon avtikhon self-assigned this May 15, 2020
@avtikhon avtikhon added flaky test qa Issues related to tests or testing subsystem labels May 15, 2020
avtikhon added a commit that referenced this issue May 15, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  box/alter_limits.test.lua                 ; gh-4926
  replication/wal_stress.test.lua           ; gh-4977

Part of #4953
avtikhon added a commit that referenced this issue May 15, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  replication/wal_rw_stress.test.lua        ; gh-4977

Part of #4953
avtikhon added a commit that referenced this issue May 15, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  replication/wal_rw_stress.test.lua        ; gh-4977
  vinyl/misc.test.lua                       ; gh-4979

Part of #4953
avtikhon added a commit that referenced this issue May 15, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979

Part of #4953

t
avtikhon added a commit that referenced this issue May 15, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  ddl.test.lua                              ; gh-4353
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979

Part of #4953
avtikhon added a commit that referenced this issue May 15, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  engine/ddl.test.lua                       ; gh-4353
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979

Part of #4953
avtikhon added a commit that referenced this issue May 15, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  engine/ddl.test.lua                       ; gh-4353
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979

Part of #4953
avtikhon added a commit that referenced this issue May 15, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  engine/ddl.test.lua                       ; gh-4353
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572

Part of #4953
avtikhon added a commit that referenced this issue May 15, 2020
Added skip condition on OSX for test:
  replication/box_set_replication_stress.test.lua

Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572

Part of #4953

(cherry picked from commit 72a2bae)
avtikhon added a commit that referenced this issue May 16, 2020
Added skip condition on OSX for test:
  replication/box_set_replication_stress.test.lua

Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app/fiber.test.lua                        ; gh-4987
  box/tuple.test.lua                        ; gh-4988
  box/transaction.test.lua                  ; gh-4990
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  engine/ddl.test.lua                       ; gh-4353
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572

Part of #4953

(cherry picked from commit 72a2bae)
avtikhon added a commit that referenced this issue May 16, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app/fiber.test.lua                        ; gh-4987
  box/tuple.test.lua                        ; gh-4988
  box/transaction.test.lua                  ; gh-4990
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  engine/ddl.test.lua                       ; gh-4353
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572

Part of #4953
avtikhon added a commit that referenced this issue May 17, 2020
Added skip condition on OSX for test:
  replication/box_set_replication_stress.test.lua

Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app/fiber.test.lua                        ; gh-4987
  box/tuple.test.lua                        ; gh-4988
  box/transaction.test.lua                  ; gh-4990
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953

(cherry picked from commit 72a2bae)
avtikhon added a commit that referenced this issue May 17, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app/fiber.test.lua                        ; gh-4987
  box/tuple.test.lua                        ; gh-4988
  box/transaction.test.lua                  ; gh-4990
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953
avtikhon added a commit that referenced this issue May 17, 2020
Added skip condition on OSX for test:
  replication/box_set_replication_stress.test.lua

Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app-tap/popen.test.lua                    ; gh-4995
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/rtree_rect.test.lua                   ; gh-4994
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953

(cherry picked from commit 72a2bae)
avtikhon added a commit that referenced this issue May 17, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app-tap/popen.test.lua                    ; gh-4995
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/rtree_rect.test.lua                   ; gh-4994
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953
avtikhon added a commit that referenced this issue May 18, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app-tap/popen.test.lua                    ; gh-4995
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953
avtikhon added a commit that referenced this issue May 18, 2020
Added skip condition on OSX for test:
  replication/box_set_replication_stress.test.lua

Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app-tap/popen.test.lua                    ; gh-4995
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953

(cherry picked from commit 72a2bae)
avtikhon added a commit that referenced this issue May 19, 2020
Added skip condition on OSX for test:
  replication/box_set_replication_stress.test.lua

Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953

(cherry picked from commit 72a2bae)
avtikhon added a commit that referenced this issue May 19, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953
kyukhin pushed a commit that referenced this issue May 20, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953

(cherry picked from commit 430c0e8)
kyukhin pushed a commit that referenced this issue May 20, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953

(cherry picked from commit 430c0e8)
kyukhin pushed a commit that referenced this issue May 20, 2020
Added skip condition on OSX for test:
  replication/box_set_replication_stress.test.lua

Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953

(cherry picked from commit 430c0e8)
kyukhin pushed a commit that referenced this issue May 20, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953
avtikhon added a commit that referenced this issue Jun 15, 2020
Found issue:

 [016] --- replication/wal_rw_stress.result	Fri Feb 21 11:53:21 2020
 [016] +++ replication/wal_rw_stress.reject	Fri May  8 08:23:56 2020
 [016] @@ -73,7 +73,42 @@
 [016]  ...
 [016]  box.info.replication[1].downstream.status ~= 'stopped' or box.info
 [016]  ---
 [016] -- true
 [016] +- version: 2.5.0-27-g32f59756a
 [016] +  id: 2
 [016] +  ro: false
 [016] +  uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +  package: Tarantool
 [016] +  cluster:
 [016] +    uuid: 397c196f-9105-11ea-96ab-08002739cbd6
 [016] +  listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto
 [016] +  replication:
 [016] +    1:
 [016] +      id: 1
 [016] +      uuid: 397a1886-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 10005
 [016] +      upstream:
 [016] +        status: follow
 [016] +        idle: 0.46353673400017
 [016] +        peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto
 [016] +        lag: -0.45732522010803
 [016] +      downstream:
 [016] +        status: stopped
 [016] +        message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati
 [016] +        system_message: Broken pipe
 [016] +    2:
 [016] +      id: 2
 [016] +      uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 0
 [016] +  signature: 10005
 [016] +  status: running
 [016] +  vinyl: []
 [016] +  uptime: 2
 [016] +  lsn: 0
 [016] +  sql: []
 [016] +  gc: []
 [016] +  pid: 41231
 [016] +  memory: []
 [016] +  vclock: {1: 10005}
 [016]  ...
 [016]  test_run:cmd("switch default")
 [016]  ---

To check the downstream status and it's message need to wait until an
downstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65 ('replication: recfg with 0
quorum returns immediately').

Closes #4977
@avtikhon avtikhon added this to ON REVIEW in Quality Assurance Jun 15, 2020
@avtikhon avtikhon moved this from ON REVIEW to DOING in Quality Assurance Jun 19, 2020
avtikhon added a commit that referenced this issue Jun 19, 2020
Found issue (reproduced on VBox FreeBSD machine):

 [016] --- replication/wal_rw_stress.result	Fri Feb 21 11:53:21 2020
 [016] +++ replication/wal_rw_stress.reject	Fri May  8 08:23:56 2020
 [016] @@ -73,7 +73,42 @@
 [016]  ...
 [016]  box.info.replication[1].downstream.status ~= 'stopped' or box.info
 [016]  ---
 [016] -- true
 [016] +- version: 2.5.0-27-g32f59756a
 [016] +  id: 2
 [016] +  ro: false
 [016] +  uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +  package: Tarantool
 [016] +  cluster:
 [016] +    uuid: 397c196f-9105-11ea-96ab-08002739cbd6
 [016] +  listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto
 [016] +  replication:
 [016] +    1:
 [016] +      id: 1
 [016] +      uuid: 397a1886-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 10005
 [016] +      upstream:
 [016] +        status: follow
 [016] +        idle: 0.46353673400017
 [016] +        peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto
 [016] +        lag: -0.45732522010803
 [016] +      downstream:
 [016] +        status: stopped
 [016] +        message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati
 [016] +        system_message: Broken pipe
 [016] +    2:
 [016] +      id: 2
 [016] +      uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 0
 [016] +  signature: 10005
 [016] +  status: running
 [016] +  vinyl: []
 [016] +  uptime: 2
 [016] +  lsn: 0
 [016] +  sql: []
 [016] +  gc: []
 [016] +  pid: 41231
 [016] +  memory: []
 [016] +  vclock: {1: 10005}
 [016]  ...
 [016]  test_run:cmd("switch default")
 [016]  ---

To check the downstream status and it's message need to wait until an
downstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65 ('replication: recfg with 0
quorum returns immediately').

Checked that test still checks the error for which it was created at
b9db91e ('xlog: fix fallocate vs
read race') patch and successfully got the needed error "tx checksum
mismatch":

[153] --- replication/wal_rw_stress.result      Fri Jun 19 15:01:49 2020
[153] +++ replication/wal_rw_stress.reject      Fri Jun 19 15:04:02 2020
[153] @@ -73,7 +73,43 @@
[153]  ...
[153]  test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info
...
[153] +      downstream:
[153] +        status: stopped
[153] +        message: tx checksum mismatch

Closes #4977
@avtikhon avtikhon moved this from DOING to ON REVIEW in Quality Assurance Jun 22, 2020
avtikhon added a commit that referenced this issue Jun 23, 2020
Found issue (reproduced on VBox FreeBSD machine):

 [016] --- replication/wal_rw_stress.result	Fri Feb 21 11:53:21 2020
 [016] +++ replication/wal_rw_stress.reject	Fri May  8 08:23:56 2020
 [016] @@ -73,7 +73,42 @@
 [016]  ...
 [016]  box.info.replication[1].downstream.status ~= 'stopped' or box.info
 [016]  ---
 [016] -- true
 [016] +- version: 2.5.0-27-g32f59756a
 [016] +  id: 2
 [016] +  ro: false
 [016] +  uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +  package: Tarantool
 [016] +  cluster:
 [016] +    uuid: 397c196f-9105-11ea-96ab-08002739cbd6
 [016] +  listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto
 [016] +  replication:
 [016] +    1:
 [016] +      id: 1
 [016] +      uuid: 397a1886-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 10005
 [016] +      upstream:
 [016] +        status: follow
 [016] +        idle: 0.46353673400017
 [016] +        peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto
 [016] +        lag: -0.45732522010803
 [016] +      downstream:
 [016] +        status: stopped
 [016] +        message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati
 [016] +        system_message: Broken pipe
 [016] +    2:
 [016] +      id: 2
 [016] +      uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 0
 [016] +  signature: 10005
 [016] +  status: running
 [016] +  vinyl: []
 [016] +  uptime: 2
 [016] +  lsn: 0
 [016] +  sql: []
 [016] +  gc: []
 [016] +  pid: 41231
 [016] +  memory: []
 [016] +  vclock: {1: 10005}
 [016]  ...
 [016]  test_run:cmd("switch default")
 [016]  ---

To check the downstream status and it's message need to wait until an
downstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65 ('replication: recfg with 0
quorum returns immediately').

Checked that test still checks the error for which it was created at
b9db91e ('xlog: fix fallocate vs
read race') patch and successfully got the needed error "tx checksum
mismatch":

[153] --- replication/wal_rw_stress.result      Fri Jun 19 15:01:49 2020
[153] +++ replication/wal_rw_stress.reject      Fri Jun 19 15:04:02 2020
[153] @@ -73,7 +73,43 @@
[153]  ...
[153]  test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info
...
[153] +      downstream:
[153] +        status: stopped
[153] +        message: tx checksum mismatch

Note that wait_cond() allows to overcome a transient network
connectivity errors, but 'tx checksum mismatch' is persistent
one and will be catched.

Closes #4977
kyukhin pushed a commit that referenced this issue Jun 26, 2020
Found issue (reproduced on VBox FreeBSD machine):

 [016] --- replication/wal_rw_stress.result	Fri Feb 21 11:53:21 2020
 [016] +++ replication/wal_rw_stress.reject	Fri May  8 08:23:56 2020
 [016] @@ -73,7 +73,42 @@
 [016]  ...
 [016]  box.info.replication[1].downstream.status ~= 'stopped' or box.info
 [016]  ---
 [016] -- true
 [016] +- version: 2.5.0-27-g32f59756a
 [016] +  id: 2
 [016] +  ro: false
 [016] +  uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +  package: Tarantool
 [016] +  cluster:
 [016] +    uuid: 397c196f-9105-11ea-96ab-08002739cbd6
 [016] +  listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto
 [016] +  replication:
 [016] +    1:
 [016] +      id: 1
 [016] +      uuid: 397a1886-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 10005
 [016] +      upstream:
 [016] +        status: follow
 [016] +        idle: 0.46353673400017
 [016] +        peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto
 [016] +        lag: -0.45732522010803
 [016] +      downstream:
 [016] +        status: stopped
 [016] +        message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati
 [016] +        system_message: Broken pipe
 [016] +    2:
 [016] +      id: 2
 [016] +      uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 0
 [016] +  signature: 10005
 [016] +  status: running
 [016] +  vinyl: []
 [016] +  uptime: 2
 [016] +  lsn: 0
 [016] +  sql: []
 [016] +  gc: []
 [016] +  pid: 41231
 [016] +  memory: []
 [016] +  vclock: {1: 10005}
 [016]  ...
 [016]  test_run:cmd("switch default")
 [016]  ---

To check the downstream status and it's message need to wait until an
downstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65 ('replication: recfg with 0
quorum returns immediately').

Checked that test still checks the error for which it was created at
b9db91e ('xlog: fix fallocate vs
read race') patch and successfully got the needed error "tx checksum
mismatch":

[153] --- replication/wal_rw_stress.result      Fri Jun 19 15:01:49 2020
[153] +++ replication/wal_rw_stress.reject      Fri Jun 19 15:04:02 2020
[153] @@ -73,7 +73,43 @@
[153]  ...
[153]  test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info
...
[153] +      downstream:
[153] +        status: stopped
[153] +        message: tx checksum mismatch

Note that wait_cond() allows to overcome a transient network
connectivity errors, but 'tx checksum mismatch' is persistent
one and will be catched.

Closes #4977

(cherry picked from commit 06eda0f)
kyukhin pushed a commit that referenced this issue Jun 26, 2020
Found issue (reproduced on VBox FreeBSD machine):

 [016] --- replication/wal_rw_stress.result	Fri Feb 21 11:53:21 2020
 [016] +++ replication/wal_rw_stress.reject	Fri May  8 08:23:56 2020
 [016] @@ -73,7 +73,42 @@
 [016]  ...
 [016]  box.info.replication[1].downstream.status ~= 'stopped' or box.info
 [016]  ---
 [016] -- true
 [016] +- version: 2.5.0-27-g32f59756a
 [016] +  id: 2
 [016] +  ro: false
 [016] +  uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +  package: Tarantool
 [016] +  cluster:
 [016] +    uuid: 397c196f-9105-11ea-96ab-08002739cbd6
 [016] +  listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto
 [016] +  replication:
 [016] +    1:
 [016] +      id: 1
 [016] +      uuid: 397a1886-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 10005
 [016] +      upstream:
 [016] +        status: follow
 [016] +        idle: 0.46353673400017
 [016] +        peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto
 [016] +        lag: -0.45732522010803
 [016] +      downstream:
 [016] +        status: stopped
 [016] +        message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati
 [016] +        system_message: Broken pipe
 [016] +    2:
 [016] +      id: 2
 [016] +      uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 0
 [016] +  signature: 10005
 [016] +  status: running
 [016] +  vinyl: []
 [016] +  uptime: 2
 [016] +  lsn: 0
 [016] +  sql: []
 [016] +  gc: []
 [016] +  pid: 41231
 [016] +  memory: []
 [016] +  vclock: {1: 10005}
 [016]  ...
 [016]  test_run:cmd("switch default")
 [016]  ---

To check the downstream status and it's message need to wait until an
downstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65 ('replication: recfg with 0
quorum returns immediately').

Checked that test still checks the error for which it was created at
b9db91e ('xlog: fix fallocate vs
read race') patch and successfully got the needed error "tx checksum
mismatch":

[153] --- replication/wal_rw_stress.result      Fri Jun 19 15:01:49 2020
[153] +++ replication/wal_rw_stress.reject      Fri Jun 19 15:04:02 2020
[153] @@ -73,7 +73,43 @@
[153]  ...
[153]  test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info
...
[153] +      downstream:
[153] +        status: stopped
[153] +        message: tx checksum mismatch

Note that wait_cond() allows to overcome a transient network
connectivity errors, but 'tx checksum mismatch' is persistent
one and will be catched.

Closes #4977

(cherry picked from commit 06eda0f)
kyukhin pushed a commit that referenced this issue Jun 26, 2020
Found issue (reproduced on VBox FreeBSD machine):

 [016] --- replication/wal_rw_stress.result	Fri Feb 21 11:53:21 2020
 [016] +++ replication/wal_rw_stress.reject	Fri May  8 08:23:56 2020
 [016] @@ -73,7 +73,42 @@
 [016]  ...
 [016]  box.info.replication[1].downstream.status ~= 'stopped' or box.info
 [016]  ---
 [016] -- true
 [016] +- version: 2.5.0-27-g32f59756a
 [016] +  id: 2
 [016] +  ro: false
 [016] +  uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +  package: Tarantool
 [016] +  cluster:
 [016] +    uuid: 397c196f-9105-11ea-96ab-08002739cbd6
 [016] +  listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto
 [016] +  replication:
 [016] +    1:
 [016] +      id: 1
 [016] +      uuid: 397a1886-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 10005
 [016] +      upstream:
 [016] +        status: follow
 [016] +        idle: 0.46353673400017
 [016] +        peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto
 [016] +        lag: -0.45732522010803
 [016] +      downstream:
 [016] +        status: stopped
 [016] +        message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati
 [016] +        system_message: Broken pipe
 [016] +    2:
 [016] +      id: 2
 [016] +      uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 0
 [016] +  signature: 10005
 [016] +  status: running
 [016] +  vinyl: []
 [016] +  uptime: 2
 [016] +  lsn: 0
 [016] +  sql: []
 [016] +  gc: []
 [016] +  pid: 41231
 [016] +  memory: []
 [016] +  vclock: {1: 10005}
 [016]  ...
 [016]  test_run:cmd("switch default")
 [016]  ---

To check the downstream status and it's message need to wait until an
downstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65 ('replication: recfg with 0
quorum returns immediately').

Checked that test still checks the error for which it was created at
b9db91e ('xlog: fix fallocate vs
read race') patch and successfully got the needed error "tx checksum
mismatch":

[153] --- replication/wal_rw_stress.result      Fri Jun 19 15:01:49 2020
[153] +++ replication/wal_rw_stress.reject      Fri Jun 19 15:04:02 2020
[153] @@ -73,7 +73,43 @@
[153]  ...
[153]  test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info
...
[153] +      downstream:
[153] +        status: stopped
[153] +        message: tx checksum mismatch

Note that wait_cond() allows to overcome a transient network
connectivity errors, but 'tx checksum mismatch' is persistent
one and will be catched.

Closes #4977

(cherry picked from commit 06eda0f)
@avtikhon avtikhon moved this from ON REVIEW to DONE in Quality Assurance Jun 26, 2020
@avtikhon avtikhon removed this from DONE in Quality Assurance Jun 26, 2020
@avtikhon
Copy link
Contributor Author

avtikhon commented Jul 14, 2020

Seems that downstream status should be checked when downstream structure will be available to avoid of issue:
https://gitlab.com/tarantool/tarantool/-/jobs/636795670
https://gitlab.com/tarantool/tarantool/-/jobs/637053046
https://gitlab.com/tarantool/tarantool/-/jobs/636795514

[017] --- replication/wal_rw_stress.result	Thu Jul  9 17:04:16 2020
[017] +++ replication/wal_rw_stress.reject	Fri May  8 08:25:15 2020
[017] @@ -75,7 +75,8 @@
[017]      return box.info.replication[1].downstream.status ~= 'stopped' \
[017]  end) or box.info
[017]  ---
[017] -- true
[017] +- error: '[string "return test_run:wait_cond(function()         ..."]:1: attempt to
[017] +    index field ''downstream'' (a nil value)'
[017]  ...
[017]  test_run:cmd("switch default")
[017]  ---
[017] 

@avtikhon avtikhon reopened this Jul 14, 2020
avtikhon added a commit that referenced this issue Jul 14, 2020
Found that on heavy loaded hosts the test tries to check replication
downstream status when downstream structure is not ready and it fails
with the error:

[017] --- replication/wal_rw_stress.result	Thu Jul  9 17:04:16 2020
[017] +++ replication/wal_rw_stress.reject	Fri May  8 08:25:15 2020
[017] @@ -75,7 +75,8 @@
[017]      return box.info.replication[1].downstream.status ~= 'stopped' \
[017]  end) or box.info
[017]  ---
[017] -- true
[017] +- error: '[string "return test_run:wait_cond(function()         ..."]:1: attempt to
[017] +    index field ''downstream'' (a nil value)'
[017]  ...
[017]  test_run:cmd("switch default")
[017]  ---
[017]

So the wait condition should start from the check of the downstream
structure availability.

Follows up #4977
avtikhon added a commit that referenced this issue Jul 14, 2020
Found that on heavy loaded hosts the test tries to check replication
downstream status when downstream structure is not ready and it fails
with the error:

[017] --- replication/wal_rw_stress.result	Thu Jul  9 17:04:16 2020
[017] +++ replication/wal_rw_stress.reject	Fri May  8 08:25:15 2020
[017] @@ -75,7 +75,8 @@
[017]      return box.info.replication[1].downstream.status ~= 'stopped' \
[017]  end) or box.info
[017]  ---
[017] -- true
[017] +- error: '[string "return test_run:wait_cond(function()         ..."]:1: attempt to
[017] +    index field ''downstream'' (a nil value)'
[017]  ...
[017]  test_run:cmd("switch default")
[017]  ---
[017]

So the wait condition should start from the check of the downstream
structure availability.

Follows up #4977
@avtikhon avtikhon added this to ON REVIEW in Quality Assurance Jul 14, 2020
kyukhin pushed a commit that referenced this issue Jul 14, 2020
Found that on heavy loaded hosts the test tries to check replication
downstream status when downstream structure is not ready and it fails
with the error:

[017] --- replication/wal_rw_stress.result	Thu Jul  9 17:04:16 2020
[017] +++ replication/wal_rw_stress.reject	Fri May  8 08:25:15 2020
[017] @@ -75,7 +75,8 @@
[017]      return box.info.replication[1].downstream.status ~= 'stopped' \
[017]  end) or box.info
[017]  ---
[017] -- true
[017] +- error: '[string "return test_run:wait_cond(function()         ..."]:1: attempt to
[017] +    index field ''downstream'' (a nil value)'
[017]  ...
[017]  test_run:cmd("switch default")
[017]  ---
[017]

So the wait condition should start from the check of the downstream
structure availability.

Follows up #4977

(cherry picked from commit d3e2a2a)
kyukhin pushed a commit that referenced this issue Jul 14, 2020
Found that on heavy loaded hosts the test tries to check replication
downstream status when downstream structure is not ready and it fails
with the error:

[017] --- replication/wal_rw_stress.result	Thu Jul  9 17:04:16 2020
[017] +++ replication/wal_rw_stress.reject	Fri May  8 08:25:15 2020
[017] @@ -75,7 +75,8 @@
[017]      return box.info.replication[1].downstream.status ~= 'stopped' \
[017]  end) or box.info
[017]  ---
[017] -- true
[017] +- error: '[string "return test_run:wait_cond(function()         ..."]:1: attempt to
[017] +    index field ''downstream'' (a nil value)'
[017]  ...
[017]  test_run:cmd("switch default")
[017]  ---
[017]

So the wait condition should start from the check of the downstream
structure availability.

Follows up #4977
kyukhin pushed a commit that referenced this issue Jul 14, 2020
Found that on heavy loaded hosts the test tries to check replication
downstream status when downstream structure is not ready and it fails
with the error:

[017] --- replication/wal_rw_stress.result	Thu Jul  9 17:04:16 2020
[017] +++ replication/wal_rw_stress.reject	Fri May  8 08:25:15 2020
[017] @@ -75,7 +75,8 @@
[017]      return box.info.replication[1].downstream.status ~= 'stopped' \
[017]  end) or box.info
[017]  ---
[017] -- true
[017] +- error: '[string "return test_run:wait_cond(function()         ..."]:1: attempt to
[017] +    index field ''downstream'' (a nil value)'
[017]  ...
[017]  test_run:cmd("switch default")
[017]  ---
[017]

So the wait condition should start from the check of the downstream
structure availability.

Follows up #4977

(cherry picked from commit d3e2a2a)
kyukhin pushed a commit that referenced this issue Jul 14, 2020
Found that on heavy loaded hosts the test tries to check replication
downstream status when downstream structure is not ready and it fails
with the error:

[017] --- replication/wal_rw_stress.result	Thu Jul  9 17:04:16 2020
[017] +++ replication/wal_rw_stress.reject	Fri May  8 08:25:15 2020
[017] @@ -75,7 +75,8 @@
[017]      return box.info.replication[1].downstream.status ~= 'stopped' \
[017]  end) or box.info
[017]  ---
[017] -- true
[017] +- error: '[string "return test_run:wait_cond(function()         ..."]:1: attempt to
[017] +    index field ''downstream'' (a nil value)'
[017]  ...
[017]  test_run:cmd("switch default")
[017]  ---
[017]

So the wait condition should start from the check of the downstream
structure availability.

Follows up #4977

(cherry picked from commit d3e2a2a)
@avtikhon avtikhon moved this from ON REVIEW to DONE in Quality Assurance Jul 14, 2020
@avtikhon avtikhon removed this from DONE in Quality Assurance Jul 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test qa Issues related to tests or testing subsystem
Projects
None yet
Development

No branches or pull requests

1 participant