Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: replication/wal_off test flaky fails on error message check #4355

Closed
avtikhon opened this issue Jul 16, 2019 · 1 comment
Closed

test: replication/wal_off test flaky fails on error message check #4355

avtikhon opened this issue Jul 16, 2019 · 1 comment
Assignees
Labels
flaky test qa Issues related to tests or testing subsystem
Milestone

Comments

@avtikhon
Copy link
Contributor

Tarantool version:
master

OS version:
All

Bug description:

[003] --- replication/wal_off.result	Thu Apr 25 13:10:18 2019
[003] +++ replication/wal_off.reject	Tue Jul 16 17:10:31 2019
[003] @@ -95,6 +95,8 @@
[003]  ...
[003]  while string.find(box.info.replication[wal_off_id].upstream.message, check) == nil do fiber.sleep(0.01) end
[003]  ---
[003] +- error: '[string "while string.find(box.info.replication[wal_of..."]:1: bad argument
[003] +    #1 to ''find'' (string expected, got nil)'
[003]  ...
[003]  box.cfg { replication = "" }
[003]  ---

Steps to reproduce:

l=0 ; while ./test-run.py -j20 `for r in {1..100} ; do echo replication/wal_off.test.lua ; done` 2>/dev/null ; do l=$(($l+1)) ; echo ======== $l ============= ; done

Optional (but very desirable):

  • coredump
  • backtrace
  • netstat
@avtikhon avtikhon self-assigned this Jul 16, 2019
@avtikhon avtikhon added flaky test qa Issues related to tests or testing subsystem labels Jul 16, 2019
@kyukhin kyukhin added this to the 2.3.0 milestone Jul 18, 2019
@kostja kostja modified the milestones: 2.3.1, QA Aug 6, 2019
@avtikhon
Copy link
Contributor Author

Reproduced on 2.4.0-16-gcdf502c66

avtikhon added a commit that referenced this issue Jun 9, 2020
Found issue:

[003] --- replication/wal_off.result	Thu Apr 25 13:10:18 2019
[003] +++ replication/wal_off.reject	Tue Jul 16 17:10:31 2019
[003] @@ -95,6 +95,8 @@
[003]  ...
[003]  while string.find(box.info.replication[wal_off_id].upstream.message, check) == nil do fiber.sleep(0.01) end
[003]  ---
[003] +- error: '[string "while string.find(box.info.replication[wal_of..."]:1: bad argument
[003] +    #1 to ''find'' (string expected, got nil)'
[003]  ...
[003]  box.cfg { replication = "" }
[003]  ---

To check the upstream status and it's message need to wait until an
upstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65 ('replication: recfg with 0
quorum returns immediately').

Closes #4355
@avtikhon avtikhon added this to ON REVIEW in Quality Assurance Jun 9, 2020
avtikhon added a commit that referenced this issue Jun 15, 2020
Found issue:

[003] --- replication/wal_off.result	Thu Apr 25 13:10:18 2019
[003] +++ replication/wal_off.reject	Tue Jul 16 17:10:31 2019
[003] @@ -95,6 +95,8 @@
[003]  ...
[003]  while string.find(box.info.replication[wal_off_id].upstream.message, check) == nil do fiber.sleep(0.01) end
[003]  ---
[003] +- error: '[string "while string.find(box.info.replication[wal_of..."]:1: bad argument
[003] +    #1 to ''find'' (string expected, got nil)'
[003]  ...
[003]  box.cfg { replication = "" }
[003]  ---

To check the upstream status and it's message need to wait until an
upstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65 ('replication: recfg with 0
quorum returns immediately').

Closes #4355
kyukhin pushed a commit that referenced this issue Jun 26, 2020
Found issue:

[003] --- replication/wal_off.result	Thu Apr 25 13:10:18 2019
[003] +++ replication/wal_off.reject	Tue Jul 16 17:10:31 2019
[003] @@ -95,6 +95,8 @@
[003]  ...
[003]  while string.find(box.info.replication[wal_off_id].upstream.message, check) == nil do fiber.sleep(0.01) end
[003]  ---
[003] +- error: '[string "while string.find(box.info.replication[wal_of..."]:1: bad argument
[003] +    #1 to ''find'' (string expected, got nil)'
[003]  ...
[003]  box.cfg { replication = "" }
[003]  ---

To check the upstream status and it's message need to wait until an
upstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65 ('replication: recfg with 0
quorum returns immediately').

Closes #4355

(cherry picked from commit 3e90447)
kyukhin pushed a commit that referenced this issue Jun 26, 2020
Found issue:

[003] --- replication/wal_off.result	Thu Apr 25 13:10:18 2019
[003] +++ replication/wal_off.reject	Tue Jul 16 17:10:31 2019
[003] @@ -95,6 +95,8 @@
[003]  ...
[003]  while string.find(box.info.replication[wal_off_id].upstream.message, check) == nil do fiber.sleep(0.01) end
[003]  ---
[003] +- error: '[string "while string.find(box.info.replication[wal_of..."]:1: bad argument
[003] +    #1 to ''find'' (string expected, got nil)'
[003]  ...
[003]  box.cfg { replication = "" }
[003]  ---

To check the upstream status and it's message need to wait until an
upstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65 ('replication: recfg with 0
quorum returns immediately').

Closes #4355

(cherry picked from commit 3e90447)
kyukhin pushed a commit that referenced this issue Jun 26, 2020
Found issue:

[003] --- replication/wal_off.result	Thu Apr 25 13:10:18 2019
[003] +++ replication/wal_off.reject	Tue Jul 16 17:10:31 2019
[003] @@ -95,6 +95,8 @@
[003]  ...
[003]  while string.find(box.info.replication[wal_off_id].upstream.message, check) == nil do fiber.sleep(0.01) end
[003]  ---
[003] +- error: '[string "while string.find(box.info.replication[wal_of..."]:1: bad argument
[003] +    #1 to ''find'' (string expected, got nil)'
[003]  ...
[003]  box.cfg { replication = "" }
[003]  ---

To check the upstream status and it's message need to wait until an
upstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65 ('replication: recfg with 0
quorum returns immediately').

Closes #4355

(cherry picked from commit 3e90447)
@avtikhon avtikhon moved this from ON REVIEW to DONE in Quality Assurance Jun 26, 2020
@avtikhon avtikhon removed this from DONE in Quality Assurance Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test qa Issues related to tests or testing subsystem
Projects
None yet
Development

No branches or pull requests

3 participants