Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1pt] test: flaky replication/gh-3704-misc-replica-checks-cluster-id.test.lua #5293

Closed
avtikhon opened this issue Sep 11, 2020 · 0 comments
Closed
Assignees
Labels
flaky test qa Issues related to tests or testing subsystem

Comments

@avtikhon
Copy link
Contributor

Tarantool version:
2.6.0-61-g5a9b79fa0

OS version:
All

Bug description:
Issue:

[037] --- replication/gh-3704-misc-replica-checks-cluster-id.result	Thu Sep 10 18:05:22 2020
[037] +++ replication/gh-3704-misc-replica-checks-cluster-id.reject	Fri Sep 11 11:09:38 2020
[037] @@ -25,7 +25,7 @@
[037]  ...
[037]  box.info.replication[2].downstream.status
[037]  ---
[037] -- follow
[037] +- stopped
[037]  ...
[037]  -- change master's cluster uuid and check that replica doesn't connect.
[037]  test_run:cmd("stop server replica")

Steps to reproduce:
Reproduced on slow MAC host tntmac02 with command:

l=0 ; while ./test-run.py -j50 `for r in {1..100} ; do echo replication/gh-3704-misc-replica-checks-cluster-id ; done 2>/dev/null` ; do l=$(($l+1)) ; echo ======== $l ============= ; done

Optional (but very desirable):

  • coredump
  • backtrace
  • netstat
@avtikhon avtikhon added qa Issues related to tests or testing subsystem flaky test labels Sep 11, 2020
@avtikhon avtikhon self-assigned this Sep 11, 2020
avtikhon added a commit that referenced this issue Sep 11, 2020
On heavy loaded hosts found the following issue:

  [037] --- replication/gh-3704-misc-replica-checks-cluster-id.result	Thu Sep 10 18:05:22 2020
  [037] +++ replication/gh-3704-misc-replica-checks-cluster-id.reject	Fri Sep 11 11:09:38 2020
  [037] @@ -25,7 +25,7 @@
  [037]  ...
  [037]  box.info.replication[2].downstream.status
  [037]  ---
  [037] -- follow
  [037] +- stopped
  [037]  ...
  [037]  -- change master's cluster uuid and check that replica doesn't connect.
  [037]  test_run:cmd("stop server replica")

It happened because replication downstream status check occurred too
early, when it was only in 'stopped' state. To give the replication
status check routine ability to reach the needed 'follow' state, it
need to wait for it using test_run:wait_downstream() routine.

Closes #5293
@avtikhon avtikhon added this to ON REVIEW in Quality Assurance Sep 11, 2020
@avtikhon avtikhon changed the title test: flaky replication/gh-3704-misc-replica-checks-cluster-id.test.lua [1pt] test: flaky replication/gh-3704-misc-replica-checks-cluster-id.test.lua Sep 14, 2020
kyukhin pushed a commit that referenced this issue Sep 15, 2020
On heavy loaded hosts found the following issue:

  [037] --- replication/gh-3704-misc-replica-checks-cluster-id.result	Thu Sep 10 18:05:22 2020
  [037] +++ replication/gh-3704-misc-replica-checks-cluster-id.reject	Fri Sep 11 11:09:38 2020
  [037] @@ -25,7 +25,7 @@
  [037]  ...
  [037]  box.info.replication[2].downstream.status
  [037]  ---
  [037] -- follow
  [037] +- stopped
  [037]  ...
  [037]  -- change master's cluster uuid and check that replica doesn't connect.
  [037]  test_run:cmd("stop server replica")

It happened because replication downstream status check occurred too
early, when it was only in 'stopped' state. To give the replication
status check routine ability to reach the needed 'follow' state, it
need to wait for it using test_run:wait_downstream() routine.

Closes #5293

(cherry picked from commit db3dd8d)
kyukhin pushed a commit that referenced this issue Sep 15, 2020
On heavy loaded hosts found the following issue:

  [037] --- replication/gh-3704-misc-replica-checks-cluster-id.result	Thu Sep 10 18:05:22 2020
  [037] +++ replication/gh-3704-misc-replica-checks-cluster-id.reject	Fri Sep 11 11:09:38 2020
  [037] @@ -25,7 +25,7 @@
  [037]  ...
  [037]  box.info.replication[2].downstream.status
  [037]  ---
  [037] -- follow
  [037] +- stopped
  [037]  ...
  [037]  -- change master's cluster uuid and check that replica doesn't connect.
  [037]  test_run:cmd("stop server replica")

It happened because replication downstream status check occurred too
early, when it was only in 'stopped' state. To give the replication
status check routine ability to reach the needed 'follow' state, it
need to wait for it using test_run:wait_downstream() routine.

Closes #5293

(cherry picked from commit db3dd8d)
@avtikhon avtikhon moved this from ON REVIEW to DONE in Quality Assurance Sep 15, 2020
@avtikhon avtikhon removed this from DONE in Quality Assurance Sep 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test qa Issues related to tests or testing subsystem
Projects
None yet
Development

No branches or pull requests

1 participant