Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: flaky replication/qsync_with_anon test at 120 line #5196

Closed
avtikhon opened this issue Jul 22, 2020 · 0 comments
Closed

test: flaky replication/qsync_with_anon test at 120 line #5196

avtikhon opened this issue Jul 22, 2020 · 0 comments
Assignees
Labels
flaky test qa Issues related to tests or testing subsystem qsync replication
Milestone

Comments

@avtikhon
Copy link
Contributor

Tarantool version:
Tarantool 2.6.0-7-g5a856023e
Target: Linux-x86_64-RelWithDebInfo
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=ON
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -static-libstdc++ -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -static-libstdc++ -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror

OS version:
Linux (Debian 9)

Bug description:
After issue #5165 found next issue at:
https://gitlab.com/tarantool/tarantool/-/jobs/647895835#L4527
Issue:

[078] --- replication/qsync_with_anon.result	Mon Jul 20 20:42:44 2020
[078] +++ replication/qsync_with_anon.reject	Tue Jul 21 03:31:42 2020
[078] @@ -120,7 +120,7 @@
[078]   | ...
[078]  box.space.sync:select{} -- none
[078]   | ---
[078] - | - []
[078] + | - - [1]
[078]   | ...
[078]  test_run:switch('default')
[078]   | ---
[078] 
[078] Last 15 lines of Tarantool Log file [Instance "master"][/builds/kGZwxbcs/0/tarantool/tarantool/test/var/078_replication/master.log]:
[078] 2020-07-21 03:31:42.383 [30082] main/120/main I> sending current read-view to replica at fd 30, aka unix/:(socket), peer of unix/:(socket)
[078] 2020-07-21 03:31:42.397 [30082] main/120/main I> read-view sent.
[078] 2020-07-21 03:31:42.414 [30082] main/120/main I> subscribed replica 3815c752-d193-47e8-9bc9-433a8721e482 at fd 30, aka unix/:(socket), peer of unix/:(socket)
[078] 2020-07-21 03:31:42.414 [30082] main/120/main I> remote vclock {1: 46} local vclock {0: 1, 1: 46}
[078] 2020-07-21 03:31:42.419 [30082] relay/unix/:(socket)/101/main I> recover from `/builds/kGZwxbcs/0/tarantool/tarantool/test/var/078_replication/master/00000000000000000000.xlog'
[078] 2020-07-21 03:31:42.463 [30082] main/184/console/unix/: I> set 'replication_synchro_timeout' configuration option to 1000
[078] 2020-07-21 03:31:42.463 [30082] main/184/console/unix/: I> set 'replication_synchro_quorum' configuration option to 2
[078] 2020-07-21 03:31:42.513 [30082] main/188/console/unix/: I> set 'replication_synchro_timeout' configuration option to 0.1
[078] 2020-07-21 03:31:42.513 [30082] main/188/console/unix/: I> set 'replication_synchro_quorum' configuration option to 3
[078] 2020-07-21 03:31:42.645 [30082] main/200/console/unix/: I> set 'replication_synchro_timeout' configuration option to 1000
[078] 2020-07-21 03:31:42.645 [30082] main/200/console/unix/: I> set 'replication_synchro_quorum' configuration option to 2
[078] 2020-07-21 03:31:42.706 [30082] relay/unix/:(socket)/101/main coio.cc:379 !> SystemError unexpected EOF when reading from socket, called on fd 30, aka unix/:(socket), peer of unix/:(socket): Broken pipe
[078] 2020-07-21 03:31:42.706 [30082] relay/unix/:(socket)/101/main C> exiting the relay loop
[078] 2020-07-21 03:31:42.747 [30082] main/208/console/unix/: I> set 'replication_synchro_timeout' configuration option to 5
[078] 2020-07-21 03:31:42.747 [30082] main/208/console/unix/: I> set 'replication_synchro_quorum' configuration option to 1

Steps to reproduce:

Optional (but very desirable):

  • coredump
  • backtrace
  • netstat
@avtikhon avtikhon added qa Issues related to tests or testing subsystem flaky test qsync replication labels Jul 22, 2020
@avtikhon avtikhon self-assigned this Jul 22, 2020
@Gerold103 Gerold103 self-assigned this Jul 22, 2020
Gerold103 added a commit that referenced this issue Jul 22, 2020
One of the test cases had 2 problems.

- The same as in the previous commit - it started a sync
  transaction on master, switched to replica assuming it sees
  everything up to this sync transaction, but it still can see
  data from the previous test case;

- The test case tried to write a sync transaction on master, got
  timeout, switched to replica to ensure the data is removed here
  too, but since dirty reads are possible, it could happen the
  data was delivered to replica and ROLLBACK wasn't not yet. On
  the replica the rolled back data still could be visible.

The first issue is solved by flushing master's state to replica
via making a successful sync transaction.

The second issue is fixed by splitting it into more steps, not
depending on timeouts (1000 is considered infinity).

Closes #5196
Gerold103 added a commit that referenced this issue Jul 28, 2020
One of the test cases had 2 problems.

- The same as in the previous commit - it started a sync
  transaction on master, switched to replica assuming it sees
  everything up to this sync transaction, but it still can see
  data from the previous test case;

- The test case tried to write a sync transaction on master, got
  timeout, switched to replica to ensure the data is removed here
  too, but since dirty reads are possible, it could happen the
  data was delivered to replica and ROLLBACK wasn't not yet. On
  the replica the rolled back data still could be visible.

The first issue is solved by flushing master's state to replica
via making a successful sync transaction.

The second issue is fixed by splitting it into more steps, not
depending on timeouts (1000 is considered infinity).

Closes #5196

(cherry picked from commit cd292ad)
@kyukhin kyukhin added this to the 2.5.2 milestone Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test qa Issues related to tests or testing subsystem qsync replication
Projects
None yet
Development

No branches or pull requests

3 participants