Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: flaky replication/qsync_basic.test.lua test #5162

Closed
avtikhon opened this issue Jul 10, 2020 · 0 comments
Closed

test: flaky replication/qsync_basic.test.lua test #5162

avtikhon opened this issue Jul 10, 2020 · 0 comments
Assignees
Labels
flaky test freebsd qa Issues related to tests or testing subsystem qsync replication
Milestone

Comments

@avtikhon
Copy link
Contributor

avtikhon commented Jul 10, 2020

Tarantool version:
Tarantool 2.5.0-243-gad80a4b24
Target: FreeBSD-amd64-RelWithDebInfo
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=OFF
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -Wno-unknown-pragmas -fexceptions -funwind-tables -fno-common -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-gnu-alignof-expression -Werror
CXX_FLAGS: -Wno-unknown-pragmas -fexceptions -funwind-tables -fno-common -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Werror

OS version:
Freebsd 12

Bug description:
https://gitlab.com/tarantool/tarantool/-/jobs/632988440

After commit:

commit c14563f523300542522a8f4200903d9d08efe29c
Author: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Date:   Sat Apr 25 00:34:06 2020 +0200

    replication: introduce space.is_sync option

    Synchronous space makes every transaction, affecting its data,
    wait until it is replicated on a quorum of replicas before it is
    committed.
    
    Part of #4844
    Part of #5073 

Issue on freebsd 12:

[021] replication/qsync_basic.test.lua                vinyl           [ fail ]
[021] 
[021] Test failed! Result content mismatch:
[021] --- replication/qsync_basic.result  Fri Jul 10 11:16:58 2020
[021] +++ replication/qsync_basic.reject  Fri May  8 08:23:56 2020
[021] @@ -388,7 +388,7 @@
[021]   | ...
[021]  f2:status()
[021]   | ---
[021] - | - dead
[021] + | - suspended
[021]   | ...
[021]  box.space.sync:select{9}
[021]   | ---
[021] 
[021] Last 15 lines of Tarantool Log file [Instance "master"][/home/vagrant/tarantool/test/var/021_replication/master.log]:
[021] 2020-05-08 08:23:54.946 [28240] main/293/console/unix/: I> set 'replication_synchro_timeout' configuration option to 1000
[021] 2020-05-08 08:23:54.946 [28240] main/293/console/unix/: I> set 'replication_synchro_quorum' configuration option to 3
[021] 2020-05-08 08:23:54.948 [28240] main/293/console/unix/: I> set 'replication_synchro_timeout' configuration option to 0.001
[021] 2020-05-08 08:23:54.979 [28240] main/296/console/unix/: I> set 'replication_synchro_timeout' configuration option to 1000
[021] 2020-05-08 08:23:54.982 [28240] main/296/console/unix/: I> set 'replication_synchro_quorum' configuration option to 2
[021] 2020-05-08 08:23:55.543 [28240] relay/unix/:(socket)/101/main coio.cc:379 !> SystemError unexpected EOF when reading from socket, called on fd 29, aka unix/:/home/vagrant/tarantool/test/var/021_replicati: Broken pipe
[021] 2020-05-08 08:23:55.544 [28240] relay/unix/:(socket)/101/main C> exiting the relay loop
[021] 2020-05-08 08:23:55.818 [28240] main/205/main I> subscribed replica 409e3359-9105-11ea-8a18-08002741e734 at fd 28, aka unix/:/home/vagrant/tarantool/test/var/021_replicati
[021] 2020-05-08 08:23:55.818 [28240] main/205/main I> remote vclock {1: 398, 2: 1} local vclock {0: 1023, 1: 398, 2: 1}
[021] 2020-05-08 08:23:55.818 [28240] relay/unix/:(socket)/101/main I> recover from `/home/vagrant/tarantool/test/var/021_replication/master/00000000000000000153.xlog'
[021] 2020-05-08 08:23:56.150 [28240] main/325/console/unix/: I> set 'replication_synchro_quorum' configuration option to 1
[021] 2020-05-08 08:23:56.150 [28240] main/325/console/unix/: I> set 'replication_synchro_timeout' configuration option to 5
[021] 2020-05-08 08:23:56.151 [28240] main/325/console/unix/: I> set 'replication_timeout' configuration option to 0.1
[021] 2020-05-08 08:23:56.160 [28240] relay/unix/:(socket)/101/main coio.cc:379 !> SystemError unexpected EOF when reading from socket, called on fd 28, aka unix/:/home/vagrant/tarantool/test/var/021_replicati: Broken pipe
[021] 2020-05-08 08:23:56.160 [28240] relay/unix/:(socket)/101/main C> exiting the relay loop

Steps to reproduce:

Optional (but very desirable):

  • coredump
  • backtrace
  • netstat
@avtikhon avtikhon added qa Issues related to tests or testing subsystem freebsd flaky test labels Jul 10, 2020
Gerold103 added a commit that referenced this issue Jul 11, 2020
In one of the test cases 2 fibers were started making a
transaction. In the first fiber the transaction was rolled back,
and the second fiber was expected to do the same.

It did rollback too, but not always immediately after the first
one. Because the first fiber needed not just rollback, but write a
ROLLBACK entry into WAL before applying the rollback to all next
transactions. This led to a yield, during which it was possible to
observe the second fiber not dead yet.

The patch makes the test explicitly wait for the fibers death.

Closes #5162
Gerold103 added a commit that referenced this issue Jul 11, 2020
In one of the test cases 2 fibers were started making a
transaction. In the first fiber the transaction was rolled back,
and the second fiber was expected to do the same.

It did rollback too, but not always immediately after the first
one. Because the first fiber needed not just do rollback right
away, but write a ROLLBACK entry into WAL before applying the
rollback to all next transactions. This led to a yield, during
which it was possible to observe the second fiber not dead yet.

The patch makes the test explicitly wait for the fibers death.

Closes #5162
@kyukhin kyukhin modified the milestones: 2.6.1, 2.5.1 Jul 13, 2020
Gerold103 added a commit that referenced this issue Jul 16, 2020
Too small timeouts were used for testing that synchronous
transactions succeed.

Follow-up #5162
Gerold103 added a commit that referenced this issue Jul 20, 2020
Too small timeouts were used for testing that synchronous
transactions succeed.

Follow-up #5162
Gerold103 added a commit that referenced this issue Jul 22, 2020
Too small timeouts were used for testing that synchronous
transactions succeed.

Follow-up #5162

(cherry picked from commit 2ef8fd3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test freebsd qa Issues related to tests or testing subsystem qsync replication
Projects
None yet
Development

No branches or pull requests

3 participants