Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: engine/ddl.test.lua flaky fails on index counter checks #13

Closed
avtikhon opened this issue Jul 16, 2019 · 11 comments · Fixed by tarantool/tarantool#6136
Closed

test: engine/ddl.test.lua flaky fails on index counter checks #13

avtikhon opened this issue Jul 16, 2019 · 11 comments · Fixed by tarantool/tarantool#6136
Assignees

Comments

@avtikhon
Copy link
Contributor

Tarantool version:
Tarantool 2.5.0-158-g357281133
Target: Linux-x86_64-RelWithDebInfo
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr -DENABLE_BACKTRACE=ON
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS:-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type
CXX_FLAGS:-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type

OS version:
Fedora 31

Bug description:
https://gitlab.com/tarantool/tarantool/-/jobs/606635367

 [159] --- engine/ddl.result	Mon Jun 22 21:45:31 2020
 [159] +++ engine/ddl.reject	Mon Jun 22 22:00:01 2020
 [159] @@ -2558,7 +2558,7 @@
 [159]  ...
 [159]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
 [159]  ---
 [159] -- true
 [159] +- false
 [159]  ...
 [159]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
 [159]  ---
 [159] @@ -2570,24 +2570,24 @@
 [159]  ...
 [159]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
 [159]  ---
 [159] +- false
 [159] +...
 [159] +inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
 [159] +---
 [159]  - true
 [159]  ...
 [159] +box.snapshot()
 [159] +---
 [159] +- ok
 [159] +...
 [159] +inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
 [159] +---
 [159] +- false
 [159] +...
 [159]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
 [159]  ---
 [159]  - true
 [159]  ...
 [159] -box.snapshot()
 [159] ----
 [159] -- ok
 [159] -...
 [159] -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
 [159] ----
 [159] -- true
 [159] -...
 [159] -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
 [159] ----
 [159] -- true
 [159] -...
 [159]  box.space.test:drop()
 [159]  ---
 [159]  ...
 [159] 
 [159] Last 15 lines of Tarantool Log file [Instance "box"][/build/usr/src/debug/tarantool-2.5.0.158/test/var/159_engine/box.log]:
 [159] 2020-06-22 21:59:01.270 [32461] vinyl.dump.0/104/task I> writing `/build/usr/src/debug/tarantool-2.5.0.158/test/var/159_engine/box/599/0/00000000000000000275.index'
 [159] 2020-06-22 21:59:01.270 [32461] main/108/vinyl.scheduler I> 599/0: dump completed
 [159] 2020-06-22 21:59:01.270 [32461] main/108/vinyl.scheduler I> dumped 305714 bytes in 0.0 s, rate 99.8 MB/s
 [159] 2020-06-22 21:59:01.270 [32461] main/121/console/unix/: I> vinyl checkpoint completed
 [159] 2020-06-22 21:59:01.270 [32461] main/108/vinyl.scheduler I> 599/0: started compacting range (-inf..inf), runs 4/4
 [159] 2020-06-22 21:59:01.270 [32461] vinyl.compaction.0/102/task I> writing `/build/usr/src/debug/tarantool-2.5.0.158/test/var/159_engine/box/599/0/00000000000000000277.run'
 [159] 2020-06-22 21:59:01.271 [32461] main I> removed /build/usr/src/debug/tarantool-2.5.0.158/test/var/159_engine/box/00000000000000003036.snap
 [159] 2020-06-22 21:59:01.271 [32461] main I> removed /build/usr/src/debug/tarantool-2.5.0.158/test/var/159_engine/box/00000000000000002736.vylog
 [159] 2020-06-22 21:59:01.271 [32461] main/105/gc I> removed /build/usr/src/debug/tarantool-2.5.0.158/test/var/159_engine/box/599/0/00000000000000000247.index
 [159] 2020-06-22 21:59:01.271 [32461] main/105/gc I> removed /build/usr/src/debug/tarantool-2.5.0.158/test/var/159_engine/box/599/0/00000000000000000247.run
 [159] 2020-06-22 21:59:01.271 [32461] main/105/gc I> removed /build/usr/src/debug/tarantool-2.5.0.158/test/var/159_engine/box/599/0/00000000000000000249.index
 [159] 2020-06-22 21:59:01.272 [32461] main/105/gc I> removed /build/usr/src/debug/tarantool-2.5.0.158/test/var/159_engine/box/599/0/00000000000000000249.run
 [159] 2020-06-22 21:59:01.273 [32461] wal I> removed /build/usr/src/debug/tarantool-2.5.0.158/test/var/159_engine/box/00000000000000003036.xlog
 [159] 2020-06-22 21:59:01.280 [32461] vinyl.compaction.0/102/task I> writing `/build/usr/src/debug/tarantool-2.5.0.158/test/var/159_engine/box/599/0/00000000000000000277.index'
 [159] 2020-06-22 21:59:01.287 [32461] main/108/vinyl.scheduler I> 599/0: completed compacting range (-inf..inf)

Steps to reproduce:

l=0 ; while ./test-run.py -j20 `for r in {1..100} ; do echo engine/ddl.test.lua ; done` 2>/dev/null ; do l=$(($l+1)) ; echo ======== $l ============= ; done

Found that the root cause of the issue is the different number of primary and secondary keys. Check the following patch:

diff --git a/test/engine/ddl.test.lua b/test/engine/ddl.test.lua
index 1d77705dd..1991212d0 100644
--- a/test/engine/ddl.test.lua
+++ b/test/engine/ddl.test.lua
@@ -966,6 +966,9 @@ math.randomseed(os.time())
 s = box.schema.space.create('test', {engine = engine})
 _ = s:create_index('pk')
 
+log = require('log')
+log.info("ERROR ======= BEFORE 1 ======= PK " .. box.space.test.index.pk:count())
+
 inspector:cmd("setopt delimiter ';'")
 
 box.begin()
@@ -1005,17 +1008,36 @@ end;
 
 inspector:cmd("setopt delimiter ''");
 
+log.info("ERROR ======= AFTER 1 ======= PK " .. box.space.test.index.pk:count())
+
 fiber = require('fiber')
 ch = fiber.channel(1)
 
 _ = fiber.create(function() gen_load() ch:put(true) end)
 _ = box.space.test:create_index('sk', {unique = false, parts = {2, 'unsigned'}})
+
+log.info("ERROR ======= BEFORE 1 ======= PK " .. box.space.test.index.pk:count())
+log.info("ERROR ======= BEFORE 1 ======= SK " .. box.space.test.index.sk:count())
+
 ch:get()
 
+log.info("ERROR ======= AFTER 1 ======= PK " .. box.space.test.index.pk:count())
+log.info("ERROR ======= AFTER 1 ======= SK " .. box.space.test.index.sk:count())
+
+
 _ = fiber.create(function() gen_load() ch:put(true) end)
 _ = box.space.test:create_index('tk', {unique = true, parts = {3, 'unsigned'}})
+
+log.info("ERROR ======= BEFORE 2 ======= PK " .. box.space.test.index.pk:count())
+log.info("ERROR ======= BEFORE 2 ======= SK " .. box.space.test.index.sk:count())
+log.info("ERROR ======= BEFORE 2 ======= TK " .. box.space.test.index.tk:count())
+
 ch:get()
 
+log.info("ERROR ======= AFTER 2 ======= PK " .. box.space.test.index.pk:count())
+log.info("ERROR ======= AFTER 2 ======= SK " .. box.space.test.index.sk:count())
+log.info("ERROR ======= AFTER 2 ======= TK " .. box.space.test.index.tk:count())
+
 inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
 inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
 

run it on dev1 host with command:

( c=0 ; while ./test-run.py --builddir /tnt -j 160 `for r in {1..200} ; do echo engine/ddl.test.lua ; done` --force --long --no-output-timeout=20 ; do date ; c=$(($c+1)) ; echo ALX $c ; done ; echo ALX $c ) | tee a.log &

and check results, with command:

grep -RI ERROR var/*_engine/ 2>/dev/null

find, like:
a) for passed tests:

 ======= BEFORE 1 ======= PK 0
 ======= AFTER 1 ======= PK 1000
 ======= BEFORE 1 ======= PK 974
 ======= BEFORE 1 ======= SK 974
 ======= AFTER 1 ======= PK 957
 ======= AFTER 1 ======= SK 957
 ======= BEFORE 2 ======= PK 931
 ======= BEFORE 2 ======= SK 931
 ======= BEFORE 2 ======= TK 931
 ======= AFTER 2 ======= PK 920
 ======= AFTER 2 ======= SK 920
 ======= AFTER 2 ======= TK 920

b) for failed tests:

 ======= BEFORE 1 ======= PK 0
 ======= AFTER 1 ======= PK 1000
 ======= BEFORE 1 ======= PK 958
 ======= BEFORE 1 ======= SK 956
 ======= AFTER 1 ======= PK 958
 ======= AFTER 1 ======= SK 956
 ======= BEFORE 2 ======= PK 930
 ======= BEFORE 2 ======= SK 928
 ======= BEFORE 2 ======= TK 930
 ======= AFTER 2 ======= PK 930
 ======= AFTER 2 ======= SK 928
 ======= AFTER 2 ======= TK 930

Optional (but very desirable):

  • coredump
  • backtrace
  • netstat
@avtikhon avtikhon self-assigned this Jul 16, 2019
@avtikhon avtikhon changed the title test: engine/ddl flaky fails om index conter checks test: engine/ddl flaky fails on index conter checks Aug 14, 2019
@avtikhon avtikhon changed the title test: engine/ddl flaky fails on index conter checks test: engine/ddl flaky fails on index counter checks Aug 14, 2019
@avtikhon
Copy link
Contributor Author

Reproduced on 2.4.0-16-gcdf502c66

@ligurio
Copy link
Member

ligurio commented Mar 18, 2020

reproduced on 23rd execution

======================================================================================
WORKR TEST                                            PARAMS          RESULT
---------------------------------------------------------------------------------
[001] engine/ddl.test.lua                             memtx           [ pass ]
[001] engine/ddl.test.lua                             vinyl           [ fail ]
[001] 
[001] Test failed! Result content mismatch:
[001] --- engine/ddl.result     Wed Mar 18 07:23:32 2020
[001] +++ engine/ddl.reject     Wed Mar 18 10:38:25 2020
[001] @@ -2463,7 +2463,7 @@
[001]  ...
[001]  box.space.test.index.pk:count() == box.space.test.index.sk:count()
[001]  ---
[001] -- true
[001] +- false
[001]  ...
[001]  box.space.test.index.pk:count() == box.space.test.index.tk:count()
[001]  ---
[001] @@ -2472,24 +2472,24 @@
[001]  inspector:cmd("restart server default")
[001]  box.space.test.index.pk:count() == box.space.test.index.sk:count()
[001]  ---
[001] +- false
[001] +...
[001] +box.space.test.index.pk:count() == box.space.test.index.tk:count()
[001] +---
[001]  - true
[001]  ...
[001] +box.snapshot()
[001] +---
[001] +- ok
[001] +...
[001] +box.space.test.index.pk:count() == box.space.test.index.sk:count()
[001] +---
[001] +- false
[001] +...
[001]  box.space.test.index.pk:count() == box.space.test.index.tk:count()
[001]  ---
[001]  - true
[001]  ...
[001] -box.snapshot()
[001] ----
[001] -- ok
[001] -...
[001] -box.space.test.index.pk:count() == box.space.test.index.sk:count()
[001] ----
[001] -- true
[001] -...
[001] -box.space.test.index.pk:count() == box.space.test.index.tk:count()
[001] ----
[001] -- true
[001] -...
[001]  box.space.test:drop()
[001]  ---
[001]  ...
[001] 
[001] Last 15 lines of Tarantool Log file [Instance "box"][/home/s.bronnikov/tarantool/build/test/var/001_
engine/box.log]:
[001] 2020-03-18 10:38:25.604 [8945] vinyl.dump.0/104/task I> writing `/home/s.bronnikov/tarantool/build/t
est/var/001_engine/box/599/0/00000000000000000275.index'
[001] 2020-03-18 10:38:25.636 [8945] main/107/vinyl.scheduler I> 599/0: dump completed
[001] 2020-03-18 10:38:25.636 [8945] main/107/vinyl.scheduler I> dumped 156864 bytes in 0.1 s, rate 1.4 MB/s
[001] 2020-03-18 10:38:25.640 [8945] main/120/console/unix/: I> vinyl checkpoint completed
[001] 2020-03-18 10:38:25.640 [8945] main/107/vinyl.scheduler I> 599/0: started compacting range (-inf..in
f), runs 4/4
[001] 2020-03-18 10:38:25.641 [8945] vinyl.compaction.0/102/task I> writing `/home/s.bronnikov/tarantool/b
uild/test/var/001_engine/box/599/0/00000000000000000277.run'
[001] 2020-03-18 10:38:25.692 [8945] vinyl.compaction.0/102/task I> writing `/home/s.bronnikov/tarantool/b
uild/test/var/001_engine/box/599/0/00000000000000000277.index'
[001] 2020-03-18 10:38:25.712 [8945] main I> removed /home/s.bronnikov/tarantool/build/test/var/001_engine
/box/00000000000000003032.snap
[001] 2020-03-18 10:38:25.712 [8945] main I> removed /home/s.bronnikov/tarantool/build/test/var/001_engine
/box/00000000000000002732.vylog
[001] 2020-03-18 10:38:25.722 [8945] main/104/gc I> removed /home/s.bronnikov/tarantool/build/test/var/001
_engine/box/599/0/00000000000000000247.index
[001] 2020-03-18 10:38:25.722 [8945] main/104/gc I> removed /home/s.bronnikov/tarantool/build/test/var/001_engine/box/599/0/00000000000000000247.run
[001] 2020-03-18 10:38:25.722 [8945] main/104/gc I> removed /home/s.bronnikov/tarantool/build/test/var/001_engine/box/599/0/00000000000000000249.index
[001] 2020-03-18 10:38:25.723 [8945] main/104/gc I> removed /home/s.bronnikov/tarantool/build/test/var/001_engine/box/599/0/00000000000000000249.run
[001] 2020-03-18 10:38:25.723 [8945] wal I> removed /home/s.bronnikov/tarantool/build/test/var/001_engine/box/00000000000000003032.xlog
[001] 2020-03-18 10:38:25.760 [8945] main/107/vinyl.scheduler I> 599/0: completed compacting range (-inf..inf)
[Main process] Got failed test; gently terminate all workers...
[001] Worker "001_engine" got failed test; stopping the server...
Tarantool 2.4.0-101-g1f7e7aa2b
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=ON
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-cast-function-type -Werror

Command line:

for i in `seq 1 1 100`; do echo "$i XXXXXXXXXXXXXXXXX"; ../../test/test-run.py --builddir=/home/s.bronnikov/tarantool/build --vardir=/home/s.bronnikov/tarantool/build/test/var engine/ddl || break; done 2>&1 | tee ../../engine_ddl.log

Env: mcs1, CentOS Linux release 8.0.1905 (Core)

@ligurio ligurio assigned ligurio and unassigned avtikhon Mar 25, 2020
ligurio referenced this issue in tarantool/tarantool Apr 10, 2020
Test was a flaky from the beginning 39d0e42
Time of building indexes varies from time to time and the problem was due to
abcense of synchronization in index building and checking numbers of these
indexes.

Fixes #4353
@ligurio
Copy link
Member

ligurio commented Apr 10, 2020

kyukhin referenced this issue in tarantool/tarantool Apr 15, 2020
Test was a flaky from the beginning 39d0e42
Time of building indexes varies from time to time and the problem was due to
abcense of synchronization in index building and checking numbers of these
indexes.

Fixes #4353

(cherry picked from commit 5f96ee5)
kyukhin referenced this issue in tarantool/tarantool Apr 15, 2020
Test was a flaky from the beginning 39d0e42
Time of building indexes varies from time to time and the problem was due to
abcense of synchronization in index building and checking numbers of these
indexes.

Fixes #4353

(cherry picked from commit 5f96ee5)
@avtikhon avtikhon reopened this May 15, 2020
@avtikhon
Copy link
Contributor Author

Seems that the issue still exists, fails occurred a few times in gitlab-ci, like this:
https://gitlab.com/tarantool/tarantool/-/jobs/553633335

 [091] --- engine/ddl.result	Thu May 14 16:12:09 2020
 [091] +++ engine/ddl.reject	Fri May 15 04:15:07 2020
 [091] @@ -2558,7 +2558,7 @@
 [091]  ...
 [091]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
 [091]  ---
 [091] -- true
 [091] +- false
 [091]  ...
 [091]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
 [091]  ---
 [091] @@ -2570,24 +2570,24 @@
 [091]  ...
 [091]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
 [091]  ---
 [091] +- false
 [091] +...
 [091] +inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
 [091] +---
 [091]  - true
 [091]  ...
 [091] +box.snapshot()
 [091] +---
 [091] +- ok
 [091] +...
 [091] +inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
 [091] +---
 [091] +- false
 [091] +...
 [091]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
 [091]  ---
 [091]  - true
 [091]  ...
 [091] -box.snapshot()
 [091] ----
 [091] -- ok
 [091] -...
 [091] -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
 [091] ----
 [091] -- true
 [091] -...
 [091] -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
 [091] ----
 [091] -- true
 [091] -...
 [091]  box.space.test:drop()
 [091]  ---
 [091]  ...

avtikhon referenced this issue in tarantool/tarantool May 15, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  ddl.test.lua                              ; gh-4353
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979

Part of #4953
avtikhon referenced this issue in tarantool/tarantool May 15, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  engine/ddl.test.lua                       ; gh-4353
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979

Part of #4953
avtikhon referenced this issue in tarantool/tarantool May 15, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  engine/ddl.test.lua                       ; gh-4353
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979

Part of #4953
avtikhon referenced this issue in tarantool/tarantool May 15, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  engine/ddl.test.lua                       ; gh-4353
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572

Part of #4953
avtikhon referenced this issue in tarantool/tarantool May 16, 2020
Added skip condition on OSX for test:
  replication/box_set_replication_stress.test.lua

Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app/fiber.test.lua                        ; gh-4987
  box/tuple.test.lua                        ; gh-4988
  box/transaction.test.lua                  ; gh-4990
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  engine/ddl.test.lua                       ; gh-4353
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572

Part of #4953

(cherry picked from commit 72a2bae)
avtikhon referenced this issue in tarantool/tarantool May 16, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app/fiber.test.lua                        ; gh-4987
  box/tuple.test.lua                        ; gh-4988
  box/transaction.test.lua                  ; gh-4990
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  engine/ddl.test.lua                       ; gh-4353
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572

Part of #4953
avtikhon referenced this issue in tarantool/tarantool May 17, 2020
Added skip condition on OSX for test:
  replication/box_set_replication_stress.test.lua

Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app/fiber.test.lua                        ; gh-4987
  box/tuple.test.lua                        ; gh-4988
  box/transaction.test.lua                  ; gh-4990
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953

(cherry picked from commit 72a2bae)
avtikhon referenced this issue in tarantool/tarantool May 17, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app/fiber.test.lua                        ; gh-4987
  box/tuple.test.lua                        ; gh-4988
  box/transaction.test.lua                  ; gh-4990
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953
avtikhon referenced this issue in tarantool/tarantool May 17, 2020
Added skip condition on OSX for test:
  replication/box_set_replication_stress.test.lua

Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app-tap/popen.test.lua                    ; gh-4995
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/rtree_rect.test.lua                   ; gh-4994
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953

(cherry picked from commit 72a2bae)
avtikhon referenced this issue in tarantool/tarantool May 17, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app-tap/popen.test.lua                    ; gh-4995
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/rtree_rect.test.lua                   ; gh-4994
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953
avtikhon referenced this issue in tarantool/tarantool May 18, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app-tap/popen.test.lua                    ; gh-4995
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953
avtikhon referenced this issue in tarantool/tarantool May 18, 2020
Added skip condition on OSX for test:
  replication/box_set_replication_stress.test.lua

Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  app-tap/popen.test.lua                    ; gh-4995
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953

(cherry picked from commit 72a2bae)
avtikhon referenced this issue in tarantool/tarantool May 19, 2020
Added skip condition on OSX for test:
  replication/box_set_replication_stress.test.lua

Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953

(cherry picked from commit 72a2bae)
avtikhon referenced this issue in tarantool/tarantool May 19, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953
kyukhin referenced this issue in tarantool/tarantool May 20, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953

(cherry picked from commit 430c0e8)
kyukhin referenced this issue in tarantool/tarantool May 20, 2020
Fragiled flaky tests from parallel runs to avoid
of flaky fails in regular testing:

  app/fiber.test.lua                        ; gh-4987
  app/fiber_channel.test.lua                ; gh-4961
  app/socket.test.lua                       ; gh-4978
  box/alter_limits.test.lua                 ; gh-4926
  box/misc.test.lua                         ; gh-4982
  box/role.test.lua                         ; gh-4998
  box/rtree_rect.test.lua                   ; gh-4994
  box/sequence.test.lua                     ; gh-4996
  box/tuple.test.lua                        ; gh-4988
  engine/ddl.test.lua                       ; gh-4353
  replication/box_set_replication_stress    ; gh-4992
  replication/recover_missing_xlog.test.lua ; gh-4989
  replication/replica_rejoin.test.lua       ; gh-4985
  replication/wal_rw_stress.test.lua        ; gh-4977
  replication-py/conflict.test.py           ; gh-4980
  vinyl/errinj_ddl.test.lua                 ; gh-4993
  vinyl/misc.test.lua                       ; gh-4979
  vinyl/snapshot.test.lua                   ; gh-4984
  vinyl/write_iterator.test.lua             ; gh-4572
  xlog/panic_on_broken_lsn.test.lua         ; gh-4991

Part of #4953

(cherry picked from commit 430c0e8)
kyukhin referenced this issue in tarantool/tarantool Dec 4, 2020
Found hanging test vinyl/ddl.test.lua on:

  [159]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
  [159]  ---
  [159]  - true
  [159]  ...
  [159] -box.snapshot()
  [159] ----
  [159] -- ok
  [159] -...

The real issue happend before it when test failed on:

  [091] --- engine/ddl.result   Thu May 14 16:12:09 2020
  [091] +++ engine/ddl.reject   Fri May 15 04:15:07 2020
  [091] @@ -2558,7 +2558,7 @@
  [091]  ...
  [091]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
  [091]  ---
  [091] -- true
  [091] +- false
  [091]  ...

Our tests have structure when different standalone subtests exists
in the test files. To be able to check all of them this hang must
be neutralized to give the next standalone subtest ability to pass.
To avoid of this hang decided to disable box.snapshot check if the
previous check of the current subtest failed.

Needed for #4353
kyukhin referenced this issue in tarantool/tarantool Dec 4, 2020
Found that the previous fix of the engine/ddl.test.lua test committed
with:

  5f96ee5 ('Fix flaky test engine/ddl')

did not fix the issue #4353 in real and it was reverted.

Needed for #4353
kyukhin referenced this issue in tarantool/tarantool Dec 4, 2020
Found hanging test vinyl/ddl.test.lua on:

  [159]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
  [159]  ---
  [159]  - true
  [159]  ...
  [159] -box.snapshot()
  [159] ----
  [159] -- ok
  [159] -...

The real issue happend before it when test failed on:

  [091] --- engine/ddl.result   Thu May 14 16:12:09 2020
  [091] +++ engine/ddl.reject   Fri May 15 04:15:07 2020
  [091] @@ -2558,7 +2558,7 @@
  [091]  ...
  [091]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
  [091]  ---
  [091] -- true
  [091] +- false
  [091]  ...

Our tests have structure when different standalone subtests exists
in the test files. To be able to check all of them this hang must
be neutralized to give the next standalone subtest ability to pass.
To avoid of this hang decided to disable box.snapshot check if the
previous check of the current subtest failed.

Needed for #4353
kyukhin referenced this issue in tarantool/tarantool Dec 4, 2020
Found that the previous fix of the engine/ddl.test.lua test committed
with:

  5f96ee5 ('Fix flaky test engine/ddl')

did not fix the issue #4353 in real and it was reverted.

Needed for #4353
kyukhin referenced this issue in tarantool/tarantool Dec 4, 2020
Found hanging test vinyl/ddl.test.lua on:

  [159]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
  [159]  ---
  [159]  - true
  [159]  ...
  [159] -box.snapshot()
  [159] ----
  [159] -- ok
  [159] -...

The real issue happend before it when test failed on:

  [091] --- engine/ddl.result   Thu May 14 16:12:09 2020
  [091] +++ engine/ddl.reject   Fri May 15 04:15:07 2020
  [091] @@ -2558,7 +2558,7 @@
  [091]  ...
  [091]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
  [091]  ---
  [091] -- true
  [091] +- false
  [091]  ...

Our tests have structure when different standalone subtests exists
in the test files. To be able to check all of them this hang must
be neutralized to give the next standalone subtest ability to pass.
To avoid of this hang decided to disable box.snapshot check if the
previous check of the current subtest failed.

Needed for #4353
avtikhon referenced this issue in tarantool/tarantool Dec 4, 2020
Found hanging test vinyl/ddl.test.lua on:

  [159]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
  [159]  ---
  [159]  - true
  [159]  ...
  [159] -box.snapshot()
  [159] ----
  [159] -- ok
  [159] -...

The real issue happend before it when test failed on:

  [091] --- engine/ddl.result   Thu May 14 16:12:09 2020
  [091] +++ engine/ddl.reject   Fri May 15 04:15:07 2020
  [091] @@ -2558,7 +2558,7 @@
  [091]  ...
  [091]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
  [091]  ---
  [091] -- true
  [091] +- false
  [091]  ...

Our tests have structure when different standalone subtests exists
in the test files. To be able to check all of them this hang must
be neutralized to give the next standalone subtest ability to pass.
To avoid of this hang decided to disable box.snapshot check if the
previous check of the current subtest failed. Also ch:get() could
cause the test to reach overall timeout due to no local timeout was
set in its options, to avoid of it set ch:get(10) seconds timeout.

Needed for #4353
@ligurio ligurio removed their assignment Dec 4, 2020
@Totktonada Totktonada transferred this issue from tarantool/tarantool Jan 15, 2021
@alyapunov alyapunov self-assigned this Feb 24, 2021
@EvgenyMekhanik
Copy link

While working on this task, I found out the following: the problem occurs due to the parallel work of the get_loud function in a separate fiber and the creation of a new index. If you wait for this function to finish before creating index the problem will disappear. Made a faster test to play in the branch https://github.com/tarantool/tarantool/tree/mechanik20051988/gh-4353-fix-flaky-test. We need further work on finding and correcting the error in the vinyl engine.

@avtikhon
Copy link
Contributor Author

avtikhon commented Apr 14, 2021

While working on this task, I found out the following: the problem occurs due to the parallel work of the get_loud function in a separate fiber and the creation of a new index. If you wait for this function to finish before creating index the problem will disappear. Made a faster test to play in the branch https://github.com/tarantool/tarantool/tree/mechanik20051988/gh-4353-fix-flaky-test. We need further work on finding and correcting the error in the vinyl engine.

Right, the test is a bit faster to use, thanks.

To make it fail it should be run just inline, but many times:

./test-run.py -j1 --builddir /tnt `for t in {1..1000} ; do echo engine/ddl_simple.test.lua ; done`

either like this to make able to set commands between runs to debug:

c=0 ; while ./test-run.py -j1 --builddir /tnt engine/ddl_simple.test.lua ; do date ; c=$(($c+1)) ; done ; echo ================= FAILED on: $c

the error output is:
artifacts.zip

[001] engine/ddl_simple.test.lua                      vinyl           [ fail ]
[001]
[001] Test failed! Result content mismatch:
[001] --- engine/ddl_simple.result      Wed Apr 14 05:41:02 2021
[001] +++ var/rejects/engine/ddl_simple.reject  Wed Apr 14 05:49:58 2021
[001] @@ -119,7 +119,7 @@
[001]   | ...
[001]  check_fiber()
[001]   | ---
[001] - | - true
[001] + | - false
[001]   | ...
[001]
[001]  inspector:cmd("restart server default")
[001] @@ -168,7 +168,7 @@
[001]
[001]  check_server_restart()
[001]   | ---
[001] - | - true
[001] + | - false
[001]   | ...
[001]
[001]  box.space.test:drop()
[001]
[001] Last 15 lines of Tarantool Log file [Instance "box"][/source/test/var/001_engine/box.log]:
[001] 2021-04-14 05:49:57.958 [7400] main/103/box I> ready to accept requests
[001] 2021-04-14 05:49:57.958 [7400] main/103/box C> leaving orphan mode
[001] 2021-04-14 05:49:57.958 [7400] main/105/gc I> wal/engine cleanup is resumed
[001] 2021-04-14 05:49:57.958 [7400] main/103/box I> set 'log_level' configuration option to 5
[001] 2021-04-14 05:49:57.958 [7400] main/106/checkpoint_daemon I> scheduled next checkpoint for Wed Apr 14 07:00:07 2021
[001] 2021-04-14 05:49:57.958 [7400] main/103/box I> set 'memtx_memory' configuration option to 107374182
[001] 2021-04-14 05:49:57.958 [7400] main/103/box I> set 'replication_sync_timeout' configuration option to 500
[001] 2021-04-14 05:49:57.958 [7400] main/103/box I> set 'vinyl_max_tuple_size' configuration option to 104857600
[001] 2021-04-14 05:49:57.958 [7400] main/103/box I> set 'listen' configuration option to "\/source\/test\/var\/001_engine\/box.socket-iproto"
[001] 2021-04-14 05:49:57.958 [7400] main/103/box I> set 'memtx_max_tuple_size' configuration option to 104857600
[001] 2021-04-14 05:49:57.958 [7400] main/103/box I> set 'log_format' configuration option to "plain"
[001] 2021-04-14 05:49:57.958 [7400] main/119/console/unix/:/source/test/var/001_engine/box.socket-admin I> started
[001] 2021-04-14 05:49:57.958 [7400] main C> entering the event loop
[001] 2021-04-14 05:49:58.102 [7400] main/123/console/unix/: [string "function check_equal(check, pk, k)     if pk ..."]:1 E> Error on server restart check: failed '1rd step secondary keys' check on equal pk 982 and k = 981
[001] 2021-04-14 05:49:58.102 [7400] wal/101/main xlog.c:1050 W> fallocate is not supported, proceeding without it

@avtikhon
Copy link
Contributor Author

avtikhon commented Apr 20, 2021

Also tried and reduced the test to:

test_run = require('test_run')
inspector = test_run.new()
engine = inspector:get_cfg('engine')

--
-- Check that all modifications done to the space during index build
-- are reflected in the new index.
--
math.randomseed(os.time())

s = box.schema.space.create('test', {engine = engine})
_ = s:create_index('pk')
inspector:cmd("setopt delimiter ';'")

last_val = 10;
box.begin()
for i = 1, last_val do box.space.test:replace{i, i, i} end
box.commit();

function gen_load()
    local s = box.space.test
    for i = 1, 10 do
        local key = math.random(last_val)
        local val1 = math.random(last_val)
        local val2 = last_val + 1
        last_val = val2
        pcall(s.upsert, s, {key, val1, val2}, {{'=', 2, val1}, {'=', 3, val2}})
    end
end;

function check_fiber()
    _ = fiber.create(function() gen_load() ch:put(true) end)
    _ = box.space.test:create_index('sk', {unique = false, parts = {2, 'unsigned'}})

    assert(ch:get(10) == true)

    local index = box.space.test.index
    if index.pk:count() ~= index.sk:count() then
        require('log').error("Error on fiber check: failed '1st step secondary keys' check on equal" ..
                             " pk = " .. index.pk:count() .. " and k = " .. index.sk:count())
        return false
    end

    return true
end;

inspector:cmd("setopt delimiter ''");

fiber = require('fiber')
ch = fiber.channel(1)
check_fiber()

box.space.test:drop()

To create results file:
./test-run.py --builddir /shared_tmpfs engine/ddl_simple.test.lua --update-result
To reproduce the issue:

c=0 ; while ./test-run.py -j1 --vardir /shared_tmpfs/vardir --collect-statistics --builddir /shared_tmpfs engine/ddl_simple.test.lua ; do date ; sync ; echo 3 > /proc/sys/vm/drop_caches ; c=$(($c+1)) ; echo PASSED ======================== : $c ; done ; echo ================= FAILED: $c

box.log:

2021-04-20 11:29:03.529 [32439] main C> entering the event loop
2021-04-20 11:29:03.561 [32439] main/122/lua I> DEBUG: key, val1, val2 {4, 9, 11}
2021-04-20 11:29:03.562 [32439] main/122/lua I> DEBUG: key, val1, val2 {10, 10, 12}
2021-04-20 11:29:03.562 [32439] main/122/lua I> DEBUG: key, val1, val2 {5, 12, 13}
2021-04-20 11:29:03.565 [32439] main/122/lua I> DEBUG: key, val1, val2 {13, 6, 14}
2021-04-20 11:29:03.565 [32439] main/108/vinyl.scheduler I> 512/1: dump started
2021-04-20 11:29:03.565 [32439] main/122/lua I> DEBUG: key, val1, val2 {7, 14, 15}
2021-04-20 11:29:03.565 [32439] main/122/lua I> DEBUG: key, val1, val2 {6, 13, 16}
2021-04-20 11:29:03.565 [32439] main/122/lua I> DEBUG: key, val1, val2 {13, 7, 17}
2021-04-20 11:29:03.565 [32439] vinyl.dump.0/102/task I> writing `/shared_tmpfs/vardir/001_engine/box/512/1/00000000000000000004.run'
2021-04-20 11:29:03.565 [32439] main/122/lua I> DEBUG: key, val1, val2 {14, 17, 18}
2021-04-20 11:29:03.565 [32439] main/122/lua I> DEBUG: key, val1, val2 {9, 1, 19}
2021-04-20 11:29:03.565 [32439] main/122/lua I> DEBUG: key, val1, val2 {2, 18, 20}
2021-04-20 11:29:03.567 [32439] vinyl.dump.0/102/task I> writing `/shared_tmpfs/vardir/001_engine/box/512/1/00000000000000000004.index'
2021-04-20 11:29:03.568 [32439] main/108/vinyl.scheduler I> 512/1: dump completed
2021-04-20 11:29:03.568 [32439] main/108/vinyl.scheduler I> 512/0: dump started
2021-04-20 11:29:03.568 [32439] vinyl.dump.0/103/task I> writing `/shared_tmpfs/vardir/001_engine/box/512/0/00000000000000000006.run'
2021-04-20 11:29:03.569 [32439] vinyl.dump.0/103/task I> writing `/shared_tmpfs/vardir/001_engine/box/512/0/00000000000000000006.index'
2021-04-20 11:29:03.571 [32439] main/108/vinyl.scheduler I> 512/0: dump completed
2021-04-20 11:29:03.571 [32439] main/108/vinyl.scheduler I> dumped 0 bytes in 0.0 s, rate 0.0 MB/s
2021-04-20 11:29:03.572 [32439] main/121/console/unix/: [string "function check_fiber()     _ = fiber.create(f..."]:1 E> Error on fiber check: failed '1st step secondary keys' check on equal pk = 12 and k = 11

Fails occurs here:

function check_fiber()
    _ = fiber.create(function() gen_load() ch:put(true) end)
    _ = box.space.test:create_index('sk', {unique = false, parts = {2, 'unsigned'}})

    assert(ch:get(10) == true)

to avoid of parallel run of gen_load() with create_index('sk', ...), code can be changed:

function check_fiber()
    _ = fiber.create(function() gen_load() ch:put(true) end)
    assert(ch:get(10) == true)

    _ = box.space.test:create_index('sk', {unique = false, parts = {2, 'unsigned'}})

In this way no parallel work and test never fails.

@Totktonada
Copy link
Member

@avtikhon Please, recheck after tarantool/tarantool#6102 and make appropriate fragile test list changes.

@avtikhon
Copy link
Contributor Author

avtikhon commented Jun 22, 2021

@avtikhon Please, recheck after tarantool/tarantool#6102 and make appropriate fragile test list changes.

Checked with and w/o patchset:

  1. w/o patchset issue got on 3 runs: 10th, 1st, 10th loops:
git revert c5e185474479797c87efec3b6ac568223f0eb5c9
git revert eecd2b90b69038e2c3fa23a4a1cdeb708e94b127
cmake . && gmake -j && cd test && \
c=0 ; while ./test-run.py -j1 engine/ddl_simple.test.lua ; do c=$(($c+1)) ; echo PASSED ======================== : $c ; done ; echo ================= FAILED: $c
  1. on Tarantool 2.9.0-105-gb35e4708e issue not reproduced, run up to 1270 loops.

@Totktonada
Copy link
Member

@Totktonada Totktonada reopened this Jun 22, 2021
avtikhon added a commit to tarantool/tarantool that referenced this issue Jun 22, 2021
Checked and found that:

  #4353: engine/ddl.test.lua fixed in #6102.
  #4926, #115: box/alter_limits.test.lua fixed
    in tarantool/tarantool-qa#126.
  #5547: box/net.box_schema_change_gh-2666.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5583: box/net.box_methods_gh-3107.test.lua
    tarantool/tarantool-qa#126.

Closes tarantool/tarantool-qa#13
Closes #4353
Closes tarantool/tarantool-qa#115
Closes #4926
Closes tarantool/tarantool-qa#50
Closes #5547
Closes tarantool/tarantool-qa#22
Closes #5583
avtikhon added a commit to tarantool/tarantool that referenced this issue Jun 23, 2021
Checked and found that:

  #4353: engine/ddl.test.lua fixed in #6102.
  #4926, #115: box/alter_limits.test.lua fixed
    in tarantool/tarantool-qa#126.
  #5547: box/net.box_schema_change_gh-2666.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5583: box/net.box_methods_gh-3107.test.lua
    tarantool/tarantool-qa#126.

Closes tarantool/tarantool-qa#13
Closes #4353
Closes tarantool/tarantool-qa#115
Closes #4926
Closes tarantool/tarantool-qa#50
Closes #5547
Closes tarantool/tarantool-qa#22
Closes #5583
avtikhon added a commit to tarantool/tarantool that referenced this issue Jun 23, 2021
Checked and found that:

  #4353 -> tarantool/tarantool-qa#13:
    engine/ddl.test.lua fixed in #6102.
  #4926, #115:
    box/alter_limits.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5547 -> tarantool/tarantool-qa#50:
    box/net.box_schema_change_gh-2666.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5583 -> tarantool/tarantool-qa#22:
    box/net.box_methods_gh-3107.test.lua fixed in
    tarantool/tarantool-qa#126.

Closes tarantool/tarantool-qa#13
Closes tarantool/tarantool-qa#115
Closes #4926
Closes tarantool/tarantool-qa#50
Closes tarantool/tarantool-qa#22
avtikhon added a commit to tarantool/tarantool that referenced this issue Jun 23, 2021
Checked and found that:

  #4353 -> tarantool/tarantool-qa#13:
    engine/ddl.test.lua fixed in #6102.
  #4926, #115:
    box/alter_limits.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5547 -> tarantool/tarantool-qa#50:
    box/net.box_schema_change_gh-2666.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5583 -> tarantool/tarantool-qa#22:
    box/net.box_methods_gh-3107.test.lua fixed in
    tarantool/tarantool-qa#126.

Closes tarantool/tarantool-qa#13
Closes tarantool/tarantool-qa#115
Closes #4926
Closes tarantool/tarantool-qa#50
Closes tarantool/tarantool-qa#22
kyukhin pushed a commit to tarantool/tarantool that referenced this issue Jun 23, 2021
Checked and found that:

  #4353 -> tarantool/tarantool-qa#13:
    engine/ddl.test.lua fixed in #6102.
  #4926, #115:
    box/alter_limits.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5547 -> tarantool/tarantool-qa#50:
    box/net.box_schema_change_gh-2666.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5583 -> tarantool/tarantool-qa#22:
    box/net.box_methods_gh-3107.test.lua fixed in
    tarantool/tarantool-qa#126.

Closes tarantool/tarantool-qa#13
Closes tarantool/tarantool-qa#115
Closes #4926
Closes tarantool/tarantool-qa#50
Closes tarantool/tarantool-qa#22
kyukhin pushed a commit to tarantool/tarantool that referenced this issue Jun 23, 2021
Checked and found that:

  #4353 -> tarantool/tarantool-qa#13:
    engine/ddl.test.lua fixed in #6102.
  #4926, #115:
    box/alter_limits.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5547 -> tarantool/tarantool-qa#50:
    box/net.box_schema_change_gh-2666.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5583 -> tarantool/tarantool-qa#22:
    box/net.box_methods_gh-3107.test.lua fixed in
    tarantool/tarantool-qa#126.

Closes tarantool/tarantool-qa#13
Closes tarantool/tarantool-qa#115
Closes #4926
Closes tarantool/tarantool-qa#50
Closes tarantool/tarantool-qa#22

(cherry picked from commit 4053a35)
kyukhin pushed a commit to tarantool/tarantool that referenced this issue Jun 23, 2021
Checked and found that:

  #4353 -> tarantool/tarantool-qa#13:
    engine/ddl.test.lua fixed in #6102.
  #4926, #115:
    box/alter_limits.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5547 -> tarantool/tarantool-qa#50:
    box/net.box_schema_change_gh-2666.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5583 -> tarantool/tarantool-qa#22:
    box/net.box_methods_gh-3107.test.lua fixed in
    tarantool/tarantool-qa#126.

Closes tarantool/tarantool-qa#13
Closes tarantool/tarantool-qa#115
Closes #4926
Closes tarantool/tarantool-qa#50
Closes tarantool/tarantool-qa#22

(cherry picked from commit 4053a35)
kyukhin pushed a commit to tarantool/tarantool that referenced this issue Jun 23, 2021
Checked and found that:

  #4353 -> tarantool/tarantool-qa#13:
    engine/ddl.test.lua fixed in #6102.
  #4926, #115:
    box/alter_limits.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5547 -> tarantool/tarantool-qa#50:
    box/net.box_schema_change_gh-2666.test.lua fixed in
    tarantool/tarantool-qa#126.
  #5583 -> tarantool/tarantool-qa#22:
    box/net.box_methods_gh-3107.test.lua fixed in
    tarantool/tarantool-qa#126.

Closes tarantool/tarantool-qa#13
Closes tarantool/tarantool-qa#115
Closes #4926
Closes tarantool/tarantool-qa#50
Closes tarantool/tarantool-qa#22

(cherry picked from commit 4053a35)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants