Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test-run worker hangs, when an instance does not enter to the event loop #276

Closed
ligurio opened this issue Mar 19, 2021 · 2 comments · Fixed by #302
Closed

test-run worker hangs, when an instance does not enter to the event loop #276

ligurio opened this issue Mar 19, 2021 · 2 comments · Fixed by #302
Assignees
Labels
bug Something isn't working
Milestone

Comments

@ligurio
Copy link
Member

ligurio commented Mar 19, 2021

How to reproduce

test reproducer (test path xlog/hang.test.lua):

test_run = require('test_run').new()
test_run:cmd('create server replica with rpl_master=default, script="xlog/replica.lua"')
test_run:cmd('start server replica')
test_run:cmd('cleanup server replica')
test_run:cmd('delete server replica')

test hang due to absence read access (see log var/001_xlog/replica.log):

...
2021-03-19 18:11:44.865 [1987357] main/103/replica I> connected to 1 replicas
2021-03-19 18:11:44.865 [1987357] main/103/replica I> bootstrapping replica from 8a89e9f8-7364-4d6a-96cd-d2c8fe5b93bb at unix/:/home/sergeyb/sources/MRG/tarantool/build/test/var/001_xlog/xlog.socket-iproto
2021-03-19 18:11:44.865 [1987357] main/112/applier/unix/:/home/sergeyb/sources/MRG/tarantool/build/test/var/001_xlog/xlog.socket-iproto I> can't read row
2021-03-19 18:11:44.865 [1987357] main/112/applier/unix/:/home/sergeyb/sources/MRG/tarantool/build/test/var/001_xlog/xlog.socket-iproto session.cc:332 E> ER_ACCESS_DENIED: Read access to universe '' is denied for user 'guest'
2021-03-19 18:11:44.865 [1987357] main/112/applier/unix/:/home/sergeyb/sources/MRG/tarantool/build/test/var/001_xlog/xlog.socket-iproto I> will retry every 1.00 second

test is passed if add line box.schema.user.grant('guest', 'replication') on top of test

Versions

test-run 5941741

tarantool --version:

Tarantool 2.8.0-134-g81c663335
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=ON
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror
@ligurio ligurio added the bug Something isn't working label Mar 19, 2021
@ligurio ligurio changed the title Test hang when no enough rights for user in Tarantool Test hang when not enough rights for user in Tarantool Mar 19, 2021
@Totktonada
Copy link
Member

This test should hang: we unable to bootstrap the replica, because it is unable to join to the master (because of lack of the grant).

However the problem is that a test timeout does not work and the test-run worker 'hangs'. It seems, it is because of time.sleep(<...>) usage, which does not work well with gevent cooperative greenlets. At least, the following patch seems to fix the problem:

diff --git a/lib/tarantool_server.py b/lib/tarantool_server.py
index f611dba..ee83489 100644
--- a/lib/tarantool_server.py
+++ b/lib/tarantool_server.py
@@ -449,7 +449,7 @@ class TarantoolLog(object):
         while True:
             if os.path.exists(self.path):
                 break
-            time.sleep(0.001)
+            gevent.sleep(0.001)
 
         with open(self.path, 'r') as f:
             f.seek(self.log_begin, os.SEEK_SET)
@@ -460,7 +460,7 @@ class TarantoolLog(object):
                         raise TarantoolStartError(name)
                 log_str = f.readline()
                 if not log_str:
-                    time.sleep(0.001)
+                    gevent.sleep(0.001)
                     f.seek(cur_pos, os.SEEK_SET)
                     continue
                 if re.findall(msg, log_str):

I guess we can reproduce the similar behaviour around other time.sleep(<...>) usages.

@Totktonada
Copy link
Member

However it would be good to have some timeout for tarantool instance startup and report a meaningful error in the case.

@Totktonada Totktonada changed the title Test hang when not enough rights for user in Tarantool test-run worker hangs, when an instance does not enter to the event loop Mar 22, 2021
Totktonada added a commit to tarantool/tarantool that referenced this issue Mar 22, 2021
This update fixes a sporadic problem with hanging test-run workers. The
reason is an incorrect garbage collector handler. See [1] for details.

This is not the last test-run problem, which leads to a hang worker: at
least there is known problem [2].

[1]: tarantool/test-run#275
[2]: tarantool/test-run#276

Part of tarantool/tarantool-qa#96
Totktonada added a commit to tarantool/tarantool that referenced this issue Mar 22, 2021
This update fixes a sporadic problem with hanging test-run workers. The
reason is an incorrect garbage collector handler. See [1] for details.

This is not the last test-run problem, which leads to a hang worker: at
least there is known problem [2].

[1]: tarantool/test-run#275
[2]: tarantool/test-run#276

Part of tarantool/tarantool-qa#96

(cherry picked from commit 680990a)
Totktonada added a commit to tarantool/tarantool that referenced this issue Mar 22, 2021
This update fixes a sporadic problem with hanging test-run workers. The
reason is an incorrect garbage collector handler. See [1] for details.

This is not the last test-run problem, which leads to a hang worker: at
least there is known problem [2].

[1]: tarantool/test-run#275
[2]: tarantool/test-run#276

Part of tarantool/tarantool-qa#96

(cherry picked from commit 680990a)
Totktonada added a commit to tarantool/tarantool that referenced this issue Mar 22, 2021
This update fixes a sporadic problem with hanging test-run workers. The
reason is an incorrect garbage collector handler. See [1] for details.

This is not the last test-run problem, which leads to a hang worker: at
least there is known problem [2].

[1]: tarantool/test-run#275
[2]: tarantool/test-run#276

Part of tarantool/tarantool-qa#96

(cherry picked from commit 680990a)
@kyukhin kyukhin added the teamQ label Apr 26, 2021
@kyukhin kyukhin added this to the Q2-21 milestone Apr 28, 2021
@kyukhin kyukhin added the 5sp label Apr 30, 2021
VitaliyaIoffe added a commit that referenced this issue May 17, 2021
Checking that tarantool server is started by finding pattern
'entering the event loop|will retry binding|hot standby mode' in the
xlog. If server is hanging it could be killed after test timeout. Was
added start-server-timeout. Now the pattern is searching until this
timeout. If there is no pattern functions wait_until_started returns
False (else True) and TarantoolServer.start() returns same. If there is
 hanging instance preprocessor kills this test.
Default value of start-server-timeout is 90 sec.

Fixes: #276
VitaliyaIoffe added a commit that referenced this issue May 17, 2021
Checking that tarantool server is started by finding pattern
'entering the event loop|will retry binding|hot standby mode' in the
xlog. If server is hanging it could be killed after test timeout. Was
added start-server-timeout. Now the pattern is searching until this
timeout. If there is no pattern functions wait_until_started returns
False (else True) and TarantoolServer.start() returns same.
Default value of start-server-timeout is 90 sec.

Fixes: #276
VitaliyaIoffe added a commit that referenced this issue May 17, 2021
Checking that tarantool server is started by finding pattern
'entering the event loop|will retry binding|hot standby mode' in the
xlog. If server is hanging it could be killed after test timeout. Was
added start-server-timeout. Now the pattern is searching until this
timeout. If there is no pattern functions wait_until_started returns
False (else True) and TarantoolServer.start() returns same.
Default value of start-server-timeout is 90 sec.

Fixes: #276
VitaliyaIoffe added a commit that referenced this issue May 17, 2021
Checking that tarantool server is started by finding pattern
'entering the event loop|will retry binding|hot standby mode' in the
xlog. If server is hanging it could be killed after test timeout. Was
added start-server-timeout. Now the pattern is searching until this
timeout. If there is no pattern functions wait_until_started returns
False (else True) and TarantoolServer.start() returns same.
Default value of start-server-timeout is 90 sec.

Fixes: #276
VitaliyaIoffe added a commit that referenced this issue May 17, 2021
Found ASAN error:

[001] +    ok 206 - =================================================================
[001] +==6889==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x604000000031 at pc 0x0000005a72e7 bp 0x7ffe47c30c80 sp 0x7ffe47c30c78
[001] +WRITE of size 1 at 0x604000000031 thread T0
[001] +    #0 0x5a72e6 in mp_store_u8 /tarantool/src/lib/msgpuck/msgpuck.h:258:1
[001] +    #1 0x5a72e6 in mp_encode_uint /tarantool/src/lib/msgpuck/msgpuck.h:1768
[001] +    #2 0x4fa657 in test_mp_print /tarantool/src/lib/msgpuck/test/msgpuck.c:957:16
[001] +    #3 0x509024 in main /tarantool/src/lib/msgpuck/test/msgpuck.c:1331:2
[001] +    #4 0x7f3658fd909a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
[001] +    #5 0x41f339 in _start (/tnt/test/unit/msgpack.test+0x41f339)
[001] +
[001] +0x604000000031 is located 0 bytes to the right of 33-byte region [0x604000000010,0x604000000031)
[001] +allocated by thread T0 here:
[001] +    #0 0x4cace3 in malloc (/tnt/test/unit/msgpack.test+0x4cace3)
[001] +    #1 0x4fa5db in test_mp_print /tarantool/src/lib/msgpuck/test/msgpuck.c:945:18
[001] +    #2 0x509024 in main /tarantool/src/lib/msgpuck/test/msgpuck.c:1331:2
[001] +    #3 0x7f3658fd909a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
[001] +
[001] +SUMMARY: AddressSanitizer: heap-buffer-overflow /tarantool/src/lib/msgpuck/msgpuck.h:258:1 in mp_store_u8
[001] +Shadow bytes around the buggy address:
[001] +  0x0c087fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[001] +  0x0c087fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[001] +  0x0c087fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[001] +  0x0c087fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[001] +  0x0c087fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[001] +=>0x0c087fff8000: fa fa 00 00 00 00[01]fa fa fa fa fa fa fa fa fa
[001] +  0x0c087fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[001] +  0x0c087fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[001] +  0x0c087fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[001] +  0x0c087fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[001] +  0x0c087fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[001] +Shadow byte legend (one shadow byte represents 8 application bytes):
[001] +  Addressable:           00
[001] +  Partially addressable: 01 02 03 04 05 06 07
[001] +  Heap left redzone:       fa
[001] +  Freed heap region:       fd
[001] +  Stack left redzone:      f1
[001] +  Stack mid redzone:       f2
[001] +  Stack right redzone:     f3
[001] +  Stack after return:      f5
[001] +  Stack use after scope:   f8
[001] +  Global redzone:          f9
[001] +  Global init order:       f6
[001] +  Poisoned by user:        f7
[001] +  Container overflow:      fc
[001] +  Array cookie:            ac
[001] +  Intra object redzone:    bb
[001] +  ASan internal:           fe
[001] +  Left alloca redzone:     ca

Investigated the buffer size that was allocated was 33 bytes, but
it needed 34. The fix was to increase this buffer for another
mp_encode_array(1).

Part of tarantool/tarantool#4360

Reviewed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
test: obuf test refactoring

Added slab_arena_destroy for graceful resources release,
removed global seed value, removed unused value from enum.
Merge pull request #136 from tbeu/patch-1

Update README.rst
test: move unit/ to test/

This virtually reverts commit 436218defd4c284134f59975d4642405bdf2d918
('move unit tests to unit'), that was made in the scope of #106.

Despite the fact that testing of the connector uses `unittest`
framework, it is functional (and integration) testing by its nature:
most of the test cases verify that public API of the connector properly
works with tarantool.

In seems meaningful to locate such kind of test cases in the `test/`
directory, not `unit/`, disregarding of used framework.

Follows up #106.
Add timeout for starting tarantool server

Checking that tarantool server is started by finding pattern
'entering the event loop|will retry binding|hot standby mode' in the
xlog. If server is hanging it could be killed after test timeout. Was
added start-server-timeout. Now the pattern is searching until this
timeout. If there is no pattern functions wait_until_started returns
False (else True) and TarantoolServer.start() returns same.
Default value of start-server-timeout is 90 sec.

Fixes: #276
RELEASE-NOTES: synced

curl 7.76.0 release
Use rawset() when exporting functions to _G
test: fix directory detection in lua-Harness suite

A test <314-regex.t> uses `arg[0]:find'314'` to determine the name of
the directory where rx_* files are located. This leads to the test
failure, when lua-Harness suite runs in a directory containing "314" in
its name, because the found path doesn't contain the required files.

This patch fixes directory name detection.

Follows up tarantool/tarantool#5844

Reviewed-by: Igor Munkin <imun@tarantool.org>
Reviewed-by: Sergey Ostanevich <sergos@tarantool.org>
Signed-off-by: Igor Munkin <imun@tarantool.org>
Add the chdir option for make

Flag --chdir for make command (with help) has been added.
It's add possibility to specify a source directory of the rock when make.
Merge pull request #2435 from facebook/dev

v1.4.8 hotfix
VitaliyaIoffe added a commit that referenced this issue May 17, 2021
Function wait_until_started in TarantoolServer contains seek_wait,
which waits pattern in logfile. If there is no pattern, server is
hanging. Was added start-server-time (by default equals to 90 secs).
The pattern is sought until the time runs out and wait_until_started
returns True if the pattern was found (else False) and
TarantoolServer.start() returns same. Was added new logging that the
instance wasn't started.

Fixes: #276
VitaliyaIoffe added a commit that referenced this issue May 17, 2021
Function wait_until_started in TarantoolServer contains seek_wait,
which waits pattern in logfile. If there is no pattern, server is
hanging. Was added start-server-time (by default equals to 90 secs).
The pattern is sought until the time runs out and wait_until_started
returns True if the pattern was found (else False) and
TarantoolServer.start() returns same. Was added new logging that the
instance wasn't started.

Fixes: #276
VitaliyaIoffe added a commit that referenced this issue May 17, 2021
Function wait_until_started in TarantoolServer contains seek_wait,
which waits pattern in logfile. If there is no pattern, server is
hanging. Was added start-server-time (by default equals to 90 secs).
The pattern is sought until the time runs out and wait_until_started
returns True if the pattern was found (else False) and
TarantoolServer.start() returns same. Was added new logging that the
instance wasn't started.

Fixes: #276
VitaliyaIoffe added a commit that referenced this issue May 21, 2021
Was changed `time.sleep` to `gevent.sleep` to allow current greenlet to
sleep and others to run. If it uses time.sleep() greenlet's context is
not changed from the main process to test greenlet. As a result, there
was no data received by the main process during hanging tarantool and
the suite was fallen down by common timeout (no output timeout). Using
greenlet timeout allows to fall down by test timeout.

Fixes: #276
VitaliyaIoffe added a commit that referenced this issue May 21, 2021
Function wait_until_started in TarantoolServer contains seek_wait,
which waits pattern in logfile. If there is no pattern, server is
hanging. Was added start-server-time (by default equals to 90 secs).
The pattern is sought until the time runs out and wait_until_started
returns True if the pattern was found (else False). Was added new
 logging that the instance wasn't started.

Fixes: #276
VitaliyaIoffe added a commit that referenced this issue May 21, 2021
Function wait_until_started in TarantoolServer contains seek_wait,
which waits pattern in logfile. If there is no pattern, server is
hanging. Was added start-server-time (by default equals to 90 secs).
The pattern is sought until the time runs out and wait_until_started
returns True if the pattern was found (else False). Was added new
 logging that the instance wasn't started.

Fixes: #276
VitaliyaIoffe added a commit that referenced this issue May 31, 2021
Was changed `time.sleep` to `gevent.sleep` to allow current greenlet to
sleep and others to run. If it uses time.sleep() greenlet's context is
not changed from the main process to test greenlet. As a result, there
was no data received by the main process during hanging tarantool and
the suite was fallen down by common timeout (no output timeout). Using
greenlet timeout allows to fall down by test timeout.

Part of #276
VitaliyaIoffe added a commit that referenced this issue May 31, 2021
Function wait_until_started in TarantoolServer contains seek_wait,
which waits pattern in logfile. If there is no pattern, server is
hanging. Was added start-server-time (by default equals to 90 secs).
The pattern is sought until the time runs out and wait_until_started
returns True if the pattern was found (else False). Was added new
 logging that the instance wasn't started.

Fixes: #276
VitaliyaIoffe added a commit that referenced this issue May 31, 2021
Was added another way with checking patterns in log. Where
test sequentially finds expected patterns and checks that there are no
unexpected lines. For this approach was changed TarantoolLog.seek_wait()
function. Now it is able to find patterns not from beginning
(for sequence of patterns). If a pattern was found its last symbol
position is saved as position for start point for next searching with
start not from beginning.

This approach helps for comparing hanging result. Pytest script
test_hanging_xlog.py executes hang.test.lua and test-run log is
comparing by expected patterns in log. Also, there is check about all
subprocesses were killed. Otherwise, raise an exception with existed
processes info.

Follows up #276
ylobankov pushed a commit that referenced this issue Feb 10, 2022
Function wait_until_started in TarantoolServer contains seek_wait,
which waits pattern in logfile. If there is no pattern, server is
hanging. Was added start-server-time (by default equals to 90 secs).
The pattern is sought until the time runs out and wait_until_started
returns True if the pattern was found (else False). Was added new
 logging that the instance wasn't started.

Fixes: #276
ylobankov pushed a commit that referenced this issue Feb 10, 2022
`time.sleep` was changed to `gevent.sleep` to allow current greenlet to
sleep and others to run. When `time.sleep` is used, greenlet's context
is not changed from the main process to the test greenlet.

As a result of this, there is no data received by the main process while
hanging the tarantool server process and the test is fallen down by the
common timeout (NO_OUTPUT_TIMEOUT). Moreover, the process is not killed
by test-run.

Using `gevent.sleep` makes the test fall down by the test timeout and
kill the farantool server process.

Part of #276
ylobankov pushed a commit that referenced this issue Feb 10, 2022
When a tarantool server starts, it waits for a special pattern in the
log file to proceed. If there is no pattern present, the server hangs.
After the test timeout runs out, the test will fail.

So this patch adds the `--start-server-timeout` option (by default
equals to 90 secs). Now when the server hangs and the time runs out,
a comprehensible exception is raised with the message that the server
failed to start within the timeout.

Fixes: #276
ylobankov added a commit that referenced this issue Feb 10, 2022
It was found that processes of non-started tarantool servers are not
killed by test-run and leave to hang. This situation can be reproduced
by creating the main server, then creating a replica server, but the
replica server is unable to join the master, for example, due to lack
of user permissions. In this case the test will fail by the server start
timeout and kill the main server process only. This patch fixes that.

Follows up #276
ylobankov pushed a commit that referenced this issue Feb 11, 2022
`time.sleep` was changed to `gevent.sleep` to allow current greenlet to
sleep and others to run. When `time.sleep` is used, greenlet's context
is not changed from the main process to test's greenlet.

As a result of this, there is no data received by the main process while
hanging the tarantool server process and the test is fallen down by the
common timeout (NO_OUTPUT_TIMEOUT). Even worse, the tarantool server
process is not killed by test-run.

Using `gevent.sleep` makes the test fail by the test timeout and kill
the farantool server process.

Part of #276
ylobankov pushed a commit that referenced this issue Feb 11, 2022
When a tarantool server starts, it waits for a special pattern in the
log file to proceed. If there is no pattern present, the server hangs.
After the test timeout (TEST_TIMEOUT) runs out, the test fails.

This patch adds the `--server-start-timeout` option to test-run (by
default it equals to 90 seconds). Now when the server hangs and the
time (SERVER_START_TIMEOUT) runs out, a comprehensible exception is
raised with the message that the server failed to start within the
timeout.

Fixes: #276
ylobankov added a commit that referenced this issue Feb 11, 2022
It was found that processes of non-started tarantool servers are not
killed by test-run and leave to hang. This situation can be reproduced
by creating the main server, then creating a replica server, but the
replica server is unable to join the master, for example, due to lack
of user permissions. In this case, the test fails by the server start
timeout and test-run kills the main server process only. This patch
fixes the issue.

Follows up #276
ylobankov pushed a commit that referenced this issue Feb 18, 2022
`time.sleep` was changed to `gevent.sleep` to allow current greenlet to
sleep and others to run. When `time.sleep` is used, greenlet's context
is not changed from the main process to test's greenlet.

As a result of this, there is no data received by the main process while
hanging the tarantool server process and the test is fallen down by the
common timeout (NO_OUTPUT_TIMEOUT). Even worse, the tarantool server
process is not killed by test-run.

Using `gevent.sleep` makes the test fail by the test timeout and kill
the farantool server process.

Part of #276
ylobankov pushed a commit that referenced this issue Feb 18, 2022
When a tarantool server starts, it waits for a special pattern in the
log file to proceed. If there is no pattern present, the server hangs.
After the test timeout (TEST_TIMEOUT) runs out, the test fails.

This patch adds the `--server-start-timeout` option to test-run (by
default it equals to 90 seconds). Now when the server hangs and the
time (SERVER_START_TIMEOUT) runs out, a comprehensible exception is
raised with the message that the server failed to start within the
timeout.

Fixes: #276
ylobankov added a commit that referenced this issue Feb 18, 2022
It was found that processes of non-started tarantool servers are not
killed by test-run and leave to hang. This situation can be reproduced
by creating the main server, then creating a replica server, but the
replica server is unable to join the master, for example, due to lack
of user permissions. In this case, the test fails by the server start
timeout and test-run kills the main server process only. This patch
fixes the issue.

Follows up #276
ylobankov added a commit that referenced this issue Feb 18, 2022
It was found that processes of non-started tarantool servers are not
killed by test-run and leave to hang. This situation can be reproduced
by creating the main server, then creating a replica server, but the
replica server is unable to join the master, for example, due to lack
of user permissions. In this case, the test fails by the server start
timeout and test-run kills the main server process only. This patch
fixes the issue.

Fixes #256
Follows up #276
ylobankov pushed a commit that referenced this issue Mar 10, 2022
When a tarantool server starts, it waits for a special pattern in the
log file to proceed. If there is no pattern present, the server hangs.
After the test timeout (TEST_TIMEOUT) runs out, the test fails.

This patch adds the `--server-start-timeout` option to test-run (by
default it equals to 90 seconds). Now when the server hangs and the
time (SERVER_START_TIMEOUT) runs out, a comprehensible exception is
raised with the message that the server failed to start within the
timeout.

Fixes: #276
ylobankov added a commit that referenced this issue Mar 10, 2022
It was found that processes of non-started tarantool servers are not
killed by test-run and leave to hang. This situation can be reproduced
by creating the main server, then creating a replica server, but the
replica server is unable to join the master, for example, due to lack
of user permissions. In this case, the test fails by the server start
timeout and test-run kills the main server process only. This patch
fixes the issue.

Fixes #256
Follows up #276
ylobankov added a commit that referenced this issue Mar 10, 2022
This patch adds a simple unit test checking that if a tarantool server
failed to start within a certain amount of seconds, test-tun raises a
comprehensible exception and kills the server process.

Follows up #256
Follows up #276
ylobankov pushed a commit that referenced this issue Mar 10, 2022
When a tarantool server starts, it waits for a special pattern in the
log file to proceed. If there is no pattern present, the server hangs.
After the test timeout (TEST_TIMEOUT) runs out, the test fails.

This patch adds the `--server-start-timeout` option to test-run (by
default it equals to 90 seconds). Now when the server hangs and the
time (SERVER_START_TIMEOUT) runs out, a comprehensible exception is
raised with the message that the server failed to start within the
timeout.

Fixes: #276
ylobankov added a commit that referenced this issue Mar 10, 2022
It was found that processes of non-started tarantool servers are not
killed by test-run and leave to hang. This situation can be reproduced
by creating the main server, then creating a replica server, but the
replica server is unable to join the master, for example, due to lack
of user permissions. In this case, the test fails by the server start
timeout and test-run kills the main server process only. This patch
fixes the issue.

Fixes #256
Follows up #276
ylobankov added a commit that referenced this issue Mar 10, 2022
This patch adds a simple unit test checking that if a tarantool server
failed to start within a certain amount of seconds, test-tun raises a
comprehensible exception and kills the server process.

Follows up #256
Follows up #276
ylobankov pushed a commit that referenced this issue Mar 10, 2022
When a tarantool server starts, it waits for a special pattern in the
log file to proceed. If there is no pattern present, the server hangs.
After the test timeout (TEST_TIMEOUT) runs out, the test fails.

This patch adds the `--server-start-timeout` option to test-run (by
default it equals to 90 seconds). Now when the server hangs and the
time (SERVER_START_TIMEOUT) runs out, a comprehensible exception is
raised with the message that the server failed to start within the
timeout.

Fixes: #276
ylobankov added a commit that referenced this issue Mar 10, 2022
It was found that processes of non-started tarantool servers are not
killed by test-run and leave to hang. This situation can be reproduced
by creating the main server, then creating a replica server, but the
replica server is unable to join the master, for example, due to lack
of user permissions. In this case, the test fails by the server start
timeout and test-run kills the main server process only. This patch
fixes the issue.

Fixes #256
Follows up #276
ylobankov added a commit that referenced this issue Mar 10, 2022
This patch adds a simple unit test checking that if a tarantool server
failed to start within a certain amount of seconds, test-tun raises a
comprehensible exception and kills the server process.

Follows up #256
Follows up #276
ylobankov added a commit that referenced this issue Mar 11, 2022
This patch adds a simple unit test checking that if a tarantool server
failed to start within a certain amount of seconds, test-tun raises a
comprehensible exception and kills the server process.

Follows up #256
Follows up #276
ylobankov pushed a commit that referenced this issue Mar 15, 2022
`time.sleep` was changed to `gevent.sleep` to allow current greenlet to
sleep and others to run. When `time.sleep` is used, greenlet's context
is not changed from the main process to test's greenlet.

As a result of this, there is no data received by the main process while
hanging the tarantool server process and the test is fallen down by the
common timeout (NO_OUTPUT_TIMEOUT). Even worse, the tarantool server
process is not killed by test-run.

Using `gevent.sleep` makes the test fail by the test timeout and kill
the farantool server process.

Part of #276
ylobankov pushed a commit that referenced this issue Mar 15, 2022
When a tarantool server starts, it waits for a special pattern in the
log file to proceed. If there is no pattern present, the server hangs.
After the test timeout (TEST_TIMEOUT) runs out, the test fails.

This patch adds the `--server-start-timeout` option to test-run (by
default it equals to 90 seconds). Now when the server hangs and the
time (SERVER_START_TIMEOUT) runs out, a comprehensible exception is
raised with the message that the server failed to start within the
timeout.

Fixes: #276
ylobankov added a commit that referenced this issue Mar 15, 2022
It was found that processes of non-started tarantool servers are not
killed by test-run and leave to hang. This situation can be reproduced
by creating the main server, then creating a replica server, but the
replica server is unable to join the master, for example, due to lack
of user permissions. In this case, the test fails by the server start
timeout and test-run kills the main server process only. This patch
fixes the issue.

Fixes #256
Follows up #276
ylobankov added a commit that referenced this issue Mar 15, 2022
This patch adds a simple unit test checking that if a tarantool server
failed to start within a certain amount of seconds, test-tun raises a
comprehensible exception and kills the server process.

Follows up #256
Follows up #276
ylobankov added a commit that referenced this issue Mar 17, 2022
It was found that hanging processes of not started tarantool servers
are not killed by test-run and leave to hang. This situation can be
reproduced by creating the main server, then creating a replica server,
but the replica server is unable to join the master, for example, due
to lack of user permissions. In this case, the test fails by the server
start timeout and test-run kills the main server process only.
This patch fixes the issue.

Fixes #256
Follows up #276
ylobankov added a commit that referenced this issue Mar 17, 2022
It was found that hanging processes of not started tarantool servers
are not killed by test-run and leave to hang. This situation can be
reproduced by creating the main server, then creating a replica server,
but the replica server is unable to join the master, for example, due
to lack of user permissions. In this case, the test fails by the server
start timeout and test-run kills the main server process only.
This patch fixes the issue.

Fixes #256
Follows up #276
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants