Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful shutdown exit before shutdown timeout expiration #9411

Closed
nshy opened this issue Nov 27, 2023 · 0 comments · Fixed by #9415
Closed

Graceful shutdown exit before shutdown timeout expiration #9411

nshy opened this issue Nov 27, 2023 · 0 comments · Fixed by #9415
Assignees
Labels
2.11 Target is 2.11 and all newer release/master branches bug Something isn't working

Comments

@nshy
Copy link
Contributor

nshy commented Nov 27, 2023

Tarantool version is 3.0.0-beta1-11-g3ee68d8ba.

Repro script:

local fiber = require('fiber')

box.ctl.on_shutdown(function()
    fiber.sleep(0.2)
    print('shutdown callback finished')
end, nil)
box.ctl.set_on_shutdown_timeout(1)

fiber.create(function()
    os.exit(0)
end)

fiber.sleep(0.1)

The script is expected to print shutdown callback finished but it does not.

@nshy nshy added the bug Something isn't working label Nov 27, 2023
@nshy nshy self-assigned this Nov 27, 2023
nshy added a commit to nshy/tarantool that referenced this issue Nov 27, 2023
Graceful shutdown is done in a special fiber which is started for
example on SIGTERM. So it can run concurrently with fiber executing
Tarantool init script. On init fiber exit we break event loop to pass
control back to the Tarantool initialization code. But we fail to run
event loop a bit more to finish graceful shutdown.

The test is a bit contrived. A more real world case is when Tarantool is
termintated during lingering box.cfg().

Close tarantool#9411

NO_DOC=bugfix
nshy added a commit to nshy/tarantool that referenced this issue Nov 27, 2023
Graceful shutdown is done in a special fiber which is started for
example on SIGTERM. So it can run concurrently with fiber executing
Tarantool init script. On init fiber exit we break event loop to pass
control back to the Tarantool initialization code. But we fail to run
event loop a bit more to finish graceful shutdown.

The test is a bit contrived. A more real world case is when Tarantool is
termintated during lingering box.cfg().

Close tarantool#9411

NO_DOC=bugfix
locker pushed a commit that referenced this issue Nov 28, 2023
Graceful shutdown is done in a special fiber which is started for
example on SIGTERM. So it can run concurrently with fiber executing
Tarantool init script. On init fiber exit we break event loop to pass
control back to the Tarantool initialization code. But we fail to run
event loop a bit more to finish graceful shutdown.

The test is a bit contrived. A more real world case is when Tarantool is
termintated during lingering box.cfg().

Close #9411

NO_DOC=bugfix
@locker locker added the 2.11 Target is 2.11 and all newer release/master branches label Nov 28, 2023
locker pushed a commit that referenced this issue Nov 28, 2023
Graceful shutdown is done in a special fiber which is started for
example on SIGTERM. So it can run concurrently with fiber executing
Tarantool init script. On init fiber exit we break event loop to pass
control back to the Tarantool initialization code. But we fail to run
event loop a bit more to finish graceful shutdown.

The test is a bit contrived. A more real world case is when Tarantool is
termintated during lingering box.cfg().

Close #9411

NO_DOC=bugfix

(cherry picked from commit 786eb2a)
Totktonada added a commit to Totktonada/tarantool that referenced this issue Dec 13, 2023
I got four fails on the given tests in a row on debug-asan job in CI for
tarantool-ee.

It seems, tarantool-ee is more sensitive to small timeouts, when the
address sanitizer slows down the execution. Or I'm just lucky.

Anyway, the given tests don't really need small timeouts: increasing it
doesn't break any test logic, doesn't increase duration of the test in a
successful case and doesn't increase it in case of a failure.

The tests are more stable after the change: I verified it locally by
running each of the tests in parallel many times on tarantool built with
enabled address sanitizer.

See the following commits for details about the given test cases and the
problems behind.

* commit 1fcfb8c ("app: start init script event loop explicitly")
* commit 786eb2a ("main: don't break graceful shutdown on init
  script exit")

Follows up tarantool#9266
Follows up tarantool#9411

NO_DOC=test adjustment
NO_CHANGELOG=see NO_DOC
Totktonada added a commit that referenced this issue Dec 14, 2023
I got four fails on the given tests in a row on debug-asan job in CI for
tarantool-ee.

It seems, tarantool-ee is more sensitive to small timeouts, when the
address sanitizer slows down the execution. Or I'm just lucky.

Anyway, the given tests don't really need small timeouts: increasing it
doesn't break any test logic, doesn't increase duration of the test in a
successful case and doesn't increase it in case of a failure.

The tests are more stable after the change: I verified it locally by
running each of the tests in parallel many times on tarantool built with
enabled address sanitizer.

See the following commits for details about the given test cases and the
problems behind.

* commit 1fcfb8c ("app: start init script event loop explicitly")
* commit 786eb2a ("main: don't break graceful shutdown on init
  script exit")

Follows up #9266
Follows up #9411

NO_DOC=test adjustment
NO_CHANGELOG=see NO_DOC
sergepetrenko pushed a commit to sergepetrenko/tarantool that referenced this issue Feb 19, 2024
I got four fails on the given tests in a row on debug-asan job in CI for
tarantool-ee.

It seems, tarantool-ee is more sensitive to small timeouts, when the
address sanitizer slows down the execution. Or I'm just lucky.

Anyway, the given tests don't really need small timeouts: increasing it
doesn't break any test logic, doesn't increase duration of the test in a
successful case and doesn't increase it in case of a failure.

The tests are more stable after the change: I verified it locally by
running each of the tests in parallel many times on tarantool built with
enabled address sanitizer.

See the following commits for details about the given test cases and the
problems behind.

* commit 1fcfb8c ("app: start init script event loop explicitly")
* commit 786eb2a ("main: don't break graceful shutdown on init
  script exit")

Follows up tarantool#9266
Follows up tarantool#9411

NO_DOC=test adjustment
NO_CHANGELOG=see NO_DOC

(cherry picked from commit dfca3c6)
sergepetrenko pushed a commit that referenced this issue Feb 19, 2024
I got four fails on the given tests in a row on debug-asan job in CI for
tarantool-ee.

It seems, tarantool-ee is more sensitive to small timeouts, when the
address sanitizer slows down the execution. Or I'm just lucky.

Anyway, the given tests don't really need small timeouts: increasing it
doesn't break any test logic, doesn't increase duration of the test in a
successful case and doesn't increase it in case of a failure.

The tests are more stable after the change: I verified it locally by
running each of the tests in parallel many times on tarantool built with
enabled address sanitizer.

See the following commits for details about the given test cases and the
problems behind.

* commit 1fcfb8c ("app: start init script event loop explicitly")
* commit 786eb2a ("main: don't break graceful shutdown on init
  script exit")

Follows up #9266
Follows up #9411

NO_DOC=test adjustment
NO_CHANGELOG=see NO_DOC

(cherry picked from commit dfca3c6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.11 Target is 2.11 and all newer release/master branches bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants