Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in net.box on_disconnect trigger hangs server #9797

Closed
drewdzzz opened this issue Feb 28, 2024 · 1 comment · Fixed by #9838
Closed

Error in net.box on_disconnect trigger hangs server #9797

drewdzzz opened this issue Feb 28, 2024 · 1 comment · Fixed by #9838
Assignees
Labels
2.11 Target is 2.11 and all newer release/master branches bug Something isn't working netbox

Comments

@drewdzzz
Copy link
Contributor

drewdzzz commented Feb 28, 2024

Consider simple test:

local server = require('luatest.server')
local t = require('luatest')
local net = require('net.box')

local g = t.group()

g.test_net_box_hang = function()
    g.server = server:new()
    g.server:start()

    local conn = net.connect(g.server.net_box_uri, {wait_connected = true})
    local errmsg = 'net.box on_disconnect error'
    conn:on_disconnect(function() error(errmsg) end)
    
    -- Call connection close to receive an error
    t.assert_error_msg_content_equals(errmsg, conn.close, conn)

    g.server:drop()
end

This test fails with such error: Timed out to wait for "process is terminated" condition for server (alias: server, workdir: server-CKeEWskegH-t, pid: 39893) within 60s - it is reported from line g.server:drop(). If I comment out the line that closes the connection, it will work fine.

I suspect that luatest causes the problem because the same scenario using bare Tarantool with closing server by SIGTERM (ctrl + C) does not hang.

My specs: MacOS 11, Darwin Kernel Version 20.6.0
Tarantool is built from sources at this commit, test-run and luatest commits can be found there.
The test was run using test-run utility from Tarantool.

@drewdzzz drewdzzz added the bug Something isn't working label Feb 28, 2024
@CuriousGeorgiy
Copy link
Member

CuriousGeorgiy commented Mar 12, 2024

I get the following error messages in the logs, when reproducing this issue on two bare servers (one with box, one with net.box):

-- server
box.cfg{listen = 3301}

require('console').start()
-- client

c = require('net.box').connect(3301)
c:on_disconnect(function() error(777) end)
pcall(function() c:close() end)
require('console').start()
2024-03-12 14:43:59.173 [87145] main/115/iproto.shutdown I> tx_binary: stopped
2024-03-12 14:44:02.173 [87145] main/103/on_shutdown fiber.c:804 E> TimedOut: timed out
2024-03-12 14:44:02.173 [87145] main/103/on_shutdown main.cc:156 E> on_shutdown triggers failed
2024-03-12 14:44:02.173 [87145] main/103/on_shutdown on_shutdown.c:273 E> TimedOut: timed out
tarantool> 2024-03-12 14:44:02.176 [87145] main I> fiber `sched' has been cancelled
2024-03-12 14:44:02.176 [87145] main I> fiber `sched': exiting
2024-03-12 14:44:02.176 [87145] main F> fatal error, exiting the event loop

If I add the following option to the server: box.ctl.set_on_shutdown_timeout(600) — the server hangs. Apparently, this is a Tarantool bug.

@CuriousGeorgiy CuriousGeorgiy transferred this issue from tarantool/luatest Mar 12, 2024
@CuriousGeorgiy CuriousGeorgiy self-assigned this Mar 18, 2024
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Mar 20, 2024
According to the documentation [1]:
> If the trigger function causes an error, the error is logged but
otherwise is ignored.

However, currently, the `on_disconnect` trigger behaves the same way as the
`on_connect` trigger, i.e., the connection is terminated and its state
changes to 'error'. Let's fix this inconsistency and log errors from the
`on_disconnect` trigger, but otherwise ignore them.

Closes tarantool#9677
Closes tarantool#9797

NO_DOC=<bugfix>

1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/#lua-function.conn.on_disconnect
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Mar 20, 2024
According to the documentation [1]:
> If the trigger function causes an error, the error is logged but
otherwise is ignored.

However, currently, the `on_disconnect` trigger behaves the same way as the
`on_connect` trigger, i.e., the connection is terminated and its state
changes to 'error'. Let's fix this inconsistency and log errors from the
`on_disconnect` trigger, but otherwise ignore them.

Closes tarantool#9677
Closes tarantool#9797

NO_DOC=<bugfix>

1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/#lua-function.conn.on_disconnect
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Mar 23, 2024
According to the documentation [1]:
> If the trigger function causes an error, the error is logged but
otherwise is ignored.

However, currently, the `on_disconnect` trigger behaves the same way as the
`on_connect` trigger, i.e., the connection is terminated and its state
changes to 'error'. Let's fix this inconsistency and log errors from the
`on_disconnect` trigger, but otherwise ignore them.

Closes tarantool#9677
Closes tarantool#9797

NO_DOC=<bugfix>

1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/#lua-function.conn.on_disconnect
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Mar 30, 2024
According to the documentation [1]:
> If the trigger function causes an error, the error is logged but
otherwise is ignored.

However, currently, the `on_disconnect` trigger behaves the same way as the
`on_connect` trigger, i.e., the connection is terminated and its state
changes to 'error'. Let's fix this inconsistency and log errors from the
`on_disconnect` trigger, but otherwise ignore them.

Closes tarantool#9677
Closes tarantool#9797

NO_DOC=<bugfix>

1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/#lua-function.conn.on_disconnect
@sergepetrenko sergepetrenko added the 2.11 Target is 2.11 and all newer release/master branches label Apr 2, 2024
sergepetrenko pushed a commit to sergepetrenko/tarantool that referenced this issue Apr 2, 2024
According to the documentation [1]:
> If the trigger function causes an error, the error is logged but
otherwise is ignored.

However, currently, the `on_disconnect` trigger behaves the same way as the
`on_connect` trigger, i.e., the connection is terminated and its state
changes to 'error'. Let's fix this inconsistency and log errors from the
`on_disconnect` trigger, but otherwise ignore them.

Closes tarantool#9677
Closes tarantool#9797

NO_DOC=<bugfix>

1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/#lua-function.conn.on_disconnect

(cherry picked from commit 1d6d6a3)
sergepetrenko pushed a commit to sergepetrenko/tarantool that referenced this issue Apr 2, 2024
According to the documentation [1]:
> If the trigger function causes an error, the error is logged but
otherwise is ignored.

However, currently, the `on_disconnect` trigger behaves the same way as the
`on_connect` trigger, i.e., the connection is terminated and its state
changes to 'error'. Let's fix this inconsistency and log errors from the
`on_disconnect` trigger, but otherwise ignore them.

Closes tarantool#9677
Closes tarantool#9797

NO_DOC=<bugfix>

1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/#lua-function.conn.on_disconnect

(cherry picked from commit 1d6d6a3)
sergepetrenko pushed a commit that referenced this issue Apr 3, 2024
According to the documentation [1]:
> If the trigger function causes an error, the error is logged but
otherwise is ignored.

However, currently, the `on_disconnect` trigger behaves the same way as the
`on_connect` trigger, i.e., the connection is terminated and its state
changes to 'error'. Let's fix this inconsistency and log errors from the
`on_disconnect` trigger, but otherwise ignore them.

Closes #9677
Closes #9797

NO_DOC=<bugfix>

1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/#lua-function.conn.on_disconnect

(cherry picked from commit 1d6d6a3)
sergepetrenko pushed a commit that referenced this issue Apr 3, 2024
According to the documentation [1]:
> If the trigger function causes an error, the error is logged but
otherwise is ignored.

However, currently, the `on_disconnect` trigger behaves the same way as the
`on_connect` trigger, i.e., the connection is terminated and its state
changes to 'error'. Let's fix this inconsistency and log errors from the
`on_disconnect` trigger, but otherwise ignore them.

Closes #9677
Closes #9797

NO_DOC=<bugfix>

1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/#lua-function.conn.on_disconnect

(cherry picked from commit 1d6d6a3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.11 Target is 2.11 and all newer release/master branches bug Something isn't working netbox
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants