Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active connection is closed, when Lua GC is invoked #9629

Closed
Serpentian opened this issue Jan 29, 2024 · 1 comment · Fixed by #9847
Closed

Active connection is closed, when Lua GC is invoked #9629

Serpentian opened this issue Jan 29, 2024 · 1 comment · Fixed by #9847
Assignees
Labels
2.11 Target is 2.11 and all newer release/master branches bug Something isn't working netbox

Comments

@Serpentian
Copy link
Contributor

Bug description

The connection is closed, when it's active but it doesn't have any strong references. It's caused by Lua GC. Related: tarantool/vshard#469

  • OS: Linux
  • OS Version: NixOS
  • Architecture: amd64
Tarantool 3.1.0-entrypoint-51-g47940311e
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/var/empty/local -DENABLE_BACKTRACE=FALSE
Compiler: GNU-12.3.0
C_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/home/serpentian/Programming/tnt/tarantool=. -std=c11 -Wall -Wextra -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror -g -ggdb -O0 
CXX_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/home/serpentian/Programming/tnt/tarantool=. -std=c++11 -Wall -Wextra -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror -g -ggdb **-O0**

Steps to reproduce

local t = require('luatest')
local server = require('luatest.server')
local nb = require('net.box')
local g = t.group()

g.before_all = function(lg)
    lg.master = server:new({alias = 'master'})
    lg.master:start()
    lg.master:exec(function()
        rawset(_G, 'echo', function(arg)
            return arg
        end)
    end)
end

g.after_all = function(lg)
    lg.master:drop()
end

g.test_conn_not_deleted_without_ref = function(lg)
    local a = string.rep('a', 1024 * 1024 * 500)
    local opts = {is_async = true}
    local f = nb.connect(lg.master.net_box_uri):call('_G.echo', {a}, opts)
    collectgarbage()
    local res, err = f:wait_result(10)
    t.assert_equals(err, nil)
    t.assert_equals(#res[1], #a)
end

Actual behavior

[001] not ok 1	replication-luatest.lua_gc.test_conn_not_deleted_without_ref
[001] #   ...g/tnt/tarantool/test/replication-luatest/lua_gc_test.lua:26: expected: nil, actual: Connection closed
[001] #   stack traceback:
[001] #   	...g/tnt/tarantool/test/replication-luatest/lua_gc_test.lua:26: in function 'replication-luatest.lua_gc.test_conn_not_deleted_without_ref'
[001] #   	...
[001] #   	[C]: in function 'xpcall'
[001] # Ran 1 tests in 1.849 seconds, 0 succeeded, 1 failed

If you'll delete collectgarbage the test will probably pass. If you'll create ref to conn, test also passes e.g:

diff --git a/test/replication-luatest/lua_gc_test.lua b/test/replication-luatest/lua_gc_test.lua
index 77052385c..0fda9f62e 100644
--- a/test/replication-luatest/lua_gc_test.lua
+++ b/test/replication-luatest/lua_gc_test.lua
@@ -20,7 +20,8 @@ end
 g.test_conn_not_deleted_without_ref = function(lg)
     local a = string.rep('a', 1024 * 1024 * 500)
     local opts = {is_async = true}
-    local f = nb.connect(lg.master.net_box_uri):call('_G.echo', {a}, opts)
+    local conn = nb.connect(lg.master.net_box_uri)
+    local f = conn:call('_G.echo', {a}, opts)
     collectgarbage()
     local res, err = f:wait_result(10)
     t.assert_equals(err, nil)

Expected behavior

Connection should be active while request is done.

@Serpentian Serpentian added the bug Something isn't working label Jan 29, 2024
Serpentian added a commit to Serpentian/vshard that referenced this issue Jan 29, 2024
Router pings replicas and if they cannot respond in
failover_ping_timeout, it detaches connection from this replica.
The test checks, that connection is not closed, when long response is
done, even if failover_ping_timeout is small.

However, when the connection is detached, it can be garbage collected,
which leads to 'Connection closed' error. Let's stop Lua GC for the
connection by creating reference to it.

Related to tarantool/tarantool#9629

NO_DOC=test
Gerold103 pushed a commit to tarantool/vshard that referenced this issue Jan 30, 2024
Router pings replicas and if they cannot respond in
failover_ping_timeout, it detaches connection from this replica.
The test checks, that connection is not closed, when long response is
done, even if failover_ping_timeout is small.

However, when the connection is detached, it can be garbage collected,
which leads to 'Connection closed' error. Let's stop Lua GC for the
connection by creating reference to it.

Related to tarantool/tarantool#9629

NO_DOC=test
@sergos
Copy link
Contributor

sergos commented Feb 2, 2024

Apparently future object should keep a reference to the connection.

CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Mar 21, 2024
In order to prevent the garbage collection of the discarded connection,
asynchronous requests must reference the connection object. We must
reference the connection object rather than the transport object, because
our garbage collection hook is attached to the former.

Closes tarantool#9629

NO_DOC=<bugfix>
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Mar 21, 2024
In order to prevent the garbage collection of the discarded connection,
asynchronous requests must reference the connection object. We must
reference the connection object rather than the transport object, because
our garbage collection hook is attached to the former.

Closes tarantool#9629

NO_DOC=<bugfix>
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Mar 23, 2024
In order to prevent the garbage collection of the discarded connection,
asynchronous requests must reference the connection object. We must
reference the connection object rather than the transport object, because
our garbage collection hook is attached to the former.

Closes tarantool#9629

NO_DOC=<bugfix>
CuriousGeorgiy added a commit to CuriousGeorgiy/tarantool that referenced this issue Mar 27, 2024
In order to prevent the garbage collection of the discarded connection,
asynchronous requests must reference the connection object. We must
reference the connection object rather than the transport object, because
our garbage collection hook is attached to the former.

Closes tarantool#9629

NO_DOC=<bugfix>
@sergepetrenko sergepetrenko added the 2.11 Target is 2.11 and all newer release/master branches label Mar 27, 2024
sergepetrenko pushed a commit that referenced this issue Mar 27, 2024
In order to prevent the garbage collection of the discarded connection,
asynchronous requests must reference the connection object. We must
reference the connection object rather than the transport object, because
our garbage collection hook is attached to the former.

Closes #9629

NO_DOC=<bugfix>
sergepetrenko pushed a commit to sergepetrenko/tarantool that referenced this issue Mar 27, 2024
In order to prevent the garbage collection of the discarded connection,
asynchronous requests must reference the connection object. We must
reference the connection object rather than the transport object, because
our garbage collection hook is attached to the former.

Closes tarantool#9629

NO_DOC=<bugfix>

(cherry picked from commit fb5bf51)
sergepetrenko pushed a commit to sergepetrenko/tarantool that referenced this issue Mar 27, 2024
In order to prevent the garbage collection of the discarded connection,
asynchronous requests must reference the connection object. We must
reference the connection object rather than the transport object, because
our garbage collection hook is attached to the former.

Closes tarantool#9629

NO_DOC=<bugfix>

(cherry picked from commit fb5bf51)
sergepetrenko pushed a commit that referenced this issue Mar 28, 2024
In order to prevent the garbage collection of the discarded connection,
asynchronous requests must reference the connection object. We must
reference the connection object rather than the transport object, because
our garbage collection hook is attached to the former.

Closes #9629

NO_DOC=<bugfix>

(cherry picked from commit fb5bf51)
sergepetrenko pushed a commit that referenced this issue Mar 28, 2024
In order to prevent the garbage collection of the discarded connection,
asynchronous requests must reference the connection object. We must
reference the connection object rather than the transport object, because
our garbage collection hook is attached to the former.

Closes #9629

NO_DOC=<bugfix>

(cherry picked from commit fb5bf51)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.11 Target is 2.11 and all newer release/master branches bug Something isn't working netbox
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants