Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion in box_wait_limbo_acked #9235

Closed
Astronomax opened this issue Oct 9, 2023 · 0 comments · Fixed by #9379
Closed

Assertion in box_wait_limbo_acked #9235

Astronomax opened this issue Oct 9, 2023 · 0 comments · Fixed by #9379
Assignees
Labels
2.10 Target is 2.10 and all newer release/master branches bug Something isn't working crash replication

Comments

@Astronomax
Copy link
Contributor

Astronomax commented Oct 9, 2023

Bug description
Now it is possible for the following assertion in box_wait_limbo_acked function to fail:

assert(last_entry->lsn > 0);

Steps to reproduce
Run the following lua test:

local t = require('luatest')
local cluster = require('luatest.replica_set')
local proxy = require('luatest.replica_proxy')
local server = require('luatest.server')

local g = t.group('assertion-in-box-wait-limbo-acked')
--
-- gh-9235:
-- Assertion in box_wait_limbo_acked.
--
local wait_timeout = 10

local function wait_pair_sync(server1, server2)
    -- Without retrying it fails sometimes when vclocks are empty and both
    -- instances are in 'connect' state instead of 'follow'.
    t.helpers.retrying({timeout = wait_timeout}, function()
        server1:wait_for_vclock_of(server2)
        server2:wait_for_vclock_of(server1)
        server1:assert_follows_upstream(server2:get_instance_id())
        server2:assert_follows_upstream(server1:get_instance_id())
    end)
end

local function server_wait_wal_is_blocked(server)
    server:exec(function(wait_timeout)
        t.helpers.retrying({timeout = wait_timeout}, function()
            t.assert(box.error.injection.get('ERRINJ_WAL_DELAY'))
        end)
    end, {wait_timeout})
end

local function server_wait_synchro_queue_len_is_equal(server, expected)
    server:exec(function(expected, wait_timeout)
        t.helpers.retrying({timeout = wait_timeout}, function(expected)
            t.assert_equals(box.info.synchro.queue.len, expected)
        end, expected)
    end, {expected, wait_timeout})
end

local function server_becomes_the_leader_again(server)
    local prev_cfg = server:exec(function()
        local prev_cfg = box.cfg
        box.cfg{
            election_mode='candidate',
            replication_synchro_quorum=1
        }
        return prev_cfg
    end)
    server:wait_until_election_leader_found()
    server:exec(function(prev_cfg)
        t.assert_equals(box.info.election.leader, box.info.id)
        box.cfg{
            election_mode=prev_cfg.election_mode,
            replication_synchro_quorum=prev_cfg.replication_synchro_quorum,
        }
    end, {prev_cfg})
end

g.before_each(function(cg)
    cg.cluster = cluster:new({})
    cg.master = cg.cluster:build_and_add_server({
        alias = 'master',
        box_cfg = {
            replication = {
                server.build_listen_uri('master', cg.cluster.id),
                server.build_listen_uri('replica', cg.cluster.id),
            },
            election_mode = 'candidate',
            replication_synchro_quorum = 2,
            replication_synchro_timeout = 100000,
        }
    })
    cg.replica = cg.cluster:build_and_add_server({
        alias = 'replica',
        box_cfg = {
            replication = {
                server.build_listen_uri('replica', cg.cluster.id),
                server.build_listen_uri('master', cg.cluster.id),
            },
            election_mode = 'off',
            replication_synchro_quorum = 2,
            replication_synchro_timeout = 100000,
        }
    })
    cg.cluster:start()
    cg.master:wait_until_election_leader_found()
    cg.replica:wait_until_election_leader_found()
    cg.master:exec(function()
        box.schema.space.create('test', {is_sync = true})
        box.space.test:create_index('pk')
    end)
    wait_pair_sync(cg.replica, cg.master)
end)

g.after_each(function(cg)
    cg.cluster:drop()
end)

g.test_assert_last_entry_lsn_is_positive = function(cg)
    local f = cg.replica:exec(function()
       box.error.injection.set('ERRINJ_WAL_DELAY_COUNTDOWN', 0)
       local f = require('fiber').create(function() box.ctl.promote() end)
       box.cfg{wal_queue_max_size=1}
       f:set_joinable(true)
       return f:id()
    end)
    server_wait_wal_is_blocked(cg.replica)
    cg.master:exec(function()
        require('fiber').create(function()
            box.begin()
            require('fiber').create(function() box.space.test:insert{1} end)
            require('fiber').create(function() box.space.test:insert{2} end)
            box.commit()
        end)
    end)
    server_wait_synchro_queue_len_is_equal(cg.replica, 1)
    cg.replica:exec(function()
        box.error.injection.set('ERRINJ_WAL_DELAY_COUNTDOWN', 0)
        box.error.injection.set('ERRINJ_WAL_DELAY', false)
    end)
    server_wait_wal_is_blocked(cg.replica)
    cg.replica:exec(function(f)
        box.error.injection.set('ERRINJ_WAL_DELAY', false)
        require('fiber').find(f):join()
    end, {f})
    cg.master:exec(function()
        box.cfg{replication_synchro_quorum=1}
    end)
    server_wait_synchro_queue_len_is_equal(cg.replica, 0)
end

Actual behavior

Assert fails.

Expected behavior

Assert doesn't fails.

@Astronomax Astronomax added crash bug Something isn't working replication labels Oct 9, 2023
@Astronomax Astronomax self-assigned this Nov 3, 2023
Astronomax added a commit to Astronomax/tarantool that referenced this issue Nov 17, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Nov 17, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Nov 17, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Nov 17, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Nov 17, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Nov 20, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Nov 23, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Nov 23, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Nov 23, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Nov 23, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Nov 29, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Dec 1, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Dec 6, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Dec 6, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Dec 6, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Dec 6, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
Astronomax added a commit to Astronomax/tarantool that referenced this issue Dec 6, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix
@sergepetrenko sergepetrenko added the 2.10 Target is 2.10 and all newer release/master branches label Dec 11, 2023
sergepetrenko pushed a commit that referenced this issue Dec 12, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes #9235

NO_DOC=bugfix
sergepetrenko pushed a commit to sergepetrenko/tarantool that referenced this issue Dec 12, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix

(cherry picked from commit 59b817e)
sergepetrenko pushed a commit to sergepetrenko/tarantool that referenced this issue Dec 12, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes tarantool#9235

NO_DOC=bugfix

(cherry picked from commit 59b817e)
sergepetrenko added a commit to sergepetrenko/tarantool that referenced this issue Dec 13, 2023
Follow-up tarantool#9235

NO_DOC=changelog
NO_TEST=changelog

Co-authored-by: Kseniia Antonova <73473519+xuniq@users.noreply.github.com>
sergepetrenko added a commit to sergepetrenko/tarantool that referenced this issue Dec 13, 2023
Follow-up tarantool#9235

NO_DOC=changelog
NO_TEST=changelog

Co-authored-by: Kseniia Antonova <73473519+xuniq@users.noreply.github.com>
sergepetrenko added a commit to sergepetrenko/tarantool that referenced this issue Dec 13, 2023
Follow-up tarantool#9235

NO_DOC=changelog
NO_TEST=changelog

Co-authored-by: Kseniia Antonova <73473519+xuniq@users.noreply.github.com>
sergepetrenko added a commit to sergepetrenko/tarantool that referenced this issue Dec 13, 2023
Follow-up tarantool#9235

NO_DOC=changelog
NO_TEST=changelog

Co-authored-by: Kseniia Antonova <73473519+xuniq@users.noreply.github.com>
sergepetrenko pushed a commit that referenced this issue Dec 13, 2023
Before this patch there was an execution sequence in which the
assertion in box_wait_limbo_acked would fail. The assertion is that
the lsn of the last entry in limbo is always positive after wal_sync.
Fix it.

Closes #9235

NO_DOC=bugfix

(cherry picked from commit 59b817e)
sergepetrenko added a commit that referenced this issue Dec 13, 2023
Follow-up #9235

NO_DOC=changelog
NO_TEST=changelog

Co-authored-by: Kseniia Antonova <73473519+xuniq@users.noreply.github.com>
sergepetrenko added a commit that referenced this issue Dec 13, 2023
Follow-up #9235

NO_DOC=changelog
NO_TEST=changelog

Co-authored-by: Kseniia Antonova <73473519+xuniq@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.10 Target is 2.10 and all newer release/master branches bug Something isn't working crash replication
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants