Skip to content

Assertion failure in raft_stop_candidate during an ongoing WAL write #8169

@sergepetrenko

Description

@sergepetrenko

Bug description

Reproduced on Tarantool 2.11.0-entrypoint-880-g493870ef5.
Reproduced on Tarantool 2.10.4-0-g816000e10.
Assertion failed: (raft_ev_timer_is_active(&raft->timer)), function raft_stop_candidate, file raft.c, line 1168.
raft_stop_candidate asserts that raft timer should be active when leader is seen (to track the moment of leader death). However this is not true during a WAL write. At such times timer is stopped.

Steps to reproduce

Only reproduced in debug mode. Nothing bad happens in release, since the assertion is followed by a no-op.

-- Instance 1.
-- Step 1.
box.cfg{listen = 3301, replication = {3301, 3302}, election_mode = 'candidate', replication_timeout = 4}

-- Step 3.
box.schema.user.grant('guest', 'replication')

-- Step 4. Wait until the 2nd instance connects before running.
box.error.injection.set('ERRINJ_WAL_DELAY', true)

-- Step 6.
box.cfg{election_mode = 'voter'}

-- Instance 2.
-- Step 2.
box.cfg{listen = 3302, replication = {3301, 3302}, election_mode = 'candidate', replication_timeout = 4, replication_synchro_quorum = 1, read_only = true}

-- Step 5.
box.ctl.promote()

After step 6 the 1st instance will fail with an assertion failure:

tarantool> box.cfg{election_mode = 'voter'}
Assertion failed: (raft_ev_timer_is_active(&raft->timer)), function raft_stop_candidate, file raft.c, line 1168.

The issue was found while debugging the gh_6036_qsync_order_test.lua test (#7785).

Metadata

Metadata

Assignees

Labels

2.10Target is 2.10 and all newer release/master branchesbugSomething isn't workingraftRAFT protocol

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions