Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication/hot_standby crashed in txn_limbo sometimes #5163

Closed
Gerold103 opened this issue Jul 10, 2020 · 1 comment
Closed

Replication/hot_standby crashed in txn_limbo sometimes #5163

Gerold103 opened this issue Jul 10, 2020 · 1 comment
Assignees
Labels
Milestone

Comments

@Gerold103
Copy link
Collaborator

Here is a failed job: https://travis-ci.org/github/tarantool/tarantool/jobs/706833309.

[007] replication/hot_standby.test.lua                                
[007] [Instance "hot_standby"] Tarantool server failed to start
[007] 
[007] Last 15 lines of Tarantool Log file [Instance "hot_standby"][/tarantool/test/var/007_replication/hot_standby.log]:
[007] 2020-07-10 11:06:36.544 [32490] main/103/hot_standby I> mapping 134217728 bytes for vinyl tuple arena...
[007] 2020-07-10 11:06:36.559 [32490] main/103/hot_standby I> instance uuid f0967d71-d838-4964-a6e1-12725f9a2283
[007] 2020-07-10 11:06:36.581 [32490] main/103/hot_standby I> instance vclock {0: 1023, 1: 1600, 2: 1}
[007] 2020-07-10 11:06:36.584 [32490] main/103/hot_standby I> recovery start
[007] 2020-07-10 11:06:36.584 [32490] main/103/hot_standby I> recovering from `master/00000000000000000040.snap'
[007] 2020-07-10 11:06:36.607 [32490] main/103/hot_standby I> cluster uuid 0b4051f2-5a9d-4a07-b51d-bddd52ce30d8
[007] 2020-07-10 11:06:36.636 [32490] main/103/hot_standby I> assigned id 1 to replica f0967d71-d838-4964-a6e1-12725f9a2283
[007] 2020-07-10 11:06:36.636 [32490] main/103/hot_standby I> recover from `master/00000000000000000040.xlog'
[007] 2020-07-10 11:06:36.637 [32490] main/103/hot_standby I> assigned id 2 to replica 67382cb7-e363-4fa3-b483-53a5cffc632b
[007] 2020-07-10 11:06:36.637 [32490] main/103/hot_standby I> removed replica 67382cb7-e363-4fa3-b483-53a5cffc632b
[007] 2020-07-10 11:06:36.639 [32490] main/103/hot_standby I> assigned id 2 to replica 21098aa9-820c-45de-84b5-3c199e75a22b
[007] 2020-07-10 11:06:36.639 [32490] main/103/hot_standby I> removed replica 21098aa9-820c-45de-84b5-3c199e75a22b
[007] 2020-07-10 11:06:36.640 [32490] main/103/hot_standby I> assigned id 2 to replica 1f95dc56-cbbe-43a0-b341-5488ba61a57f
[007] 2020-07-10 11:06:36.640 [32490] main/103/hot_standby I> removed replica 1f95dc56-cbbe-43a0-b341-5488ba61a57f
[007] tarantool: /tarantool/src/box/txn_limbo.c:117: txn_limbo_assign_remote_lsn: Assertion `limbo->instance_id != instance_id'

The test does not use sync transactions at all. What makes me think it may be just some not initialized memory.

@Gerold103 Gerold103 self-assigned this Jul 10, 2020
@Gerold103
Copy link
Collaborator Author

Gerold103 commented Jul 10, 2020

box.cfg{}
s = box.schema.space.create('sync', {is_sync=true})
_ = s:create_index('pk')
s:replace{1}

Now restart and call box.cfg{} again.

Gerold103 added a commit that referenced this issue Jul 11, 2020
Recovery uses txn_commit_async() so as not to block the recovery
process when a synchronous transaction is met. They are either
committed later when CONFIRM is read, or stay in the limbo after
recovery.

However txn_commit_async() assumed it is used for remote
transactions only, and had some assertions about that. One of them
crashed in case master restarted and had any synchronous
transaction in WAL.

The patch makes txn_commit_async() not assume anything about
transaction's origin.

Closes #5163
@kyukhin kyukhin added this to the 2.5.1 milestone Jul 13, 2020
@Gerold103 Gerold103 reopened this Jul 21, 2020
@tarantool tarantool deleted a comment from TarantoolBot Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants