Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

box: disable split-brain detection until schema is upgraded #9050

Conversation

sergepetrenko
Copy link
Collaborator

@sergepetrenko sergepetrenko commented Aug 28, 2023

Our split-brain detection machinery relies among other things on all nodes tracking the synchro queue confirmed lsn. This tracking was only added together with the split-brain detection. Only the synchro queue owner tracked the confirmed lsn before.

This means that after an upgrade all the replicas remember the latest confirmed lsn as 0, and any PROMOTE/DEMOTE request from the queue owner is treated as a split brain.

Let's fix this and only enable split-brain detection on the replica set once the schema version is updated. Thanks to the synchro queue freeze on restart, this can only happen after a new PROMOTE or DEMOTE entry is written by one of the nodes, and thus the coorect confirmed lsn is propagated with this PROMOTE/DEMOTE to all the cluster members.

Closes #8996

NO_DOC=bugfix

@coveralls
Copy link

coveralls commented Aug 28, 2023

Coverage Status

coverage: 86.451% (+0.02%) from 86.43% when pulling 1d0e13f on sergepetrenko:gh-8996-relax-split-brain-detection into 74f723b
on tarantool:master
.

Copy link
Member

@grafin grafin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch! Only one nit and one question from me.

src/box/alter.cc Show resolved Hide resolved
src/box/alter.cc Outdated Show resolved Hide resolved
@sergepetrenko sergepetrenko force-pushed the gh-8996-relax-split-brain-detection branch from e005089 to 8f38270 Compare September 1, 2023 13:34
@grafin grafin removed their assignment Sep 7, 2023
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

@sergepetrenko sergepetrenko added full-ci Enables all tests for a pull request and removed full-ci Enables all tests for a pull request labels Sep 14, 2023
@sergepetrenko sergepetrenko force-pushed the gh-8996-relax-split-brain-detection branch 2 times, most recently from 5f1d8c8 to b1c8973 Compare September 15, 2023 11:16
@sergepetrenko sergepetrenko requested a review from a team as a code owner September 15, 2023 11:16
@sergepetrenko
Copy link
Collaborator Author

@Gerold103, @grafin, please take another look. I've noticed I never woke the fibers up. Fixed that and added a test case to avoid such errors.

@grafin
Copy link
Member

grafin commented Sep 18, 2023

Great job on noticing the problem!

@grafin grafin removed their assignment Sep 18, 2023
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

Our split-brain detection machinery relies among other things on all
nodes tracking the synchro queue confirmed lsn. This tracking was only
added together with the split-brain detection. Only the synchro queue
owner tracked the confirmed lsn before.

This means that after an upgrade all the replicas remember the latest
confirmed lsn as 0, and any PROMOTE/DEMOTE request from the queue owner
is treated as a split brain.

Let's fix this and only enable split-brain detection on the replica set
once the schema version is updated. Thanks to the synchro queue freeze
on restart, this can only happen after a new PROMOTE or DEMOTE entry is
written by one of the nodes, and thus the correct confirmed lsn
is propagated with this PROMOTE/DEMOTE to all the cluster members.

Closes tarantool#8996

NO_DOC=bugfix
@sergepetrenko sergepetrenko force-pushed the gh-8996-relax-split-brain-detection branch from b1c8973 to 1d0e13f Compare September 27, 2023 07:59
@sergepetrenko sergepetrenko added the full-ci Enables all tests for a pull request label Sep 27, 2023
@sergepetrenko sergepetrenko merged commit a844bd3 into tarantool:master Sep 28, 2023
101 checks passed
@sergepetrenko
Copy link
Collaborator Author

Will cherry pick to 2.10 and 2.11 in scope of
#9192
#9190

@sergepetrenko sergepetrenko deleted the gh-8996-relax-split-brain-detection branch September 29, 2023 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full-ci Enables all tests for a pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disable split-brain checks for partially upgraded clusters
6 participants