Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Disallow ChangeConfig while we are in the middle of a config change #18335

Closed
yugabyte-ci opened this issue Jul 20, 2023 · 0 comments
Closed

Comments

@yugabyte-ci
Copy link
Contributor

yugabyte-ci commented Jul 20, 2023

Jira Link: DB-7323

Previously when a ChangeConfig results in a add server from ABC -> ABCD where D is in PRE_VOTER stage and still bootstrapping, a subsequent ChangeConfig from ABCD->ABD was disallowed because ‘D’ was still being bootstrapped and the config would be considered in transition. The ChangeConfig would fail with the below error. However this behavior was recently modified as part of commit and is now allowed which makes the transition from ABCD->ABD possible and also a subsequent ABD->ADE possible even when D & E are in pre-voter state. This needs to be revisited. (Discussion). The motivation behind the change is that if there are PRE_VOTERs in the config that are failing to bootstrap, other config changes to add or remove masters are disallowed.

@yugabyte-ci yugabyte-ci added area/docdb YugabyteDB core features jira-originated kind/bug This issue is a bug priority/low Low priority status/awaiting-triage Issue awaiting triage priority/high High Priority and removed status/awaiting-triage Issue awaiting triage priority/low Low priority labels Jul 20, 2023
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Aug 23, 2023
@yugabyte-ci yugabyte-ci added priority/medium Medium priority issue and removed priority/high High Priority labels Sep 21, 2023
basavaraj29 added a commit that referenced this issue Oct 17, 2023
…ther server is amidst transition

Summary:
Previously when a ChangeConfig results in a add server from ABC -> ABCD where D is in PRE_VOTER stage and still bootstrapping, a subsequent ChangeConfig from ABCD->ABD was disallowed because ‘D’ was still being bootstrapped and the config would be considered in transition. However this behavior was recently modified as part of 739ea8f and is now allowed which makes the transition from ABCD->ABD possible and also a subsequent ABD->ADE possible even when D & E are in PRE_VOTER state. This could lead to data loss scenarios. For instance, if we end up from A,B,C -> A,D,E,F (where D,E & F are newly added masters) with D,E & F still in PRE_VOTER state, any attempt to remove A will fail with a bad status. But if the operator mistakenly assumes that the new masters (D,E & F) are caught up and overwrites the master conf file, the new peers would form a quorum and outvote A, thus leading to orphaned tablet cleanup, and hence data loss.

Hence, it is better we reject `ConfigChange` requests for sys catalog when another server is in transition. A better fix is to still allow concurrent changes while restricting num VOTERS to drop below rf/2 + 1, but that can be taken up once #19453 is addressed.

This diff reverts to old behavior for sys catalog alone, i.e. We reject `ConfigChange` requests for sys catalog when another server is in transition.
Jira: DB-7323

Test Plan:
./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestConcurrentAddMastersFails -n 4
./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestBlockRemoveServerWhenConfigHasTransitioningServer -n 4

Reviewers: amitanand, rthallam, hsunder

Reviewed By: amitanand, hsunder

Subscribers: qhu, hsunder, ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D28676
basavaraj29 added a commit that referenced this issue Oct 18, 2023
…atalog when another server is amidst transition

Summary:
Original commit: 38abc37 / D28676
Previously when a ChangeConfig results in a add server from ABC -> ABCD where D is in PRE_VOTER stage and still bootstrapping, a subsequent ChangeConfig from ABCD->ABD was disallowed because ‘D’ was still being bootstrapped and the config would be considered in transition. However this behavior was recently modified as part of 739ea8f and is now allowed which makes the transition from ABCD->ABD possible and also a subsequent ABD->ADE possible even when D & E are in PRE_VOTER state. This could lead to data loss scenarios. For instance, if we end up from A,B,C -> A,D,E,F (where D,E & F are newly added masters) with D,E & F still in PRE_VOTER state, any attempt to remove A will fail with a bad status. But if the operator mistakenly assumes that the new masters (D,E & F) are caught up and overwrites the master conf file, the new peers would form a quorum and outvote A, thus leading to orphaned tablet cleanup, and hence data loss.

Hence, it is better we reject `ConfigChange` requests for sys catalog when another server is in transition. A better fix is to still allow concurrent changes while restricting num VOTERS to drop below rf/2 + 1, but that can be taken up once #19453 is addressed.

This diff reverts to old behavior for sys catalog alone, i.e. We reject `ConfigChange` requests for sys catalog when another server is in transition.
Jira: DB-7323

Test Plan:
./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestConcurrentAddMastersFails -n 4
./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestBlockRemoveServerWhenConfigHasTransitioningServer -n 4

Reviewers: amitanand, rthallam, hsunder

Reviewed By: rthallam

Subscribers: bogdan, ybase, hsunder, qhu

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29427
@rthallamko3 rthallamko3 added priority/high High Priority and removed priority/medium Medium priority issue labels Nov 13, 2023
basavaraj29 added a commit that referenced this issue Nov 14, 2023
…atalog when another server is amidst transition

Summary:
Original commit: 38abc37 / D28676
Previously when a ChangeConfig results in a add server from ABC -> ABCD where D is in PRE_VOTER stage and still bootstrapping, a subsequent ChangeConfig from ABCD->ABD was disallowed because ‘D’ was still being bootstrapped and the config would be considered in transition. However this behavior was recently modified as part of 739ea8f and is now allowed which makes the transition from ABCD->ABD possible and also a subsequent ABD->ADE possible even when D & E are in PRE_VOTER state. This could lead to data loss scenarios. For instance, if we end up from A,B,C -> A,D,E,F (where D,E & F are newly added masters) with D,E & F still in PRE_VOTER state, any attempt to remove A will fail with a bad status. But if the operator mistakenly assumes that the new masters (D,E & F) are caught up and overwrites the master conf file, the new peers would form a quorum and outvote A, thus leading to orphaned tablet cleanup, and hence data loss.

Hence, it is better we reject `ConfigChange` requests for sys catalog when another server is in transition. A better fix is to still allow concurrent changes while restricting num VOTERS to drop below rf/2 + 1, but that can be taken up once #19453 is addressed.

This diff reverts to old behavior for sys catalog alone, i.e. We reject `ConfigChange` requests for sys catalog when another server is in transition.
Jira: DB-7323

Test Plan:
./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestConcurrentAddMastersFails -n 4
./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestBlockRemoveServerWhenConfigHasTransitioningServer -n 4

Reviewers: amitanand, rthallam, hsunder

Reviewed By: amitanand

Subscribers: bogdan, ybase, hsunder, qhu

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30143
basavaraj29 added a commit that referenced this issue Nov 14, 2023
…atalog when another server is amidst transition

Summary:
Original commit: 38abc37 / D28676
Previously when a ChangeConfig results in a add server from ABC -> ABCD where D is in PRE_VOTER stage and still bootstrapping, a subsequent ChangeConfig from ABCD->ABD was disallowed because ‘D’ was still being bootstrapped and the config would be considered in transition. However this behavior was recently modified as part of 739ea8f and is now allowed which makes the transition from ABCD->ABD possible and also a subsequent ABD->ADE possible even when D & E are in PRE_VOTER state. This could lead to data loss scenarios. For instance, if we end up from A,B,C -> A,D,E,F (where D,E & F are newly added masters) with D,E & F still in PRE_VOTER state, any attempt to remove A will fail with a bad status. But if the operator mistakenly assumes that the new masters (D,E & F) are caught up and overwrites the master conf file, the new peers would form a quorum and outvote A, thus leading to orphaned tablet cleanup, and hence data loss.

Hence, it is better we reject `ConfigChange` requests for sys catalog when another server is in transition. A better fix is to still allow concurrent changes while restricting num VOTERS to drop below rf/2 + 1, but that can be taken up once #19453 is addressed.

This diff reverts to old behavior for sys catalog alone, i.e. We reject `ConfigChange` requests for sys catalog when another server is in transition.
Jira: DB-7323

Test Plan:
./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestConcurrentAddMastersFails -n 4
./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestBlockRemoveServerWhenConfigHasTransitioningServer -n 4

Reviewers: amitanand, rthallam, hsunder

Reviewed By: amitanand

Subscribers: bogdan, ybase, hsunder, qhu

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30144
basavaraj29 added a commit that referenced this issue Nov 14, 2023
…atalog when another server is amidst transition

Summary:
Original commit: 38abc37 / D28676
Previously when a ChangeConfig results in a add server from ABC -> ABCD where D is in PRE_VOTER stage and still bootstrapping, a subsequent ChangeConfig from ABCD->ABD was disallowed because ‘D’ was still being bootstrapped and the config would be considered in transition. However this behavior was recently modified as part of 739ea8f and is now allowed which makes the transition from ABCD->ABD possible and also a subsequent ABD->ADE possible even when D & E are in PRE_VOTER state. This could lead to data loss scenarios. For instance, if we end up from A,B,C -> A,D,E,F (where D,E & F are newly added masters) with D,E & F still in PRE_VOTER state, any attempt to remove A will fail with a bad status. But if the operator mistakenly assumes that the new masters (D,E & F) are caught up and overwrites the master conf file, the new peers would form a quorum and outvote A, thus leading to orphaned tablet cleanup, and hence data loss.

Hence, it is better we reject `ConfigChange` requests for sys catalog when another server is in transition. A better fix is to still allow concurrent changes while restricting num VOTERS to drop below rf/2 + 1, but that can be taken up once #19453 is addressed.

This diff reverts to old behavior for sys catalog alone, i.e. We reject `ConfigChange` requests for sys catalog when another server is in transition.
Jira: DB-7323

Test Plan:
./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestConcurrentAddMastersFails -n 4
./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestBlockRemoveServerWhenConfigHasTransitioningServer -n 4

Reviewers: amitanand, rthallam, hsunder

Reviewed By: amitanand

Subscribers: bogdan, ybase, hsunder, qhu

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30145
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants