New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Disallow ChangeConfig while we are in the middle of a config change #18335
Labels
2.14 Backport Required
2.16 Backport Required
2.18 Backport Required
2.20 Backport Required
area/docdb
YugabyteDB core features
jira-originated
kind/enhancement
This is an enhancement of an existing feature
priority/high
High Priority
Comments
yugabyte-ci
added
area/docdb
YugabyteDB core features
jira-originated
kind/bug
This issue is a bug
priority/low
Low priority
status/awaiting-triage
Issue awaiting triage
priority/high
High Priority
and removed
status/awaiting-triage
Issue awaiting triage
priority/low
Low priority
labels
Jul 20, 2023
yugabyte-ci
added
kind/enhancement
This is an enhancement of an existing feature
and removed
kind/bug
This issue is a bug
labels
Aug 23, 2023
yugabyte-ci
added
priority/medium
Medium priority issue
and removed
priority/high
High Priority
labels
Sep 21, 2023
basavaraj29
added a commit
that referenced
this issue
Oct 17, 2023
…ther server is amidst transition Summary: Previously when a ChangeConfig results in a add server from ABC -> ABCD where D is in PRE_VOTER stage and still bootstrapping, a subsequent ChangeConfig from ABCD->ABD was disallowed because ‘D’ was still being bootstrapped and the config would be considered in transition. However this behavior was recently modified as part of 739ea8f and is now allowed which makes the transition from ABCD->ABD possible and also a subsequent ABD->ADE possible even when D & E are in PRE_VOTER state. This could lead to data loss scenarios. For instance, if we end up from A,B,C -> A,D,E,F (where D,E & F are newly added masters) with D,E & F still in PRE_VOTER state, any attempt to remove A will fail with a bad status. But if the operator mistakenly assumes that the new masters (D,E & F) are caught up and overwrites the master conf file, the new peers would form a quorum and outvote A, thus leading to orphaned tablet cleanup, and hence data loss. Hence, it is better we reject `ConfigChange` requests for sys catalog when another server is in transition. A better fix is to still allow concurrent changes while restricting num VOTERS to drop below rf/2 + 1, but that can be taken up once #19453 is addressed. This diff reverts to old behavior for sys catalog alone, i.e. We reject `ConfigChange` requests for sys catalog when another server is in transition. Jira: DB-7323 Test Plan: ./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestConcurrentAddMastersFails -n 4 ./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestBlockRemoveServerWhenConfigHasTransitioningServer -n 4 Reviewers: amitanand, rthallam, hsunder Reviewed By: amitanand, hsunder Subscribers: qhu, hsunder, ybase, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D28676
basavaraj29
added a commit
that referenced
this issue
Oct 18, 2023
…atalog when another server is amidst transition Summary: Original commit: 38abc37 / D28676 Previously when a ChangeConfig results in a add server from ABC -> ABCD where D is in PRE_VOTER stage and still bootstrapping, a subsequent ChangeConfig from ABCD->ABD was disallowed because ‘D’ was still being bootstrapped and the config would be considered in transition. However this behavior was recently modified as part of 739ea8f and is now allowed which makes the transition from ABCD->ABD possible and also a subsequent ABD->ADE possible even when D & E are in PRE_VOTER state. This could lead to data loss scenarios. For instance, if we end up from A,B,C -> A,D,E,F (where D,E & F are newly added masters) with D,E & F still in PRE_VOTER state, any attempt to remove A will fail with a bad status. But if the operator mistakenly assumes that the new masters (D,E & F) are caught up and overwrites the master conf file, the new peers would form a quorum and outvote A, thus leading to orphaned tablet cleanup, and hence data loss. Hence, it is better we reject `ConfigChange` requests for sys catalog when another server is in transition. A better fix is to still allow concurrent changes while restricting num VOTERS to drop below rf/2 + 1, but that can be taken up once #19453 is addressed. This diff reverts to old behavior for sys catalog alone, i.e. We reject `ConfigChange` requests for sys catalog when another server is in transition. Jira: DB-7323 Test Plan: ./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestConcurrentAddMastersFails -n 4 ./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestBlockRemoveServerWhenConfigHasTransitioningServer -n 4 Reviewers: amitanand, rthallam, hsunder Reviewed By: rthallam Subscribers: bogdan, ybase, hsunder, qhu Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29427
rthallamko3
added
priority/high
High Priority
and removed
priority/medium
Medium priority issue
labels
Nov 13, 2023
basavaraj29
added a commit
that referenced
this issue
Nov 14, 2023
…atalog when another server is amidst transition Summary: Original commit: 38abc37 / D28676 Previously when a ChangeConfig results in a add server from ABC -> ABCD where D is in PRE_VOTER stage and still bootstrapping, a subsequent ChangeConfig from ABCD->ABD was disallowed because ‘D’ was still being bootstrapped and the config would be considered in transition. However this behavior was recently modified as part of 739ea8f and is now allowed which makes the transition from ABCD->ABD possible and also a subsequent ABD->ADE possible even when D & E are in PRE_VOTER state. This could lead to data loss scenarios. For instance, if we end up from A,B,C -> A,D,E,F (where D,E & F are newly added masters) with D,E & F still in PRE_VOTER state, any attempt to remove A will fail with a bad status. But if the operator mistakenly assumes that the new masters (D,E & F) are caught up and overwrites the master conf file, the new peers would form a quorum and outvote A, thus leading to orphaned tablet cleanup, and hence data loss. Hence, it is better we reject `ConfigChange` requests for sys catalog when another server is in transition. A better fix is to still allow concurrent changes while restricting num VOTERS to drop below rf/2 + 1, but that can be taken up once #19453 is addressed. This diff reverts to old behavior for sys catalog alone, i.e. We reject `ConfigChange` requests for sys catalog when another server is in transition. Jira: DB-7323 Test Plan: ./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestConcurrentAddMastersFails -n 4 ./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestBlockRemoveServerWhenConfigHasTransitioningServer -n 4 Reviewers: amitanand, rthallam, hsunder Reviewed By: amitanand Subscribers: bogdan, ybase, hsunder, qhu Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30143
basavaraj29
added a commit
that referenced
this issue
Nov 14, 2023
…atalog when another server is amidst transition Summary: Original commit: 38abc37 / D28676 Previously when a ChangeConfig results in a add server from ABC -> ABCD where D is in PRE_VOTER stage and still bootstrapping, a subsequent ChangeConfig from ABCD->ABD was disallowed because ‘D’ was still being bootstrapped and the config would be considered in transition. However this behavior was recently modified as part of 739ea8f and is now allowed which makes the transition from ABCD->ABD possible and also a subsequent ABD->ADE possible even when D & E are in PRE_VOTER state. This could lead to data loss scenarios. For instance, if we end up from A,B,C -> A,D,E,F (where D,E & F are newly added masters) with D,E & F still in PRE_VOTER state, any attempt to remove A will fail with a bad status. But if the operator mistakenly assumes that the new masters (D,E & F) are caught up and overwrites the master conf file, the new peers would form a quorum and outvote A, thus leading to orphaned tablet cleanup, and hence data loss. Hence, it is better we reject `ConfigChange` requests for sys catalog when another server is in transition. A better fix is to still allow concurrent changes while restricting num VOTERS to drop below rf/2 + 1, but that can be taken up once #19453 is addressed. This diff reverts to old behavior for sys catalog alone, i.e. We reject `ConfigChange` requests for sys catalog when another server is in transition. Jira: DB-7323 Test Plan: ./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestConcurrentAddMastersFails -n 4 ./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestBlockRemoveServerWhenConfigHasTransitioningServer -n 4 Reviewers: amitanand, rthallam, hsunder Reviewed By: amitanand Subscribers: bogdan, ybase, hsunder, qhu Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30144
basavaraj29
added a commit
that referenced
this issue
Nov 14, 2023
…atalog when another server is amidst transition Summary: Original commit: 38abc37 / D28676 Previously when a ChangeConfig results in a add server from ABC -> ABCD where D is in PRE_VOTER stage and still bootstrapping, a subsequent ChangeConfig from ABCD->ABD was disallowed because ‘D’ was still being bootstrapped and the config would be considered in transition. However this behavior was recently modified as part of 739ea8f and is now allowed which makes the transition from ABCD->ABD possible and also a subsequent ABD->ADE possible even when D & E are in PRE_VOTER state. This could lead to data loss scenarios. For instance, if we end up from A,B,C -> A,D,E,F (where D,E & F are newly added masters) with D,E & F still in PRE_VOTER state, any attempt to remove A will fail with a bad status. But if the operator mistakenly assumes that the new masters (D,E & F) are caught up and overwrites the master conf file, the new peers would form a quorum and outvote A, thus leading to orphaned tablet cleanup, and hence data loss. Hence, it is better we reject `ConfigChange` requests for sys catalog when another server is in transition. A better fix is to still allow concurrent changes while restricting num VOTERS to drop below rf/2 + 1, but that can be taken up once #19453 is addressed. This diff reverts to old behavior for sys catalog alone, i.e. We reject `ConfigChange` requests for sys catalog when another server is in transition. Jira: DB-7323 Test Plan: ./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestConcurrentAddMastersFails -n 4 ./yb_build.sh --cxx-test integration-tests_master_config-itest --gtest_filter MasterChangeConfigTest.TestBlockRemoveServerWhenConfigHasTransitioningServer -n 4 Reviewers: amitanand, rthallam, hsunder Reviewed By: amitanand Subscribers: bogdan, ybase, hsunder, qhu Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30145
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2.14 Backport Required
2.16 Backport Required
2.18 Backport Required
2.20 Backport Required
area/docdb
YugabyteDB core features
jira-originated
kind/enhancement
This is an enhancement of an existing feature
priority/high
High Priority
Jira Link: DB-7323
Previously when a ChangeConfig results in a add server from ABC -> ABCD where D is in PRE_VOTER stage and still bootstrapping, a subsequent ChangeConfig from ABCD->ABD was disallowed because ‘D’ was still being bootstrapped and the config would be considered in transition. The ChangeConfig would fail with the below error. However this behavior was recently modified as part of commit and is now allowed which makes the transition from ABCD->ABD possible and also a subsequent ABD->ADE possible even when D & E are in pre-voter state. This needs to be revisited. (Discussion). The motivation behind the change is that if there are PRE_VOTERs in the config that are failing to bootstrap, other config changes to add or remove masters are disallowed.
The text was updated successfully, but these errors were encountered: