Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] ClusterConfig version is not being incremented when setting the universe_uuid #21491

Closed
1 task done
lingamsandeep opened this issue Mar 14, 2024 · 0 comments
Closed
1 task done
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@lingamsandeep
Copy link
Contributor

lingamsandeep commented Mar 14, 2024

Jira Link: DB-10377

Description

Recently, the universe_uuid was added to ClusterConfig as part of the fb98e56. This is essentially an identity for the universe which all the tservers inherit from the master as part of the heartbeat. Once set, this value is not meant to change on either the t-servers or masters and provides a way for master to reject any heartbeats from a different universe.

For universes upgrading from an older release to one having the commit fb98e56, the catalog manager generates a new universe_uuid and propagates that to the t-server. However, before persisting the universe_uuid in cluster_config, the version number is not being incremented.

As a result of this, the following race is possible:

  1. Cluster gets upgraded to a release with commit fb98e56 and the feature master_enable_universe_uuid_heartbeat_check is enabled.
  2. Reader reads the ClusterConfig at version 'X'.
  3. Catalog Manager background thread runs and generates a new universe_uuid, persists it in ClusterConfig and propagates it to all the t-servers.
  4. Reader from Step 2 updates the ClusterConfig using ChangeMasterClusterConfigRequestPB with version 'X'.
  5. Update from Step 4 succeeds because ClusterConfig version 'X' on disk matches the one in the request 'X' - effectively overwriting the universe_uuid generated in Step 3.
  6. Catalog Manager background thread runs again and since universe_uuid is empty, it generates a new universe_uuid again.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@lingamsandeep lingamsandeep added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Mar 14, 2024
@lingamsandeep lingamsandeep self-assigned this Mar 14, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Mar 14, 2024
lingamsandeep added a commit that referenced this issue Mar 17, 2024
…update and yb-ts-cli to clear universe uuid.

Summary:
This fix addresses the following race condition which is possible when upgrading a universe from a release which does not have master_enable_universe_uuid_heartbeat_check to a release which has universe_uuid generation and checks.

  # Cluster gets upgraded to a release with commit fb98e56 and the feature master_enable_universe_uuid_heartbeat_check is enabled.
  # Reader reads the ClusterConfig at version 'X'.
  # Catalog Manager background thread runs and generates a new universe_uuid, persists it in ClusterConfig and propagates it to all the t-servers.
  # Reader from Step 2 updates the ClusterConfig using ChangeMasterClusterConfigRequestPB with version 'X'.
  # Update from Step 4 succeeds because ClusterConfig version 'X' on disk matches the one in the request 'X' - effectively overwriting the universe_uuid generated in Step 3.
  # Catalog Manager background thread runs again and since universe_uuid is empty, it generates a new universe_uuid again.

Fix is to increment the ClusterConfig version during an update by the background task. Also added checks to ensure that universe_uuid does not modified once set by ChangeMasterClusterConfig.

Added a yb-ts-cli to clear the universe uuid in case we ever run into issues because of this.

**Upgrade/Rollback safety:**
- The proto changes are strictly for the yb-ts-cli only and as such should not have any upgrade implications.
Jira: DB-10377

Test Plan:
- Manually verified with an upgrade test that the clusterconfig version gets bumped up correctly and ChangeMasterClusterConfig fails if it attempts to modify the universe_uuid.
- Manually tested the yb-ts-cli command to validate that it clears the universe_uuid correctly.

ybt integration-tests_master_heartbeat-itest MasterHeartbeatITestWithUpgrade.ClearUniverseUuidToRecoverUniverse

Reviewers: hsunder

Reviewed By: hsunder

Subscribers: ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D33180
lingamsandeep added a commit that referenced this issue Mar 18, 2024
… universe_uuid update and yb-ts-cli to clear universe uuid.

Summary:
Original commit: 425d7c2 / D33180
This fix addresses the following race condition which is possible when upgrading a universe from a release which does not have master_enable_universe_uuid_heartbeat_check to a release which has universe_uuid generation and checks.

Fix is to increment the ClusterConfig version during an update by the background task. Also added checks to ensure that universe_uuid does not modified once set by ChangeMasterClusterConfig.

Added a yb-ts-cli to clear the universe uuid in case we ever run into issues because of this.

**Upgrade/Rollback safety:**
- The proto changes are strictly for the yb-ts-cli only and as such should not have any upgrade implications.
Jira: DB-10377

Test Plan:
- Manually verified with an upgrade test that the clusterconfig version gets bumped up correctly and ChangeMasterClusterConfig fails if it attempts to modify the universe_uuid.
- Manually tested the yb-ts-cli command to validate that it clears the universe_uuid correctly.

ybt integration-tests_master_heartbeat-itest MasterHeartbeatITestWithUpgrade.ClearUniverseUuidToRecoverUniverse

Reviewers: hsunder

Reviewed By: hsunder

Subscribers: bogdan, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D33248
lingamsandeep added a commit that referenced this issue Mar 18, 2024
…ng universe_uuid update and yb-ts-cli to clear universe uuid.

Summary:
Original commit: 425d7c2 / D33180
This fix addresses the following race condition which is possible when upgrading a universe from a release which does not have master_enable_universe_uuid_heartbeat_check to a release which has universe_uuid generation and checks.

Fix is to increment the ClusterConfig version during an update by the background task. Also added checks to ensure that universe_uuid does not modified once set by ChangeMasterClusterConfig.

Added a yb-ts-cli to clear the universe uuid in case we ever run into issues because of this.

**Upgrade/Rollback safety:**
- The proto changes are strictly for the yb-ts-cli only and as such should not have any upgrade implications.
Jira: DB-10377

Test Plan:
- Manually verified with an upgrade test that the clusterconfig version gets bumped up correctly and ChangeMasterClusterConfig fails if it attempts to modify the universe_uuid.
- Manually tested the yb-ts-cli command to validate that it clears the universe_uuid correctly.

ybt integration-tests_master_heartbeat-itest MasterHeartbeatITestWithUpgrade.ClearUniverseUuidToRecoverUniverse

Reviewers: hsunder

Reviewed By: hsunder

Subscribers: bogdan, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D33249
lingamsandeep added a commit that referenced this issue Mar 18, 2024
… universe_uuid update and yb-ts-cli to clear universe uuid.

Summary:
Original commit: 425d7c2 / D33180
This fix addresses the following race condition which is possible when upgrading a universe from a release which does not have master_enable_universe_uuid_heartbeat_check to a release which has universe_uuid generation and checks.

Fix is to increment the ClusterConfig version during an update by the background task. Also added checks to ensure that universe_uuid does not modified once set by ChangeMasterClusterConfig.

Added a yb-ts-cli to clear the universe uuid in case we ever run into issues because of this.

**Upgrade/Rollback safety:**
- The proto changes are strictly for the yb-ts-cli only and as such should not have any upgrade implications.
Jira: DB-10377

Test Plan:
- Manually verified with an upgrade test that the clusterconfig version gets bumped up correctly and ChangeMasterClusterConfig fails if it attempts to modify the universe_uuid.
- Manually tested the yb-ts-cli command to validate that it clears the universe_uuid correctly.

ybt integration-tests_master_heartbeat-itest MasterHeartbeatITestWithUpgrade.ClearUniverseUuidToRecoverUniverse

Reviewers: hsunder

Reviewed By: hsunder

Subscribers: bogdan, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D33247
lingamsandeep added a commit that referenced this issue Mar 18, 2024
…ng universe_uuid update and yb-ts-cli to clear universe uuid.

Summary:
Original commit: 425d7c2 / D33180
This fix addresses the following race condition which is possible when upgrading a universe from a release which does not have master_enable_universe_uuid_heartbeat_check to a release which has universe_uuid generation and checks.

Fix is to increment the ClusterConfig version during an update by the background task. Also added checks to ensure that universe_uuid does not modified once set by ChangeMasterClusterConfig.

Added a yb-ts-cli to clear the universe uuid in case we ever run into issues because of this.

**Upgrade/Rollback safety:**
- The proto changes are strictly for the yb-ts-cli only and as such should not have any upgrade implications.
Jira: DB-10377

Test Plan:
- Manually verified with an upgrade test that the clusterconfig version gets bumped up correctly and ChangeMasterClusterConfig fails if it attempts to modify the universe_uuid.
- Manually tested the yb-ts-cli command to validate that it clears the universe_uuid correctly.

ybt integration-tests_master_heartbeat-itest MasterHeartbeatITestWithUpgrade.ClearUniverseUuidToRecoverUniverse

Reviewers: hsunder

Reviewed By: hsunder

Subscribers: bogdan, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D33250
asrinivasanyb pushed a commit to asrinivasanyb/yugabyte-db that referenced this issue Mar 18, 2024
…se_uuid update and yb-ts-cli to clear universe uuid.

Summary:
This fix addresses the following race condition which is possible when upgrading a universe from a release which does not have master_enable_universe_uuid_heartbeat_check to a release which has universe_uuid generation and checks.

  # Cluster gets upgraded to a release with commit fb98e56 and the feature master_enable_universe_uuid_heartbeat_check is enabled.
  # Reader reads the ClusterConfig at version 'X'.
  # Catalog Manager background thread runs and generates a new universe_uuid, persists it in ClusterConfig and propagates it to all the t-servers.
  # Reader from Step 2 updates the ClusterConfig using ChangeMasterClusterConfigRequestPB with version 'X'.
  # Update from Step 4 succeeds because ClusterConfig version 'X' on disk matches the one in the request 'X' - effectively overwriting the universe_uuid generated in Step 3.
  # Catalog Manager background thread runs again and since universe_uuid is empty, it generates a new universe_uuid again.

Fix is to increment the ClusterConfig version during an update by the background task. Also added checks to ensure that universe_uuid does not modified once set by ChangeMasterClusterConfig.

Added a yb-ts-cli to clear the universe uuid in case we ever run into issues because of this.

**Upgrade/Rollback safety:**
- The proto changes are strictly for the yb-ts-cli only and as such should not have any upgrade implications.
Jira: DB-10377

Test Plan:
- Manually verified with an upgrade test that the clusterconfig version gets bumped up correctly and ChangeMasterClusterConfig fails if it attempts to modify the universe_uuid.
- Manually tested the yb-ts-cli command to validate that it clears the universe_uuid correctly.

ybt integration-tests_master_heartbeat-itest MasterHeartbeatITestWithUpgrade.ClearUniverseUuidToRecoverUniverse

Reviewers: hsunder

Reviewed By: hsunder

Subscribers: ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D33180
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants