Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Add Node task fails after Backup/Restores with EAR rotations #18001

Closed
1 task done
yusong-yan opened this issue Jun 29, 2023 · 2 comments
Closed
1 task done
Assignees
Labels
2.16 Backport Required 2.18 Backport Required area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@yusong-yan
Copy link
Contributor

yusong-yan commented Jun 29, 2023

Jira Link: DB-7062

Description

The Update Universe Task fails after the following operations on a EAR Enabled universe:

Create a 3 RF and 4 node universe

Run Sample Apps

Take Backup

Rotate KMS key

Restore in the same universe by renaming the keyspace

Take backup of the new keyspace

Disable EAR

Restore in the same universe by renaming the keyspace

Take Backup of the new keyspace

Enable EAR

Restore in the same universe by renaming the keyspace

Take Backup of the new keyspace

Restore in the same universe by renaming the keyspace (without any rotations)

Add node (Update Universe) task will fail with the following logs:

Failed to execute task {"platformVersion":"2.19.1.0-b189","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"7e756665-30e5-419e-9284-a3ec33a3cd82","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"y..., hit error:

 WaitForServer(7e756665-30e5-419e-9284-a3ec33a3cd82, yb-dev-ui-auto-aws-21910-b189-kms-114-n5, type=TSERVER) did not respond in the set time..
W0627 18:17:42.236019 65736 tablet_server.cc:417] Getting full universe key registry from master Leader failed: 'Not found (yb/master/encryption_manager.cc:246): Could not find key with version '. Attempts: 2294, Total Time: 6859638200092ms. Retrying...

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@yusong-yan yusong-yan added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Jun 29, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 29, 2023
@yusong-yan
Copy link
Contributor Author

yusong-yan commented Jun 29, 2023

This can simply be reproduced by creating a universe with EAR enabled, then do the following tasks:
Screenshot 2023-06-28 at 8 01 30 PM
Screenshot 2023-06-28 at 8 01 57 PM

@yusong-yan yusong-yan self-assigned this Jun 29, 2023
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Jun 29, 2023
yusong-yan added a commit that referenced this issue Jul 6, 2023
…etFullUniverseKeyRegistry() when EAR is disabled.

Summary:
When creating a cluster with EAR enabled and soon disabling the EAR, the cluster or a TServer will fail at the next restart.

Here are the details steps:
1. Create a Universe with EAR enabled. Master will Encrypt the UniverseKeyRegistry with key at the creation.
3. Disable EAR, so that Master **decrypt** the UniverseKeyRegistry.
4. Restart the server, and it will try to get full UniverseKeyRegistry from Master. (Note: TSrver will try to obtain UniverseKeyRegistry from Master during Init() even no matter whether EAR is enabled or disabled.)
5. Master receives TServer's request. To send the full UniverseKeyRegistry, it decrypts the **decrypted UniverseKeyRegistry** and hits ERROR.

We should prevent step 4 from happening by skipping the decryption step when EAR is disabled and sending the decrypted UniverseKeyRegistry to TServer.

Test Plan: ./yb_build.sh --cxx-test integration-tests_encryption-test --gtest_filter EncryptionTest.DisableEncryptionAndRestartCluster --clang15

Reviewers: rthallam, hsunder, rahuldesirazu

Reviewed By: hsunder, rahuldesirazu

Subscribers: ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D26554
yusong-yan added a commit that referenced this issue Jul 17, 2023
…ptionManager::GetFullUniverseKeyRegistry() when EAR is disabled.

Summary:
Original commit: 30bde0a / D26554
When creating a cluster with EAR enabled and soon disabling the EAR, the cluster or a TServer will fail at the next restart.

Here are the details steps:
1. Create a Universe with EAR enabled. Master will Encrypt the UniverseKeyRegistry with key at the creation.
3. Disable EAR, so that Master **decrypt** the UniverseKeyRegistry.
4. Restart the server, and it will try to get full UniverseKeyRegistry from Master. (Note: TSrver will try to obtain UniverseKeyRegistry from Master during Init() even no matter whether EAR is enabled or disabled.)
5. Master receives TServer's request. To send the full UniverseKeyRegistry, it decrypts the **decrypted UniverseKeyRegistry** and hits ERROR.

We should prevent step 4 from happening by skipping the decryption step when EAR is disabled and sending the decrypted UniverseKeyRegistry to TServer.

Test Plan: ./yb_build.sh --cxx-test integration-tests_encryption-test --gtest_filter EncryptionTest.DisableEncryptionAndRestartCluster --clang15

Reviewers: rthallam, hsunder, rahuldesirazu

Reviewed By: rthallam

Subscribers: bogdan, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D26816
@yusong-yan
Copy link
Contributor Author

reopen for more backport

@yusong-yan yusong-yan reopened this Jul 26, 2023
yusong-yan added a commit that referenced this issue Jul 27, 2023
…ptionManager::GetFullUniverseKeyRegistry() when EAR is disabled.

Summary:
Original commit: 30bde0a / D26554
When creating a cluster with EAR enabled and soon disabling the EAR, the cluster or a TServer will fail at the next restart.

Here are the details steps:
1. Create a Universe with EAR enabled. Master will Encrypt the UniverseKeyRegistry with key at the creation.
3. Disable EAR, so that Master **decrypt** the UniverseKeyRegistry.
4. Restart the server, and it will try to get full UniverseKeyRegistry from Master. (Note: TSrver will try to obtain UniverseKeyRegistry from Master during Init() even no matter whether EAR is enabled or disabled.)
5. Master receives TServer's request. To send the full UniverseKeyRegistry, it decrypts the **decrypted UniverseKeyRegistry** and hits ERROR.

We should prevent step 4 from happening by skipping the decryption step when EAR is disabled and sending the decrypted UniverseKeyRegistry to TServer.

Test Plan: ./yb_build.sh --cxx-test integration-tests_encryption-test --gtest_filter EncryptionTest.DisableEncryptionAndRestartCluster --clang15

Reviewers: rthallam, hsunder, rahuldesirazu

Reviewed By: rahuldesirazu

Subscribers: bogdan, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D27284
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.16 Backport Required 2.18 Backport Required area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants