Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] Tool to execute a remote bootstrap on demand in case of majority raft peer failure #8558

Closed
bmatican opened this issue May 21, 2021 · 1 comment
Assignees
Labels
2.14 Backport Required area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@bmatican
Copy link
Contributor

bmatican commented May 21, 2021

Jira Link: DB-1876
Sometimes with majority peer failures, we still are able to recover the nodes, with the same IPs, but no data (eg: local SSDs wiped). In this case, we should be able to reuse the existing raft config from the minority peer(s), instead of using #8557 -- but we still need data.

Currently, we execute the equivalent of a remote bootstrap, manually, by taking the raft WALs, rocksdb regular and intent data and potentially snapshots, and moving it from one remaining peer, seeding an empty old peer. We should just have a tool to do that on demand, potentially triggering a remote bootstrap from a follower, onto a new node that is not serving this tablet.

Ideally we'd do it from the most up to date peer, as per #8556.

@bmatican bmatican added the area/docdb YugabyteDB core features label May 21, 2021
@bmatican bmatican self-assigned this May 21, 2021
@bmatican bmatican added this to Backlog in YBase features via automation May 21, 2021
@bmatican bmatican added this to Backlog in Data integrity via automation May 21, 2021
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 9, 2022
@yugabyte-ci yugabyte-ci assigned es1024 and unassigned bmatican Jul 27, 2022
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Jul 28, 2022
es1024 added a commit that referenced this issue Sep 17, 2022
Summary:
Added `yb-ts-cli remote_bootstrap` which initiates a remote bootstrap from a specified
node for a specified tablet id:

```
yb-ts-cli --server_address=$TO_ADDR remote_bootstrap $FROM_ADDR $TABLET_ID
```
to perform a remote bootstrap for tablet `$TABLET_ID` using data from tablet server `$FROM_ADDR`
onto tablet server `$TO_ADDR`.

The command fails as expected when the tablet is in the `TABLET_DATA_READY` state.

Test Plan:
Manually tested by removing everything in `data/rocksdb` and`tablet-meta` on a stopped
local cluster with RF=3 then running the remote bootstrap and performing queries. Added test case
(`ybd --cxx-test tools_yb-ts-cli-test --gtest_filter YBTsCliTest.TestManualRemoteBootstrap`).

Reviewers: mbautin, bogdan, amitanand

Reviewed By: amitanand

Subscribers: zyu, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D13258
@es1024
Copy link
Contributor

es1024 commented Sep 17, 2022

Added with dec0805

@es1024 es1024 closed this as completed Sep 17, 2022
YBase features automation moved this from Backlog to Done Sep 17, 2022
Data integrity automation moved this from Backlog to Done Sep 17, 2022
@rthallamko3 rthallamko3 reopened this Sep 17, 2022
YBase features automation moved this from Done to In progress Sep 17, 2022
Data integrity automation moved this from Done to In progress Sep 17, 2022
es1024 added a commit that referenced this issue Sep 20, 2022
Summary:
Added `yb-ts-cli remote_bootstrap` which initiates a remote bootstrap from a specified
node for a specified tablet id:

```
yb-ts-cli --server_address=$TO_ADDR remote_bootstrap $FROM_ADDR $TABLET_ID
```
to perform a remote bootstrap for tablet `$TABLET_ID` using data from tablet server `$FROM_ADDR`
onto tablet server `$TO_ADDR`.

The command fails as expected when the tablet is in the `TABLET_DATA_READY` state.

Original Commit: dec0805 / D13258

This diff also backports a test fix for the test introduced in the above commit:

Original Commit: a398484 / D19653

Test Plan:
Manually tested by removing everything in `data/rocksdb` and`tablet-meta` on a stopped
local cluster with RF=3 then running the remote bootstrap and performing queries. Added test case
(`ybd --cxx-test tools_yb-ts-cli-test --gtest_filter YBTsCliTest.TestManualRemoteBootstrap`).

Jenkins: rebase: 2.8

Reviewers: bogdan, rthallam, amitanand

Reviewed By: rthallam, amitanand

Differential Revision: https://phabricator.dev.yugabyte.com/D19618
es1024 added a commit that referenced this issue Sep 20, 2022
Summary:
Added `yb-ts-cli remote_bootstrap` which initiates a remote bootstrap from a specified
node for a specified tablet id:

```
yb-ts-cli --server_address=$TO_ADDR remote_bootstrap $FROM_ADDR $TABLET_ID
```
to perform a remote bootstrap for tablet `$TABLET_ID` using data from tablet server `$FROM_ADDR`
onto tablet server `$TO_ADDR`.

The command fails as expected when the tablet is in the `TABLET_DATA_READY` state.

Original Commit: dec0805 / D13258

This diff also backports a test fix for the test introduced in the above commit:

Original Commit: a398484 / D19653

Test Plan:
Manually tested by removing everything in `data/rocksdb` and`tablet-meta` on a stopped
local cluster with RF=3 then running the remote bootstrap and performing queries. Added test case
(`ybd --cxx-test tools_yb-ts-cli-test --gtest_filter YBTsCliTest.TestManualRemoteBootstrap`).

Jenkins: rebase: 2.12

Reviewers: bogdan, rthallam, amitanand

Reviewed By: rthallam, amitanand

Differential Revision: https://phabricator.dev.yugabyte.com/D19619
es1024 added a commit that referenced this issue Sep 20, 2022
Summary:
Added `yb-ts-cli remote_bootstrap` which initiates a remote bootstrap from a specified
node for a specified tablet id:

```
yb-ts-cli --server_address=$TO_ADDR remote_bootstrap $FROM_ADDR $TABLET_ID
```
to perform a remote bootstrap for tablet `$TABLET_ID` using data from tablet server `$FROM_ADDR`
onto tablet server `$TO_ADDR`.

The command fails as expected when the tablet is in the `TABLET_DATA_READY` state.

Original Commit: dec0805 / D13258

This diff also backports a test fix for the test introduced in the above commit:

Original Commit: a398484 / D19653

Test Plan:
Manually tested by removing everything in `data/rocksdb` and`tablet-meta` on a stopped
local cluster with RF=3 then running the remote bootstrap and performing queries. Added test case
(`ybd --cxx-test tools_yb-ts-cli-test --gtest_filter YBTsCliTest.TestManualRemoteBootstrap`).

Jenkins: rebase: 2.14

Reviewers: bogdan, rthallam, amitanand

Reviewed By: rthallam, amitanand

Differential Revision: https://phabricator.dev.yugabyte.com/D19620
@es1024 es1024 closed this as completed Sep 20, 2022
YBase features automation moved this from In progress to Done Sep 20, 2022
Data integrity automation moved this from In progress to Done Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.14 Backport Required area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
Development

No branches or pull requests

4 participants