-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Core dumps found when exercising ddl atomicity + cross db concurrent ddls #21773
Closed
1 task done
Labels
2024.1 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/medium
Medium priority issue
Comments
myang2021
added a commit
that referenced
this issue
Apr 2, 2024
Summary: The relevant code that can lead to rpc callback invocation on a deleted ddl background task is in `PollTransactionStatusBase::VerifyTransaction()` ``` *rpc_handle = client::GetTransactionStatus( TransactionRpcDeadline(), nullptr /* tablet */, client, &req, [this, rpc_handle] (Status status, const tserver::GetTransactionStatusResponsePB& resp) { auto retained = rpcs_.Unregister(rpc_handle); TransactionReceived(std::move(status), resp); }); ``` The base class `PollTransactionStatusBase` has two subclasses: ``` class NamespaceVerificationTask : public MultiStepNamespaceTaskBase, public PollTransactionStatusBase { ``` and ``` class TableSchemaVerificationTask : public MultiStepTableTaskBase, public PollTransactionStatusBase { ``` Both tasks can be asynchronously deleted by `TasksTracker::CleanupOldTasks`. Therefore it is possible that the background task is already deleted before the callback shown above is invoked. This can lead to yb-master crash. To fix this bug, we need to ensure that the bg task isn't deleted before the callback is invoked. I added a new virtual function `GetSharedFromThis()` to retrieve a `shared_ptr` to the bg task object and capture it within the callback itself as a weak_ptr<MonitoredTask>. Jira: DB-10649 Test Plan: jenkins Reviewers: hsunder, tverona Reviewed By: hsunder Subscribers: ybase, yql, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D33731
1 task
myang2021
added a commit
that referenced
this issue
Apr 5, 2024
…dl bg task Summary: The relevant code that can lead to rpc callback invocation on a deleted ddl background task is in `PollTransactionStatusBase::VerifyTransaction()` ``` *rpc_handle = client::GetTransactionStatus( TransactionRpcDeadline(), nullptr /* tablet */, client, &req, [this, rpc_handle] (Status status, const tserver::GetTransactionStatusResponsePB& resp) { auto retained = rpcs_.Unregister(rpc_handle); TransactionReceived(std::move(status), resp); }); ``` The base class `PollTransactionStatusBase` has two subclasses: ``` class NamespaceVerificationTask : public MultiStepNamespaceTaskBase, public PollTransactionStatusBase { ``` and ``` class TableSchemaVerificationTask : public MultiStepTableTaskBase, public PollTransactionStatusBase { ``` Both tasks can be asynchronously deleted by `TasksTracker::CleanupOldTasks`. Therefore it is possible that the background task is already deleted before the callback shown above is invoked. This can lead to yb-master crash. To fix this bug, we need to ensure that the bg task isn't deleted before the callback is invoked. I added a new virtual function `GetSharedFromThis()` to retrieve a `shared_ptr` to the bg task object and capture it within the callback itself as a weak_ptr<MonitoredTask>. Jira: DB-10649 Original commit: 70261b0 / D33731 Test Plan: jenkins Reviewers: hsunder, tverona Reviewed By: hsunder Subscribers: jason, bogdan, yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D33781
hari90
added a commit
that referenced
this issue
Apr 6, 2024
Summary: rpcs_.Shutdown() does not wait for all calls to complete. It has a max timeout after which it gives up. Usually this is not a problem since we only Shutdown rpcs on process exit, so its fine to end the process quickly. In the case of PollTransactionStatusBase we need to explicitly wait for the callback completion when the Task ends. Jira: DB-10649 Test Plan: Jenkins Reviewers: myang Reviewed By: myang Subscribers: ybase, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D33819
myang2021
pushed a commit
that referenced
this issue
Apr 8, 2024
…tatusBase shutdown Summary: rpcs_.Shutdown() does not wait for all calls to complete. It has a max timeout after which it gives up. Usually this is not a problem since we only Shutdown rpcs on process exit, so its fine to end the process quickly. In the case of PollTransactionStatusBase we need to explicitly wait for the callback completion when the Task ends. Jira: DB-10649 Original commit: 70620c7 / D33819 Test Plan: Jenkins Reviewers: tverona, kfranz, hsunder Reviewed By: hsunder Subscribers: bogdan, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D33918
Closed
1 task
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2024.1 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/medium
Medium priority issue
Jira Link: DB-10649
Description
http://stress.dev.yugabyte.com/stress_test/a26e46ed-0009-40eb-bbfe-e7e845a46a38
http://stress.dev.yugabyte.com/stress_test/0e143f1f-bb16-40c9-ad81-d97fac67bc2f
Build: 2024.1.0.0-b32
Core dump:
http://stress.dev.yugabyte.com/files/get?name=97b4f41a-e2e0-47ce-8ab0-106812ebf9cd-core[…].1.0.0-b32-almalinux8-aarch64!bin!yb-server.backtrace.txt
http://stress.dev.yugabyte.com/files/get?name=c27aa0ef-765d-40b2-ad5e-797fd6a6d5ff-core[…].1.0.0-b32-almalinux8-aarch64!bin!yb-server.backtrace.txt
http://stress.dev.yugabyte.com/files/get?name=17cde259-9f37-4958-8ae1-0d762618dc68-core[…].1.0.0-b32-almalinux8-aarch64!bin!yb-server.backtrace.txt
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: