Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Use async callback mechanism in YBMetaDataCache::GetTable #17038

Closed
1 task done
dr0pdb opened this issue Apr 26, 2023 · 1 comment
Closed
1 task done

[DocDB] Use async callback mechanism in YBMetaDataCache::GetTable #17038

dr0pdb opened this issue Apr 26, 2023 · 1 comment
Assignees

Comments

@dr0pdb
Copy link
Contributor

dr0pdb commented Apr 26, 2023

Jira Link: DB-6349

Description

For #16697, as a short term solution, we will protect multiple calls to OpenTable through a barrier approach.

For long term, it should be made non-blocking.

probably a barrier approach could still lead to all threads getting stuck, as the other N-1 threads would now wait on the mutex instead of on the OpenTable call, so we could still end up in a situation where all threads are stuck…
We likely would need some light form of async callback instead, to get rid of that problem, where we combine your approach, of one thread goes out and fetches metadata, with some queue, where the other N-1 add some callbacks to a queue, waiting on that metadata to be fetched, but not keep a thread busy.

This will be slightly more involving.

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@dr0pdb dr0pdb added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Apr 26, 2023
@dr0pdb dr0pdb self-assigned this Apr 26, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Apr 26, 2023
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Apr 26, 2023
@yugabyte-ci yugabyte-ci added priority/highest Highest priority issue and removed priority/medium Medium priority issue labels May 31, 2023
@yugabyte-ci yugabyte-ci assigned spolitov and unassigned dr0pdb Jun 1, 2023
spolitov added a commit that referenced this issue Jun 2, 2023
Summary:
It could happen that opening table during index update blocks too many threads, resulting in deadlock.

Introduce GetTableAsync in YBMetaDataCache and use while updating indexes.

Test Plan: Jenkins

Reviewers: stiwary

Reviewed By: stiwary

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D25887
@dr0pdb dr0pdb self-assigned this Jun 8, 2023
dr0pdb added a commit that referenced this issue Jun 14, 2023
Summary:
Make the MetaDataCache::GetTable logic completely async by introducing an async OpenTable in YBClient.

Presently, the first thread blocks while other threads register a callback. Post this diff, the first thread will also not block.
Jira: DB-6349

Test Plan: Jenkins

Reviewers: sergei, bogdan, skumar

Reviewed By: sergei

Differential Revision: https://phorge.dev.yugabyte.com/D26081
dr0pdb added a commit that referenced this issue Jun 16, 2023
Summary:
As part of https://phorge.dev.yugabyte.com/D26081, we made the OpenTable async.

An issue broke the fastdebug gcc11 build. The `info` shared_ptr was being captured via `std::move` and the `info->table_id` was passed as a const reference inside `FetchPartitions` leading to SIGSEGV

Also, while converting from the sync to the async version of `GetTableSchema`, I missed that we also have to handle the status sent in the callback & not just in the functions return value.

Fixed them both & tested on dev server.
Jira: DB-6349

Test Plan:
Jenkins: urgent

`./yb_build.sh fastdebug --gcc11 --cxx-test pgwrapper_create_initial_sys_catalog_snapshot --gtest_filter CreateInitialSysCatalogSnapshotTest.CreateInitialSysCatalogSnapshot`

Other Jenkins tests

Reviewers: sergei

Reviewed By: sergei

Differential Revision: https://phorge.dev.yugabyte.com/D26209
spolitov added a commit that referenced this issue Jun 24, 2023
Summary:
It could happen that opening table during index update blocks too many threads, resulting in deadlock.

Introduce GetTableAsync in YBMetaDataCache and use while updating indexes.

Original commit: 099a148/D25887
Jira: DB-6349

Test Plan: Jenkins

Reviewers: stiwary, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D26385
spolitov added a commit that referenced this issue Jun 25, 2023
Summary:
It could happen that opening table during index update blocks too many threads, resulting in deadlock.

Introduce GetTableAsync in YBMetaDataCache and use while updating indexes.

Original commit: 099a148/D25887

Test Plan: Jenkins

Reviewers: stiwary, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D26387
spolitov added a commit that referenced this issue Jun 30, 2023
Summary:
It could happen that opening table during index update blocks too many threads, resulting in deadlock.

Introduce GetTableAsync in YBMetaDataCache and use while updating indexes.

Original commit: 099a148/D25887
Jira: DB-6349

Test Plan: Jenkins

Reviewers: stiwary, bogdan

Reviewed By: stiwary

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D26429
@dr0pdb
Copy link
Contributor Author

dr0pdb commented Jul 4, 2023

Note that we didn't backport the async OpenTable of YBClient. That is okay because it was a minor piece.

The main fix was to make YBMetaDataCache::GetTable async which was done in Sergei's fix. Post his fix, only the first thread would block while the other threads would register callback and get freed.

The async OpenTable of YBClient ensures that even the first thread doesn't get blocked. So it is ok to not backport it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants