Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent CF being dropped while GC is running #72

Merged
merged 9 commits into from
Sep 18, 2019

Conversation

yiwu-arbug
Copy link
Collaborator

Summary:
If DropColumnFamilies is called while GC is running, GC can fail because the CF is gone. The GC job will then set background error which halts Titan as a whole. To prevent it, we let DropColumnFamilies wait till there no running GC before proceed, and let GC job wait for pending drop CF requests before start running. Fixes #71.

Also fix DropColumnFamilies mark obsolete file as obsolete again, which will cause assert failure.

Test Plan:
Run titandb_stress and the above issues don't reproduce.

Yi Wu added 4 commits September 10, 2019 22:46
Signed-off-by: Yi Wu <yiwu@pingcap.com>
Signed-off-by: Yi Wu <yiwu@pingcap.com>
Signed-off-by: Yi Wu <yiwu@pingcap.com>
Signed-off-by: Yi Wu <yiwu@pingcap.com>
@codecov-io
Copy link

codecov-io commented Sep 11, 2019

Codecov Report

Merging #72 into master will increase coverage by 0.07%.
The diff coverage is 95.34%.

@@            Coverage Diff            @@
##           master     #72      +/-   ##
=========================================
+ Coverage   82.63%   82.7%   +0.07%     
=========================================
  Files          44      44              
  Lines        3017    3041      +24     
=========================================
+ Hits         2493    2515      +22     
- Misses        524     526       +2

Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a unit test to reproduce?

@yiwu-arbug
Copy link
Collaborator Author

Can you add a unit test to reproduce?

yes, will do.

Signed-off-by: Yi Wu <yiwu@pingcap.com>
@yiwu-arbug yiwu-arbug mentioned this pull request Sep 16, 2019
Yi Wu added 3 commits September 16, 2019 23:38
Signed-off-by: Yi Wu <yiwu@pingcap.com>
Signed-off-by: Yi Wu <yiwu@pingcap.com>
@@ -259,19 +259,15 @@ Status TitanDBImpl::Open(const std::vector<TitanCFDescriptor>& descs,
s = blob_file_set_->Open(column_families);
if (!s.ok()) return s;

static bool has_init_background_threads = false;
if (!has_init_background_threads) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need this before?

Copy link
Collaborator Author

@yiwu-arbug yiwu-arbug Sep 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally no idea. It was from @DorianZheng 's initial version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent was to increase the thread pool size by max_background_gc, but don't want to do it multiple times if env is shared by multiple db instance. I'm changing the logic to set thread pool size to max_background_gc instead.

src/titan_db_test.cc Show resolved Hide resolved
Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Yi Wu <yiwu@pingcap.com>
Env::Priority::BOTTOM);
assert(env_->GetBackgroundThreads(Env::Priority::BOTTOM) ==
bottom_pri_threads_num + db_options_.max_background_gc);
env_->IncBackgroundThreadsIfNeeded(db_options_.max_background_gc,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if other works need more bottom threads

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then the other work should decide how many threads it needs (I agree this env API is awkward..)

@yiwu-arbug yiwu-arbug merged commit 8768067 into tikv:master Sep 18, 2019
@yiwu-arbug yiwu-arbug deleted the fix_drop_gc branch September 18, 2019 02:45
yiwu-arbug pushed a commit to yiwu-arbug/titan that referenced this pull request Sep 24, 2019
Summary:
If `DropColumnFamilies` is called while GC is running, GC can fail because the CF is gone. The GC job will then set background error which halts Titan as a whole. To prevent it, we let `DropColumnFamilies` wait till there no running GC before proceed, and let GC job wait for pending drop CF requests before start running. Fixes tikv#71.

Also fix `DropColumnFamilies` mark obsolete file as obsolete again, which will cause assert failure.

Test Plan:
Run titandb_stress and the above issues don't reproduce.

Signed-off-by: Yi Wu <yiwu@pingcap.com>
Connor1996 pushed a commit that referenced this pull request Sep 24, 2019
cherry-pick following patches
```
ff30897 2019-09-23 zbk602423539@gmail.. reset gc mark after sampling (#88)
cb67cd5 2019-09-22 zbk602423539@gmail.. fix wrong property after restart (#82)
c6e408c 2019-09-20 yiwu@pingcap.com     Fix blob file format description (#83)
fc4106d 2019-09-18 yiwu@pingcap.com     Fix new CF missing BlobFileSizeCollector (#78)
10710bb 2019-09-18 zbk602423539@gmail.. Add more metrics (#79)
8768067 2019-09-17 yiwu@pingcap.com     Prevent CF being dropped while GC is running (#72)
07aa065 2019-09-16 zbk602423539@gmail.. Rename version_set to blob_file_set (#69)
f963b88 2019-09-16 zbk602423539@gmail.. support histogram for titan stats (#74)
144e20f 2019-09-12 wujy.cs@gmail.com    gc breakdown metrics (#73)
715dbd6 2019-09-11 zbk602423539@gmail.. Update BlobFileMeta format (#68)
468ddc9 2019-09-03 yiwu@pingcap.com     Add internal operation stats and dump to info log periodically (#62)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drop CF while GC is running causing background error
3 participants