-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent CF being dropped while GC is running #72
Conversation
Signed-off-by: Yi Wu <yiwu@pingcap.com>
Signed-off-by: Yi Wu <yiwu@pingcap.com>
Signed-off-by: Yi Wu <yiwu@pingcap.com>
Signed-off-by: Yi Wu <yiwu@pingcap.com>
Codecov Report
@@ Coverage Diff @@
## master #72 +/- ##
=========================================
+ Coverage 82.63% 82.7% +0.07%
=========================================
Files 44 44
Lines 3017 3041 +24
=========================================
+ Hits 2493 2515 +22
- Misses 524 526 +2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a unit test to reproduce?
yes, will do. |
Signed-off-by: Yi Wu <yiwu@pingcap.com>
@@ -259,19 +259,15 @@ Status TitanDBImpl::Open(const std::vector<TitanCFDescriptor>& descs, | |||
s = blob_file_set_->Open(column_families); | |||
if (!s.ok()) return s; | |||
|
|||
static bool has_init_background_threads = false; | |||
if (!has_init_background_threads) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need this before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally no idea. It was from @DorianZheng 's initial version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intent was to increase the thread pool size by max_background_gc
, but don't want to do it multiple times if env
is shared by multiple db instance. I'm changing the logic to set thread pool size to max_background_gc
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Yi Wu <yiwu@pingcap.com>
Env::Priority::BOTTOM); | ||
assert(env_->GetBackgroundThreads(Env::Priority::BOTTOM) == | ||
bottom_pri_threads_num + db_options_.max_background_gc); | ||
env_->IncBackgroundThreadsIfNeeded(db_options_.max_background_gc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if other works need more bottom threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then the other work should decide how many threads it needs (I agree this env API is awkward..)
Summary: If `DropColumnFamilies` is called while GC is running, GC can fail because the CF is gone. The GC job will then set background error which halts Titan as a whole. To prevent it, we let `DropColumnFamilies` wait till there no running GC before proceed, and let GC job wait for pending drop CF requests before start running. Fixes tikv#71. Also fix `DropColumnFamilies` mark obsolete file as obsolete again, which will cause assert failure. Test Plan: Run titandb_stress and the above issues don't reproduce. Signed-off-by: Yi Wu <yiwu@pingcap.com>
cherry-pick following patches ``` ff30897 2019-09-23 zbk602423539@gmail.. reset gc mark after sampling (#88) cb67cd5 2019-09-22 zbk602423539@gmail.. fix wrong property after restart (#82) c6e408c 2019-09-20 yiwu@pingcap.com Fix blob file format description (#83) fc4106d 2019-09-18 yiwu@pingcap.com Fix new CF missing BlobFileSizeCollector (#78) 10710bb 2019-09-18 zbk602423539@gmail.. Add more metrics (#79) 8768067 2019-09-17 yiwu@pingcap.com Prevent CF being dropped while GC is running (#72) 07aa065 2019-09-16 zbk602423539@gmail.. Rename version_set to blob_file_set (#69) f963b88 2019-09-16 zbk602423539@gmail.. support histogram for titan stats (#74) 144e20f 2019-09-12 wujy.cs@gmail.com gc breakdown metrics (#73) 715dbd6 2019-09-11 zbk602423539@gmail.. Update BlobFileMeta format (#68) 468ddc9 2019-09-03 yiwu@pingcap.com Add internal operation stats and dump to info log periodically (#62) ```
Summary:
If
DropColumnFamilies
is called while GC is running, GC can fail because the CF is gone. The GC job will then set background error which halts Titan as a whole. To prevent it, we letDropColumnFamilies
wait till there no running GC before proceed, and let GC job wait for pending drop CF requests before start running. Fixes #71.Also fix
DropColumnFamilies
mark obsolete file as obsolete again, which will cause assert failure.Test Plan:
Run titandb_stress and the above issues don't reproduce.