New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize gc with user properties #1971

Merged
merged 13 commits into from Jul 17, 2017

Conversation

Projects
None yet
7 participants
@huachaohuang
Member

huachaohuang commented Jun 29, 2017

This PR optimize GC with user collected properties from RocksDB.
It registers a UserPropertiesCollector to RocksDB to collect keys / versions information about each SST file. Before scanning keys for GC, it first get properties of a region from RocksDB, to check if it can skip GC on the region.

Properties:

  • min_ts: the minimal timestamp
  • max_ts: the maximal timestamp
  • num_rows: number of unique rows
  • num_puts: number of MVCC puts of all rows
  • num_versions: number of MVCC versions of all rows
  • max_row_versions: maximal number of MVCC versions of a single row
@disksing

This comment has been minimized.

Show comment
Hide comment
Contributor

disksing commented Jun 29, 2017

Show outdated Hide outdated src/storage/metrics.rs
pub static ref KV_COMMAND_GC_SKIPPED_COUNTER: Counter =
register_counter!(
"tikv_storage_gc_skipped_counter",
"Total number of gc command skipped owe to optimization"

This comment has been minimized.

@andelf

andelf Jun 29, 2017

Contributor

owing to?

@andelf

andelf Jun 29, 2017

Contributor

owing to?

Show outdated Hide outdated src/storage/mvcc/reader.rs
@@ -417,4 +418,33 @@ impl<'a> MvccReader<'a> {
keys.push(key);
}
}
// Returns true if it need gc.

This comment has been minimized.

@andelf

andelf Jun 29, 2017

Contributor

needs

@andelf

andelf Jun 29, 2017

Contributor

needs

Show outdated Hide outdated src/util/rocksdb.rs
}
}
pub fn num_versions(&self) -> u64 {

This comment has been minimized.

@andelf

andelf Jun 29, 2017

Contributor

is usize better?

@andelf

andelf Jun 29, 2017

Contributor

is usize better?

Show outdated Hide outdated src/util/rocksdb.rs
map.insert(k.as_bytes().to_owned(), buf);
}
map
}

This comment has been minimized.

@andelf

andelf Jun 29, 2017

Contributor
items
   .iter()
   .map(|(k,v)| {
         let mut buf = Vec::with_capacity(8);
         buf.encode_u64(v).unwrap();
        (k.as_bytes().to_vec(), buf)
    })
    .collect()
@andelf

andelf Jun 29, 2017

Contributor
items
   .iter()
   .map(|(k,v)| {
         let mut buf = Vec::with_capacity(8);
         buf.encode_u64(v).unwrap();
        (k.as_bytes().to_vec(), buf)
    })
    .collect()
Show outdated Hide outdated src/util/rocksdb.rs
let (k, ts) = match types::split_encoded_key_on_ts(key) {
Ok((k, ts)) => (k, ts),
Err(_) => {
// Should we ignore this or panic?

This comment has been minimized.

@siddontang

siddontang Jul 1, 2017

Contributor

we may meet error for the lock cf?

@siddontang

siddontang Jul 1, 2017

Contributor

we may meet error for the lock cf?

Show outdated Hide outdated src/util/rocksdb.rs
}
match entry_type {
DBEntryType::Put => self.props.num_puts += 1,
DBEntryType::Merge => self.props.num_merges += 1,

This comment has been minimized.

@siddontang

siddontang Jul 1, 2017

Contributor

seem we don't need to care merge?

@siddontang

siddontang Jul 1, 2017

Contributor

seem we don't need to care merge?

Show outdated Hide outdated src/util/rocksdb.rs
DBEntryType::Put => self.props.num_puts += 1,
DBEntryType::Merge => self.props.num_merges += 1,
DBEntryType::Delete |
DBEntryType::SingleDelete => self.props.num_deletes += 1,

This comment has been minimized.

@siddontang

siddontang Jul 1, 2017

Contributor

we don't have SingleDelete too.

@siddontang

siddontang Jul 1, 2017

Contributor

we don't have SingleDelete too.

@huachaohuang huachaohuang changed the title from [DNM] optimize gc with user properties to optimize gc with user properties Jul 5, 2017

/// Splits encoded key on timestamp.
/// Returns the split key and timestamp.
pub fn split_encoded_key_on_ts(key: &[u8]) -> Result<(&[u8], u64), codec::Error> {

This comment has been minimized.

@siddontang

siddontang Jul 5, 2017

Contributor

any test?

@siddontang

siddontang Jul 5, 2017

Contributor

any test?

@huachaohuang

This comment has been minimized.

Show comment
Hide comment
Show outdated Hide outdated src/storage/mvcc/reader.rs
// No keys have more than one versions, and the version is effective.
// Notice: Since the properties are file-based, it can be false positive.
// For example, if multiple files have a different version of the same key.
!(props.num_keys == props.num_versions && props.num_puts == props.num_versions)

This comment has been minimized.

@siddontang

siddontang Jul 9, 2017

Contributor

do we need a ratio check here, e,g, the num versions is 100 and num keys is 99, I still think we don't need do GC here.

@siddontang

siddontang Jul 9, 2017

Contributor

do we need a ratio check here, e,g, the num versions is 100 and num keys is 99, I still think we don't need do GC here.

This comment has been minimized.

@huachaohuang

huachaohuang Jul 9, 2017

Member

How about adding a ratio configuration and default to 10%?

@huachaohuang

huachaohuang Jul 9, 2017

Member

How about adding a ratio configuration and default to 10%?

This comment has been minimized.

@huachaohuang

huachaohuang Jul 9, 2017

Member

Maybe I should add a TiDB variable like tikv_gc_min_ratio and default to 0.1?
How do you think @disksing @AndreMouche ?

@huachaohuang

huachaohuang Jul 9, 2017

Member

Maybe I should add a TiDB variable like tikv_gc_min_ratio and default to 0.1?
How do you think @disksing @AndreMouche ?

Show outdated Hide outdated src/storage/types.rs
fn test_split_ts() {
let k = b"k";
let ts = 123;
assert_eq!(split_encoded_key_on_ts(k).is_err(), true);

This comment has been minimized.

@siddontang

siddontang Jul 9, 2017

Contributor

you can use assert! to check is_err directly.

@siddontang

siddontang Jul 9, 2017

Contributor

you can use assert! to check is_err directly.

Show outdated Hide outdated src/util/rocksdb.rs
pub num_keys: u64,
pub num_puts: u64,
pub num_versions: u64,
}

This comment has been minimized.

@siddontang

siddontang Jul 9, 2017

Contributor

do we need to record deletes here?

@siddontang

siddontang Jul 9, 2017

Contributor

do we need to record deletes here?

This comment has been minimized.

@huachaohuang

huachaohuang Jul 9, 2017

Member

You mean DBEntryType::Delete or WriteType::Delete?

@huachaohuang

huachaohuang Jul 9, 2017

Member

You mean DBEntryType::Delete or WriteType::Delete?

pub fn lock(&mut self, pk: &[u8], start_ts: u64, commit_ts: u64) {
let m = Mutation::Lock(make_key(pk));
self.prewrite(m, pk, start_ts);
self.commit(pk, start_ts, commit_ts);

This comment has been minimized.

@zhangjinpeng1987

zhangjinpeng1987 Jul 9, 2017

Member

Lock will never commit but rollback.

@zhangjinpeng1987

zhangjinpeng1987 Jul 9, 2017

Member

Lock will never commit but rollback.

This comment has been minimized.

@huachaohuang

huachaohuang Jul 9, 2017

Member

Lock will be committed in some cases, e.g. SELECT FOR UPDATE?

@huachaohuang

huachaohuang Jul 9, 2017

Member

Lock will be committed in some cases, e.g. SELECT FOR UPDATE?

This comment has been minimized.

@zhangjinpeng1987

zhangjinpeng1987 Jul 10, 2017

Member

select for update will lock the key first and then clean the lock use rollback.

@zhangjinpeng1987

zhangjinpeng1987 Jul 10, 2017

Member

select for update will lock the key first and then clean the lock use rollback.

Show outdated Hide outdated src/util/rocksdb.rs
pub struct UserProperties {
pub min_ts: u64,
pub max_ts: u64,
pub num_keys: u64,

This comment has been minimized.

@zhangjinpeng1987

zhangjinpeng1987 Jul 10, 2017

Member

Does num_keys means num_rows in our situation?

@zhangjinpeng1987

zhangjinpeng1987 Jul 10, 2017

Member

Does num_keys means num_rows in our situation?

This comment has been minimized.

Show outdated Hide outdated src/util/rocksdb.rs
}
pub fn encode(&self) -> HashMap<Vec<u8>, Vec<u8>> {
let items = [("tikv.min_ts", self.min_ts),

This comment has been minimized.

@zhangjinpeng1987

zhangjinpeng1987 Jul 10, 2017

Member

Please use const str.

@zhangjinpeng1987

zhangjinpeng1987 Jul 10, 2017

Member

Please use const str.

Show outdated Hide outdated src/util/rocksdb.rs
}
if k != self.last_key.as_slice() {
self.props.num_keys += 1;

This comment has been minimized.

@zhangjinpeng1987

zhangjinpeng1987 Jul 10, 2017

Member

I prefer num_rows

@zhangjinpeng1987

zhangjinpeng1987 Jul 10, 2017

Member

I prefer num_rows

Show outdated Hide outdated src/util/rocksdb.rs
Ok(v) => v,
Err(_) => return, // Ignore error
};
if v.write_type == WriteType::Put {

This comment has been minimized.

@zhangjinpeng1987

zhangjinpeng1987 Jul 10, 2017

Member

I think we should also record the count of MVCC DEL.

@zhangjinpeng1987

zhangjinpeng1987 Jul 10, 2017

Member

I think we should also record the count of MVCC DEL.

Show outdated Hide outdated src/storage/mvcc/reader.rs
// No rows have more than one versions, and the version is effective.
// Notice: Since the properties are file-based, it can be false positive.
// For example, if multiple files have a different version of the same row.
!(props.num_rows == props.num_versions && props.num_puts == props.num_versions)

This comment has been minimized.

@zhangjinpeng1987

zhangjinpeng1987 Jul 10, 2017

Member

I think there are several cases that we need to run GC: 1) the num_versions > num_rows *(1 + 0.1(tunable)); 2) num_mvcc_del rate over threshold; 3) a single row has too many versions (usually produced by update).

@zhangjinpeng1987

zhangjinpeng1987 Jul 10, 2017

Member

I think there are several cases that we need to run GC: 1) the num_versions > num_rows *(1 + 0.1(tunable)); 2) num_mvcc_del rate over threshold; 3) a single row has too many versions (usually produced by update).

This comment has been minimized.

@huachaohuang

huachaohuang Jul 11, 2017

Member

2 and 3 are covered by 1) in a way:
2) num_mvcc_del is included in num_versions
3) If a single row has too many versions, the total num_versions may or may not > num_rows * threshold. Do you want to set a fixed number for this case, like 10 or 100 versions of a single row?

@huachaohuang

huachaohuang Jul 11, 2017

Member

2 and 3 are covered by 1) in a way:
2) num_mvcc_del is included in num_versions
3) If a single row has too many versions, the total num_versions may or may not > num_rows * threshold. Do you want to set a fixed number for this case, like 10 or 100 versions of a single row?

This comment has been minimized.

@zhangjinpeng1987

zhangjinpeng1987 Jul 11, 2017

Member

2 is not covered by 1, for example we execute sql like delete * from t..., that will add a mvcc_del for each row we have deleted, the old version mvcc_put may be dropped after gc finished, but the fresh mvcc_del will left, these mvcc_del also need gc when they are out of date.
3 is not covered by 1 too, you have finger it out above, I think 100 is enough.

@zhangjinpeng1987

zhangjinpeng1987 Jul 11, 2017

Member

2 is not covered by 1, for example we execute sql like delete * from t..., that will add a mvcc_del for each row we have deleted, the old version mvcc_put may be dropped after gc finished, but the fresh mvcc_del will left, these mvcc_del also need gc when they are out of date.
3 is not covered by 1 too, you have finger it out above, I think 100 is enough.

This comment has been minimized.

@AndreMouche

AndreMouche Jul 11, 2017

Member

I prefer to make a new function in UserProperties to estimate whether gc is needed. And I'm not pretty sure that if we could use an approximate range(use percentage) instead of the exact value((props.num_rows == props.num_versions && props.num_puts == props.num_versions))

@AndreMouche

AndreMouche Jul 11, 2017

Member

I prefer to make a new function in UserProperties to estimate whether gc is needed. And I'm not pretty sure that if we could use an approximate range(use percentage) instead of the exact value((props.num_rows == props.num_versions && props.num_puts == props.num_versions))

@siddontang

This comment has been minimized.

Show comment
Hide comment
@siddontang

siddontang Jul 11, 2017

Contributor

Any update? @huachaohuang

Contributor

siddontang commented Jul 11, 2017

Any update? @huachaohuang

Show outdated Hide outdated src/storage/mvcc/reader.rs
// No rows have more than one versions, and the version is effective.
// Notice: Since the properties are file-based, it can be false positive.
// For example, if multiple files have a different version of the same row.
!(props.num_rows == props.num_versions && props.num_puts == props.num_versions)

This comment has been minimized.

@AndreMouche

AndreMouche Jul 11, 2017

Member

I prefer to make a new function in UserProperties to estimate whether gc is needed. And I'm not pretty sure that if we could use an approximate range(use percentage) instead of the exact value((props.num_rows == props.num_versions && props.num_puts == props.num_versions))

@AndreMouche

AndreMouche Jul 11, 2017

Member

I prefer to make a new function in UserProperties to estimate whether gc is needed. And I'm not pretty sure that if we could use an approximate range(use percentage) instead of the exact value((props.num_rows == props.num_versions && props.num_puts == props.num_versions))

Show outdated Hide outdated src/util/rocksdb.rs
("cd", 3, WriteType::Put, DBEntryType::Put),
("ef", 5, WriteType::Put, DBEntryType::Put),
("ef", 5, WriteType::Put, DBEntryType::Delete),
("gh", 6, WriteType::Delete, DBEntryType::Put)];

This comment has been minimized.

@AndreMouche

AndreMouche Jul 11, 2017

Member

When would we use DBEntryType::Delete in rocksdb? Only in GC operation?

@AndreMouche

AndreMouche Jul 11, 2017

Member

When would we use DBEntryType::Delete in rocksdb? Only in GC operation?

This comment has been minimized.

@huachaohuang

huachaohuang Jul 11, 2017

Member

Maybe, what's the problem?

@huachaohuang

huachaohuang Jul 11, 2017

Member

Maybe, what's the problem?

This comment has been minimized.

@AndreMouche

AndreMouche Jul 11, 2017

Member

no problem, I'm just curious @huachaohuang

@AndreMouche

AndreMouche Jul 11, 2017

Member

no problem, I'm just curious @huachaohuang

Show outdated Hide outdated src/storage/mvcc/reader.rs
engine.flush();
check_need_gc(db.clone(), region.clone(), 10, true);
}
}

This comment has been minimized.

@AndreMouche

AndreMouche Jul 11, 2017

Member

Could you split this test to test_gc_without_db_properties and test_gc_with_db_properties?

@AndreMouche

AndreMouche Jul 11, 2017

Member

Could you split this test to test_gc_without_db_properties and test_gc_with_db_properties?

Show outdated Hide outdated src/util/rocksdb.rs
}
impl UserProperties {
pub fn new() -> UserProperties {

This comment has been minimized.

@AndreMouche

AndreMouche Jul 11, 2017

Member

Could we implement trait Default instead of new with empty params?

@AndreMouche

AndreMouche Jul 11, 2017

Member

Could we implement trait Default instead of new with empty params?

Show outdated Hide outdated src/util/rocksdb.rs
}
impl UserPropertiesCollector {
fn new() -> UserPropertiesCollector {

This comment has been minimized.

@AndreMouche

AndreMouche Jul 11, 2017

Member

ditto, Could we implement trait Default instead of new with empty params?

@AndreMouche

AndreMouche Jul 11, 2017

Member

ditto, Could we implement trait Default instead of new with empty params?

Show outdated Hide outdated src/util/rocksdb.rs
match v.write_type {
WriteType::Put => self.props.num_puts += 1,
WriteType::Delete => self.props.num_deletes += 1,
_ => {}

This comment has been minimized.

@AndreMouche

AndreMouche Jul 11, 2017

Member

Why not consider Rollback and Lock here?

@AndreMouche

AndreMouche Jul 11, 2017

Member

Why not consider Rollback and Lock here?

Show outdated Hide outdated src/util/rocksdb.rs
pub struct UserPropertiesCollectorFactory {}
impl UserPropertiesCollectorFactory {
pub fn new() -> UserPropertiesCollectorFactory {

This comment has been minimized.

@AndreMouche

AndreMouche Jul 11, 2017

Member

ditto

Show outdated Hide outdated src/util/rocksdb.rs
let (k, ts) = match types::split_encoded_key_on_ts(key) {
Ok((k, ts)) => (k, ts),
Err(_) => return, // Ignore error

This comment has been minimized.

@AndreMouche

AndreMouche Jul 11, 2017

Member

Could we do some logs to indicate an error once happened?

@AndreMouche

AndreMouche Jul 11, 2017

Member

Could we do some logs to indicate an error once happened?

This comment has been minimized.

@siddontang

siddontang Jul 16, 2017

Contributor

maybe adding a prop field to record is better.

@siddontang

siddontang Jul 16, 2017

Contributor

maybe adding a prop field to record is better.

This comment has been minimized.

@siddontang

siddontang Jul 16, 2017

Contributor

but for raw KV, we can return here directly.

@siddontang

siddontang Jul 16, 2017

Contributor

but for raw KV, we can return here directly.

This comment has been minimized.

@huachaohuang

huachaohuang Jul 16, 2017

Member

We only register this collector for CF_WRITE, so it will not affect raw KV.

@huachaohuang

huachaohuang Jul 16, 2017

Member

We only register this collector for CF_WRITE, so it will not affect raw KV.

This comment has been minimized.

@siddontang

siddontang Jul 17, 2017

Contributor

So the struct name should contain Write to make it more scene, like WriteUserProperty.

And we should add a field counter to record the error.

@siddontang

siddontang Jul 17, 2017

Contributor

So the struct name should contain Write to make it more scene, like WriteUserProperty.

And we should add a field counter to record the error.

This comment has been minimized.

@huachaohuang

huachaohuang Jul 17, 2017

Member

UserProperties aims to be a common type returned by the snapshot interface.
It can be split into multiple parts when it's too large, but we have only a few properties right now.

@huachaohuang

huachaohuang Jul 17, 2017

Member

UserProperties aims to be a common type returned by the snapshot interface.
It can be split into multiple parts when it's too large, but we have only a few properties right now.

Show outdated Hide outdated src/util/rocksdb.rs
let v = match Write::parse(value) {
Ok(v) => v,
Err(_) => return, // Ignore error

This comment has been minimized.

@AndreMouche

AndreMouche Jul 11, 2017

Member

ditto

Show outdated Hide outdated src/util/rocksdb.rs
Ok(res)
}
pub fn need_gc(&self, safe_point: u64, ratio_threshold: f64) -> bool {

This comment has been minimized.

@zhangjinpeng1987

zhangjinpeng1987 Jul 11, 2017

Member

I don't think move need_gc into user properties is a good idea.

@zhangjinpeng1987

zhangjinpeng1987 Jul 11, 2017

Member

I don't think move need_gc into user properties is a good idea.

@huachaohuang

This comment has been minimized.

Show comment
Hide comment
Show outdated Hide outdated src/storage/mod.rs
pub struct Options {
pub lock_ttl: u64,
pub skip_constraint_check: bool,
pub key_only: bool,
pub gc_ratio_threshold: f64,

This comment has been minimized.

@zhangjinpeng1987

zhangjinpeng1987 Jul 13, 2017

Member

Why embedding gc_ratio_threshold in Options?

@zhangjinpeng1987

zhangjinpeng1987 Jul 13, 2017

Member

Why embedding gc_ratio_threshold in Options?

@zhangjinpeng1987

This comment has been minimized.

Show comment
Hide comment
Member

zhangjinpeng1987 commented Jul 14, 2017

LGTM

@siddontang

This comment has been minimized.

Show comment
Hide comment
Contributor

siddontang commented Jul 14, 2017

@@ -107,6 +109,37 @@ impl RegionSnapshot {
Ok(())
}
pub fn get_properties_cf(&self,

This comment has been minimized.

@siddontang

siddontang Jul 14, 2017

Contributor

do we have a test to cover it?

@siddontang

siddontang Jul 14, 2017

Contributor

do we have a test to cover it?

@siddontang

This comment has been minimized.

Show comment
Hide comment
Contributor

siddontang commented Jul 14, 2017

@siddontang

This comment has been minimized.

Show comment
Hide comment
@siddontang

siddontang Jul 16, 2017

Contributor

PTAL @disksing

Contributor

siddontang commented Jul 16, 2017

PTAL @disksing

@siddontang

This comment has been minimized.

Show comment
Hide comment
@siddontang

siddontang Jul 17, 2017

Contributor

I prefer moving the property codes from rocksdb.rs to another file, the rocksdb.rs is too huge now.

Contributor

siddontang commented Jul 17, 2017

I prefer moving the property codes from rocksdb.rs to another file, the rocksdb.rs is too huge now.

@huachaohuang

This comment has been minimized.

Show comment
Hide comment
@disksing

This comment has been minimized.

Show comment
Hide comment
@disksing

disksing Jul 17, 2017

Contributor

LGTM

Contributor

disksing commented Jul 17, 2017

LGTM

@huachaohuang huachaohuang merged commit 7f49a4f into master Jul 17, 2017

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details

@huachaohuang huachaohuang deleted the huachao/optimize-gc branch Jul 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment