Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util: compare raw and encoded keys without explicit decoding #5613

Merged
merged 12 commits into from Oct 14, 2019

Conversation

@sticnarf
Copy link
Contributor

sticnarf commented Oct 10, 2019

What have you changed?

This PR adds a function to check if the encoded and raw format refers to the same key.

Motivation

The motivation is that we want to know if a lock is a primary lock. However, inside TiKV we use encoded keys while the stored primary key in the lock is in the raw format.

An alternatives are directly passing raw keys into the transaction layer. But this makes the transaction code uglier. The benchmark result below shows that for a typical key length (30 bytes), it takes just less than 10ns and in most cases, two different keys are different in their tails. So the performance impact is quite small.

What is the type of the changes?

  • New feature (a change which adds functionality)

How is the PR tested?

  • Unit test

Does this PR affect documentation (docs) or should it be mentioned in the release notes?

No

Does this PR affect tidb-ansible?

No

Refer to a related PR or issue link (optional)

#5575 may utilize it.

Benchmark result if necessary (optional)

test codec::bytes::tests::bench_is_encoded_from       ... bench:         949 ns/iter (+/- 67)
test codec::bytes::tests::bench_is_encoded_from_small ... bench:           5 ns/iter (+/- 0)

Any examples? (optional)

sticnarf added 2 commits Oct 10, 2019
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Copy link
Contributor

BusyJay left a comment

Why does index key hardly become primary key? I think it works in the opposite way.

for raw_len in 0..=255 {
let raw: Vec<u8> = (0..raw_len).collect();
let encoded = Key::from_raw(&raw);
assert!(encoded.is_encoded_from(&raw));

This comment has been minimized.

Copy link
@BusyJay

BusyJay Oct 10, 2019

Contributor

Should test negative cases too.

This comment has been minimized.

Copy link
@sticnarf

sticnarf Oct 10, 2019

Author Contributor

Added. PTAL again

Tests are deliberately repeated in tikv_util::codec and keys, in case we change the implementation in keys to components/codec.

@sticnarf

This comment has been minimized.

Copy link
Contributor Author

sticnarf commented Oct 10, 2019

Why does index key hardly become primary key? I think it works in the opposite way.

My fault...Index keys don't become primary keys in pessimistic transactions. In optimistic transactions, index keys tend to become primary keys. I'll update the main thread.

Anyway, comparing from the end helps return early.

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
@sticnarf sticnarf added the C: Util label Oct 10, 2019
.zip(raw)
.all(|(&enc, &raw)| enc == !raw)
&& encoded[len..encoded.len() - 1].iter().all(|&v| v == 0xff)
&& encoded[ENC_GROUP_SIZE] == !(ENC_MARKER - pad)

This comment has been minimized.

Copy link
@youjiali1995

youjiali1995 Oct 12, 2019

Contributor

How about comparing it first?

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Copy link
Contributor

youjiali1995 left a comment

👍

@sticnarf sticnarf requested a review from breeswish Oct 12, 2019
@breeswish

This comment has been minimized.

Copy link
Member

breeswish commented Oct 12, 2019

When will this compare function being used?

@sticnarf

This comment has been minimized.

Copy link
Contributor Author

sticnarf commented Oct 12, 2019

When will this compare function being used?

In #5575, to prevent primary keys of pessimistic transactions from being collapsed.

And maybe in the future, to avoid writing rollback or lock of secondary keys.

Copy link
Member

breeswish left a comment

Looks like this comparing implementation is even slower than simply decoding..

1000 bytes:

test codec::bytes::tests::bench_is_encoded_from       ... bench:         121 ns/iter (+/- 5)
test byte::benches::bench_memcmp_decode_first_asc_large                               ... bench:          92 ns/iter (+/- 4)
test byte::benches::bench_memcmp_decode_first_desc_large                              ... bench:         113 ns/iter (+/- 5)

10000 bytes:

test codec::bytes::tests::bench_is_encoded_from       ... bench:       1,117 ns/iter (+/- 76)
test byte::benches::bench_memcmp_decode_first_asc_large                               ... bench:         749 ns/iter (+/- 45)
test byte::benches::bench_memcmp_decode_first_desc_large                              ... bench:         960 ns/iter (+/- 37)

let mut rev_encoded_chunks = encoded.rchunks_exact(ENC_GROUP_SIZE + 1);
// Valid encoded bytes must has complete chunks
assert!(rev_encoded_chunks.remainder().is_empty());

This comment has been minimized.

Copy link
@breeswish

breeswish Oct 12, 2019

Member

I think it would be better to accept invalid bytes but also return false.

This comment has been minimized.

Copy link
@sticnarf

sticnarf Oct 12, 2019

Author Contributor

Encoded bytes are all generated and used inside TiKV. Therefore, it's either a bug or data corruption if encoded bytes are invalid. So I choose to panic here.

This comment has been minimized.

Copy link
@breeswish

breeswish Oct 12, 2019

Member

However as a utility function it should not hold that knowledge. You can't prevent this utility function not to be used elsewhere as a common way to compare some user passed-in bytes or TiDB given bytes.

This comment has been minimized.

Copy link
@sticnarf

sticnarf Oct 12, 2019

Author Contributor

makes sense.

// Valid encoded bytes must has complete chunks
assert!(rev_encoded_chunks.remainder().is_empty());

// Bytes are compared in reverse order because in TiKV, if two keys are different, the last

This comment has been minimized.

Copy link
@breeswish

breeswish Oct 12, 2019

Member

Actually this is a TiDB feature instead of TiKV.

This comment has been minimized.

Copy link
@sticnarf

sticnarf Oct 12, 2019

Author Contributor

Ah, yes...Anyway, I still think it's all right to depend on this feature.

@sticnarf

This comment has been minimized.

Copy link
Contributor Author

sticnarf commented Oct 12, 2019

Looks like this comparing implementation is even slower than simply decoding..

@breeswish If necessary, I think we can do the similar optimizations. But actually this PR helps when we can get the result by check the length or just a few bytes...

@breeswish

This comment has been minimized.

Copy link
Member

breeswish commented Oct 12, 2019

@sticnarf If there can be optimizations to do in this PR, then why not do it 🤪I guess the source of slowness is caused from too much abstractions instead of simple and raw array operation..

Note that in TiDB scenario, the key is very short and looks like the effectiveness of this PR can be very trivial.

@sticnarf

This comment has been minimized.

Copy link
Contributor Author

sticnarf commented Oct 12, 2019

@sticnarf If there can be optimizations to do in this PR, then why not do it

My consideration is that this PR is going to be used in transactions to fix a bug in 3.0. Correctness is more important than effectiveness. (And comparing keys is far less costly compared to other operations like rocksdb seek, such optimizations don't improve the final performance much)

Later in master, we can further improve this. There will be more time to test before the next major release.

sticnarf added 2 commits Oct 12, 2019
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
@breeswish

This comment has been minimized.

Copy link
Member

breeswish commented Oct 12, 2019

@sticnarf I don't think bug fix can be an excuse of writing inefficient code, especially considering that it is not that hard to be efficient, as well as this is not a bug that occurs in high priority clients (?).. If there are future plans to improve it, please open an new issue to track it and I can accept it.


let mut rev_encoded_chunks = encoded.rchunks_exact(ENC_GROUP_SIZE + 1);
// Valid encoded bytes must has complete chunks
assert!(rev_encoded_chunks.remainder().is_empty());

This comment has been minimized.

Copy link
@breeswish

breeswish Oct 12, 2019

Member

However as a utility function it should not hold that knowledge. You can't prevent this utility function not to be used elsewhere as a common way to compare some user passed-in bytes or TiDB given bytes.

let raw_chunks = raw.chunks_exact(ENC_GROUP_SIZE);
// Check the last chunk first
match rev_encoded_chunks.next() {
Some(encoded_chunk) if check_single_chunk(encoded_chunk, raw_chunks.remainder()) => {}

This comment has been minimized.

Copy link
@breeswish

breeswish Oct 12, 2019

Member

what will happen if raw_chunks's length is N*ENC_GROUP_SIZE so that raw_chunks.remainder() is empty?

This comment has been minimized.

Copy link
@sticnarf

sticnarf Oct 12, 2019

Author Contributor

Then the encoded chunk must be 00 00 00 00 00 00 00 00 FF

Copy link
Member

breeswish left a comment

Except for the panic design, I'm fine with the rest.

.iter()
.zip(raw)
.all(|(&enc, &raw)| enc == !raw)
&& encoded[len..encoded.len() - 1].iter().all(|&v| v == 0xff)

This comment has been minimized.

Copy link
@breeswish

breeswish Oct 12, 2019

Member

ENC_MARKER

} else {
encoded[ENC_GROUP_SIZE] == (ENC_MARKER - pad)
&& &encoded[..len] == raw
&& encoded[len..encoded.len() - 1].iter().all(|&v| v == 0)

This comment has been minimized.

Copy link
@breeswish

breeswish Oct 12, 2019

Member

!ENC_MARKER

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
@sticnarf

This comment has been minimized.

Copy link
Contributor Author

sticnarf commented Oct 13, 2019

@breeswish Panic behavior is changed.

#5641 gives a faster implementation. Could you review that PR and help us evaluate the risk of using that one now?

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
/// # Panics
///
/// Panics if `encoded` is not valid
/// Returns whether `encoded` bytes is encoded from `raw`. Returns `false` if `encoded` is invalid.

This comment has been minimized.

Copy link
@breeswish

breeswish Oct 14, 2019

Member

If encoded is invalid, it must not be encoded from raw. Thus this scenario is already covered by the first sentence.

This comment has been minimized.

Copy link
@sticnarf

sticnarf Oct 14, 2019

Author Contributor

It's just to clarify that this function can accept invalid encoded bytes :)

@sticnarf sticnarf requested a review from youjiali1995 Oct 14, 2019
@sticnarf

This comment has been minimized.

Copy link
Contributor Author

sticnarf commented Oct 14, 2019

/test

///
/// # Panics
///
/// Panics if `self` is not a valid encoded key.

This comment has been minimized.

Copy link
@youjiali1995

youjiali1995 Oct 14, 2019

Contributor

Seems it never panic.

This comment has been minimized.

Copy link
@sticnarf

sticnarf Oct 14, 2019

Author Contributor

Fixed. r? @breeswish

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
@breeswish

This comment has been minimized.

Copy link
Member

breeswish commented Oct 14, 2019

/merge

@sre-bot sre-bot added the S: CanMerge label Oct 14, 2019
@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Oct 14, 2019

/run-all-tests

@sre-bot sre-bot merged commit e18d5a5 into tikv:master Oct 14, 2019
6 checks passed
6 checks passed
DCO All commits are signed off!
Details
idc-jenkins-ci-tikv/integration-common-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-compatibility-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-cop-push-down-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-ddl-test Jenkins job succeeded.
Details
idc-jenkins-ci/test Jenkins job succeeded.
Details
@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Oct 14, 2019

cherry pick to release-3.1 in PR #5645

@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Oct 14, 2019

cherry pick to release-3.0 in PR #5646

sre-bot added a commit that referenced this pull request Oct 14, 2019
…5646)

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
sre-bot added a commit that referenced this pull request Oct 14, 2019
…5645)

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
sticnarf added a commit to sticnarf/tikv that referenced this pull request Oct 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.