Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Types for Batch Execution #3660

Merged
merged 32 commits into from Nov 17, 2018

Conversation

Projects
None yet
4 participants
@breeswish
Copy link
Member

commented Oct 10, 2018

What have you changed? (mandatory)

This PR implements primitive type BatchColumn and BatchRows to be used in Coprocessor batch execution, as well as corresponding tests and benchmarks.

  • Wait for merge: This PR is based on #3651

What are the type of the changes? (mandatory)

  • New feature (non-breaking change which adds functionality)

How has this PR been tested? (mandatory)

See new unit tests introduced by this PR.

Benchmark result if necessary (optional)

test coprocessor::codec::batch::column::benches::bench_batch_decode                             ... bench:      53,312 ns/iter (+/- 11,136)
test coprocessor::codec::batch::column::benches::bench_push_datum_int                           ... bench:      17,264 ns/iter (+/- 5,818)
test coprocessor::codec::batch::column::benches::bench_retain                                   ... bench:       2,240 ns/iter (+/- 769)
test coprocessor::codec::batch::rows::benches::bench_lazy_batch_column_by_vec_clone             ... bench:      37,944 ns/iter (+/- 6,581)
test coprocessor::codec::batch::rows::benches::bench_lazy_batch_column_by_vec_push_raw_10bytes  ... bench:      41,078 ns/iter (+/- 8,300)
test coprocessor::codec::batch::rows::benches::bench_lazy_batch_column_clone                    ... bench:       9,736 ns/iter (+/- 2,586)
test coprocessor::codec::batch::rows::benches::bench_lazy_batch_column_clone_10bytes            ... bench:      47,573 ns/iter (+/- 19,126)
test coprocessor::codec::batch::rows::benches::bench_lazy_batch_column_clone_and_decode         ... bench:      31,776 ns/iter (+/- 2,540)
test coprocessor::codec::batch::rows::benches::bench_lazy_batch_column_clone_and_decode_decoded ... bench:         344 ns/iter (+/- 55)
test coprocessor::codec::batch::rows::benches::bench_lazy_batch_column_clone_decoded            ... bench:         312 ns/iter (+/- 69)
test coprocessor::codec::batch::rows::benches::bench_lazy_batch_column_clone_naive              ... bench:      75,691 ns/iter (+/- 10,705)
test coprocessor::codec::batch::rows::benches::bench_lazy_batch_column_push_raw_10bytes         ... bench:      43,391 ns/iter (+/- 7,464)
test coprocessor::codec::batch::rows::benches::bench_lazy_batch_column_push_raw_4bytes          ... bench:      11,531 ns/iter (+/- 2,098)
test coprocessor::codec::batch::rows::benches::bench_lazy_batch_column_push_raw_9bytes          ... bench:      11,607 ns/iter (+/- 1,660)

Explain

For the followings, items in italic means that it is the implementation used in this PR.

Notice that this PR does not provide most efficient implementation for every kind of payloads. However it tries it best. For example, SmallVec is used to accelerate 75% performance for small and common data types like Int, Real, while it will introduce 4% overhead for large data types like Json and String.

Decode Multiple Datums

Decode 1000 datums

  • Original: bench_batch_decode -- 53,312 ns/iter (+/- 11,136)
  • This PR: bench_push_datum_int -- 17,264 ns/iter (+/- 5,818)

SmallVec vs Vec: Push Item

When implementing BatchRows, SmallVec instead of Vec is used. This shows how it benefits performance (1000 elements):

When datum is small enough to put inline (applicable for Int, Real, DateTime, Duration):

  • SmallVec: bench_lazy_batch_column_push_raw_9bytes -- 11,607 ns/iter (+/- 1,660)
  • Vec: bench_lazy_batch_column_by_vec_push_raw_10bytes -- 41,078 ns/iter (+/- 8,300)

When datum is large so that it cannot be put inline (applicable for other kind of data):

  • SmallVec: bench_lazy_batch_column_push_raw_10bytes -- 43,391 ns/iter (+/- 7,464)
  • Vec: bench_lazy_batch_column_by_vec_push_raw_10bytes -- 41,078 ns/iter (+/- 8,300)

SmallVec vs Vec vs Optimized SmallVec: Clone

SmallVec's default clone() implementation is not efficient and this PR writes our own (Optimized SmallVec). This shows how it benefits performance (1000 elements):

When datum is small and SmallVec is in inline mode:

  • SmallVec: bench_lazy_batch_column_clone_naive -- 75,691 ns/iter (+/- 10,705)
  • Vec: bench_lazy_batch_column_by_vec_clone -- 37,944 ns/iter (+/- 6,581)
  • Optimized SmallVec (this PR): bench_lazy_batch_column_clone -- 9,736 ns/iter (+/- 2,586)

When datum is large SmallVec is not in inline mode:

  • Vec: bench_lazy_batch_column_by_vec_clone -- 37,944 ns/iter (+/- 6,581)
  • Optimized SmallVec (this PR): bench_lazy_batch_column_clone_10bytes -- 47,573 ns/iter (+/- 19,126)

Encoded clone vs Decoded clone

For BatchRows, each column may be either encoded or decoded (in a specific data type). This shows their clone performance (1000 elements):

  • Encoded: bench_lazy_batch_column_clone -- 9,736 ns/iter (+/- 2,586)
  • Decoded: bench_lazy_batch_column_clone_decoded -- 312 ns/iter (+/- 69)

Retain

For Selection executor, we will retain rows according to expression evaluation result. This benchmark shows its performance (1000 elements):

  • Retain: bench_retain -- 2,240 ns/iter (+/- 769)

breeswish added some commits Sep 30, 2018

initial
Signed-off-by: Breezewish <breezewish@pingcap.com>
some refinement
Signed-off-by: Breezewish <breezewish@pingcap.com>
improve push_datum performance
Signed-off-by: Breezewish <breezewish@pingcap.com>
compile pass
Signed-off-by: Breezewish <breezewish@pingcap.com>
Fix unit tests
Signed-off-by: Breezewish <breezewish@pingcap.com>
BatchExecution Primitive Types
Signed-off-by: Breezewish <breezewish@pingcap.com>

@breeswish breeswish requested review from solotzg and huachaohuang Oct 10, 2018

@siddontang

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2018

SmallVec's default clone() implementation is not efficient and this PR writes our own (Optimized SmallVec)

what the clone is not efficient? where is our own Clone?

@breeswish

This comment has been minimized.

Copy link
Member Author

commented Oct 10, 2018

SmallVec's default clone() implementation is not efficient and this PR writes our own (Optimized SmallVec)

what the clone is not efficient? where is our own Clone?

SmallVec::clone() is not as efficient as our own implementation, basically because we are SmallVec<[u8]> which can be just Copyed instead of Clone each.

as_mut_accessor()
Signed-off-by: Breezewish <breezewish@pingcap.com>
@siddontang

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2018

got it.
Em, do we have any other places that use default Clone but can be replaced by Copy directly?

breeswish added some commits Oct 10, 2018

as_mut_accessor()
Signed-off-by: Breezewish <breezewish@pingcap.com>
@breeswish

This comment has been minimized.

Copy link
Member Author

commented Oct 10, 2018

@siddontang We are not using SmallVec in TiKV now.

@siddontang

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2018

@breeswish

what I mean not only SmallVec, but other Clone usage.

@breeswish

This comment has been minimized.

Copy link
Member Author

commented Oct 10, 2018

@siddontang Maybe we can investigate the performance of cloning vectors and see whether it can be faster. I don't have ideas about other kind of structures.

breeswish added some commits Oct 11, 2018

utilize as_mut_accessor()
Signed-off-by: Breezewish <breezewish@pingcap.com>

breeswish added some commits Oct 17, 2018

Merge upstream
Signed-off-by: Breezewish <breezewish@pingcap.com>
fix build issues
Signed-off-by: Breezewish <breezewish@pingcap.com>
Merge remote-tracking branch 'origin/master' into _batch/2_infra_typedef
Signed-off-by: Breezewish <breezewish@pingcap.com>
@breeswish

This comment has been minimized.

Copy link
Member Author

commented Nov 6, 2018

/run-integration-tests

breeswish and others added some commits Nov 7, 2018

Merge remote-tracking branch 'origin/master' into _batch/2_infra_type…
…def_and_batch

Signed-off-by: Breezewish <breezewish@pingcap.com>
@breeswish

This comment has been minimized.

Copy link
Member Author

commented Nov 8, 2018

This PR is reviewable now. Please take a look. Thanks! @huachaohuang @DorianZheng @solotzg

breeswish added some commits Nov 15, 2018

Implement a Clone that preserves capacity
Signed-off-by: Breezewish <breezewish@pingcap.com>
Address comments and add more tests
Signed-off-by: Breezewish <breezewish@pingcap.com>
@breeswish

This comment has been minimized.

Copy link
Member Author

commented Nov 15, 2018

@solotzg @huachaohuang PTAL again, thanks!

breeswish added some commits Nov 16, 2018

@huachaohuang
Copy link
Contributor

left a comment

LGTM

@breeswish breeswish merged commit 7c8ed8f into tikv:master Nov 17, 2018

3 checks passed

DCO All commits are signed off!
Details
ci/circleci: test Your tests passed on CircleCI!
Details
jenkins-ci-tikv/build Jenkins job succeeded.
Details

@breeswish breeswish deleted the breeswish:_batch/2_infra_typedef_and_batch branch Nov 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.