coprocessor: use mur3 to calculate fmsketch #14204

xuyifangreeneyes · 2023-02-12T15:35:16Z

Signed-off-by: xuyifangreeneyes xuyifangreeneyes@gmail.com

What is changed and how it works?

Issue Number: ref #14231

What's Changed:

Use https://github.com/tikv/mur3 to calculate fmsketch, which is more efficient.

Related changes

PR to update pingcap/docs/pingcap/docs-cn:
Need to cherry-pick to the release branch

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Before the PR:

test coprocessor::statistics::analyze::benches::bench_collect_column                        ... bench:         215 ns/iter (+/- 29)
test coprocessor::statistics::analyze::benches::bench_collect_column_group                  ... bench:         183 ns/iter (+/- 14)

After the PR:

test coprocessor::statistics::analyze::benches::bench_collect_column                        ... bench:         119 ns/iter (+/- 24)
test coprocessor::statistics::analyze::benches::bench_collect_column_group                  ... bench:         120 ns/iter (+/- 13)

Also, I tested end-to-end analyze commands on OSSInsight data. The analyze time is reduced from 702s to 650s.

Side effects

Performance regression
- Consumes more CPU
- Consumes more MEM
Breaking backward compatibility

Release note

None

ti-chi-bot · 2023-02-12T15:35:17Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

BusyJay
skyzh

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

Signed-off-by: xuyifan <675434007@qq.com>

xuyifangreeneyes · 2023-02-15T10:08:10Z

@winoros

xuyifangreeneyes · 2023-02-15T10:08:33Z

/cc @BusyJay @skyzh

BusyJay · 2023-02-15T10:27:33Z

Cargo.toml

@@ -117,6 +117,7 @@ match-template = "0.0.1"
 memory_trace_macros = { workspace = true }
 mime = "0.3.13"
 more-asserts = "0.2"
+mur3 = "0.1"
 murmur3 = "0.5.1"


Can this be removed?

Yes. Fixed.

Signed-off-by: xuyifan <675434007@qq.com>

xuyifangreeneyes · 2023-02-15T12:30:32Z

/run-build

skyzh

Rest LGTM

skyzh · 2023-02-15T17:25:38Z

src/coprocessor/statistics/analyze.rs

                }
+                self.fm_sketches[col_len + i].insert_hash_value(hasher.finish());


looks a little bit suspicious to me that we are using different ways of computation for len = 1 and len != 1, I don't think it's consistent with the previous way of computation...

As mur3 doc suggests, Hasher version(Hasher128) is a little slower than the function version murmurhash3_x64_128. Hence, when len = 1, we use the faster one(murmurhash3_x64_128), while when len > 1, we use Hasher128 to avoid concating byte slices.

BusyJay

Any performance difference?

BusyJay

I see. LGTM

xuyifangreeneyes · 2023-02-16T09:00:07Z

/merge

ti-chi-bot · 2023-02-16T09:00:09Z

@xuyifangreeneyes: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2023-02-16T09:00:11Z

@xuyifangreeneyes: /merge is only allowed for the committers, you can assign this pull request to the committer in list by filling /assign @committer in the comment to help merge this pull request.

In response to this:

/merge

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

sticnarf · 2023-02-16T09:01:27Z

/merge

ti-chi-bot · 2023-02-16T09:01:28Z

@sticnarf: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2023-02-16T09:01:30Z

This pull request has been accepted and is ready to merge.

Commit hash: b764980

* json, copr: implement unary not for json (tikv#14070) close tikv#14069 Signed-off-by: YangKeao <yangkeao@chunibyo.icu> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * copr: (refactor) Lift heap struct out from top_n_executor (tikv#14096) ref tikv#13936 Signed-off-by: Zhi Qi <qizhi@pingcap.com> * copr: (feat) Implement operator PartitionTopN (tikv#14116) ref tikv#13936 Signed-off-by: Zhi Qi <qizhi@pingcap.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * copr: fix error when cast const Enum to any type (tikv#14149) close tikv#14156, close pingcap/tidb#40341 copr: fix error when cast const Enum to any type Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * coprocessor: use mur3 to calculate fmsketch (tikv#14204) ref tikv#14231 Signed-off-by: xuyifan <675434007@qq.com> * copr: early stop paging copr when resultset is drained. (tikv#14209) close tikv#14254 When the result set is drained, it indicates that no more data is required in the range. This PR set the scanned range to None to avoid the following paging requests in the current range. Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * coprocessor: avoid unnecessary vec allocation in collect_column_stats (tikv#14280) ref tikv#14231 When collect_column_stats handles each row, reuse column_vals and collation_key_vals to avoid allocating many small objects. Signed-off-by: xuyifan <675434007@qq.com> * copr: (enhance) support executor limit with partition_by fields (tikv#14359) ref tikv#13936 Signed-off-by: Zhi Qi <qizhi@pingcap.com> * coprocessor: reuse EvalContext in collect_column_stats (tikv#14376) ref tikv#14231 Signed-off-by: xuyifan <675434007@qq.com> * copr: fix extral physical table id when idx key < `MAX_OLD_ENCODED_VALUE_LEN` (tikv#14618) close tikv#14619 fix a bug with `process_old_collation_kv` function. related with tikv#11931, forget process `physical_table_id_column_cnt` in process_old_collation_kv function Signed-off-by: Jason Mo <mohangjie1995@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> --------- Signed-off-by: Zhi Qi <qizhi@pingcap.com> Signed-off-by: xuyifan <675434007@qq.com> Co-authored-by: YangKeao <yangkeao@chunibyo.icu> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Co-authored-by: Zhi Qi <30543181+LittleFall@users.noreply.github.com> Co-authored-by: Shenghui Wu <793703860@qq.com> Co-authored-by: Yifan Xu <30385241+xuyifangreeneyes@users.noreply.github.com> Co-authored-by: you06 <you1474600@gmail.com> Co-authored-by: Hangjie Mo <mohangjie1995@gmail.com>

ti-chi-bot added do-not-merge/needs-linked-issue release-note do-not-merge/work-in-progress contribution Type: PR - From contributors size/L release-note-none and removed do-not-merge/needs-linked-issue release-note labels Feb 12, 2023

xuyifangreeneyes marked this pull request as ready for review February 15, 2023 09:56

ti-chi-bot removed the do-not-merge/work-in-progress label Feb 15, 2023

xuyifangreeneyes added 4 commits February 15, 2023 18:05

add test

3d10d89

Signed-off-by: xuyifan <675434007@qq.com>

compare

8f7cc3a

Signed-off-by: xuyifan <675434007@qq.com>

upd

baf5144

Signed-off-by: xuyifan <675434007@qq.com>

make format

5d78717

Signed-off-by: xuyifan <675434007@qq.com>

xuyifangreeneyes force-pushed the use-mur3 branch from 46d878d to 5d78717 Compare February 15, 2023 10:06

ti-chi-bot requested review from BusyJay and skyzh February 15, 2023 10:08

BusyJay reviewed Feb 15, 2023

View reviewed changes

xuyifangreeneyes added 2 commits February 15, 2023 18:54

remove murmur3 and fix clippy

03bb38a

Signed-off-by: xuyifan <675434007@qq.com>

make format

0209a29

Signed-off-by: xuyifan <675434007@qq.com>

skyzh approved these changes Feb 15, 2023

View reviewed changes

ti-chi-bot added the status/LGT1 Status: PR - There is already 1 approval label Feb 15, 2023

xuyifangreeneyes requested a review from BusyJay February 16, 2023 02:51

BusyJay reviewed Feb 16, 2023

View reviewed changes

BusyJay approved these changes Feb 16, 2023

View reviewed changes

ti-chi-bot removed the status/LGT1 Status: PR - There is already 1 approval label Feb 16, 2023

ti-chi-bot added the status/LGT2 Status: PR - There are already 2 approvals label Feb 16, 2023

Merge branch 'master' into use-mur3

b764980

ti-chi-bot added the status/can-merge Status: Can merge to base branch label Feb 16, 2023

ti-chi-bot merged commit bbe06e9 into tikv:master Feb 16, 2023

ti-chi-bot added this to the Pool milestone Feb 16, 2023

xuyifangreeneyes deleted the use-mur3 branch February 16, 2023 09:28

xuyifangreeneyes mentioned this pull request Mar 6, 2023

Improve Analyze Performance and Stability pingcap/tidb#41930

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coprocessor: use mur3 to calculate fmsketch #14204

coprocessor: use mur3 to calculate fmsketch #14204

xuyifangreeneyes commented Feb 12, 2023 •

edited

ti-chi-bot commented Feb 12, 2023 •

edited

xuyifangreeneyes commented Feb 15, 2023

xuyifangreeneyes commented Feb 15, 2023

BusyJay Feb 15, 2023

xuyifangreeneyes Feb 15, 2023

xuyifangreeneyes commented Feb 15, 2023

skyzh left a comment

skyzh Feb 15, 2023

xuyifangreeneyes Feb 16, 2023

BusyJay left a comment

BusyJay left a comment

xuyifangreeneyes commented Feb 16, 2023

ti-chi-bot commented Feb 16, 2023

ti-chi-bot commented Feb 16, 2023

sticnarf commented Feb 16, 2023

ti-chi-bot commented Feb 16, 2023

ti-chi-bot commented Feb 16, 2023

		}
		self.fm_sketches[col_len + i].insert_hash_value(hasher.finish());

coprocessor: use mur3 to calculate fmsketch #14204

coprocessor: use mur3 to calculate fmsketch #14204

Conversation

xuyifangreeneyes commented Feb 12, 2023 • edited

What is changed and how it works?

Related changes

Check List

Release note

ti-chi-bot commented Feb 12, 2023 • edited

xuyifangreeneyes commented Feb 15, 2023

xuyifangreeneyes commented Feb 15, 2023

BusyJay Feb 15, 2023

Choose a reason for hiding this comment

xuyifangreeneyes Feb 15, 2023

Choose a reason for hiding this comment

xuyifangreeneyes commented Feb 15, 2023

skyzh left a comment

Choose a reason for hiding this comment

skyzh Feb 15, 2023

Choose a reason for hiding this comment

xuyifangreeneyes Feb 16, 2023

Choose a reason for hiding this comment

BusyJay left a comment

Choose a reason for hiding this comment

BusyJay left a comment

Choose a reason for hiding this comment

xuyifangreeneyes commented Feb 16, 2023

ti-chi-bot commented Feb 16, 2023

ti-chi-bot commented Feb 16, 2023

sticnarf commented Feb 16, 2023

ti-chi-bot commented Feb 16, 2023

ti-chi-bot commented Feb 16, 2023

xuyifangreeneyes commented Feb 12, 2023 •

edited

ti-chi-bot commented Feb 12, 2023 •

edited