Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coprocessor: use mur3 to calculate fmsketch #14204

Merged
merged 7 commits into from Feb 16, 2023

Conversation

xuyifangreeneyes
Copy link
Contributor

@xuyifangreeneyes xuyifangreeneyes commented Feb 12, 2023

Signed-off-by: xuyifangreeneyes xuyifangreeneyes@gmail.com

What is changed and how it works?

Issue Number: ref #14231

What's Changed:

Use https://github.com/tikv/mur3 to calculate fmsketch, which is more efficient.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Before the PR:

test coprocessor::statistics::analyze::benches::bench_collect_column                        ... bench:         215 ns/iter (+/- 29)
test coprocessor::statistics::analyze::benches::bench_collect_column_group                  ... bench:         183 ns/iter (+/- 14)

After the PR:

test coprocessor::statistics::analyze::benches::bench_collect_column                        ... bench:         119 ns/iter (+/- 24)
test coprocessor::statistics::analyze::benches::bench_collect_column_group                  ... bench:         120 ns/iter (+/- 13)

Also, I tested end-to-end analyze commands on OSSInsight data. The analyze time is reduced from 702s to 650s.

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM
  • Breaking backward compatibility

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Feb 12, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • BusyJay
  • skyzh

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

Signed-off-by: xuyifan <675434007@qq.com>
Signed-off-by: xuyifan <675434007@qq.com>
Signed-off-by: xuyifan <675434007@qq.com>
Signed-off-by: xuyifan <675434007@qq.com>
@xuyifangreeneyes
Copy link
Contributor Author

@winoros

@xuyifangreeneyes
Copy link
Contributor Author

/cc @BusyJay @skyzh

Cargo.toml Outdated
@@ -117,6 +117,7 @@ match-template = "0.0.1"
memory_trace_macros = { workspace = true }
mime = "0.3.13"
more-asserts = "0.2"
mur3 = "0.1"
murmur3 = "0.5.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed.

Signed-off-by: xuyifan <675434007@qq.com>
Signed-off-by: xuyifan <675434007@qq.com>
@xuyifangreeneyes
Copy link
Contributor Author

/run-build

Copy link
Member

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

}
self.fm_sketches[col_len + i].insert_hash_value(hasher.finish());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks a little bit suspicious to me that we are using different ways of computation for len = 1 and len != 1, I don't think it's consistent with the previous way of computation...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mur3 doc suggests, Hasher version(Hasher128) is a little slower than the function version murmurhash3_x64_128. Hence, when len = 1, we use the faster one(murmurhash3_x64_128), while when len > 1, we use Hasher128 to avoid concating byte slices.

@ti-chi-bot ti-chi-bot added the status/LGT1 Status: PR - There is already 1 approval label Feb 15, 2023
Copy link
Member

@BusyJay BusyJay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any performance difference?

Copy link
Member

@BusyJay BusyJay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. LGTM

@ti-chi-bot ti-chi-bot removed the status/LGT1 Status: PR - There is already 1 approval label Feb 16, 2023
@ti-chi-bot ti-chi-bot added the status/LGT2 Status: PR - There are already 2 approvals label Feb 16, 2023
@xuyifangreeneyes
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

@xuyifangreeneyes: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

@xuyifangreeneyes: /merge is only allowed for the committers, you can assign this pull request to the committer in list by filling /assign @committer in the comment to help merge this pull request.

In response to this:

/merge

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@sticnarf
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

@sticnarf: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: b764980

@ti-chi-bot ti-chi-bot added the status/can-merge Status: Can merge to base branch label Feb 16, 2023
@ti-chi-bot ti-chi-bot merged commit bbe06e9 into tikv:master Feb 16, 2023
@ti-chi-bot ti-chi-bot added this to the Pool milestone Feb 16, 2023
@xuyifangreeneyes xuyifangreeneyes deleted the use-mur3 branch February 16, 2023 09:28
iosmanthus pushed a commit to iosmanthus/tikv that referenced this pull request Oct 30, 2023
* json, copr: implement unary not for json (tikv#14070)

close tikv#14069

Signed-off-by: YangKeao <yangkeao@chunibyo.icu>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>

* copr: (refactor) Lift heap struct out from top_n_executor (tikv#14096)

ref tikv#13936

Signed-off-by: Zhi Qi <qizhi@pingcap.com>

* copr: (feat) Implement operator PartitionTopN (tikv#14116)

ref tikv#13936

Signed-off-by: Zhi Qi <qizhi@pingcap.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>

* copr: fix error when cast const Enum to any type (tikv#14149)

close tikv#14156, close pingcap/tidb#40341

copr: fix error when cast const Enum to any type

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>

* coprocessor: use mur3 to calculate fmsketch (tikv#14204)

ref tikv#14231

Signed-off-by: xuyifan <675434007@qq.com>

* copr: early stop paging copr when resultset is drained. (tikv#14209)

close tikv#14254

When the result set is drained, it indicates that no more data is required in the range.
This PR set the scanned range to None to avoid the following paging requests in the current range.

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>

* coprocessor: avoid unnecessary vec allocation in collect_column_stats (tikv#14280)

ref tikv#14231

When collect_column_stats handles each row, reuse column_vals and collation_key_vals to avoid allocating many small objects.

Signed-off-by: xuyifan <675434007@qq.com>

* copr: (enhance) support executor limit with partition_by fields (tikv#14359)

ref tikv#13936

Signed-off-by: Zhi Qi <qizhi@pingcap.com>

* coprocessor: reuse EvalContext in collect_column_stats (tikv#14376)

ref tikv#14231

Signed-off-by: xuyifan <675434007@qq.com>

* copr: fix extral physical table id when idx key < `MAX_OLD_ENCODED_VALUE_LEN` (tikv#14618)

close tikv#14619

fix a bug with `process_old_collation_kv` function. 

related with tikv#11931, forget process `physical_table_id_column_cnt` in process_old_collation_kv function

Signed-off-by: Jason Mo <mohangjie1995@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>

---------

Signed-off-by: Zhi Qi <qizhi@pingcap.com>
Signed-off-by: xuyifan <675434007@qq.com>
Co-authored-by: YangKeao <yangkeao@chunibyo.icu>
Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Co-authored-by: Zhi Qi <30543181+LittleFall@users.noreply.github.com>
Co-authored-by: Shenghui Wu <793703860@qq.com>
Co-authored-by: Yifan Xu <30385241+xuyifangreeneyes@users.noreply.github.com>
Co-authored-by: you06 <you1474600@gmail.com>
Co-authored-by: Hangjie Mo <mohangjie1995@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution Type: PR - From contributors release-note-none size/L status/can-merge Status: Can merge to base branch status/LGT2 Status: PR - There are already 2 approvals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants