Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve TopN row number / rank performance #16753

Merged
merged 4 commits into from Apr 10, 2023

Conversation

pettyjamesm
Copy link
Member

@pettyjamesm pettyjamesm commented Mar 27, 2023

Description

Avoids inserting and accounting for new Pages in GroupedTopN{RowNumber, Rank} until we identify the first row that would need to be inserted. When no new rows are inserted, a significant amount of overhead can be avoided entirely which can be common enough to yield appreciable throughput improvements.

Benchmark Results:

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# General
* Improve performance of selective TopN row number and rank queries

@cla-bot cla-bot bot added the cla-signed label Mar 27, 2023
@pettyjamesm pettyjamesm force-pushed the improve-grouped-topn-row-number branch from 7fe5a33 to 4f6d7f7 Compare March 28, 2023 13:01
@pettyjamesm pettyjamesm force-pushed the improve-grouped-topn-row-number branch from 784b656 to 2f37c7b Compare March 28, 2023 20:19
@pettyjamesm pettyjamesm force-pushed the improve-grouped-topn-row-number branch 2 times, most recently from e510847 to 8bcb55e Compare March 30, 2023 21:39
@pettyjamesm pettyjamesm changed the title Improve GroupedTopNRowNumber performance Improve TopN row number / rank performance Mar 30, 2023
@pettyjamesm pettyjamesm force-pushed the improve-grouped-topn-row-number branch 2 times, most recently from d79351f to 42225ce Compare April 3, 2023 14:26
Adds a new benchmark method that only benchmarks processPage without
attempting to produce the operator output since that is a separate
operation. Also changes the benchmark to insert the same page more than
once to exercise situations where no new rows necessarily need to be
added to the TopN heap.
Changes include:
- returning created objects directly from IdRegistry instead their id
- using a CMOV friendly compaction scheme in RowReferencePageManager
- making classes final for better JIT inlining opportunities
Avoids inserting and accounting for new Pages in GroupedTopNBuilder
until we identify the first row that would need to be inserted. When no
new rows are inserted, a significant amount of overhead can be avoided
entirely which in some situation is common enough to produce appreciable
throughput improvements.
@pettyjamesm pettyjamesm force-pushed the improve-grouped-topn-row-number branch from 42225ce to 2040fba Compare April 5, 2023 15:18
@pettyjamesm
Copy link
Member Author

@erichwang / @martint - build is green and the PR is ready for review when you get a chance

@martint martint merged commit ccb06d5 into trinodb:master Apr 10, 2023
89 checks passed
@pettyjamesm pettyjamesm deleted the improve-grouped-topn-row-number branch April 10, 2023 19:44
@github-actions github-actions bot added this to the 413 milestone Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants