Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support re-enabling partial aggregation adaptively #11361

Closed
lukasz-stec opened this issue Mar 8, 2022 · 0 comments · Fixed by #17143
Closed

Support re-enabling partial aggregation adaptively #11361

lukasz-stec opened this issue Mar 8, 2022 · 0 comments · Fixed by #17143

Comments

@lukasz-stec
Copy link
Member

With #11011, trino aggregation operator partial step can be switched off if, for the rows processed so far, it did not reduce the number of rows much (most of the input rows are distinct).

If this happens, but the rows that are yet to be processed have a different distribution i.e. a small number of distinct values, we want the partial aggregation step to be re-enabled.

One idea of how to implement this is to calculate or estimate (e.g. using hyper log log) the number of distinct values in the split once in a while (possibly with exponential backoff for the window between calculations), and enable partial aggregation again if the number of distinct values to input position count is low for the given split.

Another idea, that may not be doable but I will just put it here, is that if we had per split statistics of the number of distinct values per column (with the correlation between column stats in a perfect world), we could decide to enable or disable partial aggregation on per split basis.
Parquet format has support for per column chunk and per page distinct_count but I suspect it's not populated in most real-life scenarios

@lukasz-stec lukasz-stec changed the title Support enabling partial aggregation adaptively Support re-enabling partial aggregation adaptively Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

1 participant