Skip to content

Avoid O(N^2) in VALUES with ordinals grouping #130576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jul 3, 2025

Using the VALUES aggregator with ordinals grouping led to accidental quadratic complexity. Queries like FROM .. | STATS ... VALUES(field) ... BY keyword-field are affected by this performance issue. This change caches a sorted structure - previously used to fix a similar O(N^2) problem when emitting the output block - during the merging phase of the OrdinalGroupingOperator.

@dnhatn dnhatn force-pushed the fix-values-aggregator branch from 73349d4 to ef7d00c Compare July 3, 2025 22:29
@dnhatn dnhatn closed this Jul 3, 2025
@dnhatn dnhatn deleted the fix-values-aggregator branch July 3, 2025 22:29
@dnhatn dnhatn restored the fix-values-aggregator branch July 3, 2025 22:29
@dnhatn dnhatn reopened this Jul 3, 2025
@dnhatn dnhatn force-pushed the fix-values-aggregator branch from ef7d00c to 81326cb Compare July 3, 2025 22:39
@dnhatn dnhatn force-pushed the fix-values-aggregator branch from 81326cb to 0ee7d9e Compare July 4, 2025 03:49
@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn
Copy link
Member Author

dnhatn commented Jul 4, 2025

I updated the benchmark to simulate the ordinal grouping operator. With the main branch, I could not complete the benchmark with 1,000,000 groups, but with the fix, it took 1625ms. The benchmark results are below.

Before:
Benchmark                      (dataType)  (groups)  (numOrdinalMerges)  Mode  Cnt      Score   Error  Units
ValuesAggregatorBenchmark.run    BytesRef         1                   0  avgt    2      3.680          ms/op
ValuesAggregatorBenchmark.run    BytesRef         1                   1  avgt    2      3.632          ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000                   0  avgt    2      2.515          ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000                   1  avgt    2      9.397          ms/op
ValuesAggregatorBenchmark.run    BytesRef    200000                   0  avgt    2    148.966          ms/op
ValuesAggregatorBenchmark.run    BytesRef    200000                   1  avgt    2  90055.908          ms/op
ValuesAggregatorBenchmark.run         int         1                   0  avgt    2      0.494          ms/op
ValuesAggregatorBenchmark.run         int         1                   1  avgt    2      0.488          ms/op
ValuesAggregatorBenchmark.run         int      1000                   0  avgt    2      2.788          ms/op
ValuesAggregatorBenchmark.run         int      1000                   1  avgt    2      8.232          ms/op
ValuesAggregatorBenchmark.run         int    200000                   0  avgt    2    198.020          ms/op
ValuesAggregatorBenchmark.run         int    200000                   1  avgt    2  70918.020          ms/op
ValuesAggregatorBenchmark.run        long         1                   0  avgt    2      0.862          ms/op
ValuesAggregatorBenchmark.run        long         1                   1  avgt    2      0.873          ms/op
ValuesAggregatorBenchmark.run        long      1000                   0  avgt    2      4.212          ms/op
ValuesAggregatorBenchmark.run        long      1000                   1  avgt    2     10.450          ms/op
ValuesAggregatorBenchmark.run        long    200000                   0  avgt    2    257.926          ms/op
ValuesAggregatorBenchmark.run        long    200000                   1  avgt    2  75686.076          ms/op



After:
Benchmark                      (dataType)  (groups)  (numOrdinalMerges)  Mode  Cnt     Score   Error  Units
ValuesAggregatorBenchmark.run    BytesRef         1                   0  avgt    2     3.909          ms/op
ValuesAggregatorBenchmark.run    BytesRef         1                   1  avgt    2     3.951          ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000                   0  avgt    2     2.635          ms/op
ValuesAggregatorBenchmark.run    BytesRef      1000                   1  avgt    2     2.703          ms/op
ValuesAggregatorBenchmark.run    BytesRef   1000000                   0  avgt    2  1519.385          ms/op
ValuesAggregatorBenchmark.run    BytesRef   1000000                   1  avgt    2  1623.915          ms/op
ValuesAggregatorBenchmark.run         int         1                   0  avgt    2     0.601          ms/op
ValuesAggregatorBenchmark.run         int         1                   1  avgt    2     0.613          ms/op
ValuesAggregatorBenchmark.run         int      1000                   0  avgt    2     2.504          ms/op
ValuesAggregatorBenchmark.run         int      1000                   1  avgt    2     2.591          ms/op
ValuesAggregatorBenchmark.run         int   1000000                   0  avgt    2  1396.017          ms/op
ValuesAggregatorBenchmark.run         int   1000000                   1  avgt    2  1441.373          ms/op
ValuesAggregatorBenchmark.run        long         1                   0  avgt    2     0.598          ms/op
ValuesAggregatorBenchmark.run        long         1                   1  avgt    2     0.597          ms/op
ValuesAggregatorBenchmark.run        long      1000                   0  avgt    2     2.397          ms/op
ValuesAggregatorBenchmark.run        long      1000                   1  avgt    2     2.510          ms/op
ValuesAggregatorBenchmark.run        long   1000000                   0  avgt    2  1538.923          ms/op
ValuesAggregatorBenchmark.run        long   1000000                   1  avgt    2  1625.971          ms/op

@dnhatn dnhatn requested a review from nik9000 July 4, 2025 04:01
@dnhatn dnhatn marked this pull request as ready for review July 4, 2025 04:02
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.4 v8.19.1 v9.0.4 v9.1.1 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants