Skip to content

ES|QL - Allow full text functions to be used in STATS ... WHERE #125479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
5f4eff5
Add full text functions usage in STATS.. WHERE
carlosdelest Mar 24, 2025
748e576
Fix test
carlosdelest Mar 24, 2025
dc1dbe6
Skip rest compat test
carlosdelest Mar 24, 2025
3117d50
Fix test error message
carlosdelest Mar 24, 2025
3a753d2
Remove match operator check
carlosdelest Mar 26, 2025
2a8e987
Included ShardContext list into LocalExecutionPlannerContext
carlosdelest Mar 26, 2025
c99d373
Update docs/changelog/125479.yaml
carlosdelest Mar 26, 2025
2cc6898
Merge remote-tracking branch 'origin/main' into enhancement/esql-text…
carlosdelest Mar 26, 2025
425f146
Merge remote-tracking branch 'carlosdelest/enhancement/esql-text-sear…
carlosdelest Mar 26, 2025
c842f14
Add shard context information to aggregate filters
carlosdelest Mar 26, 2025
8a0f48f
Merge branch 'main' into enhancement/esql-text-search-functions-stats
carlosdelest Apr 28, 2025
dfce035
Allow FTFs in WHERE clauses but not in grouping clauses
carlosdelest May 5, 2025
3963fd3
Add CSV tests for STATS
carlosdelest May 5, 2025
85a2374
Add stats scores test
carlosdelest May 5, 2025
4f3228a
Merge remote-tracking branch 'origin/main' into enhancement/esql-text…
carlosdelest May 5, 2025
351d1ff
Fix capabilities
carlosdelest May 5, 2025
52dd7b9
Fix test
carlosdelest May 5, 2025
02f8748
Changed error message
carlosdelest May 5, 2025
f9bf170
[CI] Auto commit changes from spotless
elasticsearchmachine May 5, 2025
e98ad43
Forbid usage of _score aggregations on STATS when there is a WHERE cl…
carlosdelest May 9, 2025
40eaeb8
Merge remote-tracking branch 'origin/main' into enhancement/esql-text…
carlosdelest May 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ public record ShardConfig(Query query, IndexSearcher searcher) {}
private final List<ShardState> perShardState;

protected LuceneQueryEvaluator(BlockFactory blockFactory, ShardConfig[] shards) {
assert shards.length > 0 : "LuceneQueryEvaluator expects shard configs";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will help detect non-pushable expressions that need to be created with ShardConfig information

this.blockFactory = blockFactory;
this.shards = shards;
this.perShardState = new ArrayList<>(Collections.nCopies(shards.length, null));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -189,3 +189,28 @@ book_no:keyword
7140
2714
;

testKqlInStatsNonPushable
required_capability: kql_function
required_capability: full_text_functions_in_stats_where

from books
| where length(title) > 40
| stats c = count(*) where kql("title:Lord")
;

c:long
3
;

testKqlInStatsPushable
required_capability: kql_function
required_capability: full_text_functions_in_stats_where

from books
| stats c = count(*) where kql("author:tolkien")
;

c:long
22
;
Original file line number Diff line number Diff line change
Expand Up @@ -755,3 +755,29 @@ book_no:keyword
7140
2714
;

testMatchInStatsNonPushable
required_capability: match_function
required_capability: full_text_functions_in_stats_where

from books
| where length(title) > 40
| stats c = count(*) where match(title, "Lord")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some suggestions on the test coverages that I can think of, these also apply to the other full text functions and operators:

  1. Can we have some tests with a bit more complicated predicates in the where clause under stats? For example some combinations of and, or and not, with functions that can or cannot be pushed down to Lucent? Perhaps borrow some queries from the existing tests where the where clause is not under stats.
  2. I wonder how score works with where under stats, can we have some tests to capture how it behaves?
  3. Can we have some tests to cover multiple aggregate functions with where clause under stats? The predicates for each aggregation can have some overlaps, we have some optimization rules to deal with overlapped predicates under stats.
  4. Can we have some test with aggregation and grouping(BY) with full text functions under where clause?
  5. Add some full text functions with options for the completeness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some tests with a bit more complicated predicates in the where clause under stats? For example some combinations of and, or and not, with functions that can or cannot be pushed down to Lucent? Perhaps borrow some queries from the existing tests where the where clause is not under stats.
Can we have some tests to cover multiple aggregate functions with where clause under stats? The predicates for each aggregation can have some overlaps, we have some optimization rules to deal with overlapped predicates under stats.
Can we have some test with aggregation and grouping(BY) with full text functions under where clause?
Add some full text functions with options for the completeness.

I added testing in 3963fd3, hopefully that works!

I wonder how score works with where under stats, can we have some tests to capture how it behaves?

It affects scoring as well. I'm thinking that

FROM my_index METADATA _score
| WHERE match(field, "query")
| STATS c = AVG(_score)

is the same as

FROM my_index METADATA _score
| STATS c = AVG(_score) WHERE match(field, "query")

so using a FTF in a STATS WHERE clause affects scoring as well. I think it's better for consistency, but happy to discuss with the team.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It affects scoring as well. I'm thinking that

FROM my_index METADATA _score
| WHERE match(field, "query")
| STATS c = AVG(_score)

is the same as

FROM my_index METADATA _score
| STATS c = AVG(_score) WHERE match(field, "query")

so using a FTF in a STATS WHERE clause affects scoring as well. I think it's better for consistency, but happy to discuss with the team.

The example above makes sense to me. I wonder how the _score works, when there are multiple aggregate functions in a stats command? For example, does the query below make sense? avg and max's scores seem to come from different sources.

FROM my_index METADATA _score
| STATS avg = AVG(_score) WHERE match(field, "query1"), max = max(_score) WHERE match(field, "query2")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how the _score works, when there are multiple aggregate functions in a stats command?

@fang-xing-esql good catch. That doesn't work, as the filters are not pushed down to Lucene and thus there's no scoring available 😓

It could be doable to calculate the scoring on the individual WHERE clauses in the aggregation, but I'm inclined not to do it. We're looking into having separate scores for individual queries as a separate effort, and this kind of replicates some of those efforts.

What I've done is to disallow the usage of _score aggregations in STATS when that includes WHERE clauses. This allows to do things like:

from books metadata _score 
| where match(title, "Lord Rings", {"operator": "AND"})
| stats avg_score = avg(_score), max_score = max(_score), min_score = min(_score)

but does not allow to use WHERE in STATS:

from books metadata _score 
| stats avg_score = avg(_score) where match(title, "Lord Rings", {"operator": "AND"})

Change done in e98ad43

I think this is a good compromise, and we can work to lift that limitation later if needed be.

LMKWYT

;

c:long
3
;

testMatchInStatsPushable
required_capability: match_function
required_capability: full_text_functions_in_stats_where

from books
| stats c = count(*) where match(author, "tolkien")
;

c:long
22
;

Original file line number Diff line number Diff line change
Expand Up @@ -757,3 +757,29 @@ from semantic_text
host:keyword | semantic_text_field:text | language_name:keyword | language_code:integer
"host1" | live long and prosper | English | 1
;


testMatchInStatsNonPushable
required_capability: match_operator_colon
required_capability: full_text_functions_in_stats_where

from books
| where length(title) > 40
| stats c = count(*) where title:"Lord"
;

c:long
3
;

testMatchInStatsPushable
required_capability: match_operator_colon
required_capability: full_text_functions_in_stats_where

from books
| stats c = count(*) where author:"tolkien"
;

c:long
22
;
Original file line number Diff line number Diff line change
Expand Up @@ -210,3 +210,29 @@ book_no:keyword | title:text
7480 | The Hobbit
// end::qstr-with-options-result[]
;

testQstrInStatsNonPushable
required_capability: qstr_function
required_capability: full_text_functions_in_stats_where

from books
| where length(title) > 40
| stats c = count(*) where qstr("title:Lord")
;

c:long
3
;

testQstrInStatsPushable
required_capability: qstr_function
required_capability: full_text_functions_in_stats_where

from books
| stats c = count(*) where qstr("author:tolkien")
;

c:long
22
;

Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ public void testKqlQueryWithinEval() {
""";

var error = expectThrows(VerificationException.class, () -> run(query));
assertThat(error.getMessage(), containsString("[KQL] function is only supported in WHERE commands"));
assertThat(error.getMessage(), containsString("[KQL] function is only supported in WHERE and STATS ... WHERE commands"));
}

public void testInvalidKqlQueryEof() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -246,14 +246,35 @@ public void testWhereMatchWithRow() {
);
}

public void testMatchWithStats() {
var errorQuery = """
FROM test
| STATS c = count(*) BY match(content, "fox")
""";

var error = expectThrows(ElasticsearchException.class, () -> run(errorQuery));
assertThat(error.getMessage(), containsString("[MATCH] function is only supported in WHERE and STATS ... WHERE commands"));

var query = """
FROM test
| STATS c = count(*) WHERE match(content, "fox"), d = count(*) WHERE match(content, "dog")
""";

try (var resp = run(query)) {
assertColumnNames(resp.columns(), List.of("c", "d"));
assertColumnTypes(resp.columns(), List.of("long", "long"));
assertValues(resp.values(), List.of(List.of(2L, 4L)));
}
}

public void testMatchWithinEval() {
var query = """
FROM test
| EVAL matches_query = match(content, "fox")
""";

var error = expectThrows(VerificationException.class, () -> run(query));
assertThat(error.getMessage(), containsString("[MATCH] function is only supported in WHERE commands"));
assertThat(error.getMessage(), containsString("[MATCH] function is only supported in WHERE and STATS ... WHERE commands"));
}

private void createAndPopulateIndex() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,7 @@ public void testMatchWithinEval() {
""";

var error = expectThrows(VerificationException.class, () -> run(query));
assertThat(error.getMessage(), containsString("[:] operator is only supported in WHERE commands"));
assertThat(error.getMessage(), containsString("[:] operator is only supported in WHERE and STATS ... WHERE commands"));
}

public void testMatchWithNonTextField() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ public void testQueryStringWithinEval() {
""";

var error = expectThrows(VerificationException.class, () -> run(query));
assertThat(error.getMessage(), containsString("[QSTR] function is only supported in WHERE commands"));
assertThat(error.getMessage(), containsString("[QSTR] function is only supported in WHERE and STATS ... WHERE commands"));
}

public void testInvalidQueryStringEof() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ public void testTermWithinEval() {
""";

var error = expectThrows(VerificationException.class, () -> run(query));
assertThat(error.getMessage(), containsString("[Term] function is only supported in WHERE commands"));
assertThat(error.getMessage(), containsString("[Term] function is only supported in WHERE and STATS ... WHERE commands"));
}

public void testMultipleTerm() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -912,7 +912,12 @@ public enum Cap {
/**
* The metrics command
*/
METRICS_COMMAND(Build.current().isSnapshot());
METRICS_COMMAND(Build.current().isSnapshot()),

/**
* Full text functions in STATS ... WHERE
*/
FULL_TEXT_FUNCTIONS_IN_STATS_WHERE;

private final boolean enabled;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
import org.elasticsearch.xpack.esql.core.tree.Source;
import org.elasticsearch.xpack.esql.core.type.DataType;
import org.elasticsearch.xpack.esql.evaluator.mapper.EvaluatorMapper;
import org.elasticsearch.xpack.esql.expression.function.aggregate.FilteredExpression;
import org.elasticsearch.xpack.esql.expression.predicate.logical.BinaryLogic;
import org.elasticsearch.xpack.esql.expression.predicate.logical.Not;
import org.elasticsearch.xpack.esql.optimizer.rules.physical.local.LucenePushdownPredicates;
Expand Down Expand Up @@ -209,13 +210,32 @@ private static void checkFullTextQueryFunctions(LogicalPlan plan, Failures failu
failures
);
checkFullTextFunctionsParents(condition, failures);
} else if (plan instanceof Aggregate agg) {
agg.forEachExpression(exp -> checkFullTextFunctionsInAggs(exp, failures));
} else {
plan.forEachExpression(FullTextFunction.class, ftf -> {
failures.add(fail(ftf, "[{}] {} is only supported in WHERE commands", ftf.functionName(), ftf.functionType()));
failures.add(
fail(ftf, "[{}] {} is only supported in WHERE and STATS ... WHERE commands", ftf.functionName(), ftf.functionType())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STATS ... WHERE commands looks a bit weird to me, especially the ..., but I also had hard time thinking of a good way to describe this situation, some candidates that I can think of are:

{} is only supported in WHERE commands   ===> implies standalone or nested WHERE command
{} is only supported as filters in WHERE commands 
{} is only supported as predicates in WHERE commands 
{} is only supported in WHERE commands and WHERE commands under STATS commands
{} is only supported as filters in WHERE and STATS commands 
{} is only supported as predicates in WHERE and STATS commands 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is only supported as filters in WHERE and STATS commands works for me - changed it in 02f8748

);
});
}
}

private static void checkFullTextFunctionsInAggs(Expression expression, Failures failures) {
if (expression instanceof FilteredExpression) {
return;
}
for (Expression child : expression.children()) {
if (child instanceof FullTextFunction ftf) {
failures.add(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we could maybe add a method that builds the failure message and reuse it above as well.

fail(ftf, "[{}] {} is only supported in WHERE and STATS ... WHERE commands", ftf.functionName(), ftf.functionType())
);
return;
}
checkFullTextFunctionsInAggs(child, failures);
}
}

/**
* Checks all commands that exist before a specific type satisfy conditions.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,8 @@ public abstract class AbstractPhysicalOperationProviders implements PhysicalOper
public final PhysicalOperation groupingPhysicalOperation(
AggregateExec aggregateExec,
PhysicalOperation source,
LocalExecutionPlannerContext context
LocalExecutionPlannerContext context,
List<EsPhysicalOperationProviders.ShardContext> shardContexts
) {
// The layout this operation will produce.
Layout.Builder layout = new Layout.Builder();
Expand Down Expand Up @@ -95,7 +96,8 @@ public final PhysicalOperation groupingPhysicalOperation(
aggregatorMode,
sourceLayout,
false, // non-grouping
s -> aggregatorFactories.add(s.supplier.aggregatorFactory(s.mode, s.channels))
s -> aggregatorFactories.add(s.supplier.aggregatorFactory(s.mode, s.channels)),
shardContexts
);

if (aggregatorFactories.isEmpty() == false) {
Expand Down Expand Up @@ -169,7 +171,8 @@ else if (aggregatorMode.isOutputPartial()) {
aggregatorMode,
sourceLayout,
true, // grouping
s -> aggregatorFactories.add(s.supplier.groupingAggregatorFactory(s.mode, s.channels))
s -> aggregatorFactories.add(s.supplier.groupingAggregatorFactory(s.mode, s.channels)),
shardContexts
);

if (groupSpecs.size() == 1 && groupSpecs.get(0).channel == null) {
Expand Down Expand Up @@ -259,7 +262,8 @@ private void aggregatesToFactory(
AggregatorMode mode,
Layout layout,
boolean grouping,
Consumer<AggFunctionSupplierContext> consumer
Consumer<AggFunctionSupplierContext> consumer,
List<EsPhysicalOperationProviders.ShardContext> shardContexts
) {
// extract filtering channels - and wrap the aggregation with the new evaluator expression only during the init phase
for (NamedExpression ne : aggregates) {
Expand Down Expand Up @@ -319,7 +323,8 @@ else if (mode == AggregatorMode.FINAL || mode == AggregatorMode.INTERMEDIATE) {
EvalOperator.ExpressionEvaluator.Factory evalFactory = EvalMapper.toEvaluator(
foldContext,
aggregateFunction.filter(),
layout
layout,
shardContexts
);
aggSupplier = new FilteredAggregatorFunctionSupplier(aggSupplier, evalFactory);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@ private PhysicalOperation planRrfScoreEvalExec(RrfScoreEvalExec rrf, LocalExecut

private PhysicalOperation planAggregation(AggregateExec aggregate, LocalExecutionPlannerContext context) {
var source = plan(aggregate.child(), context);
return physicalOperationProviders.groupingPhysicalOperation(aggregate, source, context);
return physicalOperationProviders.groupingPhysicalOperation(aggregate, source, context, shardContexts);
}

private PhysicalOperation planEsQueryNode(EsQueryExec esQueryExec, LocalExecutionPlannerContext context) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
import org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.LocalExecutionPlannerContext;
import org.elasticsearch.xpack.esql.planner.LocalExecutionPlanner.PhysicalOperation;

import java.util.List;

interface PhysicalOperationProviders {
PhysicalOperation fieldExtractPhysicalOperation(FieldExtractExec fieldExtractExec, PhysicalOperation source);

Expand All @@ -21,6 +23,7 @@ interface PhysicalOperationProviders {
PhysicalOperation groupingPhysicalOperation(
AggregateExec aggregateExec,
PhysicalOperation source,
LocalExecutionPlannerContext context
LocalExecutionPlannerContext context,
List<EsPhysicalOperationProviders.ShardContext> shardContexts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's feels a bit odd to provide an implementation piece of EsPhysicalOperationProviders (in the list) to the interface, given that actually EsPhysicalOperationProviders, which implements this if'ace, already has the list of shardContexts. I wonder if we could embbed it into the LocalExecutionPlannerContext itself, since it's local execution specific anyways.
Maybe it should be passed it to LocalExecutionPlaner#plan from ComputeService#runCompute, where the list of shard contexts is already available?

Copy link
Member Author

@carlosdelest carlosdelest Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - I did that change in 2a8e987, and some follow up work in c842f14

);
}
Original file line number Diff line number Diff line change
Expand Up @@ -1208,7 +1208,7 @@ public void testWeightedAvg() {
public void testMatchInsideEval() throws Exception {
assumeTrue("Match operator is available just for snapshots", Build.current().isSnapshot());
assertEquals(
"1:36: [:] operator is only supported in WHERE commands\n"
"1:36: [:] operator is only supported in WHERE and STATS ... WHERE commands\n"
+ "line 1:36: [:] operator cannot operate on [title], which is not a field from an index mapping",
error("row title = \"brown fox\" | eval x = title:\"fox\" ")
);
Expand Down Expand Up @@ -1368,12 +1368,12 @@ public void testKqlFunctionsNotAllowedAfterCommands() throws Exception {
}

public void testQueryStringFunctionOnlyAllowedInWhere() throws Exception {
assertEquals("1:9: [QSTR] function is only supported in WHERE commands", error("row a = qstr(\"Anna\")"));
assertEquals("1:9: [QSTR] function is only supported in WHERE and STATS ... WHERE commands", error("row a = qstr(\"Anna\")"));
checkFullTextFunctionsOnlyAllowedInWhere("QSTR", "qstr(\"Anna\")", "function");
}

public void testKqlFunctionOnlyAllowedInWhere() throws Exception {
assertEquals("1:9: [KQL] function is only supported in WHERE commands", error("row a = kql(\"Anna\")"));
assertEquals("1:9: [KQL] function is only supported in WHERE and STATS ... WHERE commands", error("row a = kql(\"Anna\")"));
checkFullTextFunctionsOnlyAllowedInWhere("KQL", "kql(\"Anna\")", "function");
}

Expand All @@ -1393,23 +1393,15 @@ public void testMatchOperatornOnlyAllowedInWhere() throws Exception {
private void checkFullTextFunctionsOnlyAllowedInWhere(String functionName, String functionInvocation, String functionType)
throws Exception {
assertEquals(
"1:22: [" + functionName + "] " + functionType + " is only supported in WHERE commands",
"1:22: [" + functionName + "] " + functionType + " is only supported in WHERE and STATS ... WHERE commands",
error("from test | eval y = " + functionInvocation)
);
assertEquals(
"1:18: [" + functionName + "] " + functionType + " is only supported in WHERE commands",
"1:18: [" + functionName + "] " + functionType + " is only supported in WHERE and STATS ... WHERE commands",
error("from test | sort " + functionInvocation + " asc")
);
assertEquals(
"1:23: [" + functionName + "] " + functionType + " is only supported in WHERE commands",
error("from test | STATS c = " + functionInvocation + " BY first_name")
);
assertEquals(
"1:50: [" + functionName + "] " + functionType + " is only supported in WHERE commands",
error("from test | stats max_salary = max(salary) where " + functionInvocation)
);
assertEquals(
"1:47: [" + functionName + "] " + functionType + " is only supported in WHERE commands",
"1:47: [" + functionName + "] " + functionType + " is only supported in WHERE and STATS ... WHERE commands",
error("from test | stats max_salary = max(salary) by " + functionInvocation)
);
}
Expand Down
Loading