Skip to content

ES|QL: No plain strings in Literal #129399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

luigidellaquila
Copy link
Contributor

@luigidellaquila luigidellaquila commented Jun 13, 2025

Fixes: #129322

Making sure that all the TEXT, KEYWORD and VERSION Literals contain a BytesRef (and not a plain java.lang.String).

The mix of the two was the reason for some ClassCastExceptions, eg. #129322

Steps:

  • Force the conversion from String to BytesRef inside Literal constructor
  • Make all the tests pass
  • Replace the forced conversion with an assert, checking that no java.lang.String is passed to the constructor
  • Make tests pass again

@luigidellaquila luigidellaquila marked this pull request as ready for review June 17, 2025 14:01
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

} else if (dataType == VERSION && value instanceof BytesRef br) {
str = new Version(br).toString();
// } else if (dataType == IP && value instanceof BytesRef ip) {
// str = DocValueFormat.IP.format(ip);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case, is this intentionally commented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... unfortunately it is, for two reasons:

  • IP conversion from/to BytesRefs could trigger assertion errors, and it's pretty annoying in tests
  • We are explicitly using strings for IPs in some cases, see CobineDisjunctions; we'll have to change that before we can force IPs as BytesRefs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add this as a comment/todo referencing an issue to resolve it eventually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue and I'm linking it in a TODO

Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

if (dataType == KEYWORD || dataType == TEXT) {
str = BytesRefs.toString(value);
} else if (dataType == VERSION && value instanceof BytesRef br) {
str = new Version(br).toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an improvement? Or were Versions a readable String before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had some cases (mostly unit tests) where we passed plain strings to VERSION literals.
Even for tests it's not a good thing, as we are testing something that is different than the actual production execution.

@@ -398,7 +398,7 @@ eth0 |gamma |fe80::cae2:65ff:fece:feb9
lo0 |gamma |fe80::cae2:65ff:fece:feb9 |fe81::cae2:65ff:fece:feb9
;

cdirMatchEqualsInsOrsIPs
cicdirMatchEqualsInsOrsIPs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cicdirMatchEqualsInsOrsIPs
cidrMatchEqualsInsOrsIPs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would replace every cast+utf8ToString() to BytesRefs.toString(), if possible

@@ -182,8 +183,10 @@ private static String readEvaluator() {
}

private static String stringEvaluator(ToIp.LeadingZeros leadingZeros) {
if (leadingZeros == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this was here before, and in jdk21 the case null works (right?), do we need this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My IntelliJ (Java 21) keeps complaining, but maybe it's some wrong setting on my side

@luigidellaquila luigidellaquila added v8.19.0 auto-backport Automatically create backport pull requests when merged labels Jun 18, 2025
Comment on lines 61 to 66
if (value instanceof String) {
return false;
}
if (value instanceof Collection<?> c) {
return c.stream().allMatch(x -> noPlainStrings(x, dataType));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: consider pattern matching here

Suggested change
if (value instanceof String) {
return false;
}
if (value instanceof Collection<?> c) {
return c.stream().allMatch(x -> noPlainStrings(x, dataType));
}
return switch (value) {
case String s -> false;
case Collection<?> c -> c.stream().allMatch(x -> noPlainStrings(x, dataType));
default -> true;
}

}

public static Literal l(Object value, DataType type) {
if (value instanceof String && (type == DataType.TEXT || type == DataType.KEYWORD)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: type == DataType.TEXT || type == DataType.KEYWORD theck looks a bit cheaper. Do you mind making it first (before instanceof)?

@@ -39,6 +42,10 @@ public static Literal of(Source source, Object value) {
if (value instanceof Literal) {
return (Literal) value;
}
DataType type = DataType.fromJava(value);
if (value instanceof String && (type == TEXT || type == KEYWORD)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@@ -116,7 +117,7 @@ public ExpressionEvaluator.Factory toEvaluator(ToEvaluator toEvaluator) {
Object folded = field.fold(toEvaluator.foldCtx());
if (folded instanceof Integer num) {
checkNumber(num);
return toEvaluator.apply(new Literal(source(), " ".repeat(num), KEYWORD));
return toEvaluator.apply(new Literal(source(), BytesRefs.toBytesRef(" ".repeat(num)), KEYWORD));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to repeat a lot, I wonder if it is worth introducing:

pusblic static Literal.keyword(Source source, String literal) {
    return new Literal(source, BytesRefs.toBytesRef(literal), KEYWORD);
}

to simplify construction of string literals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think it makes sense

Copy link
Contributor

@idegtiarenko idegtiarenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@luigidellaquila
Copy link
Contributor Author

Thanks for the feedback folks!

@luigidellaquila luigidellaquila enabled auto-merge (squash) June 18, 2025 09:20
@luigidellaquila luigidellaquila merged commit 22e592a into elastic:main Jun 18, 2025
27 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 129399

albertzaharovits pushed a commit to albertzaharovits/elasticsearch that referenced this pull request Jun 18, 2025
luigidellaquila added a commit to luigidellaquila/elasticsearch that referenced this pull request Jun 19, 2025
@luigidellaquila
Copy link
Contributor Author

Manual backport #129702

kderusso pushed a commit to kderusso/elasticsearch that referenced this pull request Jun 23, 2025
kanoshiou added a commit to kanoshiou/elasticsearch that referenced this pull request Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged backport pending >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ES|QL: Folding of REPLACE with constants throws ClassCastException
4 participants