You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have found a bug in the max_by(x, y, n) function when providing a value for n in versions of Trino after v423 up to the current version (v437 as of writing). Running this test query:
SELECT
col1,
MAX_BY(col2, length(col2), 3) AS longest_three_values
FROM
my_iceberg_table
WHERE
customer_id ='abcd1234'ANDtime>=TIMESTAMP'2023-09-30 00:00'ANDtime<TIMESTAMP'2023-10-01 00:00'AND col1 IS NOT NULLGROUP BY
col1
ORDER BY
col1;
results in query failure, returning a GENERIC_INTERNAL_ERROR(65536) with stack trace:
java.lang.ArrayIndexOutOfBoundsException: arraycopy: last source index 1501 out of bounds for byte[1024]
at java.base/java.lang.System.arraycopy(Native Method)
at io.trino.spi.block.VariableWidthBlockBuilder.writeEntry(VariableWidthBlockBuilder.java:103)
at io.trino.spi.type.AbstractVariableWidthType$DefaultReadOperators.readFlatToBlock(AbstractVariableWidthType.java:174)
at io.trino.operator.aggregation.minmaxbyn.TypedKeyValueHeap.write(TypedKeyValueHeap.java:217)
at io.trino.operator.aggregation.minmaxbyn.TypedKeyValueHeap.writeAllUnsorted(TypedKeyValueHeap.java:172)
at io.trino.operator.aggregation.minmaxbyn.MinMaxByNStateFactory$AbstractMinMaxByNState.lambda$serialize$0(MinMaxByNStateFactory.java:75)
at io.trino.spi.block.ArrayBlockBuilder.buildEntry(ArrayBlockBuilder.java:117)
at io.trino.operator.aggregation.minmaxbyn.MinMaxByNStateFactory$AbstractMinMaxByNState.lambda$serialize$1(MinMaxByNStateFactory.java:75)
at io.trino.spi.block.ArrayBlockBuilder.buildEntry(ArrayBlockBuilder.java:117)
at io.trino.operator.aggregation.minmaxbyn.MinMaxByNStateFactory$AbstractMinMaxByNState.lambda$serialize$2(MinMaxByNStateFactory.java:75)
at io.trino.spi.block.RowBlockBuilder.buildEntry(RowBlockBuilder.java:110)
at io.trino.operator.aggregation.minmaxbyn.MinMaxByNStateFactory$AbstractMinMaxByNState.serialize(MinMaxByNStateFactory.java:70)
at io.trino.operator.aggregation.minmaxbyn.MinMaxByNStateSerializer.serialize(MinMaxByNStateSerializer.java:41)
at io.trino.operator.aggregation.minmaxbyn.MinMaxByNStateSerializer.serialize(MinMaxByNStateSerializer.java:22)
at io.trino.$gen.max_byGroupedAccumulator_20240131_181420_42.evaluateIntermediate(Unknown Source)
at io.trino.operator.aggregation.GroupedAggregator.evaluate(GroupedAggregator.java:105)
at io.trino.operator.aggregation.builder.InMemoryHashAggregationBuilder.lambda$buildResult$2(InMemoryHashAggregationBuilder.java:282)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:412)
at io.trino.operator.HashAggregationOperator.getOutput(HashAggregationOperator.java:512)
at io.trino.operator.Driver.processInternal(Driver.java:398)
at io.trino.operator.Driver.lambda$process$8(Driver.java:301)
at io.trino.operator.Driver.tryWithLock(Driver.java:704)
at io.trino.operator.Driver.process(Driver.java:293)
at io.trino.operator.Driver.processForDuration(Driver.java:264)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:887)
at io.trino.execution.executor.timesharing.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:187)
at io.trino.execution.executor.timesharing.TimeSharingTaskExecutor$TaskRunner.run(TimeSharingTaskExecutor.java:565)
at io.trino.$gen.Trino_437____20240131_180440_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
This error only occurs when providing a value for n, to return an array of the top n matches. This also only occurs at the scale of our production data, the issue does not occur in our smaller, dev environment.
Since this error occurs in version 424 and above, I suspect it was introduced somewhere in this PR.
This bug has our team locked at version 423, and we cannot upgrade until this is fixed. If someone could take a look at this, that would be greatly appreciated. Thank you!
The text was updated successfully, but these errors were encountered:
Hello team,
I have found a bug in the
max_by(x, y, n)
function when providing a value forn
in versions of Trino after v423 up to the current version (v437 as of writing). Running this test query:results in query failure, returning a
GENERIC_INTERNAL_ERROR(65536)
with stack trace:This error only occurs when providing a value for
n
, to return an array of the topn
matches. This also only occurs at the scale of our production data, the issue does not occur in our smaller, dev environment.Since this error occurs in version 424 and above, I suspect it was introduced somewhere in this PR.
This bug has our team locked at version 423, and we cannot upgrade until this is fixed. If someone could take a look at this, that would be greatly appreciated. Thank you!
The text was updated successfully, but these errors were encountered: