Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional aggregation fails with: Invalid position %s in block with %s positions #21285

Closed
davseitsev opened this issue Mar 27, 2024 · 6 comments
Assignees

Comments

@davseitsev
Copy link

After Trino upgrade from version 409 to 442/443 we got an error on conditional aggregation. Example query:

SELECT
    MIN(rhs.id) FILTER(WHERE lhs.id IS NOT NULL) AS max_id
FROM agg_test_lhs AS lhs
         LEFT JOIN agg_test_rhs AS rhs
                   ON lhs.id = rhs.id

When join_distribution_type='PARTITIONED', it fails on a small dataset with following stack trace:

java.lang.IllegalArgumentException: Invalid position 256 in block with 256 positions
    at io.trino.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:72)
    at io.trino.spi.block.BlockUtil.checkReadablePosition(BlockUtil.java:78)
    at io.trino.spi.block.LongArrayBlock.isNull(LongArrayBlock.java:141)
    at io.trino.$gen.AggregationMaskBuilder_20240325_112624_214.buildAggregationMask(Unknown Source)
    at io.trino.operator.aggregation.GroupedAggregator.processPage(GroupedAggregator.java:83)
    at io.trino.operator.aggregation.builder.InMemoryHashAggregationBuilder.lambda$processPage$1(InMemoryHashAggregationBuilder.java:157)
    at io.trino.operator.TransformWork.process(TransformWork.java:44)
    at io.trino.operator.HashAggregationOperator.addInput(HashAggregationOperator.java:443)
    at io.trino.operator.Driver.processInternal(Driver.java:403)
    at io.trino.operator.Driver.lambda$process$8(Driver.java:301)
    at io.trino.operator.Driver.tryWithLock(Driver.java:704)
    at io.trino.operator.Driver.process(Driver.java:293)
    at io.trino.operator.Driver.processForDuration(Driver.java:264)
    at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:887)
    at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:76)
    at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191)
    at io.trino.$gen.Trino_442____20240325_112320_2.run(Unknown Source)
    at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192)
    at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:174)
    at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:161)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)

When join_distribution_type='REPLICATED', it fails only on a big dataset with following stack trace:

java.lang.IllegalArgumentException: Invalid position 6635 in block with 6635 positions
	at io.trino.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:72)
	at io.trino.spi.block.BlockUtil.checkReadablePosition(BlockUtil.java:78)
	at io.trino.spi.block.VariableWidthBlock.isNull(VariableWidthBlock.java:197)
	at io.trino.$gen.AggregationMaskBuilder_20240326_130128_50.buildAggregationMask(Unknown Source)
	at io.trino.operator.aggregation.Aggregator.processPage(Aggregator.java:75)
	at io.trino.operator.AggregationOperator.addInput(AggregationOperator.java:137)
	at io.trino.operator.Driver.processInternal(Driver.java:403)
	at io.trino.operator.Driver.lambda$process$8(Driver.java:301)
	at io.trino.operator.Driver.tryWithLock(Driver.java:704)
	at io.trino.operator.Driver.process(Driver.java:293)
	at io.trino.operator.Driver.processForDuration(Driver.java:264)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:887)
	at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:76)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191)
	at io.trino.$gen.Trino_443____20240326_130047_2.run(Unknown Source)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192)
	at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:174)
	at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:161)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

Trino cluster size also affects on how big dataset should be to reproduce the issue. The bigger cluster size the bigger dataset should be.
A managed to reproduce it in unit tests. The file is attached
TestCorruptedAggregation.java.zip

Please let me know if I can provide more information.

@wendigo
Copy link
Contributor

wendigo commented Mar 27, 2024

cc @dain

@wendigo
Copy link
Contributor

wendigo commented Mar 27, 2024

Relates to #21272

@guyco33
Copy link
Member

guyco33 commented Mar 28, 2024

Issue can be easily reproduced by the following queries:

CREATE TABLE iceberg.default.tbl AS
SELECT uuid() id,
       date_add('day', cast(rand() * 365 as int), date'2020-01-01')  payment_date,
       if(rand() < 0.3, 1)  converted,
       if(rand() > 0.9, 1)  realized
FROM TABLE(sequence(1,4734676))
;

SELECT
    SUM(COALESCE(converted, realized)) FILTER(WHERE (payment_date <= DATE('2023-11-30'))) AS cumulative
FROM iceberg.default.tbl
;

java.lang.IllegalArgumentException: Invalid position 256 in block with 256 positions
	at io.trino.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:72)
	at io.trino.spi.block.BlockUtil.checkReadablePosition(BlockUtil.java:78)
	at io.trino.spi.block.LongArrayBlock.isNull(LongArrayBlock.java:141)
	at io.trino.$gen.AggregationMaskBuilder_20240328_084411_60.buildAggregationMask(Unknown Source)
	at io.trino.operator.aggregation.Aggregator.processPage(Aggregator.java:75)
	at io.trino.operator.AggregationOperator.addInput(AggregationOperator.java:137)
	at io.trino.operator.Driver.processInternal(Driver.java:403)
	at io.trino.operator.Driver.lambda$process$8(Driver.java:301)
	at io.trino.operator.Driver.tryWithLock(Driver.java:704)
	at io.trino.operator.Driver.process(Driver.java:293)
	at io.trino.operator.Driver.processForDuration(Driver.java:264)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:887)
	at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:76)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191)
	at io.trino.$gen.Trino_443_122_gb52e3f6____20240328_082125_2.run(Unknown Source)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192)
	at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:174)
	at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:161)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

@wendigo
Copy link
Contributor

wendigo commented Mar 28, 2024

cc @sopel39

@Pluies
Copy link
Contributor

Pluies commented Mar 28, 2024

Same issue on the delta connector, if that helps. 👍

Running DeltaLakeQueryRunner#S3DeltaLakeQueryRunnerMain with the following SQL:

CREATE TABLE delta.tpch.tbl AS
SELECT
  date_add('day', cast(rand() * 365 as int), date'2020-01-01')  payment_date,
  if(rand() < 0.3, 1)  converted,
  if(rand() > 0.9, 1)  realized
FROM TABLE(sequence(1,4734676));

SELECT
  SUM(COALESCE(converted, realized)) FILTER(WHERE (payment_date <= DATE('2023-11-30'))) AS cumulative
FROM delta.tpch.tbl;

@findepi
Copy link
Member

findepi commented Apr 2, 2024

We believe it is the same issue as #21272.

@findepi findepi closed this as not planned Won't fix, can't repro, duplicate, stale Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

6 participants