Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AggregationMask generated code throws IllegalArgumentException: Invalid position %d in block with %d positions #21272

Closed
sdaberdaku opened this issue Mar 26, 2024 · 4 comments · Fixed by #21363
Assignees
Labels
bug Something isn't working RELEASE-BLOCKER

Comments

@sdaberdaku
Copy link

sdaberdaku commented Mar 26, 2024

In Trino 443, the following SQL query raises an IllegalArgumentException when using the Delta Lake connector:

SELECT
    SUM(COALESCE(converted_expected_amount, expected_amount)) FILTER(WHERE (payment_date <= DATE('2023-11-30'))) AS cumulated_expected_amount_to_date,
    SUM(COALESCE(converted_realized_amount, realized_amount)) FILTER(WHERE (payment_date <= DATE('2023-11-30'))) AS cumulated_realized_amount_to_date
FROM my_catalog.asset_collections_timeline
WHERE (spv_id in ('3517cc86-9cb9-415b-b62e-761482337a99')) AND (reporting_date = DATE('2023-11-03')) AND (cash_flow_type not in (1)) AND (payment_date <= DATE('2023-11-03'))

An example stack trace is the following:

java.lang.IllegalArgumentException: Invalid position 470 in block with 470 positions
	at io.trino.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:72)
	at io.trino.spi.block.BlockUtil.checkReadablePosition(BlockUtil.java:78)
	at io.trino.spi.block.Int128ArrayBlock.isNull(Int128ArrayBlock.java:156)
	at io.trino.$gen.AggregationMaskBuilder_20240326_131546_78.buildAggregationMask(Unknown Source)
	at io.trino.operator.aggregation.Aggregator.processPage(Aggregator.java:75)
	at io.trino.operator.AggregationOperator.addInput(AggregationOperator.java:137)
	at io.trino.operator.Driver.processInternal(Driver.java:403)
	at io.trino.operator.Driver.lambda$process$8(Driver.java:301)
	at io.trino.operator.Driver.tryWithLock(Driver.java:704)
	at io.trino.operator.Driver.process(Driver.java:293)
	at io.trino.operator.Driver.processForDuration(Driver.java:264)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:887)
	at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:76)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191)
	at io.trino.$gen.Trino_443____20240326_131420_2.run(Unknown Source)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192)
	at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:174)
	at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:161)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

This error persists until version 440, where for the same query I get a slightly different error message:

java.lang.NoSuchMethodError: 'io.trino.spi.block.ValueBlock io.trino.spi.block.DictionaryBlock.getDictionary(int)'
	at io.trino.$gen.AggregationMaskBuilder_20240326_133141_45.buildAggregationMask(Unknown Source)
	at io.trino.operator.aggregation.Aggregator.processPage(Aggregator.java:75)
	at io.trino.operator.AggregationOperator.addInput(AggregationOperator.java:137)
	at io.trino.operator.Driver.processInternal(Driver.java:403)
	at io.trino.operator.Driver.lambda$process$8(Driver.java:301)
	at io.trino.operator.Driver.tryWithLock(Driver.java:704)
	at io.trino.operator.Driver.process(Driver.java:293)
	at io.trino.operator.Driver.processForDuration(Driver.java:264)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:887)
	at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:76)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191)
	at io.trino.$gen.Trino_440____20240326_133029_2.run(Unknown Source)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192)
	at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:174)
	at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:161)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

With Trino 439 the query works fine.

Here is also a link to the related Slack discussion: https://trinodb.slack.com/archives/CGB0QHWSW/p1711454712473549

@Pluies
Copy link
Contributor

Pluies commented Mar 27, 2024

@sdaberdaku I tried replicating this type of query with data from tpch but it didn't give me an error (admittedly because I tweaked the query too much 😄 ), do you have some sample data we could use to replicate it? 🙏

@sdaberdaku
Copy link
Author

sdaberdaku commented Mar 27, 2024

Hope this helps!

I created a sample delta table with the following pyspark code from Databricks 14.3 (Spark 3.5.0):

from pyspark.sql.functions import rand, expr, when
from pyspark.sql.types import StructType, StructField, StringType, DateType, FloatType

def generate_dataframe(n):
    # Define schema
    schema = StructType([
        StructField("id", StringType(), True),
        StructField("payment_date", DateType(), True),
        StructField("converted", FloatType(), True),
        StructField("realized", FloatType(), True)
    ])
    
    # Generate DataFrame
    df = spark.range(n)\
        .withColumn("id", expr("uuid()"))\
        .withColumn("payment_date", expr("date_add(to_date('2020-01-01'), cast(rand() * 365 as int))"))\
        .withColumn("converted", when(rand() < 0.3, None).otherwise(rand()))\
        .withColumn("realized", when(rand() > 0.9, None).otherwise(rand()))\
    
    return df

# Define number of records
n = 4734676  # Change this value to your desired number of records

# Generate DataFrame
df = generate_dataframe(n)

# Show DataFrame
display(df)

df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").saveAsTable("my_schema.test_table")

Then I ran the query in Trino 443

select
    SUM(COALESCE(converted, realized)) FILTER(WHERE (payment_date <= DATE('2023-11-30'))) AS cumulative
from my_schema.test_table;

and got:
[65536] Query failed (#20240327_114625_00533_2hswp): Invalid position 256 in block with 256 positions java.lang.IllegalArgumentException: Invalid position 256 in block with 256 positions

EDIT: just confirmed that in Trino 439 this same query works without errors on the same data.
EDIT2: I am running Trino with one coordinator and 5 workers.

@guyco33
Copy link
Member

guyco33 commented Mar 28, 2024

Issue can be easily reproduced by the following queries:

CREATE TABLE iceberg.temp.tbl AS
SELECT uuid() id,
       date_add('day', cast(rand() * 365 as int), date'2020-01-01')  payment_date,
       if(rand() < 0.3, 1)  converted,
       if(rand() > 0.9, 1)  realized
FROM TABLE(sequence(1,4734676))
;

SELECT
    SUM(COALESCE(converted, realized)) FILTER(WHERE (payment_date <= DATE('2023-11-30'))) AS cumulative
FROM iceberg.temp.tbl
;

java.lang.IllegalArgumentException: Invalid position 256 in block with 256 positions
	at io.trino.spi.block.BlockUtil.checkValidPosition(BlockUtil.java:72)
	at io.trino.spi.block.BlockUtil.checkReadablePosition(BlockUtil.java:78)
	at io.trino.spi.block.LongArrayBlock.isNull(LongArrayBlock.java:141)
	at io.trino.$gen.AggregationMaskBuilder_20240328_084411_60.buildAggregationMask(Unknown Source)
	at io.trino.operator.aggregation.Aggregator.processPage(Aggregator.java:75)
	at io.trino.operator.AggregationOperator.addInput(AggregationOperator.java:137)
	at io.trino.operator.Driver.processInternal(Driver.java:403)
	at io.trino.operator.Driver.lambda$process$8(Driver.java:301)
	at io.trino.operator.Driver.tryWithLock(Driver.java:704)
	at io.trino.operator.Driver.process(Driver.java:293)
	at io.trino.operator.Driver.processForDuration(Driver.java:264)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:887)
	at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:76)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191)
	at io.trino.$gen.Trino_443_122_gb52e3f6____20240328_082125_2.run(Unknown Source)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192)
	at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:174)
	at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:161)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

@findinpath findinpath added the bug Something isn't working label Mar 29, 2024
@findepi
Copy link
Member

findepi commented Mar 29, 2024

This error persists until version 440, where for the same query I get a slightly different error message:

java.lang.NoSuchMethodError: 'io.trino.spi.block.ValueBlock io.trino.spi.block.DictionaryBlock.getDictionary(int)'
	at io.trino.$gen.AggregationMaskBuilder_20240326_133141_45.buildAggregationMask(Unknown Source)

for reference, this is #21002 and has been fixed in #21064 (Trino 441)

@dain dain changed the title java.lang.IllegalArgumentException: Invalid position %d in block with %d positions AggregationMask generated code throws IllegalArgumentException: Invalid position %d in block with %d positions Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working RELEASE-BLOCKER
Development

Successfully merging a pull request may close this issue.

6 participants