New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AggregationMask generated code throws IllegalArgumentException: Invalid position %d in block with %d positions #21272
Comments
@sdaberdaku I tried replicating this type of query with data from tpch but it didn't give me an error (admittedly because I tweaked the query too much 😄 ), do you have some sample data we could use to replicate it? 🙏 |
Hope this helps! I created a sample delta table with the following pyspark code from Databricks 14.3 (Spark 3.5.0): from pyspark.sql.functions import rand, expr, when
from pyspark.sql.types import StructType, StructField, StringType, DateType, FloatType
def generate_dataframe(n):
# Define schema
schema = StructType([
StructField("id", StringType(), True),
StructField("payment_date", DateType(), True),
StructField("converted", FloatType(), True),
StructField("realized", FloatType(), True)
])
# Generate DataFrame
df = spark.range(n)\
.withColumn("id", expr("uuid()"))\
.withColumn("payment_date", expr("date_add(to_date('2020-01-01'), cast(rand() * 365 as int))"))\
.withColumn("converted", when(rand() < 0.3, None).otherwise(rand()))\
.withColumn("realized", when(rand() > 0.9, None).otherwise(rand()))\
return df
# Define number of records
n = 4734676 # Change this value to your desired number of records
# Generate DataFrame
df = generate_dataframe(n)
# Show DataFrame
display(df)
df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").saveAsTable("my_schema.test_table") Then I ran the query in Trino 443 select
SUM(COALESCE(converted, realized)) FILTER(WHERE (payment_date <= DATE('2023-11-30'))) AS cumulative
from my_schema.test_table; and got: EDIT: just confirmed that in Trino 439 this same query works without errors on the same data. |
Issue can be easily reproduced by the following queries:
|
for reference, this is #21002 and has been fixed in #21064 (Trino 441) |
In Trino 443, the following SQL query raises an IllegalArgumentException when using the Delta Lake connector:
An example stack trace is the following:
This error persists until version 440, where for the same query I get a slightly different error message:
With Trino 439 the query works fine.
Here is also a link to the related Slack discussion: https://trinodb.slack.com/archives/CGB0QHWSW/p1711454712473549
The text was updated successfully, but these errors were encountered: