Closed
Description
What happened?
The following code fails with trino and pyspark backend. I presume it's SQL level issue. Might not even be a bug as SQLs generally don't guarantee the original order. But it's still surprising that it fails and it precludes quick and dirty experimentation in the REPL.
import ibis
import ibis.backends.pyspark
from ibis import _
data = [
{'id': 1, "col": "a"},
{'id': 2, "col": "b"},
]
# this works
ibis.set_backend("polars")
tbl = ibis.memtable(pl.DataFrame(data))
tbl.order_by("id").select("col").distinct()
# this fails
scon: ibis.backends.pyspark.Backend = ibis.pyspark.connect(spark)
scon.create_table("tmp", pl.DataFrame(data))
tbl = scon.table("tmp")
tbl.order_by("id").select("col").distinct()
# AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `t0`.`id` cannot be resolved. Did you mean one of the following? [`t1`.`col`].; line 1 pos 79;
# 'GlobalLimit 101
# +- 'LocalLimit 101
# +- 'Sort ['t0.id ASC NULLS LAST], true
# +- Project [col#9]
# +- SubqueryAlias t1
# +- Distinct
# +- Project [col#9]
# +- SubqueryAlias t0
# +- SubqueryAlias spark_catalog.default.tmp
# +- Relation spark_catalog.default.tmp[id#8L,col#9] parquet
Similar error with trno.
The generated SQL is wrong:
ibis.to_sql(tbl.order_by("id").select("col").distinct())
SELECT
*
FROM (
SELECT DISTINCT
`t0`.`col`
FROM `tmp` AS `t0`
) AS `t1`
ORDER BY
`t0`.`id` ASC NULLS LAST
What version of ibis are you using?
10.5
What backend(s) are you using, if any?
trino, spark, polars
Relevant log output
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
done