Skip to content

bug: [trino, spark] distinct() after order_by() fails with "Column 't0.xyz' cannot be resolved" #11295

Closed
@vspinu

Description

@vspinu

What happened?

The following code fails with trino and pyspark backend. I presume it's SQL level issue. Might not even be a bug as SQLs generally don't guarantee the original order. But it's still surprising that it fails and it precludes quick and dirty experimentation in the REPL.

import ibis
import ibis.backends.pyspark
from ibis import _

data = [
    {'id': 1, "col": "a"},
    {'id': 2, "col": "b"},
]

# this works
ibis.set_backend("polars")
tbl = ibis.memtable(pl.DataFrame(data))
tbl.order_by("id").select("col").distinct()

# this fails
scon: ibis.backends.pyspark.Backend = ibis.pyspark.connect(spark)
scon.create_table("tmp", pl.DataFrame(data))
tbl = scon.table("tmp")
tbl.order_by("id").select("col").distinct()

# AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `t0`.`id` cannot be resolved. Did you mean one of the following? [`t1`.`col`].; line 1 pos 79;
# 'GlobalLimit 101
# +- 'LocalLimit 101
#    +- 'Sort ['t0.id ASC NULLS LAST], true
#       +- Project [col#9]
#          +- SubqueryAlias t1
#             +- Distinct
#                +- Project [col#9]
#                   +- SubqueryAlias t0
#                      +- SubqueryAlias spark_catalog.default.tmp
#                         +- Relation spark_catalog.default.tmp[id#8L,col#9] parquet

Similar error with trno.

The generated SQL is wrong:

ibis.to_sql(tbl.order_by("id").select("col").distinct())
SELECT
  *
FROM (
  SELECT DISTINCT
    `t0`.`col`
  FROM `tmp` AS `t0`
) AS `t1`
ORDER BY
  `t0`.`id` ASC NULLS LAST

What version of ibis are you using?

10.5

What backend(s) are you using, if any?

trino, spark, polars

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

bugIncorrect behavior inside of ibispysparkThe Apache PySpark backendsqlBackends that generate SQLtrinoThe Trino backend

Type

No type

Projects

Status

done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions