Skip to content

duckdb's RANGE incorrectly transpiles to spark #7291

@huydo862003

Description

@huydo862003

Thank you for this wonderful library. As I'm studying the code of sqlglot and tinkering with it, I found the following issue.

Information

  • Version: 29.0.1
  • "read" dialect: duckdb
  • "write" dialect: spark

Issue

  • The issue is reproducible on main via the following test in sqlglot/tests/dialects/test_duckdb.py:
    self.validate_all(
    "WITH t AS (SELECT 5 AS c) SELECT RANGE(1, c) FROM t",
    write={
    "duckdb": "WITH t AS (SELECT 5 AS c) SELECT RANGE(1, c) FROM t",
    "spark": "WITH t AS (SELECT 5 AS c) SELECT IF((c - 1) <= 1, ARRAY(), SEQUENCE(1, (c - 1))) FROM t",
    },
    )

Fully reproducible code snippet

sqlglot.transpile('''
  WITH t AS (SELECT 2 AS c)
  SELECT RANGE(1, c) FROM t
''', read='duckdb', write='spark')

It gives the following:

['WITH t AS (SELECT 2 AS c) SELECT IF((c - 1) <= 1, ARRAY(), SEQUENCE(1, (c - 1))) FROM t']

If I understood correctly:

  • The duckdb query RANGE(1, c) would return [1].
  • The spark query IF((c - 1) <= 1, ARRAY(), SEQUENCE(1, (c - 1))) FROM t would return [], because (2-1)<=1 is true.

Therefore, it seems that the transpiled query is not equivalent to the duckdb query.

I suspect that this piece of code may be at fault:

if expression.args.get("is_end_exclusive"):
step_value = step or exp.Literal.number(1)
end = exp.paren(exp.Sub(this=end, expression=step_value), copy=False)
sequence_call = exp.Anonymous(
this="SEQUENCE", expressions=[e for e in (start, end, step) if e]
)
zero = exp.Literal.number(0)
should_return_empty = exp.or_(
exp.EQ(this=step_value.copy(), expression=zero.copy()),
exp.and_(
exp.GT(this=step_value.copy(), expression=zero.copy()),
exp.GTE(this=start.copy(), expression=end.copy()),
),
exp.and_(
exp.LT(this=step_value.copy(), expression=zero.copy()),
exp.LTE(this=start.copy(), expression=end.copy()),
),
)
empty_array_or_sequence = exp.If(
this=should_return_empty,
true=exp.Array(expressions=[]),
false=sequence_call,
)
return self.sql(self._simplify_unless_literal(empty_array_or_sequence))

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions