Skip to content

bug: MSSQL syntax error #11021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
albersonmiranda opened this issue Mar 19, 2025 · 9 comments
Open
1 task done

bug: MSSQL syntax error #11021

albersonmiranda opened this issue Mar 19, 2025 · 9 comments
Labels
bug Incorrect behavior inside of ibis mssql The Microsoft SQL Server backend

Comments

@albersonmiranda
Copy link

albersonmiranda commented Mar 19, 2025

What happened?

Nested mutate() generates invalid SQL query. See reprex adapted from https://github.com/posit-dev/pointblank/blob/main/pointblank/_interrogation.py#L91-L152:

# %%
import ibis
import pandas as pd

ibis.options.interactive = True

# dataframe
data = {"col1": [1, 2, 3], "col2": ["A", "B", "C"]}
df = pd.DataFrame(data)

# mssql con
con = ibis.mssql.connect(
    user=input("User: "),
    password=input("Password: "),
    host=input("Host: "),
    driver="SQL Server",
    database=input("db: ")
)

con.create_table("test_table", df)
t = con.table("test_table")

na_pass = False
column = "col1"
compare = 0

# %%
tbl = t.mutate(
    pb_is_good_1=t[column].isnull() & ibis.literal(na_pass),
    pb_is_good_2=t[column] > ibis.literal(compare),
)

tbl = tbl.mutate(pb_is_good_2=ibis.ifelse(tbl.pb_is_good_2.notnull(), tbl.pb_is_good_2, False))

result_tbl = tbl.mutate(pb_is_good_=tbl.pb_is_good_1 | tbl.pb_is_good_2).drop(
    "pb_is_good_1", "pb_is_good_2"
)

sql_query = ibis.to_sql(result_tbl)
print(sql_query)
# %%

Is this a bug or misuse of mutate()?

What version of ibis are you using?

10.3.1

What backend(s) are you using, if any?

MSSQL

Relevant log output

Here's SQL generated:


SELECT
  [t1].[col1],
  [t1].[col2],
  IIF([t1].[pb_is_good_1] <> 0 OR [t1].[pb_is_good_2] <> 0, 1, 0) AS [pb_is_good_]
FROM (
  SELECT
    [t0].[col1],
    [t0].[col2],
    (
      [t0].[col1] IS NULL -------------------------------------- Invalid syntax
    ) AND (1 = 0) AS [pb_is_good_1],---------------------------- Invalid syntax
    IIF((
      [t0].[col1] > 0
    ) IS NOT NULL, [t0].[col1] > 0, (1 = 0)) AS [pb_is_good_2]
  FROM [test_table] AS [t0]
) AS [t1]

Code of Conduct

  • I agree to follow this project's Code of Conduct
@albersonmiranda albersonmiranda added the bug Incorrect behavior inside of ibis label Mar 19, 2025
@gforsyth
Copy link
Member

gforsyth commented Mar 19, 2025

Hey @albersonmiranda -- can you confirm that Ibis version number? 3.3.0 is quite old I don't believe there is a version 3.3.0

@albersonmiranda
Copy link
Author

albersonmiranda commented Mar 19, 2025

@gforsyth I'm sorry, user mistake (I've previously installed ibis). Version is 2.1.1. Error still occurs in a fresh venv.

@gforsyth
Copy link
Member

@albersonmiranda -- that version is several years old. Current release is 10.3.1 -- is there something in your environment preventing an upgrade?

@albersonmiranda
Copy link
Author

Took from wrong venv again... @gforsyth here's correct output:

(.venv) PS C:\Users\030075675\Documents\bees_projects\ghm_banestes> python -V
Python 3.12.7
(.venv) PS C:\Users\030075675\Documents\bees_projects\ghm_banestes> pip show ibis-framework
Name: ibis-framework
Version: 10.3.1
Summary: The portable Python dataframe library
Home-page: https://ibis-project.org
Author:
Author-email: Ibis Maintainers <maintainers@ibis-project.org>
License: Apache-2.0
Location: C:\Users\030075675\Documents\bees_projects\ghm_banestes\.venv\Lib\site-packages
Requires: atpublic, parsy, python-dateutil, sqlglot, toolz, typing-extensions, tzdata
Required-by:

And query again:

print(sql_query)

SELECT
  [t1].[col1],
  [t1].[col2],
  IIF([t1].[pb_is_good_1] <> 0 OR [t1].[pb_is_good_2] <> 0, 1, 0) AS [pb_is_good_]
FROM (
  SELECT
    [t0].[col1],
    [t0].[col2],
    (
      [t0].[col1] IS NULL
    ) AND (1 = 0) AS [pb_is_good_1],
    IIF((
      [t0].[col1] > 0
    ) IS NOT NULL, [t0].[col1] > 0, (1 = 0)) AS [pb_is_good_2]
  FROM [test_table] AS [t0]
) AS [t1]

@gforsyth gforsyth added the mssql The Microsoft SQL Server backend label Mar 20, 2025
@gforsyth
Copy link
Member

gforsyth commented Mar 20, 2025

Thanks @albersonmiranda ! This does look like a bug and a moderately gnarly one.

I don't think your usage is incorrect.

A simplified example shows that we're initially doing the right thing:

[ins] In [42]: t = ibis.table({'a': 'str', 'b': 'str'})

[ins] In [43]: expr = t.mutate(c=t.a.isnull() & ibis.literal(False))

[nav] In [44]: ibis.to_sql(expr, dialect="mssql")
Out[44]: 
SELECT
  [t0].[a],
  [t0].[b],
  IIF((
    [t0].[a] IS NULL
  ) AND (1 = 0), 1, 0) AS [c]
FROM [unbound_table_3] AS [t0]

but when we then use an ifelse and generate a subquery, but the syntax inside the subquery is incorrect.

[ins] In [45]: expr2 = expr.mutate(d=ibis.ifelse(expr.c.notnull(), expr.c, False))

[ins] In [46]: ibis.to_sql(expr2, dialect="mssql")
Out[46]: 
SELECT
  [t1].[a],
  [t1].[b],
  IIF([t1].[c] <> 0, 1, 0) AS [c],
  IIF(IIF([t1].[c] IS NOT NULL, [t1].[c], (1 = 0)), 1, 0) AS [d]
FROM (
  SELECT
    [t0].[a],
    [t0].[b],
    (
      [t0].[a] IS NULL
    ) AND (1 = 0) AS [c]
  FROM [unbound_table_3] AS [t0]
) AS [t1]

@albersonmiranda
Copy link
Author

@gforsyth I see you've got a pretty big backlog. Any pointers on which file the bug might be in, so I can take a shot at fixing it?

@gforsyth
Copy link
Member

@albersonmiranda -- I think this is likely a pretty deep rabbit hole. I took a poke at it, and in the MSSQL compiler, we call _to_sqlglot to convert the ibis expression to a sqlglot expression:

https://github.com/ibis-project/ibis/blob/main/ibis/backends/sql/compilers/mssql.py#L159-L178

as a part of that, we run this conversion code:

        conversions = {
            name: ibis.ifelse(table_expr[name], 1, 0).cast(dt.boolean)
            for name, typ in table_expr.schema().items()
            if typ.is_boolean()
        }

to convert any boolean types to explicit ifelse because mssql has terrible handling for booleans.

I don't think the issue is actually here, but if we compare the table expression after those conversions are applied, there's an indication of the problem:

r0 := UnboundTable: unbound_table_0
  a string
  b string

r1 := Project[r0]
  a: r0.a
  b: r0.b
  c: IsNull(r0.a) & False

r2 := Project[r1]
  a: r1.a
  b: r1.b
  c: r1.c
  d: IfElse(bool_expr=NotNull(r1.c), true_expr=r1.c, false_null_expr=False)

Project[r2]
  a: r2.a
  b: r2.b
  c: Cast(IfElse(bool_expr=r2.c, true_expr=1, false_null_expr=0), to=boolean)
  d: Cast(IfElse(bool_expr=r2.d, true_expr=1, false_null_expr=0), to=boolean)

Note that the final projection (r2) includes Cast(..., to=boolean) around the IfElse and that is being correctly transpiled -- however, the IfElse inside the subquery (r1) is not explicitly cast to a bool.

I'm fairly sure that's the problem -- I suspect we may need an explicit rule in the mssql compiler to handle the IfElse node and ensure that it is wrapped in an explicit cast.

@albersonmiranda
Copy link
Author

@gforsyth Thanks for the detailed instructions! I'm getting close but not there yet:

def visit_IfElse(self, op, *, bool_expr, true_expr, false_null_expr):
        if isinstance(bool_expr, sge.Boolean):
            processed_bool_expr = bool_expr
        else:
            is_comparison = isinstance(bool_expr, (sge.Is, sge.EQ, sge.NEQ, 
                                                sge.GT, sge.LT, sge.GTE, sge.LTE))
            processed_bool_expr = bool_expr if is_comparison else bool_expr.neq(0)

        if op.true_expr.dtype.is_boolean():
            processed_true_expr = true_expr if isinstance(true_expr, (sge.Literal, int)) else self.if_(true_expr, 1, 0)
        else:
            processed_true_expr = true_expr

        if false_null_expr is not None and op.false_null_expr is not None and op.false_null_expr.dtype.is_boolean():
            processed_false_expr = false_null_expr if isinstance(false_null_expr, (sge.Literal, int)) else self.if_(false_null_expr, 1, 0)
        else:
            processed_false_expr = false_null_expr

        return sge.If(
            this=processed_bool_expr,
            true=processed_true_expr,
            false=processed_false_expr,
        )
    
    def visit_And(self, op, *, left, right):
        result = super().visit_And(op, left=left, right=right)
        return self.if_(result, 1, 0)

    def visit_Or(self, op, *, left, right):
        result = super().visit_Or(op, left=left, right=right)
        return self.if_(result, 1, 0)

    def visit_IsNull(self, op, *, arg):
        return self.if_(arg.is_(NULL), 1, 0)
    
    def visit_NotNull(self, op, *, arg):
        return self.if_(arg.is_(NULL).not_(), 1, 0)

For my example, this generates:

SELECT
  [t1].[col1],
  [t1].[col2],
  IIF(IIF([t1].[pb_is_good_1] <> 0 OR [t1].[pb_is_good_2] <> 0, 1, 0) <> 0, 1, 0) AS [pb_is_good_]
FROM (
  SELECT
    [t0].[col1],
    [t0].[col2],
    IIF(IIF([t0].[col1] IS NULL, 1, 0) AND (1 = 0), 1, 0) AS [pb_is_good_1],
    IIF(
      IIF(NOT (
        [t0].[col1] > 0
      ) IS NULL, 1, 0) <> 0,
      IIF([t0].[col1] > 0, 1, 0),
      IIF((1 = 0), 1, 0)
    ) AS [pb_is_good_2]
  FROM [test_table] AS [t0]
) AS [t1]

@gforsyth
Copy link
Member

@albersonmiranda -- another option that might work is to add a rewrite rule to the mssql compiler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis mssql The Microsoft SQL Server backend
Projects
Status: backlog
Development

Successfully merging a pull request may close this issue.

2 participants