Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for invalid SQL generated in WITH queries #572

Merged
merged 6 commits into from
Oct 3, 2023

Conversation

shane-circuithub
Copy link
Contributor

This PR contains two fixes pertaining to WITH queries. The first wraps an otherwise pointless SELECT * FROM around WITH queries to workaround PostgreSQL's arbitrary rejection of WITH foo AS (_) WITH bar AS (_) _ as invalid syntax. The second fixes a more serious issue whereby withExplicit is missing a Rebind and can generate a "valid" query that silently produces incorrect results.

@tomjaguarpaw
Copy link
Owner

Thanks! Can you also add tests that fail before this PR and pass after it?

This is necessary because PostgreSQL arbitrarily prohibits (directly) nesting `WITH`. For example, `WITH xs AS (VALUES ('x')) WITH ys AS (VALUES ('y')) SELECT * FROM xs, ys;` is a syntax error, PostgreSQL instead wants you to write `WITH xs AS (VALUES ('x')), ys AS (VALUES ('y')) SELECT * FROM xs, ys;`. This commit instead will produce the following, which also works: `SELECT * FROM (WITH xs AS (VALUES ('x')) SELECT * FROM (WITH ys AS (VALUES ('y')) SELECT * FROM xs, ys) _) _;`.
This is essential, otherwise `with foo bar` and `with (fmap swap foo) bar` will generate the same SQL, which is incorrect.
@shane-circuithub
Copy link
Contributor Author

I've added failing tests before the commits that subsequently fix them.

Comment on lines +842 to +846
testWithRebind :: Test
testWithRebind = it "with (rebinding)" $ testH with (`shouldBe` expected)
where
with = O.with (fmap swap table1Q) $ \t -> (,) <$> t <*> table2Q
expected = (,) <$> fmap swap table1data <*> table2data
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised by this failure! For reference, here is the generated code before and after

SELECT
"cte_renamed0_4" as "result1_6",
"cte_renamed1_4" as "result2_6",
"column10_5" as "result3_6",
"column21_5" as "result4_6"
FROM (SELECT
      *
      FROM (WITH "cte_1" (cte0_2,
                          cte1_2) AS
            (SELECT
             "column1" as "column10_3",
             "column2" as "column21_3"
             FROM "table1" AS "T1")
            SELECT
            *
            FROM (SELECT
                  "cte0_2" as "cte_renamed0_4",
                  "cte1_2" as "cte_renamed1_4"
                  FROM "cte_1" AS "T1") AS "T1",
                 (SELECT
                  "column1" as "column10_5",
                  "column2" as "column21_5"
                  FROM "TABLE2" AS "T1") AS "T2") AS "T1") AS "T1"
SELECT
"cte_renamed0_5" as "result1_7",
"cte_renamed1_5" as "result2_7",
"column10_6" as "result3_7",
"column21_6" as "result4_7"
FROM (SELECT
      *
      FROM (WITH "cte_1" (cte0_2,
                          cte1_2) AS
            (SELECT
             "column21_3" as "rebind0_4",
             "column10_3" as "rebind1_4"
             FROM (SELECT
                   "column1" as "column10_3",
                   "column2" as "column21_3"
                   FROM "table1" AS "T1") AS "T1")
            SELECT
            *
            FROM (SELECT
                  "cte0_2" as "cte_renamed0_5",
                  "cte1_2" as "cte_renamed1_5"
                  FROM "cte_1" AS "T1") AS "T1",
                 (SELECT
                  "column1" as "column10_6",
                  "column2" as "column21_6"
                  FROM "TABLE2" AS "T1") AS "T2") AS "T1") AS "T1"

and the difference is specifically

      FROM (WITH "cte_1" (cte0_2,
                          cte1_2) AS
            (SELECT
             "column1" as "column10_3",
             "column2" as "column21_3"
             FROM "table1" AS "T1")
      FROM (WITH "cte_1" (cte0_2,
                          cte1_2) AS
            (SELECT
             "column21_3" as "rebind0_4",
             "column10_3" as "rebind1_4"
             FROM (SELECT
                   "column1" as "column10_3",
                   "column2" as "column21_3"
                   FROM "table1" AS "T1") AS "T1")

Presumably ("column10_3", "column21_3") is what occurs in the tuple returned from the Select but when we swap them that change is not reflected in the generated code. Thus we run a rebind to get them in the correct order (although the rebound names are not used).

We could probably do this without a rebind by using WITH "cte_1" ("column10_3", "column21_3") AS but that doesn't matter for this PR. There's probably also a way of architecting Opaleye so that such bugs are harder to write.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we could fix the ordering of the ("column10_3", "column21_3") in the WITH, I think we would still want a rebinding. I didn't realise this when I first wrote this code, but WITH foo (bar, baz) AS (VALUES ('a', 'b', 'c'), ('x', 'y', 'z')) SELECT * FROM foo generates a query that returns three rows, not two. The parentheses after foo only renames the columns, it doesn't restrict their number. This is important for WITH because it isn't purely syntactical, in many cases Postgres materializes the expressions therein, so without the rebind (which does restrict the number of columns) it can end up doing unnecessary work.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generates a query that returns three rows, not two

Do you mean "three columns, not two"? Very interesting! In which case, yes, best to be explicit about the rebind. But then maybe we can do without the ctei_j?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant to say "columns" there.

I think in theory we can do without the ctei_j in that both cases (non-recursive and recursive WITH) now end up with a rebinding step (the former because we explicitly make one, and the latter because it goes through union or unionAll which adds a rebinding step), but at least union doesn't give us a good way to know the names of the symbols it will choose for the rebinding "in advance", which we need for recursive queries because they need to be able to refer to their own columns, the names of which we don't know yet. You could make a version of union where the symbols to use in the rebinding are passed in explicitly and similarly for rebind, but that was outside the scope of what I was trying to do here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. It's most important to have something that works and we could revisit the exact form of the generated code later if we want.

@tomjaguarpaw
Copy link
Owner

I should probably add with to the property tests. They ought to have caught this.

Thanks @duairc!

@tomjaguarpaw tomjaguarpaw merged commit fcda290 into tomjaguarpaw:master Oct 3, 2023
6 checks passed
@tomjaguarpaw
Copy link
Owner

Released as https://hackage.haskell.org/package/opaleye-0.10.1.1, and deprecated 0.10.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants