-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for invalid SQL generated in WITH queries #572
Conversation
Thanks! Can you also add tests that fail before this PR and pass after it? |
This is necessary because PostgreSQL arbitrarily prohibits (directly) nesting `WITH`. For example, `WITH xs AS (VALUES ('x')) WITH ys AS (VALUES ('y')) SELECT * FROM xs, ys;` is a syntax error, PostgreSQL instead wants you to write `WITH xs AS (VALUES ('x')), ys AS (VALUES ('y')) SELECT * FROM xs, ys;`. This commit instead will produce the following, which also works: `SELECT * FROM (WITH xs AS (VALUES ('x')) SELECT * FROM (WITH ys AS (VALUES ('y')) SELECT * FROM xs, ys) _) _;`.
This is essential, otherwise `with foo bar` and `with (fmap swap foo) bar` will generate the same SQL, which is incorrect.
24c1c2d
to
feb0205
Compare
I've added failing tests before the commits that subsequently fix them. |
testWithRebind :: Test | ||
testWithRebind = it "with (rebinding)" $ testH with (`shouldBe` expected) | ||
where | ||
with = O.with (fmap swap table1Q) $ \t -> (,) <$> t <*> table2Q | ||
expected = (,) <$> fmap swap table1data <*> table2data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised by this failure! For reference, here is the generated code before and after
SELECT
"cte_renamed0_4" as "result1_6",
"cte_renamed1_4" as "result2_6",
"column10_5" as "result3_6",
"column21_5" as "result4_6"
FROM (SELECT
*
FROM (WITH "cte_1" (cte0_2,
cte1_2) AS
(SELECT
"column1" as "column10_3",
"column2" as "column21_3"
FROM "table1" AS "T1")
SELECT
*
FROM (SELECT
"cte0_2" as "cte_renamed0_4",
"cte1_2" as "cte_renamed1_4"
FROM "cte_1" AS "T1") AS "T1",
(SELECT
"column1" as "column10_5",
"column2" as "column21_5"
FROM "TABLE2" AS "T1") AS "T2") AS "T1") AS "T1"
SELECT
"cte_renamed0_5" as "result1_7",
"cte_renamed1_5" as "result2_7",
"column10_6" as "result3_7",
"column21_6" as "result4_7"
FROM (SELECT
*
FROM (WITH "cte_1" (cte0_2,
cte1_2) AS
(SELECT
"column21_3" as "rebind0_4",
"column10_3" as "rebind1_4"
FROM (SELECT
"column1" as "column10_3",
"column2" as "column21_3"
FROM "table1" AS "T1") AS "T1")
SELECT
*
FROM (SELECT
"cte0_2" as "cte_renamed0_5",
"cte1_2" as "cte_renamed1_5"
FROM "cte_1" AS "T1") AS "T1",
(SELECT
"column1" as "column10_6",
"column2" as "column21_6"
FROM "TABLE2" AS "T1") AS "T2") AS "T1") AS "T1"
and the difference is specifically
FROM (WITH "cte_1" (cte0_2,
cte1_2) AS
(SELECT
"column1" as "column10_3",
"column2" as "column21_3"
FROM "table1" AS "T1")
FROM (WITH "cte_1" (cte0_2,
cte1_2) AS
(SELECT
"column21_3" as "rebind0_4",
"column10_3" as "rebind1_4"
FROM (SELECT
"column1" as "column10_3",
"column2" as "column21_3"
FROM "table1" AS "T1") AS "T1")
Presumably ("column10_3", "column21_3")
is what occurs in the tuple returned from the Select
but when we swap them that change is not reflected in the generated code. Thus we run a rebind to get them in the correct order (although the rebound names are not used).
We could probably do this without a rebind by using WITH "cte_1" ("column10_3", "column21_3") AS
but that doesn't matter for this PR. There's probably also a way of architecting Opaleye so that such bugs are harder to write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we could fix the ordering of the ("column10_3", "column21_3")
in the WITH
, I think we would still want a rebinding. I didn't realise this when I first wrote this code, but WITH foo (bar, baz) AS (VALUES ('a', 'b', 'c'), ('x', 'y', 'z')) SELECT * FROM foo
generates a query that returns three rows, not two. The parentheses after foo
only renames the columns, it doesn't restrict their number. This is important for WITH
because it isn't purely syntactical, in many cases Postgres materializes the expressions therein, so without the rebind (which does restrict the number of columns) it can end up doing unnecessary work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generates a query that returns three rows, not two
Do you mean "three columns, not two"? Very interesting! In which case, yes, best to be explicit about the rebind. But then maybe we can do without the ctei_j
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I meant to say "columns" there.
I think in theory we can do without the ctei_j
in that both cases (non-recursive and recursive WITH
) now end up with a rebinding step (the former because we explicitly make one, and the latter because it goes through union
or unionAll
which adds a rebinding step), but at least union
doesn't give us a good way to know the names of the symbols it will choose for the rebinding "in advance", which we need for recursive queries because they need to be able to refer to their own columns, the names of which we don't know yet. You could make a version of union
where the symbols to use in the rebinding are passed in explicitly and similarly for rebind
, but that was outside the scope of what I was trying to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. It's most important to have something that works and we could revisit the exact form of the generated code later if we want.
I should probably add Thanks @duairc! |
Released as https://hackage.haskell.org/package/opaleye-0.10.1.1, and deprecated 0.10.1.0. |
This PR contains two fixes pertaining to
WITH
queries. The first wraps an otherwise pointlessSELECT * FROM
aroundWITH
queries to workaround PostgreSQL's arbitrary rejection ofWITH foo AS (_) WITH bar AS (_) _
as invalid syntax. The second fixes a more serious issue wherebywithExplicit
is missing aRebind
and can generate a "valid" query that silently produces incorrect results.