Skip to content

fix: unnest_subqueries produces invalid column references when subquery contains a UNION#7667

Merged
georgesittas merged 6 commits into
tobymao:mainfrom
snovik75:fix/unnest-subqueries-union-alias
May 22, 2026
Merged

fix: unnest_subqueries produces invalid column references when subquery contains a UNION#7667
georgesittas merged 6 commits into
tobymao:mainfrom
snovik75:fix/unnest-subqueries-union-alias

Conversation

@snovik75
Copy link
Copy Markdown
Contributor

Fixes #7666

Problem

When unnest_subqueries encounters a SetOperation (UNION/UNION ALL) subquery, it wraps it in a derived table:

if isinstance(select, exp.SetOperation):
    select = exp.select(*select.selects).from_(select.subquery(next_alias_name()))

select.selects returns the fully-qualified column expressions from the left branch of the UNION (e.g. child_table.col_a AS col_a). These are copied verbatim into the outer wrapper SELECT, but the FROM source is the new subquery alias _u_0, making the column references invalid in that scope.

Fix

Build column references that point to the derived alias instead of copying the inner expressions:

if isinstance(select, exp.SetOperation):
    inner_alias = next_alias_name()
    select = (
        exp.select(*(
            exp.alias_(exp.column(s.alias_or_name, inner_alias), s.alias_or_name)
            for s in select.selects
        ))
        .from_(select.subquery(inner_alias))
    )

Reproducer

import sqlglot
from sqlglot.optimizer.qualify import qualify
from sqlglot.optimizer.unnest_subqueries import unnest_subqueries

sql = """
SELECT t.id AS ref_id
FROM parent_table t
WHERE t.id NOT IN (
    SELECT DISTINCT col_a FROM child_table
    UNION ALL
    SELECT col_b FROM child_table
)
"""

ast = sqlglot.parse_one(sql)
result = unnest_subqueries(qualify(ast))
print(result.sql(pretty=True))

Before: child_table.col_a referenced inside a scope where only _u_0 exists.

After: _u_0.col_a — correctly qualified against the derived alias.

Sergej Novik added 5 commits May 22, 2026 08:15
…ry contains a UNION

When a NOT IN subquery is a SetOperation (UNION/UNION ALL), the wrapper
SELECT was built by copying inner qualified column references verbatim
(e.g. child_table.col_a). But the FROM source is the new subquery alias,
so those table-qualified references are invalid in that scope.

Fix by building proper column references using the derived alias:
  exp.column(s.alias_or_name, inner_alias)

Fixes tobymao#7666
@snovik75 snovik75 marked this pull request as ready for review May 22, 2026 07:37
@snovik75
Copy link
Copy Markdown
Contributor Author

dear team, there was a but in the old test. could you please have a careful look? thanks

@snovik75
Copy link
Copy Markdown
Contributor Author

snovik75 commented May 22, 2026

tbh, that test in optimizer.sql seems redundant and/or misplaced (and does not have a title) but I just corrected it. your call what to do with it

Copy link
Copy Markdown
Collaborator

@geooo109 geooo109 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

A note here (future work):

CREATE TABLE x AS SELECT * FROM (VALUES (1),(2),(3),(4)) AS t(a);
CREATE TABLE y AS SELECT * FROM (VALUES (1),(NULL)) AS t(a);
CREATE TABLE z AS SELECT * FROM (VALUES (2),(3)) AS t(a);
SELECT * FROM x WHERE x.a NOT IN (SELECT y.a AS a FROM y UNION ALL SELECT z.a AS a FROM z);

For this case ^ in duckdb the optimized query doesn't match the semantics of the input.

Copy link
Copy Markdown
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines -1454 to +1457
"cascade"."tag_input" AS "tagname"
"_u_0"."tagname" AS "tagname"
FROM "_u_0" AS "_u_0"
GROUP BY
"cascade"."tag_input"
"_u_0"."tagname"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is embarassing, seems like I didn't notice this at all when I added the test...

@georgesittas georgesittas merged commit f173fde into tobymao:main May 22, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

unnest_subqueries produces invalid column references when subquery contains a UNION

3 participants