Skip to content

fix(presto): iterate over copy in eliminate_semi_and_anti_joins#7455

Merged
georgesittas merged 1 commit intotobymao:mainfrom
Evgeniy-Sinyak:fix/eliminate-semi-anti-joins-iteration
Apr 6, 2026
Merged

fix(presto): iterate over copy in eliminate_semi_and_anti_joins#7455
georgesittas merged 1 commit intotobymao:mainfrom
Evgeniy-Sinyak:fix/eliminate-semi-anti-joins-iteration

Conversation

@Evgeniy-Sinyak
Copy link
Copy Markdown
Contributor

Summary

When a SELECT has multiple SEMI or ANTI joins, eliminate_semi_and_anti_joins only rewrites every other one. The remaining joins leak into the generated SQL as bare ANTI JOIN / SEMI JOIN — invalid syntax for dialects like Presto/Trino.

Root cause: the transform iterates over expression.args.get("joins") while calling join.pop() inside the loop. After popping index i, the next element shifts into position i, but the iterator has already advanced to i+1 — so that element is never visited.

Fix: iterate over list(...) so removals don't shift unvisited elements. Same approach as #4364 which fixed the identical pattern in unnest_to_explode.

Minimal reproduction

from sqlglot import exp

select = (
    exp.Select()
    .select(exp.column("id", table="t1"))
    .from_("t1")
    .join("t2", on=exp.EQ(this=exp.column("id", table="t1"),
                           expression=exp.column("id", table="t2")))
    .join("t3", join_type="anti",
          on=exp.EQ(this=exp.column("id", table="t1"),
                    expression=exp.column("id", table="t3")))
    .join("t4", join_type="anti",
          on=exp.EQ(this=exp.column("id", table="t1"),
                    expression=exp.column("id", table="t4")))
)

print(select.sql(dialect="presto", pretty=True))

Before (bug): t4 leaks as bare ANTI JOIN

SELECT
  t1.id
FROM t1
JOIN t2
  ON t1.id = t2.id
ANTI JOIN t4
  ON t1.id = t4.id
WHERE
  NOT EXISTS(SELECT 1 FROM t3 WHERE t1.id = t3.id)

After (fix): both rewritten to NOT EXISTS

SELECT
  t1.id
FROM t1
JOIN t2
  ON t1.id = t2.id
WHERE
  NOT EXISTS(SELECT 1 FROM t3 WHERE t1.id = t3.id)
  AND NOT EXISTS(SELECT 1 FROM t4 WHERE t1.id = t4.id)

Changes

  • sqlglot/transforms.py: iterate over list(expression.args.get("joins") or []) instead of the original list
  • tests/test_transforms.py: add tests for single, double, and triple ANTI/SEMI joins

Made with Cursor

When a SELECT has multiple SEMI or ANTI joins,
eliminate_semi_and_anti_joins iterates over the joins list while
calling join.pop(), which mutates the list mid-iteration. This
causes every other SEMI/ANTI join to be skipped, leaving bare
ANTI JOIN / SEMI JOIN syntax in the generated SQL — invalid for
dialects like Presto/Trino that rely on this transform.

Fix: iterate over list(...) so removals don't shift unvisited
elements. Same approach as PR tobymao#4364 which fixed the identical
pattern in unnest_to_explode.

Made-with: Cursor
@Evgeniy-Sinyak
Copy link
Copy Markdown
Contributor Author

@georgesittas This is the same iterate-and-mutate pattern you fixed in #4364 for unnest_to_explode, but in eliminate_semi_and_anti_joins. One-line fix + tests.

Copy link
Copy Markdown
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@georgesittas georgesittas merged commit 2b3dba1 into tobymao:main Apr 6, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants