Skip to content

feat(starrocks): stop eliminating semi/anti joins, QUALIFY, and FULL OUTER JOIN [CLAUDE]#7524

Merged
georgesittas merged 3 commits intotobymao:mainfrom
dwoldemariam-klav:feat/starrocks-update-natively-supported
Apr 20, 2026
Merged

feat(starrocks): stop eliminating semi/anti joins, QUALIFY, and FULL OUTER JOIN [CLAUDE]#7524
georgesittas merged 3 commits intotobymao:mainfrom
dwoldemariam-klav:feat/starrocks-update-natively-supported

Conversation

@dwoldemariam-klav
Copy link
Copy Markdown
Contributor

StarRocks inherits from the MySQL dialect, but it natively supports several SQL features that MySQL does not. The MySQL generator unnecessarily rewrites these into workarounds, producing suboptimal SQL for StarRocks. This PR corrects that.

Features now preserved (no longer eliminated):

  • Semi/anti joins: StarRocks supports LEFT SEMI JOIN and LEFT ANTI JOIN natively (docs), so these are no longer rewritten into EXISTS/NOT EXISTS subqueries. ANTI and SEMI are also removed from TABLE_ALIAS_TOKENS so the parser doesn't consume them as table aliases.
  • QUALIFY clause: StarRocks supports QUALIFY for filtering on window functions (PR #13239), so it is no longer rewritten into a subquery wrapper.
  • FULL OUTER JOIN: StarRocks supports FULL OUTER JOIN natively (docs), so it is no longer decomposed into a UNION of left/right joins.

Transforms carried over from the MySQL generator (features StarRocks also does not support):

  • DISTINCT ON: Rewritten to a ROW_NUMBER() window function pattern, since StarRocks does not support DISTINCT ON (#33842).
  • UNNEST(GENERATE_DATE_ARRAY(...)): Rewritten to a recursive CTE, since StarRocks' array_generate and generate_series only accept numeric inputs (#49575).

Made with Cursor

…ERATE_DATE_ARRAY [CLAUDE]

StarRocks supports LEFT SEMI JOIN and LEFT ANTI JOIN natively, so
remove ANTI and SEMI from TABLE_ALIAS_TOKENS (MySQL added them back
since it lacks semi/anti join syntax).

Add Select preprocessors to eliminate DISTINCT ON (not supported) and
rewrite UNNEST(GENERATE_DATE_ARRAY(...)) to recursive CTEs (StarRocks
only supports numeric inputs for array_generate/generate_series).

Made-with: Cursor
Copy link
Copy Markdown
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! LGTM. A couple of small suggestions related to styling.

Comment thread tests/dialects/test_starrocks.py Outdated
Comment thread tests/dialects/test_starrocks.py Outdated
@georgesittas georgesittas merged commit d289db3 into tobymao:main Apr 20, 2026
8 checks passed
@dwoldemariam-klav
Copy link
Copy Markdown
Contributor Author

Hi @georgesittas, thank you for the quick turnaround on this. Do you know when there'll be a release with this commit? The anti join rewrite would fix an active starrocks bug for us: StarRocks/starrocks#67860

@georgesittas
Copy link
Copy Markdown
Collaborator

There is no fixed cadence for releases, but we tend to do it frequently. I expect to do it some time during this week.

@georgesittas
Copy link
Copy Markdown
Collaborator

@dwoldemariam-klav just released v30.5.0.

@siddharthgupta-klaviyo
Copy link
Copy Markdown

Hey @georgesittas, any chance that we can also get this change as a patch release on v29? Our current codebase is not compatible with v30, and would really like to have the changes in this PR available to us. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants