Skip to content

feat: Compile base & BigQuery's parser with mypyc#7206

Merged
VaggelisD merged 8 commits intomainfrom
compile-parser-poc
Mar 6, 2026
Merged

feat: Compile base & BigQuery's parser with mypyc#7206
VaggelisD merged 8 commits intomainfrom
compile-parser-poc

Conversation

@VaggelisD
Copy link
Collaborator

This PR enables compilation for parser.py and as a PoC it also fully compiles BigQuery's parser. The point of this PR is to:

  1. Ensure that we can compile the base Parser
  2. Show that other parsers can remain interpreted and nested within their Dialects, maintaining backwards compatibility
  3. Show that we can incrementally move each subparser to its own file for full compilation

The key changes include:

  • Extracting BigQuery.Parser into its own file for compilation alongside parser.py (mypyc cannot deal with nested classes well)
  • Exposing parser dicts (_FUNCTIONS, _NO_PAREN_FUNCTIONS etc.) as module-level variables so dialect subclasses can reference them without __dict__ introspection (mypyc stores class attributes as getset descriptors instead of plain dicts, so __dict__ walking doesn't work)
  • Add a _pa helper to handle mypyc-compiled class attributes by reading from a temporary instance instead of __dict__ (same mypyc getset descriptor limitation)
  • Fix _NO_PAREN_FUNCTIONS bug: TokenType.CURRENT_DATETIME was mapped to exp.CurrentDate instead of exp.CurrentDatetime (pre-existing bug, exposed by the _pa fallback path triggered under mypyc)
  • Switch sqlglotc build dependency from upstream mypy to sqlglot-mypy via build isolation, keeping upstream mypy for type checking

Sqlglot-mypy

sqlglot-mypy is a lightweight fork of mypy that carries mypyc bug fixes needed to compile sqlglot e.g things like cross-module class attribute defaults and globals export for separate compilation. It is only used at build time (via [build-system] requires + pip build isolation) and does not affect the main venv or type checking, which continue to use upstream mypy. This is intended to be used until all the bugs that we run into SQLGlot are fixed upstream, and as a testing ground for our own mypyc experiments.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

SQLGlot Integration Test Results

Comparing:

  • this branch (sqlglot:compile-parser-poc, sqlglot version: compile-parser-poc)
  • baseline (main, sqlglot version: 29.0.2.dev29)

⚠️ Limited to dialects: bigquery, duckdb, snowflake

By Dialect

dialect main sqlglot:compile-parser-poc transitions links
bigquery -> bigquery 2592/2624 passed (98.8%) 2592/2624 passed (98.8%) No change full result / delta
bigquery -> duckdb 2077/2623 passed (79.2%) 2078/2623 passed (79.2%) 1 fail -> pass full result / delta
duckdb -> duckdb 4003/4003 passed (100.0%) 4003/4003 passed (100.0%) No change full result / delta
snowflake -> duckdb 1617/2642 passed (61.2%) 1645/2642 passed (62.3%) 28 fail -> pass full result / delta
snowflake -> snowflake 2863/2863 passed (100.0%) 2863/2863 passed (100.0%) No change full result / delta

Overall

main: 14755 total, 13152 passed (pass rate: 89.1%), sqlglot version: 29.0.2.dev29

sqlglot:compile-parser-poc: 14755 total, 13181 passed (pass rate: 89.3%), sqlglot version: compile-parser-poc

Transitions:
29 fail -> pass

Extract BigQuery parser into sqlglot/parsers/bigquery.py for separate
compilation alongside parser.py. This enables mypyc to compile both
modules together, with cross-module class attribute defaults resolved
correctly via the mypyc globals export fix.

Key changes:
- Move BigQuery.Parser from dialects/bigquery.py to parsers/bigquery.py
  to allow mypyc compilation as a separate module
- Expose module-level parser dicts (_FUNCTIONS, _NO_PAREN_FUNCTIONS, etc.)
  for cross-module reference by bigquery parser
- Fix _NO_PAREN_FUNCTIONS: CURRENT_DATETIME was incorrectly mapped to
  CurrentDate instead of CurrentDatetime
- Fix _pa helper in dialect metaclass to read mypyc compiled class
  attributes via temporary instance (getset descriptors aren't dicts)
- Fix build_logarithm to read LOG_DEFAULTS_TO_LN from instance instead
  of walking __dict__ (mypyc descriptors aren't bools)
- Add parsers/bigquery.py to sqlglotc/setup.py compilation targets
- Refactor dialect metaclass to centralize parser attribute overrides
  that were previously duplicated across dialect Parser subclasses
@VaggelisD VaggelisD force-pushed the compile-parser-poc branch from 90d6932 to 6788b61 Compare March 5, 2026 13:08
Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting an initial pass in before moving on to parser.py and parsers/bigquery.py

@VaggelisD VaggelisD force-pushed the compile-parser-poc branch from c377116 to 7b9e0cd Compare March 6, 2026 14:59
Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final set of comments from me, LGTM otherwise.

@VaggelisD VaggelisD merged commit 5599478 into main Mar 6, 2026
9 checks passed
@VaggelisD VaggelisD deleted the compile-parser-poc branch March 6, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants