Skip to content

feat(mypyc): Enable incremental compilation, deprecate Python 3.9#7574

Merged
VaggelisD merged 1 commit intomainfrom
feat/mypyc-incremental-compilation
Apr 29, 2026
Merged

feat(mypyc): Enable incremental compilation, deprecate Python 3.9#7574
VaggelisD merged 1 commit intomainfrom
feat/mypyc-incremental-compilation

Conversation

@VaggelisD
Copy link
Copy Markdown
Collaborator

@VaggelisD VaggelisD commented Apr 28, 2026

What

Switch sqlglotc's mypyc build to separate=True, plus the small sqlglot-side tweaks needed to make it work end-to-end. Also deprecates Python 3.9 for sqlglot[c] (pure-Python sqlglot still supports 3.9).

The separate=True flag gives each compiled module its own shared lib + shim, so mypyc only has to regenerate / recompile the modules whose source actually changed.

Changes

sqlglot[c] build:

  • sqlglotc/setup.py: pass separate=True to mypycify().
  • sqlglotc/pyproject.toml: bump requires-python to >= 3.10, pin build dep sqlglot-mypy >= 1.20.0.post4. The post4 release carries a cached-SCC codegen fix that makes pip's two-pass wheel build actually package the shared libs (without it, the second mypycify pass returned Extension(sources=[]) and produced an installable but broken wheel).
  • setup.py: gate the [c] / [rs] extras on python_version >= '3.10'. On 3.9, pip install sqlglot[c] is a no-op and you get pure-Python sqlglot. Dev extras fall back to upstream mypy for type checking on 3.9.
  • Makefile: install-devc / install-devc-release skip the sqlglotc build on 3.9 with a clear message.

sqlglot package:

  • sqlglot/__init__.py: drop the legacy *__mypyc*.so bootstrap preloader. It was written for the old monolithic build where a single hash-named .so sat at the package root; under separate=True the per-module shared libs (e.g. errors__mypyc.so) live next to their .py siblings and resolve through Python's normal import machinery via the shim.
  • sqlglot/optimizer/__init__.py: swap the eager top-level re-exports for PEP 562 __getattr__. Under separate=True's cross-group init ordering, the previous eager from sqlglot.optimizer.optimizer import ... could fire while sqlglot/__init__.py was still mid-bootstrap and trip a circular ImportError on from sqlglot import Schema, exp. Lazy resolution defers the cycle past sqlglot's init. Concurrent first-access lookups are serialised with an RLock, mirroring sqlglot/dialects/__init__.py.

Replaces #7558 (squashed history).

Switch sqlglotc's mypyc build to separate=True so each compiled module
gets its own shared lib + shim. Clean builds are roughly the same
wall-clock; incremental rebuilds after a one-line edit drop from a full
~110s monolithic rebuild to a few seconds once the cache is warm.

Python 3.9 is dropped from `sqlglot[c]` because sqlglot-mypy 1.20+ (the
build dep needed for the separate=True codegen fixes) only ships wheels
for 3.10+. Pure-Python sqlglot still supports 3.9; on 3.9
`pip install sqlglot[c]` is a no-op via an environment marker, and
`make install-devc` short-circuits with a clear "requires Python 3.10+"
message.

  - sqlglotc/pyproject.toml: requires-python >= 3.10, build dep pinned
    to sqlglot-mypy >= 1.20.0.post4 (which carries the cached-SCC fix
    that makes the wheel-build path actually package shared libs).
  - setup.py: [c] / [rs] extras gated on python_version >= '3.10'; on
    3.9 fall back to upstream `mypy` for type checking instead of
    sqlglot-mypy.
  - Makefile: install-devc / install-devc-release skip the build on 3.9.

sqlglot-side tweaks for the new build:

  - sqlglot/__init__.py: drop the legacy *__mypyc*.so bootstrap
    preloader. It was written for the old monolithic build where a
    single hash-named .so sat at the package root; under separate=True
    per-module shared libs sit next to their .py siblings and resolve
    through the shim.
  - sqlglot/optimizer/__init__.py: swap eager top-level re-exports for
    PEP 562 `__getattr__`. Under separate=True's cross-group init
    ordering, the previous eager `from sqlglot.optimizer.optimizer
    import ...` could fire while sqlglot/__init__.py was still
    mid-bootstrap and trip a circular `ImportError` on
    `from sqlglot import Schema, exp`. Lazy resolution defers the
    cycle past sqlglot's init. Concurrent first-access lookups are
    serialised with an RLock, mirroring sqlglot/dialects/__init__.py.
@VaggelisD VaggelisD force-pushed the feat/mypyc-incremental-compilation branch from a348e6a to 61d1586 Compare April 28, 2026 13:22
@github-actions
Copy link
Copy Markdown
Contributor

SQLGlot Integration Test Results

Comparing:

  • this branch (sqlglot:feat/mypyc-incremental-compilation, sqlglot version: feat/mypyc-incremental-compilation)
  • baseline (main, sqlglot version: 0.0.1.dev1)

By Dialect

dialect main sqlglot:feat/mypyc-incremental-compilation transitions links
bigquery -> bigquery 24645/24650 passed (100.0%) 23495/23495 passed (100.0%) No change full result / delta
bigquery -> duckdb 867/1154 passed (75.1%) 0/0 passed (0.0%) Results not found full result / delta
duckdb -> duckdb 5823/5823 passed (100.0%) 0/0 passed (0.0%) Results not found full result / delta
snowflake -> duckdb 1063/1961 passed (54.2%) 0/0 passed (0.0%) Results not found full result / delta
snowflake -> snowflake 65133/65133 passed (100.0%) 63027/63027 passed (100.0%) No change full result / delta
databricks -> databricks 1370/1370 passed (100.0%) 1370/1370 passed (100.0%) No change full result / delta
postgres -> postgres 6042/6042 passed (100.0%) 6042/6042 passed (100.0%) No change full result / delta
redshift -> redshift 7101/7101 passed (100.0%) 7101/7101 passed (100.0%) No change full result / delta

Overall

main: 113234 total, 112044 passed (pass rate: 98.9%), sqlglot version: 0.0.1.dev1

sqlglot:feat/mypyc-incremental-compilation: 101035 total, 101035 passed (pass rate: 100.0%), sqlglot version: feat/mypyc-incremental-compilation

Transitions:
No change

Dialect pair changes: 0 previous results not found, 3 current results not found

✅ 37 test(s) passed

@georgesittas
Copy link
Copy Markdown
Collaborator

@VaggelisD did you check for memory leaks here?

@VaggelisD
Copy link
Copy Markdown
Collaborator Author

@georgesittas The PR was fine when I first tested it but curiosity got the best of me and I checked again, looks like there's a tiny leak in the order of 5 Token objects across all of make unitc (~2300 tests).

Investigating why that is

@VaggelisD
Copy link
Copy Markdown
Collaborator Author

VaggelisD commented Apr 28, 2026

Answered internally as well but leaving it here for reference: I think this was just a fluke caused by make leakcheck being non-deterministic (depends on GC cycles, statistics etc).

When we had the first Identifier leak there were 200k leaking refs per sample run (= 1 full make testc suite) whereas now there were only a handful of objects flagged.

Considering this a non issue for now, might need to decrease leakcheck's sensitivity.

@VaggelisD VaggelisD merged commit 1a10806 into main Apr 29, 2026
8 checks passed
@VaggelisD VaggelisD deleted the feat/mypyc-incremental-compilation branch April 29, 2026 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants