perf(tokens): cache TokenizerCore per thread by tobymao · Pull Request #7547 · tobymao/sqlglot

tobymao · 2026-04-23T06:09:56Z

Summary

Tokenizer.__init__ was rebuilding TokenizerCore on every parse_one call (~6µs). Its args are pure functions of the Tokenizer subclass, so it's cached per (thread, class) via a new ThreadLocalCache helper in sqlglot/helper.py.
Drops two list[t.Union[str, tuple[str, str]]](...) subscripted-generic constructions at init that were pure type-annotation theatre (2.7µs / 41% of init cost).
bit_strings / hex_strings on TokenizerCore are only truthy-checked, so they're narrowed to has_bit_strings / has_hex_strings bools.

Benchmark — `parse_one("1")`

Pure Python, best of 5 × 30k iters, same session:

	time	delta
before	31.98 µs/call
after	24.58 µs/call	−23%

Speedup is fixed overhead reduction, so the win is largest on short SQL. Large queries (tpch, many_joins, etc.) see the same absolute ~7µs drop but it's drowned in parse time.

Thread safety

TokenizerCore.tokenize() mutates internal state (sql, _current, _line, …) via reset() before each call. Sharing one core across threads would race on those fields. ThreadLocalCache subclasses threading.local, so each thread gets its own cache dict and its own per-class core — same guarantees as today's "construct-fresh-each-call" behavior.

Verified with a stress test modeled on #520: 32 threads × 10 iterations parsing a mix of SQL across 6 dialects concurrently; all results matched the serial baseline exactly.

Test plan

make style passes
make unit passes (pure Python; SKIP_INTEGRATION=1 python -m unittest — 1231 tests)
Threaded stress test (32 workers, mixed dialects) — no divergence from serial
Bit/hex literal paths exercised manually (b'101', x'ff', 0b101, 0xff) post-rename

georgesittas · 2026-04-23T13:21:19Z

Seems like this caused a segfault.

Rebuilding TokenizerCore on every `parse_one` call was ~6µs of wasted work; the core's construction is purely a function of the Tokenizer subclass, so caching it per (thread, class) is safe. Also drops two `list[T](...)` subscripted-generic constructions that were pure type-annotation theatre, and narrows `bit_strings` / `hex_strings` to `has_bit_strings` / `has_hex_strings` bools since TokenizerCore only truthy-checks them. ThreadLocalCache lives in tokens.py (not sqlglotc-compiled). Subclassing threading.local inside a mypyc-compiled module causes a segfault because mypyc's fixed-slot attribute access bypasses threading.local's per-thread __dict__ swap, racing all threads on the same C slot.

tobymao · 2026-04-23T15:11:46Z

@georgesittas fixed

github-actions · 2026-04-23T15:21:22Z

SQLGlot Integration Test Results

Comparing:

this branch (sqlglot:perf/cache-tokenizer-core, sqlglot version: perf/cache-tokenizer-core)
baseline (main, sqlglot version: 0.0.1.dev1)

By Dialect

dialect	main	sqlglot:perf/cache-tokenizer-core	transitions	links
bigquery -> bigquery	24645/24650 passed (100.0%)	23495/23495 passed (100.0%)	No change	full result / delta
bigquery -> duckdb	867/1154 passed (75.1%)	0/0 passed (0.0%)	Results not found	full result / delta
duckdb -> duckdb	5823/5823 passed (100.0%)	5823/5823 passed (100.0%)	No change	full result / delta
snowflake -> duckdb	1063/1961 passed (54.2%)	0/0 passed (0.0%)	Results not found	full result / delta
snowflake -> snowflake	65133/65133 passed (100.0%)	63027/63027 passed (100.0%)	No change	full result / delta
databricks -> databricks	1370/1370 passed (100.0%)	1370/1370 passed (100.0%)	No change	full result / delta
postgres -> postgres	6042/6042 passed (100.0%)	6042/6042 passed (100.0%)	No change	full result / delta
redshift -> redshift	7101/7101 passed (100.0%)	7101/7101 passed (100.0%)	No change	full result / delta

Overall

main: 113234 total, 112044 passed (pass rate: 98.9%), sqlglot version: 0.0.1.dev1

sqlglot:perf/cache-tokenizer-core: 106858 total, 106858 passed (pass rate: 100.0%), sqlglot version: perf/cache-tokenizer-core

Transitions:
No change

Dialect pair changes: 0 previous results not found, 2 current results not found

✅ 34 test(s) passed

tobymao · 2026-04-23T15:32:40Z

/benchmark

tobymao · 2026-04-23T15:32:52Z

/bench

github-actions · 2026-04-23T15:58:09Z

Benchmark Results

sqlglot

Query	main	PR	diff
tpch	2.7ms	2.7ms	0.1% slower	⚪
short	199us	195us	2.0% faster	🟩
deep_arithmetic	8.4ms	8.5ms	0.6% slower	⚪
large_in	449.2ms	446.0ms	0.7% faster	⚪
values	513.6ms	511.4ms	0.4% faster	⚪
many_joins	11.4ms	11.3ms	0.6% faster	⚪
many_unions	40.8ms	41.3ms	1.3% slower	🟧
nested_subqueries	1.1ms	1.1ms	2.1% slower	🟧
many_columns	13.0ms	13.1ms	0.0%	⚪
large_case	38.5ms	37.6ms	2.5% faster	🟩
complex_where	28.2ms	28.4ms	0.9% slower	⚪
many_ctes	17.0ms	16.4ms	3.1% faster	🟢
many_windows	20.9ms	21.0ms	0.7% slower	⚪
nested_functions	691us	701us	1.4% slower	🟧
large_strings	5.6ms	5.6ms	0.5% slower	⚪
many_numbers	109.4ms	102.3ms	6.5% faster	🟢🟢

sqlglot[c]

Query	main	PR	diff
tpch	657us	665us	1.1% slower	🟧
short	54us	47us	13.3% faster	🟢🟢
deep_arithmetic	2.2ms	2.7ms	26.2% slower	🔴🔴
large_in	114.8ms	121.9ms	6.2% slower	🔴🔴
values	126.8ms	134.3ms	5.9% slower	🔴🔴
many_joins	2.5ms	2.5ms	0.2% slower	⚪
many_unions	8.5ms	8.3ms	2.2% faster	🟩
nested_subqueries	239us	231us	3.3% faster	🟢
many_columns	3.2ms	3.0ms	3.9% faster	🟢
large_case	10.1ms	8.9ms	11.6% faster	🟢🟢
complex_where	7.4ms	6.7ms	9.9% faster	🟢🟢
many_ctes	3.5ms	3.5ms	0.4% faster	⚪
many_windows	4.9ms	5.2ms	5.0% slower	🔴🔴
nested_functions	161us	148us	7.8% faster	🟢🟢
large_strings	1.4ms	1.3ms	5.7% faster	🟢🟢
many_numbers	25.8ms	28.1ms	8.9% slower	🔴🔴

Comment /benchmark to re-run.

tobymao changed the title ~~perf(tokens): cache TokenizerCore per thread [CLAUDE]~~ perf(tokens): cache TokenizerCore per thread Apr 23, 2026

tobymao force-pushed the perf/cache-tokenizer-core branch from a22a0c5 to 51b2bf8 Compare April 23, 2026 06:11

tobymao force-pushed the perf/cache-tokenizer-core branch from c453c28 to 1bc41d2 Compare April 23, 2026 14:49

tobymao merged commit 63f8dc6 into main Apr 23, 2026
8 checks passed

tobymao deleted the perf/cache-tokenizer-core branch April 23, 2026 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(tokens): cache TokenizerCore per thread#7547

perf(tokens): cache TokenizerCore per thread#7547
tobymao merged 1 commit intomainfrom
perf/cache-tokenizer-core

tobymao commented Apr 23, 2026

Uh oh!

georgesittas commented Apr 23, 2026

Uh oh!

tobymao commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

tobymao commented Apr 23, 2026

Uh oh!

tobymao commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tobymao commented Apr 23, 2026

Summary

Benchmark — parse_one("1")

Thread safety

Test plan

Uh oh!

georgesittas commented Apr 23, 2026

Uh oh!

tobymao commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

SQLGlot Integration Test Results

By Dialect

Overall

Uh oh!

tobymao commented Apr 23, 2026

Uh oh!

tobymao commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Benchmark Results

sqlglot

sqlglot[c]

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Benchmark — `parse_one("1")`