-
-
Notifications
You must be signed in to change notification settings - Fork 59
V8 non backtracking RegExp engine
A V8-internal experimental regex engine ships in every Node.js since Node 16, alongside the default Irregexp engine. It uses the same algorithmic approach as RE2 — automaton-walking, no backtracking, linear time in the input — and adopting it natively across runtimes would in principle obsolete node-re2. As of May 2026 it has not advanced toward that, and two architectural observations — a missing JIT and a likely intent to route engines transparently rather than expose a flag — explain why the gap is structural rather than just current.
Shipped with V8 8.8 (January 2021) and described in An additional non-backtracking RegExp engine. A breadth-first NFA walker that runs alongside Irregexp. Same theoretical underpinning as RE2 (Thompson NFA → automaton, no backtracking, linear in the size of the input). On a pathological pattern like /(a*)*b/ against aaaa… it terminates in linear time; Irregexp blows up exponentially.
V8's experimental engine source lives in src/regexp/experimental/ and contains exactly three concerns:
experimental-compiler.{cc,h} pattern → bytecode
experimental-bytecode.{cc,h} bytecode definitions
experimental-interpreter.{cc,h} bytecode → execution
No codegen, no assembler, no JIT. Contrast Irregexp, which has both an interpreter and a JIT — V8 already owns all the regex-JIT infrastructure (tier-up architecture) and just hasn't pointed it at the experimental engine. Adding a JIT is doable but unowned for five years.
This is the root cause of the benchmark numbers below. The experimental engine is a single-tier NFA bytecode interpreter; RE2 is a tiered engine (BitState → NFA → OnePass → lazy-DFA) where the DFA path is a state-table-lookup-per-byte tight loop. Both are "interpreted", but RE2 is "interpreter with algorithm-tier selection and a tuned inner loop" while V8 experimental is "single-strategy bytecode interpreter". That's the 14× gap on the pathological bench.
There is no TC39 proposal to standardise the l flag and we suspect there won't be. The V8 flag surface argues for transparent engine routing as the intended end-state — engine selection should be invisible to user code:
-
--default-to-experimental-regexp-engine— static routing: use experimental wherever the pattern allows. -
--enable-experimental-regexp-engine-on-excessive-backtrackspaired with--regexp-backtracks-before-fallback=50000— dynamic routing: Irregexp first, swap engines after N backtracks.
Both are user-invisible. Standardising /foo/l would commit V8 (and ECMAScript) to engine choice as a permanent language feature, which is a worse design if you believe you can route correctly under the hood. The l flag reads like internal scaffolding for testing the engine on specific patterns, not the intended user-facing surface — consistent with the absence of any proposal track.
The closest TC39 work in this space targets different mechanisms entirely:
-
tc39/proposal-regexp-atomic-operators— atomic groups(?>...), Stage 1. A backtracking-mitigation lever inside Irregexp, not a separate engine. -
rbuckton/proposal-regexp-features— informal umbrella for future RegExp work; no linear-engine proposal under it.
Neither addresses linear-time execution as a contract.
Five years on, still experimental in V8 8.8's original sense — opt-in, off by default, undocumented in Node. The full flag surface from node --v8-options:
--enable-experimental-regexp-engine # recognise the non-standard /l flag
--default-to-experimental-regexp-engine # run all regexps on it where possible
--experimental-regexp-engine-capture-group-opt # added since 2021, optional capture-group support
--enable-experimental-regexp-engine-on-excessive-backtracks
--regexp-backtracks-before-fallback=50000 # threshold for the fallback mode
All default to off. The two flags added since the 2021 blog post (default-to-..., capture-group-opt) show V8 is still maintaining the engine, but there is no public roadmap toward making it default, no Node-side announcement, and no follow-up blog post.
Two practical wrinkles:
-
Not allowed in
NODE_OPTIONS— the flag must be passed directly tonode. Node treats it as an unsupported V8 flag (see nodejs/node #55400). -
REPL-incompatible literal form —
acorn, which Node uses to preprocess top-level await, does not recognise/foo/lsyntax. The constructor formnew RegExp(pat, 'l')works fine.
| Feature | Irregexp (V8 default) | V8 experimental /l
|
node-re2 (RE2) |
|---|---|---|---|
| Linear-time guarantee | no | yes | yes |
| Backreferences | yes | no | no |
| Lookahead / lookbehind | yes | no | no |
| Capture groups | yes | partial (capture-group-opt) |
yes |
Unicode (u) flag |
yes | no | yes (always implicit) |
Case-insensitive (i) |
yes | no | yes |
Large counted repetition /a{200,500}/
|
yes | no | yes |
| Buffer / binary input | no | no | yes |
| Multi-pattern Set API | no | no | yes (RE2.Set) |
| Cross-runtime standardised | yes | no — flag-gated, V8-only | yes — wherever native addons run |
Both linear engines (V8 /l and RE2) form a strict subset of standard RegExp semantics — that is the price of dropping backreferences and lookaround. Where V8 /l works, RE2 works and does more.
bench/bad-pattern.mjs and bench/set-match.mjs in this repo carry comparable runs. The Linear entrant registers itself when --enable-experimental-regexp-engine is enabled and silently drops otherwise.
Pattern ([a-z]+)+$ against aaaaaaaaaa! (10 a's, no match — a textbook ReDoS case):
| engine | median time | vs Irregexp |
|---|---|---|
| RegExp (Irregexp, backtracking) | 15.73 µs | baseline |
V8 experimental (/l) |
1.463 µs | 10.8× faster |
| RE2 (node-re2) | 101.4 ns | 155× faster than Irregexp; 14× faster than V8 /l |
The 14× RE2-over-experimental gap is the architecture section made concrete: same algorithmic family, but RE2's tuned tiered interpreter beats V8's single-tier bytecode interpreter by an order of magnitude.
200 simple patterns tokenN(?:[a-z]+)? × 500 inputs:
| engine | median time | vs Irregexp |
|---|---|---|
| RE2.Set | 808 µs | 5.85× faster |
| RegExp (Irregexp) | 4.73 ms | baseline |
| RE2 (per-pattern) | 127 ms | 26.9× slower |
V8 experimental (/l) |
142.3 ms | 30.1× slower |
Three takeaways:
-
RE2.Set is unmatched for multi-pattern dispatch. Neither V8 engine has an analog. A planned
RE2::FilteredRE2binding (see project queue) doubles down on this direction. - Per-pattern RE2 loses to Irregexp in JS-loop-calling-FFI-100,000-times shapes. FFI overhead dominates. RE2's value is safety on untrusted patterns, not raw speed on benign ones.
- V8 experimental is the slowest on benign patterns — slower than per-pattern RE2 across the FFI boundary, 30× slower than Irregexp. The V8 team's own caveat ("Irregexp is orders of magnitude faster on most common patterns") still holds, and the missing JIT is why.
Run with:
node --enable-experimental-regexp-engine $(npx nano-bench --self) bench/bad-pattern.mjs
node --enable-experimental-regexp-engine $(npx nano-bench --self) bench/set-match.mjsWe think it is good that V8 implements a non-backtracking engine — it validates the algorithmic approach node-re2 has shipped since 2014. In an ideal world Node, Bun, and Deno would all expose proper non-backtracking regex natively, and this project would become redundant. That outcome is something to root for.
The ideal world is not the one we live in, and the architectural observations above explain why it is unlikely to arrive soon:
-
Two compounding moats, both rate-limited by the missing JIT. Even if V8 shipped transparent routing tomorrow, the experimental engine is 30× slower than Irregexp on benign patterns, so any routing policy has to be conservative — V8 cannot risk regressing the common case.
node-re2with its tiered RE2 engine stays meaningfully faster on the pathological case. Both moats erode only when V8 invests in JIT-compiling the experimental engine; until then the gap is structural, not just current. No one owns that work upstream. -
No standardised user-facing guarantee. Routing-by-heuristic ("fall back after 50,000 backtracks") is a mitigation, not a contract. App authors who want a hard linear-time guarantee — typically because they execute patterns from untrusted input — have nowhere to go on stock Node.
node-re2is the only Node-side tool that gives that contract byimport. -
Smaller feature set even where V8
/lworks. No Unicode, no case-insensitive matching, no large counted repetition, weaker capture-group support. The intersection of "patterns V8/laccepts" and "patterns real code writes" is small. -
No multi-pattern API.
RE2.Set(and the plannedRE2::FilteredRE2binding) have no V8-side equivalent regardless of routing.
For Node users with untrusted patterns, node-re2 works today, covers a much wider regex language, and is the only tool of its kind across runtimes. V8's engine is the right algorithmic shape but the wrong shipping state — and the routing-then-JIT-then-stable path has at least one major piece missing with no owner.
Installing
Troubleshooting
- Problem: ABI mismatch in Electron
- Problem: build vs prod environments
- Problem: non-ASCII install path
Developers
Research and notes
- Notes on building alternatives
- Related projects
- FilteredRE2 evaluation
- RE2 lookbehinds fork assessment
- V8 Fast API assessment
- V8 non-backtracking RegExp engine
Project
Repository · README · prebuilds via install-artifact-from-github