Replies: 1 comment
-
|
Would be cool to have. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is a feature inquiry / scoping question — checking which direction the team wants before anyone (potentially CX or others) invests in a port. Filing as an Issue to invite discussion; happy to move to Discussions if preferred.
Where V is today
V's regex ecosystem now has two engines:
vlib/regex— the original custom-syntax NFA engine (~4.5K LOC, not PCRE-compatible by design per its README: "V philosophy, to have one way and keep it simple").vlib/regex.pcre— a newer PCRE-syntax VM with active optimisation work (commits41d9d85d6,97f5b8994,53bb04c6f, et al. in late 2025 / early 2026). Its README describes a non-recursive VM with dynamic backtracking-stack growth — i.e. a Spencer/Thompson-style backtracking matcher with PCRE ergonomics.Neither is linear-time-guaranteed. Both can exhibit catastrophic backtracking on adversarial input, which is a real concern for use cases that match against untrusted strings (schema validators, log/event ingestion, content classification).
The gap
A linear-time engine in the RE2 / Go
regexp/ Rustregexlineage gives:O(n·m), regardless of input or pattern. ReDoS-immune by construction.Historical note
Issue #1114 (closed 2020-01-17) explored exactly this. A maintainer (@joe-conigliaro) wrote "I agree we should port re2" and "I'm going to begin porting re2 to v specifically the golang implementation." The issue was closed when work began on
vlib/regex, but the engine that landed was the custom-syntax NFA — not the Go-regexp/ RE2-derived linear-time engine originally discussed.regex.pcreis more recent and clearly the team's current focus for "real-world syntax." A separate linear-time engine would complement, not replace, it.The scoping question
vlib(e.g.vlib/regex.linearorvlib/regex.re2) that prioritises linearity over PCRE features?regexp(BSD-3, well-tested, ~10K LOC), Rust'sregex-automata(MIT, lazy DFA, more capable), or something else?vlib/or as a third-party module undervlang/?Context from downstream
CX (a V-native data interchange library) ships its schema validator against an RE2 C++ shim today (system libre2 + a ~200-line wrapper). The shim works but adds a system dependency. CX has the motivation to port a linear-time engine to V post-its-v0.7.0 — but only worth doing if there's a chance of upstreaming. If the answer is "third-party module," that's also fine and informs the design (it'd then be optimised for CX-shaped queries rather than general use).
Happy to help scope further or contribute a prototype if the direction is welcome. Just looking for a signal on whether to invest before anyone writes code.
Note
You can use the 👍 reaction to increase the issue's priority for developers.
Please note that only the 👍 reaction to the issue itself counts as a vote.
Other reactions and those to comments will not be taken into account.
Beta Was this translation helpful? Give feedback.
All reactions