⚠️ API STABILITY NOTICE
This is a alpha release with a stabilizing API. While core functionality is complete and well-tested,
API changes may occur in future versions as we refine the implementation.
Regular expression engine for MoonBit — inspired by Russ Cox's regex series.
test {
// Compile once, use everywhere
let regexp = @regexp.compile("a(bc|de)f")
let result = regexp.execute("xxabcf")
if result.matched() {
// ["abcf", "bc"]
inspect(
result.results(),
content=(
#|[Some("abcf"), Some("bc")]
),
)
}
}
compile(pattern)
→ Creates anEngine
engine.execute(text)
→ ReturnsMatchResult
result.matched()
→Bool
result.get(index)
→ Capture group contentresult.results()
→ Iterator over all matches
engine.group_by_name(name)
→ Find group index by nameengine.group_count()
→ Total capture groupsresult.groups()
→ Get named group content
Feature | Example | What it does |
---|---|---|
Literals | abc |
Match exact text |
Wildcards | a.c |
. matches any character |
Quantifiers | a+ , b* , c? |
One or more, zero or more, optional |
Ranges | a{2,5} |
Between 2-5 repetitions |
Classes | [a-z] , [^0-9] |
Character sets, negated sets |
Groups | (abc) , (?:xyz) |
Capturing, non-capturing |
Named | (?<word>abc) |
Named capture groups |
Choice | cat|dog |
Match either option |
Anchors | ^start , end$ |
Line boundaries |
Escapes | \\u{41} , \\u0041 |
Unicode escapes, standard escapes |
Unicode Props | \\p{L} , \\p{Nd} |
Unicode general categories |
Backrefs |
(.)\\1 |
Reference previous captures |
Match characters by their Unicode general categories:
test "unicode properties" {
// Matching gc=L
let regex = @regexp.compile("\\p{Letter}+")
inspect(
regex.execute("Hello 世界").results(),
content=(
#|[Some("Hello")]
),
)
// Matching gc=N
let regex = @regexp.compile("\\p{Number}+")
inspect(
regex.execute("123 and 456").results(),
content=(
#|[Some("123")]
),
)
}
Supported Propertes:
⚠️ Performance Warning: Backreferences can cause exponential time complexity in worst cases!
test "backreferences" {
// Palindrome detection (simple)
let palindrome = @regexp.compile("^(.)(.)\\2\\1")
inspect(
palindrome.execute("abba").results(),
content=(
#|[Some("abba"), Some("a"), Some("b")]
),
)
// HTML tag matching
let html_regex = @regexp.compile("<([a-zA-Z]+)[^>]*>(.*?)</\\1>")
let result = html_regex.execute("<div class='test'>content</div>")
inspect(
result.results(),
content=(
#|[Some("<div class='test'>content</div>"), Some("div"), Some("content")]
),
)
}
test "character classes" {
// Email validation (simplified)
let email = @regexp.compile(
(
#|[\w-]+@[\w-]+\.\w+
),
)
let email_result = email.execute("user@example.com").results()
inspect(
email_result,
content=(
#|[Some("user@example.com")]
),
)
// Extract numbers
let numbers = @regexp.compile(
(
#|\d+\.\d{2}
),
)
let result = numbers.execute("Price: $42.99").results()
inspect(
result,
content=(
#|[Some("42.99")]
),
)
// Named captures for parsing
let parser = @regexp.compile(
(
#|(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
),
)
let date_result = parser.execute("2024-03-15")
inspect(
date_result.groups(),
content=(
#|{"year": "2024", "month": "03", "day": "15"}
),
)
}
test {
try {
let _ = @regexp.compile("a(b") // Oops! Missing )
} catch {
RegexpError(err=MissingParenthesis, source_fragment=_) => println("Fix your regex! 🔧")
_ => ()
}
}
- Predictable complexity — Designed to avoid catastrophic backtracking (except with backreferences)
- VM-based — Structured interpreter design
- Unicode support — Character set and property support
Built with reliability and correctness as primary goals.
This implementation has some behavior differences compared to other popular regex engines:
-
Empty Character Class Handling:
- In JavaScript:
[][]
is parsed as two character classes with no characters - In Golang:
[][]
is parsed as one character class containing]
and[
- In MoonBit: we follow the JavaScript interpretation
- In JavaScript:
-
Empty Alternatives Behavior:
- Expressions like
(|a)*
and(|a)+
have specific behavior that may differ from other implementations - See Golang issue #46123 for related discussion
- Expressions like
-
Backreferences:
- Backreferences are supported but may impact the complexity guarantees of the engine