v1.14.0
✨ New Features
Predefined character classes (#87)
Character classes now support regex-style shorthand escapes and POSIX named classes, both with ASCII semantics:
Number <- [\d]+
Ident <- [[:alpha:]_][\w]*
NoWS <- [^\s]+
- Shorthands:
\d\D\w\W\s\S(inside[...]and[^...]) - POSIX classes:
[[:alpha:]],[[:^digit:]], etc. —alnum,alpha,ascii,blank,cntrl,digit,graph,lower,print,punct,space,upper,word,xdigit - Both are desugared into plain codepoint ranges at grammar-parse time, so serialization, first-set computation, and matching performance are unaffected.
- Fully backward compatible:
\dwas previously a syntax error, and non-POSIX sequences like[[:]keep parsing as literal characters. An unknown POSIX class name (e.g.[[:foo:]]) fails grammar loading with a clear error. - Unicode property classes (
\p{L}, UTS #18) are left as a future extension.
{ no_whitespace } instruction (#44)
A rule can now opt out of %whitespace skipping — requested back in 2018:
StrQuot <- '"' (StrEscape / StrChars)* '"' { no_whitespace }
StrEscape <- '\\' .
StrChars <- (!'"' !'\\' .)+
%whitespace <- [ \t\r\n]*
Whitespace skipping is disabled inside the rule and resumes after it — like a token boundary, without the token capture. It composes with other instructions ({ no_whitespace; no_ast_opt }) and works with packrat parsing and grammar blobs.
peglint: whitespace/predicate lint (#319)
With %whitespace defined, a literal skips its trailing whitespace before a following predicate tests the input, so KEYWORD <- "create" !IDCHAR silently checks the wrong position. peglint now detects this pattern and suggests wrapping it in a token boundary or adding { no_whitespace }. The !. end-of-input idiom is excluded.
📚 Documentation
- New README section "How
%whitespaceworks exactly": the three exact skip points, the no-skip zones, and the recurring pitfalls behind #44, #292, #319, #325, #327, and #328 — including the previously undocumented rule that definitions whose name starts with_are hidden from error messages.
🐛 Bug Fixes
operator"" _deprecation warning re-fixed (#338, #340): the original fix was accidentally reverted by an older clang-format run (clang-format < 19 inserts a space between""and the suffix identifier). Compiles warning-free in C++23 mode again.- Test suite no longer skips silently: when the mini-js grammar file cannot be found, the tests now fail with the reason instead of being reported as skipped while ctest shows "100% passed" (follow-up to #341).
⚠️ Compatibility Notes
- Grammar blob format bumped (
PEG1→PEG2) to carry the per-ruleno_whitespaceflag. Blobs generated with v1.13.0 are cleanly rejected with a bad-magic error — regenerate them withpeglint --bloborparser::serialize_grammar(). peg::Definitiongained a data member; as a header-only library this only requires recompiling — but don't mix object files compiled against different peglib versions.