Skip to content

v1.14.0

Choose a tag to compare

@yhirose yhirose released this 02 Jul 18:23

✨ New Features

Predefined character classes (#87)

Character classes now support regex-style shorthand escapes and POSIX named classes, both with ASCII semantics:

Number <- [\d]+
Ident  <- [[:alpha:]_][\w]*
NoWS   <- [^\s]+
  • Shorthands: \d \D \w \W \s \S (inside [...] and [^...])
  • POSIX classes: [[:alpha:]], [[:^digit:]], etc. — alnum, alpha, ascii, blank, cntrl, digit, graph, lower, print, punct, space, upper, word, xdigit
  • Both are desugared into plain codepoint ranges at grammar-parse time, so serialization, first-set computation, and matching performance are unaffected.
  • Fully backward compatible: \d was previously a syntax error, and non-POSIX sequences like [[:] keep parsing as literal characters. An unknown POSIX class name (e.g. [[:foo:]]) fails grammar loading with a clear error.
  • Unicode property classes (\p{L}, UTS #18) are left as a future extension.

{ no_whitespace } instruction (#44)

A rule can now opt out of %whitespace skipping — requested back in 2018:

StrQuot   <- '"' (StrEscape / StrChars)* '"'  { no_whitespace }
StrEscape <- '\\' .
StrChars  <- (!'"' !'\\' .)+

%whitespace <- [ \t\r\n]*

Whitespace skipping is disabled inside the rule and resumes after it — like a token boundary, without the token capture. It composes with other instructions ({ no_whitespace; no_ast_opt }) and works with packrat parsing and grammar blobs.

peglint: whitespace/predicate lint (#319)

With %whitespace defined, a literal skips its trailing whitespace before a following predicate tests the input, so KEYWORD <- "create" !IDCHAR silently checks the wrong position. peglint now detects this pattern and suggests wrapping it in a token boundary or adding { no_whitespace }. The !. end-of-input idiom is excluded.

📚 Documentation

  • New README section "How %whitespace works exactly": the three exact skip points, the no-skip zones, and the recurring pitfalls behind #44, #292, #319, #325, #327, and #328 — including the previously undocumented rule that definitions whose name starts with _ are hidden from error messages.

🐛 Bug Fixes

  • operator"" _ deprecation warning re-fixed (#338, #340): the original fix was accidentally reverted by an older clang-format run (clang-format < 19 inserts a space between "" and the suffix identifier). Compiles warning-free in C++23 mode again.
  • Test suite no longer skips silently: when the mini-js grammar file cannot be found, the tests now fail with the reason instead of being reported as skipped while ctest shows "100% passed" (follow-up to #341).

⚠️ Compatibility Notes

  • Grammar blob format bumped (PEG1PEG2) to carry the per-rule no_whitespace flag. Blobs generated with v1.13.0 are cleanly rejected with a bad-magic error — regenerate them with peglint --blob or parser::serialize_grammar().
  • peg::Definition gained a data member; as a header-only library this only requires recompiling — but don't mix object files compiled against different peglib versions.