✨ New Features
Structured error reports (#272)
set_logger delivers errors as pre-formatted strings, which is awkward if you want error codes, localized messages, or IDE diagnostics. The new set_error_reporter delivers the same errors as structured data, before they are flattened into a display string:
parser.set_error_reporter([](const peg::ErrorReport &r) {
// r.line, r.col : 1-based error position
// r.position : byte offset in the input
// r.unexpected_token : the token found at the error position
// r.expected_literals : e.g. {"}", ";"}
// r.expected_rules : e.g. {"NAME"} (rules starting with '_' excluded)
// r.message : custom error_message if any (placeholders resolved)
// r.label : rule name or recovery label the error belongs to
});Both callbacks can be set at once; each error is delivered to both. With error recovery, the reporter is called once per recovered error, so a single parse can produce multiple reports. Mapping r.label to an application-defined enum is the intended way to get typed errors.
Named captures in custom error messages (#289)
A custom error_message can now reference a named capture with %{name}, which expands to the text captured by $name<...> earlier in the parse (in addition to the existing %t / %c placeholders). An unknown name expands to an empty string.
Enum <- 'enum' $name<NAME> '{' NAME+^enum_count '}'
enum_count <- '' { error_message "enum '%{name}' must contain at least one member" }
> enum Color { }
1:14: enum 'Color' must contain at least one member
⚡ Performance
Selective packrat memoization
A packrat cache hit requires the same rule to be queried twice at the same input position, which in a PEG only happens when the alternatives of a choice share a leftmost prefix. The packrat filter is now restricted to rules that are leftmost-reachable (considering nullable prefixes) from 2+ alternatives of the same choice. On the SQL benchmark, the previous filter cached ~100 rules of which only two ever hit; the new filter keeps exactly those. This follows Becket & Somogyi's observation ("DCGs + Memoing = Packrat Parsing but Is It Worth It?", PADL 2008) that memoizing one or two nonterminals is all that pays off.
Rules outside the cached set no longer touch the cache tables at all; their re-entry protection is a per-rule active-position guard. The cache bitvectors are indexed by a compact slot, shrinking from def_count·(len+1) (~30 MB on a 1.2 MB input) to cached_count·(len+1) (~1.2 MB), which also eliminates most of the per-parse zero-fill.
big.sql(~1.2 MB): 66.8 ms → 45.8 ms (−31%), YACC (libpg_query) ratio 2.1× → 1.44×- Behavior is unchanged (memoization does not affect correctness); all tests pass.
🦀 rust-peglib
The same selective-packrat optimization was backported to the Rust port (which previously memoized every rule), giving big.sql 65 ms → 51.5 ms (−20%). Behavior verified against the C++ reference via the language-independent spec harness (530 cases).
⚠️ Compatibility Notes
- As a header-only library, this only requires recompiling.
peg::Definitionand the internal parseContextchanged layout for the new error-reporter and packrat plumbing, so don't mix object files compiled against different peglib versions.