pxf: parser-side @<name>/@entry/@table directive grammar#9
Merged
Conversation
Adds the v0.72-v0.75 directive grammar to the PXF parser and the fast
direct-decode path (parser-tier only — runtime semantics arrive in
follow-up PRs of the cpp catch-up sequence).
AST changes:
- Document gains directives[], tables[], body_offset; type_url and
entries[] keep their meaning
- Directive { pos, name, prefixes[], type (back-compat single-prefix),
body, has_body, leading_comments } — Body holds the raw bytes
between '{' and '}', preserved verbatim
- TableDirective { pos, type, columns[], rows[], leading_comments }
with TableRow::cells[] = vector<optional<ValuePtr>> (nullopt =
absent cell, *NullVal = present-but-null, other = present-with-value)
- Position gains `offset` (byte offset into lexer input) so directive
Body extraction can slice raw bytes
Lexer changes:
- kAtDirective (any @<ident> not "type" / "table"; Token.value is the
bare name) and kAtTable join the existing kAtType
- kLParen / kRParen for @table column lists and row tuples
- lex_.Input() exposed so the parser can slice directive Body bytes
Parser (slow / AST tier):
- parseDocument runs a top-of-document directive prelude: @type,
@<directive>, @table in any order; doc.body_offset tracks the
end of the last directive (chameleon's hashing anchor)
- parseDirective handles zero-or-more prefix identifiers with
one-token lookahead (IDENT followed by '=' / ':' is a body key,
not a prefix); optional inline block, body raw bytes extracted
via findMatchingBrace (mirrors protowire-go)
- parseTableDirective + parseTableRow enforce v1 cell grammar
(scalar shapes only — no list / block in cells); arity check
against the column list; dotted column paths rejected
- Standalone constraint (draft §3.4.4): a document with any @table
MUST NOT also have @type or top-level field entries
Fast path (direct decode):
- consumeDirectives mirrors the AST parser's prelude at the token
level (no AST allocation); discards directive contents in this
PR — Result accessors / TableReader / BindRow land in subsequent
PRs
- Same standalone constraint and arity checks
Tests:
- 21 new tests in pxf_directive_test.cc covering bare directives,
one/two-prefix shapes, lookahead disambiguation, block body raw
bytes, nested braces, braces inside strings, @table happy path,
empty / null cells (three-state grammar), zero rows, arity
mismatch, dotted columns, list/block cell rejection, both
standalone-constraint violations
- All 132 tests pass; end-to-end smoke via cmd/check_decode
confirms the fast path correctly skips chameleon-style
`@header T { ... }` plus a multi-prefix `@frob alpha beta`
before decoding the schema-typed body
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
trendvidia
added a commit
that referenced
this pull request
May 12, 2026
Adds 26 new tests on top of PR #9's initial 21: Fast-path (PxfDirectiveFast fixture, 19 tests) — every code path in ConsumeDirectives, including the standalone-constraint enforcement in both orderings (@type before @table and @table before @type), bare / single-prefix / multi-prefix / inline-block / nested-block directive shapes, the prefix-lookahead disambiguator, every @table error return (missing type, missing '(' , empty column list, bad column token, missing ',' or ')' in column list and rows, arity mismatch, dotted columns, list-cell and block-cell rejection), and the "@type accepts string form" back-compat that the fast path supports but the AST parser does not. AST-tier error paths (7 tests) — @type without an IDENT, findMatchingBrace's #-comment / //-comment / /*-comment / b"..." sub-skip branches, zero-prefix-no-legacy-type back-compat, and the @table-after-@type rejection symmetric to @type-after-@table. All 170 tests pass locally; the fast-path tests exercise the 123-line ConsumeDirectives block that PR #9's initial test set missed because it only invoked Parse() (AST tier).
2 tasks
trendvidia
added a commit
that referenced
this pull request
May 12, 2026
Adds 26 new tests on top of PR #9's initial 21: Fast-path (PxfDirectiveFast fixture, 19 tests) — every code path in ConsumeDirectives, including the standalone-constraint enforcement in both orderings (@type before @table and @table before @type), bare / single-prefix / multi-prefix / inline-block / nested-block directive shapes, the prefix-lookahead disambiguator, every @table error return (missing type, missing '(' , empty column list, bad column token, missing ',' or ')' in column list and rows, arity mismatch, dotted columns, list-cell and block-cell rejection), and the "@type accepts string form" back-compat that the fast path supports but the AST parser does not. AST-tier error paths (7 tests) — @type without an IDENT, findMatchingBrace's #-comment / //-comment / /*-comment / b"..." sub-skip branches, zero-prefix-no-legacy-type back-compat, and the @table-after-@type rejection symmetric to @type-after-@table. All 170 tests pass locally; the fast-path tests exercise the 123-line ConsumeDirectives block that PR #9's initial test set missed because it only invoked Parse() (AST tier).
3 tasks
trendvidia
added a commit
that referenced
this pull request
May 12, 2026
Third PR of the v0.72-v0.75 cpp catch-up. The fast-path direct
decoder previously walked directives just enough to satisfy the
standalone constraint and arity checks; it discarded their content.
This PR wires the parsed shape onto Result so consumers can read the
document-root directive list after UnmarshalFull returns.
API additions on Result:
- Directives() → const vector<Directive>& : generic
`@<name> *(prefix) [{ ... }]` blocks in source order. body holds
raw bytes between '{' and '}', preserved verbatim for downstream
re-parsing (chameleon's @Header reader, etc.). Single-prefix
populates the back-compat `type` field per v0.72.0 shape.
- Tables() → const vector<TableDirective>& : @table directives
with full column metadata and parsed cell ValuePtr per row,
faithful to the three-state cell grammar (absent / present-null /
present-with-value).
- AddDirective(...) / AddTable(...) : internal mutators used by
the fast path; not part of the consumer API.
Fast path (decode_fast.cc):
- ConsumeDirectives builds Directive / TableDirective structs
inline, conditionally appending to result_ when non-null.
Unmarshal (Result=nullptr) retains its zero-allocation contract:
the fast path still walks and validates directives but allocates
nothing on the prelude.
- New ParseScalarCellValue helper mirrors the scalar branches of
the AST parser's ParseValue. Used by @table row parsing; list /
block cell tokens are already rejected before it's called.
- Body bytes for @<directive> blocks are sliced from
lex_.Input().substr(open + 1, close - (open + 1)) using the
Position::offset added in PR #9.
Tests (14 new in pxf_result_directives_test.cc, 198 total):
- Empty document, bare / single-prefix / multi-prefix directives
- @type does not leak into Directives()
- Nested block body preserved verbatim
- Multiple directives in source order
- @table columns / rows / cells (concrete value tagging)
- Three-state cells (absent / null / value)
- Multiple tables in order
- Directives + tables coexisting
- Unmarshal (no Result) still succeeds (regression check on the
result_-null branch)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First of 5 PRs porting v0.72-v0.75 features to protowire-cpp. Adds the directive grammar to the AST parser and the fast direct-decode path — parser-tier only; runtime semantics (Result accessors, TableReader streaming, per-row Scan/BindRow) arrive in subsequent PRs.
Mirrors the Go reference at `protowire-go/encoding/pxf/{parser,decode_fast}.go` (draft §3.4.2 – §3.4.4).
AST additions to `Document`:
Lexer: `kAtDirective` / `kAtTable` join `kAtType`; `kLParen` / `kRParen` added for table column lists.
Parser (slow / AST tier): top-of-document directive prelude; `parseDirective` uses one-token lookahead (IDENT followed by `=` / `:` is a body key, not a prefix); inline-block body extracted via `findMatchingBrace` (string- and comment-aware); `parseTableDirective` enforces v1 cell grammar (scalar shapes only), column arity, and rejects dotted column paths.
Fast path (`decode_fast.cc`): `consumeDirectives` mirrors the AST prelude at the token level — no AST allocation, contents discarded in this PR. Same standalone constraint and arity checks.
Standalone constraint (draft §3.4.4): a document with any `@table` MUST NOT also have `@type` or top-level field entries.
Test plan