encoding.cbor: add CBOR (RFC 8949) module#27018
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2bcd4bd9c4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if info == 31 { | ||
| return -1 | ||
| } | ||
| return i64(u.read_arg(info)!) |
There was a problem hiding this comment.
Reject oversized definite lengths in header unpackers
unpack_array_header converts the decoded length from u64 to i64 without a bounds check, so values above i64.max wrap negative (for example u64::max becomes -1) and callers treat a definite-length container as indefinite. This can make malformed payloads parse as valid and corrupt downstream decoding logic; the same pattern is repeated in unpack_map_header and should be guarded before casting.
Useful? React with 👍 / 👎.
| if u.peek_break() { | ||
| u.pos++ | ||
| break | ||
| } | ||
| u.skip_inner(depth + 1)! |
There was a problem hiding this comment.
Enforce chunk type rules when skipping indefinite strings
In the skip_inner path for major types 2/3 with indefinite length, each chunk is skipped via skip_inner recursively without checking that chunks are definite and of the same major type. That means malformed CBOR like an indefinite byte string containing text chunks can be accepted whenever decoding relies on skip_value (notably RawMessage, Unmarshaler, and unknown-field skipping), so this branch should validate chunk headers the same way unpack_bytes/unpack_text do.
Useful? React with 👍 / 👎.
| p.pack_map_header(u64(field_count)) | ||
| $for field in T.fields { | ||
| if !cbor_field_skipped(field) { |
There was a problem hiding this comment.
Sort struct map keys in canonical encoding mode
When encoding structs, canonical mode still emits keys in declaration order via $for field in T.fields instead of RFC 8949 deterministic key order. Because structs are encoded as CBOR maps, this violates the module's canonical-encoding contract and can break hash/signature interoperability for typed payloads; the struct branch should apply the same encoded-key sorting used for map/value canonical paths.
Useful? React with 👍 / 👎.
|
Fixed all 3 points and added tests. Let me know if you want me to squash commits. |
Adds
encoding.cbor, a module implementing the Concise Binary Object Representation as specified by RFC 8949 (the current CBOR Internet Standard, October 2020).The module covers the full encode + decode surface (all 8 major types, indefinite-length items, tags, half/single/double float, simple values, self-describe tag, deterministic / canonical mode), comptime generic encode/decode for arbitrary V structs, and the standard
Marshaler/Unmarshalerinterfaces.Why
CBOR is the binary equivalent of JSON used by COSE (RFC 8152), CWT (RFC 8392), WebAuthn / FIDO2, OCF, OpenTelemetry's experimental binary encoding, IPFS DAG-CBOR, and several IoT / mTLS profiles. Other ecosystems ship a stdlib or quasi-stdlib implementation (Go's
fxamacker/cbor, Rust'sciborium, Python'scbor2); Such a module will help developers to simpler implement standards using V.Module surface
Conformance / test vectors
All of the recognised public CBOR conformance corpora ship under
vlib/encoding/cbor/tests/and run as part ofv test:rfc8949_appendix_a_test.v(80+ entries, hex + value + roundtrip flag).tests/appendix_a.json+upstream_appendix_a_test.v.tests/cbor_wg/driven bycbor_wg_test.v— 88 well-formed payloads (must decode), 47 malformed payloads (must reject), plus the per-major-type filesappA_mt0.edn…appA_mt6.edn,appA_mt7-float.edn,appA_mt7-simple.edn,appA_streaming.edn.cose_cwt_test.vdecodes the canonical CWT claims-set vector from RFC 8392 §A.1 and re-encodes it byte-exact.rfc8949_appendix_a_test.vandgeneric_test.v.In addition to the public corpora, the in-tree suite covers:
canonical_test.v— RFC 8949 §4.2 deterministic encoding (sorted-by-encoded-key maps, shortest-form integers, shortest-form floats, no indefinite-length items).security_test.v— depth-limit DoS, duplicate-key rejection, tag-0/1 content-type validation (RFC 8949 §3.4.1), nested recursion bomb, malformed-UTF-8 in text strings.generic_test.v— comptimeencode[T] / decode[T]over primitives, arrays, fixed arrays, maps, optionals, sum types, nested structs,time.Time(tag 0 / tag 1),[]u8, enums, and thecbor: "-"/cbor_optionalfield attributes.time_test.v— RFC 3339 (tag 0) and epoch (tag 1) on the Vtime.Timetype.smoke_test.v— minimal sanity harness used by editors / vls.Out-of-tree validation harness (not part of this PR)
To gain extra confidence before opening this PR, the module was also driven against three external reference implementations. The harness lives outside the V tree (it depends on
cargoandcddl):ciboriuminterop — V encodes a representative struct,ciborium::dedecodes it byte-exact,ciborium::serre-emits it, V re-decodes. Includes a non-preferred-width fixture to verify the Robustness-Principle decode path.cddlRust binary validates V-emitted payloads against custom schemas (aUserschema and the canonicalCWTschema), and a negative case (isstyped as uint where the schema requires text) confirms rejection.authData/attStmtsubstructures are re-encoded and compared.All of the above pass on the current branch.