Skip to content

encoding.cbor: add CBOR (RFC 8949) module#27018

Open
davlgd wants to merge 4 commits intovlang:masterfrom
davlgd:davlgd-cbor
Open

encoding.cbor: add CBOR (RFC 8949) module#27018
davlgd wants to merge 4 commits intovlang:masterfrom
davlgd:davlgd-cbor

Conversation

@davlgd
Copy link
Copy Markdown
Contributor

@davlgd davlgd commented Apr 28, 2026

Adds encoding.cbor, a module implementing the Concise Binary Object Representation as specified by RFC 8949 (the current CBOR Internet Standard, October 2020).

The module covers the full encode + decode surface (all 8 major types, indefinite-length items, tags, half/single/double float, simple values, self-describe tag, deterministic / canonical mode), comptime generic encode/decode for arbitrary V structs, and the standard Marshaler / Unmarshaler interfaces.

Why

CBOR is the binary equivalent of JSON used by COSE (RFC 8152), CWT (RFC 8392), WebAuthn / FIDO2, OCF, OpenTelemetry's experimental binary encoding, IPFS DAG-CBOR, and several IoT / mTLS profiles. Other ecosystems ship a stdlib or quasi-stdlib implementation (Go's fxamacker/cbor, Rust's ciborium, Python's cbor2); Such a module will help developers to simpler implement standards using V.

Module surface

import encoding.cbor

bytes := cbor.encode(my_struct)!
back  := cbor.decode[MyStruct](bytes, cbor.DecodeOpts{})!

// streaming
mut p := cbor.new_packer(cbor.EncodeOpts{deterministic: true})
p.pack_array_header(3)
p.pack_uint(1) p.pack_text('two') p.pack_value(cbor.Bool{value: true})
out := p.bytes()

Conformance / test vectors

All of the recognised public CBOR conformance corpora ship under vlib/encoding/cbor/tests/ and run as part of v test:

  • RFC 8949 Appendix A — every vector from the normative appendix is encoded as an inline table in rfc8949_appendix_a_test.v (80+ entries, hex + value + roundtrip flag).
  • cbor/test-vectors (https://github.com/cbor/test-vectors — the same JSON corpus that drives ciborium, serde_cbor and cbor2): tests/appendix_a.json + upstream_appendix_a_test.v.
  • cbor-wg/cbor-test-vectors (https://github.com/cbor-wg/cbor-test-vectors, the IETF working-group corpus): EDN fixtures under tests/cbor_wg/ driven by cbor_wg_test.v — 88 well-formed payloads (must decode), 47 malformed payloads (must reject), plus the per-major-type files appA_mt0.ednappA_mt6.edn, appA_mt7-float.edn, appA_mt7-simple.edn, appA_streaming.edn.
  • COSE / CWT (RFC 8152 / RFC 8392) — cose_cwt_test.v decodes the canonical CWT claims-set vector from RFC 8392 §A.1 and re-encodes it byte-exact.
  • Half-precision (binary16) — every special bit pattern (±0, ±Inf, qNaN, sNaN, all subnormals) is round-tripped in rfc8949_appendix_a_test.v and generic_test.v.

In addition to the public corpora, the in-tree suite covers:

  • canonical_test.v — RFC 8949 §4.2 deterministic encoding (sorted-by-encoded-key maps, shortest-form integers, shortest-form floats, no indefinite-length items).
  • security_test.v — depth-limit DoS, duplicate-key rejection, tag-0/1 content-type validation (RFC 8949 §3.4.1), nested recursion bomb, malformed-UTF-8 in text strings.
  • generic_test.v — comptime encode[T] / decode[T] over primitives, arrays, fixed arrays, maps, optionals, sum types, nested structs, time.Time (tag 0 / tag 1), []u8, enums, and the cbor: "-" / cbor_optional field attributes.
  • time_test.v — RFC 3339 (tag 0) and epoch (tag 1) on the V time.Time type.
  • smoke_test.v — minimal sanity harness used by editors / vls.

Out-of-tree validation harness (not part of this PR)

To gain extra confidence before opening this PR, the module was also driven against three external reference implementations. The harness lives outside the V tree (it depends on cargo and cddl):

  • Rust ciborium interop — V encodes a representative struct, ciborium::de decodes it byte-exact, ciborium::ser re-emits it, V re-decodes. Includes a non-preferred-width fixture to verify the Robustness-Principle decode path.
  • CDDL (RFC 8610) — the cddl Rust binary validates V-emitted payloads against custom schemas (a User schema and the canonical CWT schema), and a negative case (iss typed as uint where the schema requires text) confirms rejection.
  • WebAuthn / FIDO2 — a real attestation-object payload from the W3C WebAuthn examples is decoded and the authData / attStmt substructures are re-encoded and compared.

All of the above pass on the current branch.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2bcd4bd9c4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vlib/encoding/cbor/decoder.v Outdated
if info == 31 {
return -1
}
return i64(u.read_arg(info)!)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject oversized definite lengths in header unpackers

unpack_array_header converts the decoded length from u64 to i64 without a bounds check, so values above i64.max wrap negative (for example u64::max becomes -1) and callers treat a definite-length container as indefinite. This can make malformed payloads parse as valid and corrupt downstream decoding logic; the same pattern is repeated in unpack_map_header and should be guarded before casting.

Useful? React with 👍 / 👎.

Comment thread vlib/encoding/cbor/decoder.v Outdated
Comment on lines +527 to +531
if u.peek_break() {
u.pos++
break
}
u.skip_inner(depth + 1)!
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce chunk type rules when skipping indefinite strings

In the skip_inner path for major types 2/3 with indefinite length, each chunk is skipped via skip_inner recursively without checking that chunks are definite and of the same major type. That means malformed CBOR like an indefinite byte string containing text chunks can be accepted whenever decoding relies on skip_value (notably RawMessage, Unmarshaler, and unknown-field skipping), so this branch should validate chunk headers the same way unpack_bytes/unpack_text do.

Useful? React with 👍 / 👎.

Comment thread vlib/encoding/cbor/generic.v Outdated
Comment on lines +114 to +116
p.pack_map_header(u64(field_count))
$for field in T.fields {
if !cbor_field_skipped(field) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sort struct map keys in canonical encoding mode

When encoding structs, canonical mode still emits keys in declaration order via $for field in T.fields instead of RFC 8949 deterministic key order. Because structs are encoded as CBOR maps, this violates the module's canonical-encoding contract and can break hash/signature interoperability for typed payloads; the struct branch should apply the same encoded-key sorting used for map/value canonical paths.

Useful? React with 👍 / 👎.

@davlgd
Copy link
Copy Markdown
Contributor Author

davlgd commented Apr 28, 2026

Fixed all 3 points and added tests. Let me know if you want me to squash commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant