Skip to content

v0.1: VLAN choreography, packet fixtures, kernel 6.0+ probe fix (PR #5)#5

Open
lunarthegrey wants to merge 9 commits intomainfrom
pr5-vlan
Open

v0.1: VLAN choreography, packet fixtures, kernel 6.0+ probe fix (PR #5)#5
lunarthegrey wants to merge 9 commits intomainfrom
pr5-vlan

Conversation

@lunarthegrey
Copy link
Copy Markdown
Contributor

@lunarthegrey lunarthegrey commented Apr 20, 2026

Summary

  • Full SPEC §4.7 VLAN choreography in the BPF program: 802.1Q ingress parse + egress push/pop/rewrite driven by a new `vlan_resolve` map, populated from `/proc/net/vlan/config` at attach time.
  • Raw-syscall `bpf_prog_test_run` harness + 21 packet-level fixtures covering the no-FIB §9 Phase 1 matrix (deferred from PR v0.1: fast-path BPF program + toolchain + verifier-pass test (PR #3) #3 because aya 0.13.1 doesn't wrap `BPF_PROG_TEST_RUN`).
  • Fixes the `EINVAL` failure I saw on Debian 6.1 — bpf_attr trailing-padding bytes from Rust `Default` land in the user buffer as garbage and trip the kernel's `CHECK_ATTR` macro. `mem::zeroed` then field-writes instead.
  • Fixes the `Ipv4Hdr::ihl()` misunderstanding that short-circuited every IPv4 path into `pass_complex_header` (network-types returns bytes, not the raw field).

What landed

BPF side

  • `tests/common/mod.rs` harness: raw bpf() syscall wrapper for `BPF_PROG_TEST_RUN`, synthetic packet builders for Ethernet + IPv4/TCP + IPv6/TCP, stat-map reader.
  • `tests/fixtures.rs`: 21 fixtures covering ARP / IPv4 options, fragments, TTL, allowlist miss / dry-run matched paths / IPv6 ext headers / tagged (VLAN) ingress-parse / jumbo(-ish) payload.
  • `bpf/src/maps.rs`: `vlan_resolve` HashMap (`u32 ifindex → VlanResolve{phys_ifindex, vid}`, 256 entries).
  • `bpf/src/main.rs`: one-tag 802.1Q ingress parse threads `ingress_vid` (as a u16 sentinel — `Option` tripped the verifier with `R5 !read_ok` during argument spills). Dispatch consults `vlan_resolve` on FIB success, picks effective egress ifindex + expected VID, and runs the §4.7 matrix:
    • untagged in/out → no-op
    • untagged → tagged → push (adjust_head -4, memmove MACs, write TPID + TCI)
    • tagged → untagged → pop (memmove MACs, adjust_head +4)
    • tagged X → tagged X → no-op
    • tagged X → tagged Y → rewrite (TCI in place)
  • Push/pop use `core::ptr::copy` per SPEC §4.7's explicit warning — the 6-byte MAC moves overlap with their sources, and the verifier won't catch `copy_nonoverlapping` misuse.

Userspace

  • `linux_impl.rs`: `populate_vlan_resolve` reads `/proc/net/vlan/config` at attach time, `if_nametoindex`es each (subif, parent) pair, and inserts into the map. Missing config (no 8021q module) is not an error.

Bug fixes

  • bpf_attr trailing padding (`TEST_RUN`, `PROG_LOAD`, `MAP_CREATE`): `mem::zeroed` + field writes instead of `#[derive(Default)]` struct-literal. Root cause of the EINVAL-with-empty-log on kernel 6.0+.
  • IPv4 IHL check: `Ipv4Hdr::ihl()` returns bytes (20 for a standard header), not the raw 4-bit IHL field. Compare to 20, not 5.
  • Option verifier: switched `ingress_vid` and `egress_vid` to `u16` with `VLAN_NONE = 0` as sentinel. 802.1Q VID 0 is reserved for priority-only tagging anyway.

What's NOT covered

  • Byte-level §4.7 push/pop/rewrite output verification. Those paths sit behind a `bpf_fib_lookup` SUCCESS verdict that needs real routes in the running netns — `bpf_prog_test_run` can't fake that. Byte-level tests land with a netns integration harness (follow-up PR).
  • True 9K jumbo frames. Need `BPF_F_TEST_XDP_LIVE_FRAMES` (kernel 5.18+). The 3.6K fixture proves the parse/allowlist path doesn't choke on jumbo-ish sizes; 9K waits on the netns harness too.
  • Re-promoting `prog_type.` and `helper.` feasibility probes to required. The bpf_attr padding fix should make them work on the test VM; I've left the demotion in place pending your re-run with the PR v0.1: VLAN choreography, packet fixtures, kernel 6.0+ probe fix (PR #5) #5 binary (feel free to promote in a follow-up once confirmed).
  • §11.1(a)/(b) and §11.11 empirical probe helpers. Those are EFG-specific runbook tools; deferring to when you have interactive EFG time.

CI

All 5 jobs green. The sudo step now runs 23 real kernel tests (verifier + veth attach + 21 fixtures) through aya + `BPF_PROG_TEST_RUN`. No more silent-stub false-positives.

lunarthegrey and others added 9 commits April 20, 2026 09:31
Covers the no-FIB §9 Phase 1 cases that were deferred from PR #3
because aya 0.13.1 doesn't wrap BPF_PROG_TEST_RUN for XDP. The raw
syscall harness lives in `tests/common/mod.rs` — it builds synthetic
Ethernet + IPv4/IPv6 + TCP packets, runs them through the loaded
BPF program via `bpf(BPF_PROG_TEST_RUN)`, and lets the test assert
on both the verdict and per-counter deltas.

Fixtures in `tests/fixtures.rs` currently cover:

- ARP (non-IP) → XDP_PASS + pass_not_ip
- IPv4 options (IHL>5) → pass_complex_header
- IPv4 MF fragment → pass_fragment
- IPv4 non-zero fragment offset → pass_fragment
- IPv4 TTL=1 and TTL=0 → pass_low_ttl
- IPv4 allowlist miss → XDP_PASS without bumping matched_v4
- IPv6 fragment extension header → pass_complex_header
- IPv6 Hop-by-Hop extension → pass_complex_header
- IPv6 hop_limit=1 → pass_low_ttl
- IPv4 src-only match dry-run → matched_v4 + matched_src_only + fwd_dry_run
- IPv4 dst-only match dry-run → matched_dst_only
- IPv4 both-match dry-run → matched_both
- IPv6 src-only match dry-run → matched_v6 + matched_src_only
- rx_total bumps for every packet regardless of verdict

FIB-return verdicts (SUCCESS/NO_NEIGH/DROP/FRAG_NEEDED/not-in-devmap)
still need a netns with real routes + a populated redirect_devmap;
those are a separate slice.

All fixtures are `#[ignore]`-gated (need CAP_BPF). CI's existing
sudo step (`cargo test --tests -- --ignored`) picks them up
automatically — no workflow changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of the BPF_PROG_TEST_RUN EINVAL (and the earlier raw
prog_load probe EINVAL on the Debian 6.1 test VM):

Rust's `#[derive(Default)]` + struct-literal init writes every named
field but says nothing about the struct's trailing padding. For our
bpf_attr variants the padding is 4 bytes at the end (needed to hit
8-byte alignment). Those bytes land as whatever the stack had at
allocation time.

The kernel's `CHECK_ATTR` macro validates that bytes past the
command's last field in the user-supplied attr are zero. With dirty
padding, the check fails and the syscall returns EINVAL with no log
— exactly the symptom we saw on the VM.

Switch all three attr builders (TEST_RUN, PROG_LOAD, MAP_CREATE) to
`mem::zeroed()` + individual field writes so the whole buffer
including padding is zero. Fixes the fixtures that run under sudo in
CI, and lets us re-promote the prog_type/helper probes back to
required in a follow-up commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
My `if ihl != 5` check bumped pass_complex_header for *every* valid
IPv4 packet because network-types' `Ipv4Hdr::ihl()` is documented as
"returns header length in bytes" — for IHL=5 it returns 20, not 5.
The bug short-circuited every downstream IPv4 fixture (matched,
fragment, TTL, dry-run) — they all saw pass_complex_header bumped
instead of the counter they expected.

Compare against 20 (bytes) instead. Every test that asserts on a
specific non-complex-header counter should start working after this.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the full VLAN choreography. Inbound: parse one 802.1Q tag if
present, record ingress VID, thread through to IP handlers with the
adjusted ip_offset. Outbound: consult vlan_resolve on fib.ifindex to
pick an effective egress ifindex + expected VID, then run §4.7's
four-case matrix:

  (ingress=None, egress=None)      no-op
  (ingress=None, egress=Some)      push (+4 headroom)
  (ingress=Some, egress=None)      pop  (-4 headroom)
  (ingress=X,    egress=X)         no-op (same VID pass-through)
  (ingress=X,    egress=Y, X!=Y)   rewrite (TCI in place)

Push/pop use `core::ptr::copy` (true memmove) rather than
`copy_nonoverlapping` — the 6-byte MAC shifts overlap with their
source regions, and as SPEC.md §4.7 warns, `copy_nonoverlapping`
produces wrong bytes on the wire for some VID/MAC byte combinations
while the verifier does *not* catch it at load time.

Bounds re-checked after `bpf_xdp_adjust_head` per the verifier's
invalidation rules. Adds `vlan_resolve` map (HashMap<ifindex,
VlanResolve{phys_ifindex, vid}>) and the `VlanTag` local type for
packed 4-byte 802.1Q parsing. Error paths bump err_vlan + XDP_ABORTED
as SPEC prescribes.

Userspace vlan_resolve population from sysfs lands in the next commit;
until then tests targeting non-VLAN flows continue to work (vlan_resolve
is empty → egress_vid=None for every fib target), and the existing
15 fixtures stay green because untagged in/untagged out is a no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The verifier rejected my Option<u16> handling with R5 !read_ok during
argument spill: Rust's Option<u16> layout passes as two registers
(discriminant + inner), and on the None branch the inner-value
register stays uninitialized. The unconditional stack spill before
the function call trips the verifier.

Switch to a plain u16 with VLAN_NONE=0 as the sentinel (802.1Q VID 0
is reserved for priority-only tags we don't fast-path anyway). One
register across function boundaries → verifier-clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tagged-packet fixtures + 1 jumbo-ish fixture. These exercise the
§4.4 step 2 ingress parse through to the allowlist match and dry-run
short-circuit — if the 802.1Q tag parse is broken, matched_v4/v6 never
bumps and the test fails.

What's covered:
- Tagged IPv4 src-match dry-run → matched_v4 + src_only + fwd_dry_run
- Tagged IPv6 dst-match dry-run → matched_v6 + dst_only
- Tagged IPv4 w/ options → pass_complex_header (after VLAN strip)
- Tagged IPv4 fragment → pass_fragment (after VLAN strip)
- Tagged non-IP inner → pass_not_ip
- Jumbo-ish IPv4 (3600-byte body, under test_run's ~4K cap)

What's NOT covered here (deferred to a netns integration test):
- §4.7 push / pop / rewrite byte-level output verification. Those
  paths sit behind a `bpf_fib_lookup` SUCCESS verdict, which needs
  real routes in the running netns — bpf_prog_test_run can't fake
  that. Byte-level push/pop tests come with the netns harness.
- True 9K jumbo frames (need test_run LIVE_FRAMES mode, kernel 5.18+).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After attach, read /proc/net/vlan/config and insert (subif_ifindex,
{phys_ifindex, vid}) for every VLAN subif configured on the host. The
BPF program consults this map during its SUCCESS dispatch to decide
push/pop/rewrite on egress (§4.7). Without population, every fib.ifindex
lookup misses → egress treated as untagged (correct default for
physical interfaces; incorrect for tagged subifs).

Missing /proc/net/vlan/config (no 8021q module loaded) is not an
error — we just skip. Empty config is also fine.

Adds a userspace `VlanResolve` struct mirroring the BPF-side layout
(u32/u16/u16 + Pod impl) so aya can byte-copy into the HashMap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lunarthegrey lunarthegrey marked this pull request as ready for review April 20, 2026 15:14
@lunarthegrey lunarthegrey changed the title [WIP] v0.1: packet fixtures, VLAN choreography, probe fix (PR #5) v0.1: VLAN choreography, packet fixtures, kernel 6.0+ probe fix (PR #5) Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant