LFX Mentorship — Summer 2026: Reproducible benchmarking suite for urunc #759

ananos · 2026-06-10T15:36:01Z

ananos
Jun 10, 2026
Maintainer

Mentee: @jim-junior (Beingana Jim Junior)
Mentors: @ananos (Tassos), @cmainas (Babis), @amallikopoulou (Anastasia)
Term: June 8 – August 29, 2026 (12 weeks) · Midterm evaluation: end of Week 6 (~July 21) · Final evaluation: end of Week 12 (~August 29)
Sync: Weekly call, Mondays · Ad-hoc deep-dive sessions as needed · Async via this Discussion + CNCF Slack (#urunc)

This is a living document. We will finalize the plan at the Week 2 sync (Monday, June 15) and append status updates, decisions, and issues as comments below.

Background & motivation

urunc is a lightweight sandboxed container runtime; performance is a core project promise. We currently lack a robust, reproducible evaluation suite: running the same measurement on the same machine on different days does not yield consistent results (storage/devmapper snapshot provisioning, machine I/O load, and orchestrator layers such as Kubernetes scheduling all introduce variance).

The goal of this mentorship is a benchmarking suite that (a) measures urunc against other runtimes fairly (apples-to-apples), (b) is reproducible; publishable scripts that anyone can run and obtain the same results/ratios, and (c) eventually runs as a scheduled GitHub Action to catch performance regressions as the codebase evolves.

Runtimes under test: urunc (focus), runc (baseline), Kata Containers, gVisor. Scripts must be runtime-agnostic.
Notes: spawn time and density are already benchmarked; prioritize network, storage/I/O, and memory. cgroups support in urunc is expected to land in the coming weeks and will simplify resource accounting. Memory methodology must distinguish guest-assigned / VMM / host / physical memory (RSS alone is misleading e.g. Kata assigned 256 MB showing ~40 MB RSS on Raspberry Pi runs).

Goal

A reproducible, runtime-agnostic benchmark/regression suite for urunc vs runc/Kata/gVisor — kick off the scripts, get results, compare against previous runs — merged in the urunc-dev org and wired into CI as a scheduled action.

Metrics matrix

Area	Metrics	Tooling (candidate)
Lifecycle	OCI create/start/delete latency, app/HTTP readiness	ctr/crictl + timestamps
Memory	Guest-assigned vs VMM vs host overhead (not RSS-only)	cgroups, smem/procfs, per-runtime accounting
CPU	Sandbox overhead vs native	stress-ng / app-level workloads
Storage	Throughput + IOPS (reads/writes)	fio
Network	Throughput, latency, RTT, per-hypervisor (QEMU/KVM, Firecracker) virtio path	iperf3, netperf
Tail/stress	p99 behavior, constrained guest/host resources, spikes	constrained cgroup runs

Plan (by week)

Phase 0 — Onboarding & environment (Weeks 1–2, Jun 8–21)

Kickoff call (Jun 8); proposal walkthrough
Join CNCF Slack (#urunc)
Set up & run workloads on runc, Kata, gVisor (as already done with urunc); record setup steps/configs -> this feeds the apples-to-apples methodology
Experiment with urunc+linux guests.
Define test objectives & environment; Week 2 sync (Jun 15): finalize this plan (partly done, will be finished next week)

Phase 1 — Methodology & harness skeleton (Weeks 3–5, Jun 22–Jul 12)

Write the methodology doc (as a comment here): workloads, metrics, fairness rules (equal resources/configs per runtime), variance handling (warm/cold runs, repetitions, reporting ratios not just raw numbers)
Build the harness skeleton: one entry point, per-runtime drivers (runc/urunc/Kata/gVisor), structured results output (JSON/CSV), environment capture (kernel, versions, hardware)
Implement first two metric groups end-to-end: lifecycle latency and storage (fio)
Mentors: provide dedicated benchmarking machine; share previous evaluation material
Reproducibility check: repeated runs on the dedicated machine across different days; quantify variance

Phase 2 — Midterm milestone (Week 6, Jul 13–19)

🎯 Concrete milestone for the midterm evaluation (end of Week 6, ~Jul 21):

A runnable, reproducible benchmark harness covering at least lifecycle + storage (and network if ready) across all four runtimes; single-command kickoff, structured results, with a variance report demonstrating run-to-run consistency (stable ratios), published as a draft PR/repo in urunc-dev.

Acceptance criteria:

./run-bench.sh (or equivalent) executes the suite unattended across runc, urunc, Kata, gVisor
Results land in a structured format with environment metadata; a comparison report (urunc vs others) is generated
Variance across ≥3 runs on different days is quantified and within agreed bounds for ratios
Methodology doc reviewed by mentors; draft PR open

Phase 3 — Full metric coverage (Weeks 7–9, Jul 20–Aug 9)

Add network benchmarks (throughput/latency/RTT, per hypervisor)
Add memory methodology (guest/VMM/host split; leverage cgroups once merged)
Add CPU + tail/stress scenarios (p99, constrained resources)
Validate on a cloud-provider VM (e.g. an AWS instance) for public reproducibility of final numbers

Phase 4 — CI integration & wrap-up (Weeks 10–12, Aug 10–29)

Wire the suite as a scheduled GitHub Action with regression detection against previous runs
Final results pass + write-up (blog-post style summary of findings)
Merge in urunc-dev; final evaluation + mentee feedback/blog submission (end of Week 12)

Stretch goals (time permitting)

Paravirtualized sandboxing measurements (AntGroup/Alibaba custom KVM module — sandboxes without nested virtualization, relevant for cloud providers)
Expanded corner-case coverage beyond the reproducible baseline

References

Previous urunc evaluation blog post (startup latency): https://nubificus.co.uk/blog/runtime_benchmarking_rpi/

Status updates: please post a short weekly comment (done / in progress / blockers) before each Monday sync.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LFX Mentorship — Summer 2026: Reproducible benchmarking suite for urunc #759

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

LFX Mentorship — Summer 2026: Reproducible benchmarking suite for urunc #759

Uh oh!

Uh oh!

ananos Jun 10, 2026 Maintainer

Background & motivation

Goal

Metrics matrix

Plan (by week)

Phase 0 — Onboarding & environment (Weeks 1–2, Jun 8–21)

Phase 1 — Methodology & harness skeleton (Weeks 3–5, Jun 22–Jul 12)

Phase 2 — Midterm milestone (Week 6, Jul 13–19)

Phase 3 — Full metric coverage (Weeks 7–9, Jul 20–Aug 9)

Phase 4 — CI integration & wrap-up (Weeks 10–12, Aug 10–29)

Stretch goals (time permitting)

References

Replies: 0 comments

ananos
Jun 10, 2026
Maintainer