Skip to content

feat: aarch64 tracer port + Linux aarch64 wheel publishing#81

Open
christophergeyer wants to merge 9 commits intomainfrom
cg/aarch64-tracer
Open

feat: aarch64 tracer port + Linux aarch64 wheel publishing#81
christophergeyer wants to merge 9 commits intomainfrom
cg/aarch64-tracer

Conversation

@christophergeyer
Copy link
Copy Markdown
Member

Summary

  • Port the ptrace tracer to aarch64: a new target-gated arch module isolates syscall numbers, audit-arch, and register access so roar-tracer compiles and runs on both x86_64 and aarch64 (previously hardcoded orig_rax/rdi/rsi/etc. that don't exist on libc::user_regs_struct for arm64).
  • Make the eBPF attach loop arch-tolerant: probe /sys/kernel/{tracing,debug/tracing}/events/<cat>/<tp>/id before attaching so missing tracepoints (e.g. sys_enter_open/rename/link/dup2/fork/vfork on aarch64) are skipped with a warning instead of aborting the whole tracer.
  • Make the Linux wheel build arch-aware: scripts/build_wheel_with_bins.sh and scripts/sync_packaged_rust_artifacts.py detect host arch (uname/platform.machine()) and pick the matching cargo-zigbuild target. The reusable Build Rust Binaries action grows a target input.
  • Publish Linux aarch64 wheels: publish-pypi.yml and publish-testpypi.yml matrix build-rust-binaries-linux over (ubuntu-latest, x86_64) + (ubuntu-24.04-arm, aarch64) and add 4 aarch64 wheel matrix entries (cp310–cp313). Expected wheel count 12 → 16.
  • Stop the silent wrong-arch-binary trap on sdist: the maturin include no longer ships roar/bin/* in the sdist (which was bundling x86_64 ELFs into a source dist that aarch64 hosts then fell through to). Instead the sdist now ships the tracer Rust source so an sdist install with cc + cargo can build the binaries; README documents the requirement.
  • Harden the wheel verifier: scripts/ci/verify_wheel_contents.py now reads e_machine from each bundled ELF and requires it to match the wheel's platform tag, plus asserts the executable bit on bundled binaries. A wrong-arch or non-exec payload now fails CI.
  • Add a cross-backend Python sandbox at scripts/test_tracers_sandbox.py for local validation: runs a single-process Python read+write under each backend (avoids the known shell-spawned-write race) and asserts the msgpack report saw both files.

Test plan

Verified end-to-end on both arches.

aarch64 (AWS Graviton, Ubuntu 26.04, kernel 7.0.0-1004-aws)

  • cargo build --release: roar-tracer / roar-tracer-preload / roar-tracer-ebpf / roard / libroar_tracer_preload.so — all native aarch64 ELFs
  • cargo test --workspace (debug): all green (95 tests across 23 binaries)
  • scripts/test_tracers_sandbox.py: ptrace + preload (unprivileged), ebpf (sudo) — all OK
  • pytest unit + happy_path: 1382 passed, 3 skipped
  • pytest integration: 95 passed, 3 skipped
  • pytest ebpf (sudo): 12 passed
  • End-to-end `roar run` × 3 backends: each backend records `in:1 out:1` with a real artifact hash
  • scripts/build_wheel_with_bins.sh: produces `roar_cli-0.2.11-cp313-cp313-linux_aarch64.whl` (9.2 MB), all bin/* are aarch64 ELFs, verify_wheel_contents passes
  • Wheel install + verify_installed_distribution.py: clean
  • sdist build: no `roar/bin/` payload, includes `rust/tracers/`, `rust/services/`, `rust/crates/tracer-/**`

x86_64 (parallel verification)

Same 10 steps, all clean — wheel emits `linux_x86_64` with x86_64 ELFs in bin/, identical pytest counts.

Lint / format

  • ruff check . — clean
  • ruff format --check . — clean (718 files)
  • cargo clippy: only pre-existing warnings (none introduced by this branch)

🤖 Generated with Claude Code

chrisgeyertreqs and others added 8 commits May 9, 2026 13:06
Move syscall numbers, audit-arch, and register access into a new
target-gated `arch` module so the ptrace tracer compiles and runs on
both x86_64 and aarch64. The previous code hardcoded x86_64 register
fields (orig_rax, rax, rdi, rsi, rdx, r10, r8) and AUDIT_ARCH_X86_64;
on aarch64 those don't exist on libc::user_regs_struct and the audit
arch must be AUDIT_ARCH_AARCH64. Syscalls that don't exist on aarch64
(open/rename/link) are mapped to distinct out-of-range sentinels so
shared match arms keyed on them stay compilable but never fire.

Preflight now accepts both x86_64 and aarch64 hosts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Several syscall tracepoints don't exist on aarch64 because the syscalls
themselves don't exist there: sys_enter_open / sys_enter_rename /
sys_enter_link / sys_enter_dup2 / sys_enter_fork / sys_enter_vfork.
The previous attach loop hard-failed on the first missing tracepoint
with "No such file or directory", so the whole tracer aborted on
aarch64.

Probe `/sys/kernel/{tracing,debug/tracing}/events/<cat>/<tp>/id` before
attaching; if the tracepoint isn't there, log a warning and continue.
This also makes the tracer robust to kernels built without
CONFIG_FTRACE_SYSCALLS for a particular syscall.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/test_tracers_sandbox.py runs a single-process Python workload
(read input.txt -> write output.txt) under each tracer backend and
checks the resulting msgpack report contains both files. Useful for
local validation after a build, especially when porting the tracers
to a new arch.

Uses a Python workload rather than the more obvious `bash -c "echo > x"`
because there's a known race where short-lived shell-spawned writes can
complete before the tracer observes them; an in-process read+write
avoids that flake.

Backends that aren't available (binary not built, eBPF without CAP_BPF,
etc.) are reported as SKIP rather than FAIL so the script is meaningful
to run unprivileged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/build_wheel_with_bins.sh and scripts/sync_packaged_rust_artifacts.py
both hardcoded x86_64-unknown-linux-gnu.2.17 as the cargo-zigbuild target.
On aarch64 hosts that produced wrong-arch (x86_64) ELFs in roar/bin and a
silently-broken wheel.

Detect host arch (uname -m / platform.machine()) and pick the matching
target triple (x86_64 or aarch64), keeping the manylinux2014 glibc 2.17
floor. The reusable Build Rust Binaries GitHub action grows a `target`
input so callers can pass either x86_64 or aarch64 explicitly.

Also runs ruff format on the touched scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The maturin include directive shipped pre-built tracer binaries inside
the sdist, so sdist installs on a non-x86_64 host got x86_64 ELFs that
failed at exec time with "Exec format error" — silent on the install
side, broken on first roar run. Drop the sdist entry; the wheel still
includes roar/bin/* via the wheel-only entry. sdist users build the
binaries from the rust/ source tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build-rust-binaries-linux is now a matrix over (ubuntu-latest, x86_64) +
(ubuntu-24.04-arm, aarch64). The wheel matrix gains 4 aarch64 entries
(cp310..cp313 on ubuntu-24.04-arm). The publish job's expected wheel
count rises from 12 to 16 and now asserts both Linux x86_64 and
aarch64 wheels are present.

build-sdist no longer copies binaries into roar/bin since the sdist
include excludes them — see the prior commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…table

The aarch64 install regression was a sdist that shipped x86_64 ELFs
with no exec bit. The wheel-side equivalent would be just as silently
broken: a Linux wheel with the wrong arch in roar/bin would install
fine but fail at exec. Add two assertions to verify_wheel_contents.py:

  - read e_machine from each bundled ELF and require it to match the
    wheel's platform tag (x86_64 wheels → 0x3E, aarch64 wheels → 0xB7)
  - require the executable bit on bundled binaries (libroar*.so are
    exempted)

Both assertions fire as part of the existing "Verify wheel contents"
step in CI, so a wrong-arch or non-exec payload fails the publish
pipeline instead of shipping.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit dropped pre-built binaries from the sdist, but cargo's
auto-included package list (anchored at artifact-hash-py) doesn't reach
the tracer/proxy crates, so sdist users had no way to build the binaries
either. Add explicit maturin sdist includes for rust/tracers/**,
rust/services/**, rust/crates/tracer-*/**, and the artifact sync script.

Also documents the sdist install path in the README: when a matching
wheel isn't available, the install requires a C toolchain and Rust to
compile tracers — replacing what was previously a silent broken install.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CI workflow's `pull_request: branches: [main]` filter meant only
PRs whose base is `main` triggered CI. Stacked PRs (cg/foo → cg/bar)
landed blind: PRs #83#87 in the current ptrace/eBPF/preload stack
all reported "no checks reported on the branch" because each one
targets a feature-branch base.

Drop the branch filter on `pull_request` so CI fires for every PR
regardless of base. The `push: branches: [main]` constraint stays —
we don't want to burn CI minutes on every feature-branch push, only
on PR diffs and main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants