Skip to content

cgroup, systemd: use BPFProgram=device if supported #1795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kolyshkin
Copy link
Collaborator

@kolyshkin kolyshkin commented Jun 20, 2025

This is a proof of concept to take a look at and/or play with.

It appears to work for me (runc's dev.bats tests work) but was not tested much yet. CI may fail.

TODO:

  • support device updates;
  • add tests;
  • add documentation.

Related to #1765.

Summary by Sourcery

Use eBPF programs to apply device cgroup v2 policies and inject them into systemd via the BPFProgram property when supported, while preserving compatibility with legacy device permissions.

New Features:

  • Add support for generating and loading eBPF programs to configure device cgroup v2 rules
  • Integrate BPFProgram property in systemd cgroup management when a BPF filesystem is available

Enhancements:

  • Detect BPF filesystem availability with has_bpf_fs() and conditionally attach or pin eBPF programs
  • Implement fallback logic via can_skip_devices() to gracefully revert to traditional device permissions if eBPF setup fails
  • Propagate cgroup identifier through systemd functions to construct BPF pinning paths
  • Refine libcrun_ebpf_load() to attach programs only when a valid cgroup directory is specified

This is a proof of concept to check and/or play with.
It appears to work for me (runc's dev.bats tests work).
CI may fail.

TODO:
 - support device updates;
 - add tests;
 - add documentation.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Copy link

sourcery-ai bot commented Jun 20, 2025

Reviewer's Guide

This PR refactors device cgroup handling to build and load BPF programs when the BPF filesystem is available, and integrates BPFProgram support into the systemd path by pinning and appending a BPFProgram property in lieu of explicit device rules.

Sequence diagram for systemd cgroup device BPF program integration

sequenceDiagram
    participant systemd_integration as Systemd Integration
    participant cgroup_resources as cgroup-resources
    participant ebpf as ebpf
    participant bpf_fs as BPF Filesystem

    systemd_integration->>cgroup_resources: create_dev_bpf(devs, devs_len, err)
    cgroup_resources->>ebpf: bpf_program_new/init/append/complete
    cgroup_resources-->>systemd_integration: bpf_program
    systemd_integration->>ebpf: libcrun_ebpf_load(program, -1, path, err)
    ebpf->>bpf_fs: Pin BPF program to /sys/fs/bpf/crun/<id>
    ebpf-->>systemd_integration: Success/Failure
    systemd_integration->>systemd_integration: sd_bus_message_append("BPFProgram", "device", path)
    Note right of systemd_integration: If BPFProgram is set, skip explicit device rules
Loading

Class diagram for BPF program integration and device cgroup handling

classDiagram
    class bpf_program {
        +bpf_program_new(size_t size)
        +bpf_program_init_dev(program, err)
        +bpf_program_append_dev(program, access, type, major, minor, allow, err)
        +bpf_program_complete_dev(program, err)
    }
    class cgroup_resources {
        +create_dev_bpf(devs, devs_len, err)
        +write_devices_resources_v2_internal(dirfd, devs, devs_len, err)
        +write_devices_resources_v2(dirfd, devs, devs_len, err)
        +can_skip_devices(can_skip, devs, devs_len, err)
    }
    class ebpf {
        +libcrun_ebpf_load(program, dirfd, pin, err)
        +has_bpf_fs()
    }
    bpf_program <.. cgroup_resources : uses
    ebpf <.. cgroup_resources : uses
    ebpf <.. cgroup_resources : uses
    cgroup_resources <.. systemd_integration : used by
    class systemd_integration {
        +append_resources(m, is_update, state_dir, resources, cgroup_mode, id, err)
        +enter_systemd_cgroup_scope(resources, cgroup_mode, annotations, state_root, scope, slice, pid, id, can_retry, err)
        +libcrun_cgroup_enter_systemd(args, ...)
        +libcrun_update_resources_systemd(cgroup_status, ...)
    }
Loading

Class diagram for new and modified functions in ebpf and cgroup-resources

classDiagram
    class ebpf {
        +libcrun_ebpf_load(program, dirfd, pin, err)
        +has_bpf_fs()
    }
    class cgroup_resources {
        +create_dev_bpf(devs, devs_len, err)
        +can_skip_devices(can_skip, devs, devs_len, err)
    }
Loading

File-Level Changes

Change Details Files
Introduce BPF program creation and refactor device cgroup handlers
  • Add create_dev_bpf() to encapsulate building the BPF device filter program
  • Convert write_devices_resources_v2_internal() to return struct bpf_program*
  • Refactor write_devices_resources_v2() to call create_dev_bpf() and invoke libcrun_ebpf_load()
  • Expose create_dev_bpf() in cgroup-resources.h
src/libcrun/cgroup-resources.c
src/libcrun/cgroup-resources.h
Add BPF filesystem detection and conditional attach logic
  • Define SYS_FS_BPF and CRUN_BPF_DIR constants in ebpf.h
  • Implement has_bpf_fs() in ebpf.c using statfs and BPF_FS_MAGIC
  • Include sys/vfs.h and linux/magic.h headers
  • Modify libcrun_ebpf_load() to only attach when dirfd>=0
src/libcrun/ebpf.c
src/libcrun/ebpf.h
Extend systemd cgroup path to use BPFProgram property when supported
  • Add id parameter and need_append_devices flag to append_resources()
  • Check has_bpf_fs() and absent BPFProgram property, then create, load and pin a BPF program
  • Append a ("BPFProgram", "a(ss)") sd-bus message instead of device rules when BPF succeeds
  • Update enter_systemd_cgroup_scope() and libcrun_update_resources_systemd() to propagate id
  • Fallback to legacy append_devices() if BPF setup fails or not supported
src/libcrun/cgroup-systemd.c

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

Copy link

TMT tests failed. @containers/packit-build please check.


mkdir(CRUN_BPF_DIR, 0700); // Best effort.

ret = append_paths (&path, err, CRUN_BPF_DIR, id, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it really stinks that systemd requires us to install a program instead of just reading a file, but definitely an improvement compared to the current status.

What we have in place is enough to make sure it is deleted when the container is deleted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it really stinks that systemd requires us to install a program instead of just reading a file

Yep.

What we have in place is enough to make sure it is deleted when the container is deleted?

We don't have anything yet, but this is easy to add, just haven't done it yet. In general this is still very raw code.

I was also thinking about using a shared program for a common case of default rules, but maybe it's too much hassle without any sizable benefit.

@kolyshkin
Copy link
Collaborator Author

Another concern is using container ID for the program file name. Container ID is only unique in a given state (--root) directory, so collisions are quite possible. Can use something like hash of a state directory, or a state directory with slashes replaced with underscores (like /sys/fs/bpf/crun/run_crun_$CTID), or just a state directory as is (i.e. /sys/fs/bpf/run/crun/$CTID).

WDYT @giuseppe ?

@giuseppe
Copy link
Member

Another concern is using container ID for the program file name. Container ID is only unique in a given state (--root) directory, so collisions are quite possible. Can use something like hash of a state directory, or a state directory with slashes replaced with underscores (like /sys/fs/bpf/crun/run_crun_$CTID), or just a state directory as is (i.e. /sys/fs/bpf/run/crun/$CTID).

what do you think if we use some hashing of the runroot? Could be some basic stuff, like adding all the chars to a uint64_t, so we don't have to worry for paths that are too long?

@keszybz do you think there is any hope systemd would accept a path for BPFProgram=?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants