-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AF_XDP: Provide synchronisation between processes using XSKs on the same interface #93
Comments
Just a note that we probably need to resolve this before libbpf 1.0. Seems downstream users have started relying on the bpf_link approach in libbpf, so things like this DPDK commit will break when moving to libxdp-based XSK: @magnus-karlsson any plans to work on this? |
I guess you are referring to the missing "bpf_link like" support in libxdp? If so, we have it in our backlog but should probably bump it up towards the top. I will dig through my old mails on why the libbpf approach could not be ported over as is to libxdp. Have forgotten why. |
Ciara, the owner of the AF_XDP PMD in DPDK, also has a task for DPDK 22.02 or 22.05 to move it over from libbpf to libxdp. Before that, this needs to be in place for sure. |
Great. Briefly, we can't use the libbpf approach because we're pinning all the bpf_links to build the multi-prog dispatcher, so even if we did pass the link fd to the application, closing that fd wouldn't actually detach anything... |
The way it works in libbpf from the user's point of view is the following: Under the hood, xsk_socket__create(xsk1) creates a bpf_link and the creation of xsk2 takes a reference to the same, thus it needs to be closed twice before the XDP program is detached. Maciej can correct me if wrong. In libxdp, the creation of xsk1 triggers an xdp_program__attach() and the creation of xsk2 triggers an xdp_program__clone(). If I understand the code correctly, and I might not, there is no reference taken in xsd_program__clone() to the bpf_link created by xdp_program__attach(). Would taking a reference to the bpf_link when cloning be a solution? From the xsk point of view, this would be elegant. Closing the clone or the original just decreases the refcount of the bpf_link. But introducing this in xdp_program__clone might have other side effects as it is used internally in libxdp.c. |
But we pin the link, so closing the last entry is not going to remove the XDP program. And we have to pin it to make it available for someone else who might want to attach a second BPF program before or after the XSK default prog. However, we already synchronise the creation and removal of programs using flock() on the pin directory (see xdp_lock_{acquire,release}()). So we can just make the XSK code do the socket map manipulation while holding that lock, and then simply remove the default program if it's removing the last socket from the map. At least I think this would be sufficient? |
Does the xdp_lock_* require bpffs support compiled in the kernel? |
Yup, the whole libxdp dispatcher logic depends on bpffs |
OK. So then we need two ways to deal with this. One when the dispatcher can and is loaded and one when it is not. It would be nice to come up with a single implementation, but maybe not possible. |
Right, I don't think we can use a single implementation, unfortunately. We already have a compatibility check in |
Fixed with the merging of #203 (including support when bpffs is not mounted). |
With the addition of the AF_XDP code we will be attaching XDP programs on behalf of the socket owner to redirect the packets. We should make sure this is synchronised so multiple processes can attach sockets to the same interface without stepping on each others' toes.
Since we already have a synchronisation point when building the dispatcher, this could likely be reused.
See also this upstream discussion about doing this in libbpf using bpf_link (which is an approach that will not work with multi-prog dispatchers today, sadly):
https://lore.kernel.org/bpf/3a625c8b-6dc0-3933-25e5-f747197ae1f4@intel.com/T/#t
The text was updated successfully, but these errors were encountered: