Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDMA sniffing support for pcap #585

Merged
merged 1 commit into from
Aug 31, 2017
Merged

Conversation

rolandd
Copy link
Contributor

@rolandd rolandd commented May 18, 2017

Implement capture support for offloaded RDMA traffic. This uses the RDMA
verbs "flow steering" interface, which is available in the Linux kernel
since version 3.12. The userspace interface is ibv_create_flow() - so
building this support in pcap adds a new dependency on libibverbs.

I added a new "rdmasniff" pcap module, which exposes RDMA devices under an
interface name equal to their libibverbs name. The module uses the RDMA
verbs interface to create a receive queue with a flow steering rule that
gets a copy of all packets, even offloaded packets generated by or consumed
by the hardware.

I'm definitely not an expert on pcap, so please point out all the places
where I misused APIs or something could be improved.

@rolandd rolandd force-pushed the master branch 3 times, most recently from 45f98f4 to 7473a0d Compare May 19, 2017 00:35
@rolandd
Copy link
Contributor Author

rolandd commented May 19, 2017

The Travis CI automated builds show that my changes don't regress - but because the Linux build environments are old (Ubuntu 12.04 includes libibverbs 1.1.5), the configure script does not enable my new module, and so the C code isn't compiled.

configure.ac Outdated

if test "xxx_only" = yes; then
# User requested something-else-only pcap, so they don't
# want D-Bus support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably you mean "RDMA capture support" rather than "D-Bus support" here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you can guess where I copy and pasted that code snipped from :)

Fixed in the updated commit.

pcap-rdmasniff.c Outdated
if (strlen(dev_list[i]->name) == namelen &&
!strncmp(device, dev_list[i]->name, namelen)) {
p = pcap_create_common(ebuf, sizeof (struct pcap_rdmasniff));
if (p) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If p is NULL, you should just return NULL, with *is_ours set to 1 - pcap_create_common() failing isn't "this isn't an RDMA capture device", it's "this is an RDMA capture device, but something went wrong trying to create the pcap_t for it".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks for clarifying the pcap interface for me.

}

for (i = 0; i < numdev; ++i) {
if (!add_dev(devlistp, dev_list[i]->name, 0, "RDMA sniffer", err_str)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all the RDMA sniffer devices the same, or do different devices have different purposes? If they have different purposes, the description string should probably indicate the purpose of the device in question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference in devices is analogous to the differences between network interfaces ("ethX" and "ethY"). For example, on my system I see:

# tcpdump --list-interfaces | grep RDMA
11.mlx5_3 (RDMA sniffer)
12.mlx5_2 (RDMA sniffer)
13.mlx5_1 (RDMA sniffer)
14.mlx5_0 (RDMA sniffer)

because I have two 2-port adapters installed - each device corresponds to capturing packets on one port of an adapter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the names should probably be more than just "RDMA sniffer", they should include some indication of the port on which they'll capture traffic.

@guyharris
Copy link
Member

Other than the questions/comments I added, it looks OK.

@guyharris
Copy link
Member

A couple of questions:

  1. Does opening an RDMA sniffing device require any special privileges?

  2. Is there a file descriptor for an open RDMA sniffing device on which a select()/poll()/epoll() can be done to wait for packets to arrive?

@rolandd rolandd force-pushed the master branch 2 times, most recently from dc4ac72 to b08dff8 Compare May 19, 2017 13:55
@rolandd
Copy link
Contributor Author

rolandd commented May 19, 2017

With regards to the questions:

  1. Yes, opening the device requires creating a "raw packet QP," and the kernel checks capable(CAP_NET_RAW) before allowing that.

  2. Yes, there is a file descriptor - I tried to expose it to pcap with the code handle->selectable_fd = priv->channel->fd; in my activate function. But I'm not sure if there's more I'm supposed to be doing.

@mcr
Copy link
Member

mcr commented May 19, 2017 via email

@rolandd rolandd changed the title [RFC] RDMA sniffing support for pcap RDMA sniffing support for pcap May 29, 2017
@ogerlitz
Copy link

I looked on the code, seems crazily simple and cool

+static const int RDMASNIFF_RECEIVE_SIZE = 10000;

I haven't use jumbo frames with mlx5, but I guess the max is 9k, so you're fine

We (Mellanox) made some runs with your patch, it turns out that it even works for ConnectX3!!
as both mlx5 and mlx4 drivers support IB_FLOW_ATTR_SNIFFER.

@rolandd
Copy link
Contributor Author

rolandd commented Jul 12, 2017

As far as I know, this code is ready to go. @guyharris is there anything further you need? How does this pull request actually get landed?

Thanks!

@guyharris
Copy link
Member

@guyharris is there anything further you need?

As per my comment, should the descriptions for the "RDMA sniffing" devices include an indication of the port on which the device in question will sniff? The description, if available, is supposed to say more than what the device does, e.g. USB sniffing devices indicate on which bus they'll sniff.

Implement capture support for offloaded RDMA traffic.  This uses the RDMA
verbs "flow steering" interface, which is available in the Linux kernel
since version 3.12.  The userspace interface is ibv_create_flow() - so
building this support in pcap adds a new dependency on libibverbs.

I added a new "rdmasniff" pcap module, which exposes RDMA devices under an
interface name equal to their libibverbs name.  The module uses the RDMA
verbs interface to create a receive queue with a flow steering rule that
gets a copy of all packets, even offloaded packets generated by or consumed
by the hardware.

The autoconf test for a usable version of libibverbs is a bit complicated
because ibv_create_flow() is defined as an inline function in the header
file, so we need to find the library and header and then try to link a
program to check if the API is usable (it appeared in libibverbs 1.1.8).
Copy link

@paravmellanox paravmellanox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any updates on merging this request?

@guyharris
Copy link
Member

Any updates on merging this request?

Any updates on my question about device names?

@paravmellanox
Copy link

Hi Guy,
Is this the question?
'As per my comment, should the descriptions for the "RDMA sniffing" devices include an indication of the port on which the device in question will sniff?'

If device has multiple ports, sniffing capability is generally on both all the ports. From the code it appears that to sniff on particular port, the format is something like below.
mlx5_0:0 (<- port 0)
mlx5_0:1 (<- port 1)

Default it sniff on port 0.

@guyharris
Copy link
Member

OK, so:

  1. Is there ever more than one RDMA sniffing device?

  2. If so, what is the difference between the devices?

  3. If there isn't any difference between them, why is there more than one?

@paravmellanox
Copy link

  1. Yes. There can be.
  2. Mostly no difference, at least I don't know of any at this point.
  3. RDMA devices are essentially another networking devices. So a given server can have multiple networking devices say ethX, ethY, etc, there can be multiple such RDMA devices.
    RDMA sniffing device is not a dedicated sniffing device. Its just that a given RDMA device might support sniffing. Primary role of the device is not sniffing; primary role is networking, and it might support sniffing those networking packets. Hope this clarifies.

@guyharris
Copy link
Member

So, for ethX and ethY devices, different devices are normally on different networks; if you want to see the traffic on network X, you sniff on ethX, and if you want to see the traffic on network Y, you sniff on ethY.

Do the RDMA devices differ in a similar fashion, so somebody would know that they'd want to sniff on mlx5_0 rather than mlx5_1? If so, is there any "network name" available that could be used as a hint in the description?

@paravmellanox
Copy link

Your description is correct. User would know whether he/she wants to sniff which device. Similar to network device which are identified based on device name, RDMA devices are also identified based on "device name" such as mlx5_0, mlx4_0, rxe0 etc.

At this point sniffing traffic on particular network_port or connection are not supported. Its limited to sniff only at device level. But possibly it can extend in future for such extra sniffing parameters depending on users need.

@guyharris
Copy link
Member

So it looks as if there's no human-readable description that could be given for a particular device identifying not only that it's an RDMA device but also that it's a device corresponding to whatever a particular RDMA device would correspond.

It also looks as if you wouldn't necessarily want to open, for example, mlx5_0 - you might want to open mlx5_0:17, and that there's no way to enumerate what would come after the color. This would cause problems for Wireshark, as its GUI currently (and incorrectly!) assumes that the only names you can use to open a device are the names you get from pcap_findalldevs(), but CLI programs such as tcpdump and TShark wouldn't have this problem. (It needs to be fixed in Wireshark, not in libpcap.)

@paravmellanox
Copy link

I think Wireshark GUI should let user choose device and its port, so pcap_findalldevs() should create a object that represents mlx5_0:0, mlx5_0:1 and sniffing should happen on it. I haven't reviewed the code frankly. But pcap_findalldevs() should provide information of device and port both, so that Wireshark (and human user) can pick appropriate port of it. Not sure if this can be done incrementally without breaking compatibility between pcap and Wireshark.

@guyharris
Copy link
Member

I think Wireshark GUI should let user choose device and its port

I think the Wireshark GUI should let you do more than just select devices from the list it gets from pcap_findalldevs() - the netmap devices cannot enumerate all the possible device names, so you can't do something such as

pcap_findalldevs() should create a object that represents mlx5_0:0, mlx5_0:1 and sniffing should happen on it

for netmap devices, so we have to change the way the Wireshark GUI works to allow arbitrary strings to be specified as interfaces, or to allow a particular pcap module to indicate module-specific parameters that can be specified by the program.

I have a project to support those module-specific parameters; they would be specified with a command-line flag for command-line programs such as tcpdump or TShark, and GUI programs would have to ask libpcap what parameters a particular device supports and offer them in the GUI.

In the case of RDMA sniffing devices, the port number would be one such parameter.

But that doesn't yet exist, so you would either have to

  1. for each device, enumerate all the ports available, and provide a separate capture device for each available port for `pcap_findalldevs();

or

  1. just punt on supporting capturing on particular ports in Wireshark for now.

@paravmellanox
Copy link

I would go with (2) for now given the need of this feature.

In future rdmasniff_findalldevs() can be enhanced to add per port sniffing device.
and description would change from "RDMA sniffer" to "RDMA sniffer port X".
This should be fine, I think.

@guyharris guyharris merged commit 4f96b10 into the-tcpdump-group:master Aug 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants