RDMA sniffing support for pcap #585

rolandd · 2017-05-18T18:00:36Z

Implement capture support for offloaded RDMA traffic. This uses the RDMA
verbs "flow steering" interface, which is available in the Linux kernel
since version 3.12. The userspace interface is ibv_create_flow() - so
building this support in pcap adds a new dependency on libibverbs.

I added a new "rdmasniff" pcap module, which exposes RDMA devices under an
interface name equal to their libibverbs name. The module uses the RDMA
verbs interface to create a receive queue with a flow steering rule that
gets a copy of all packets, even offloaded packets generated by or consumed
by the hardware.

I'm definitely not an expert on pcap, so please point out all the places
where I misused APIs or something could be improved.

rolandd · 2017-05-19T00:38:02Z

The Travis CI automated builds show that my changes don't regress - but because the Linux build environments are old (Ubuntu 12.04 includes libibverbs 1.1.5), the configure script does not enable my new module, and so the C code isn't compiled.

guyharris · 2017-05-19T04:36:17Z

configure.ac

+
+if test "xxx_only" = yes; then
+	# User requested something-else-only pcap, so they don't
+	# want D-Bus support.


Presumably you mean "RDMA capture support" rather than "D-Bus support" here.

yes, you can guess where I copy and pasted that code snipped from :)

Fixed in the updated commit.

guyharris · 2017-05-19T04:45:17Z

pcap-rdmasniff.c

+		if (strlen(dev_list[i]->name) == namelen &&
+		    !strncmp(device, dev_list[i]->name, namelen)) {
+			p = pcap_create_common(ebuf, sizeof (struct pcap_rdmasniff));
+			if (p) {


If p is NULL, you should just return NULL, with *is_ours set to 1 - pcap_create_common() failing isn't "this isn't an RDMA capture device", it's "this is an RDMA capture device, but something went wrong trying to create the pcap_t for it".

Fixed, thanks for clarifying the pcap interface for me.

guyharris · 2017-05-19T04:46:49Z

pcap-rdmasniff.c

+	}
+
+	for (i = 0; i < numdev; ++i) {
+		if (!add_dev(devlistp, dev_list[i]->name, 0, "RDMA sniffer", err_str)) {


Are all the RDMA sniffer devices the same, or do different devices have different purposes? If they have different purposes, the description string should probably indicate the purpose of the device in question.

The difference in devices is analogous to the differences between network interfaces ("ethX" and "ethY"). For example, on my system I see:

# tcpdump --list-interfaces | grep RDMA 11.mlx5_3 (RDMA sniffer) 12.mlx5_2 (RDMA sniffer) 13.mlx5_1 (RDMA sniffer) 14.mlx5_0 (RDMA sniffer)

because I have two 2-port adapters installed - each device corresponds to capturing packets on one port of an adapter.

So the names should probably be more than just "RDMA sniffer", they should include some indication of the port on which they'll capture traffic.

guyharris · 2017-05-19T04:47:27Z

Other than the questions/comments I added, it looks OK.

guyharris · 2017-05-19T09:00:24Z

A couple of questions:

Does opening an RDMA sniffing device require any special privileges?
Is there a file descriptor for an open RDMA sniffing device on which a select()/poll()/epoll() can be done to wait for packets to arrive?

rolandd · 2017-05-19T16:15:11Z

With regards to the questions:

Yes, opening the device requires creating a "raw packet QP," and the kernel checks capable(CAP_NET_RAW) before allowing that.
Yes, there is a file descriptor - I tried to expose it to pcap with the code handle->selectable_fd = priv->channel->fd; in my activate function. But I'm not sure if there's more I'm supposed to be doing.

mcr · 2017-05-19T20:17:26Z

Roland Dreier <notifications@github.com> wrote: The Travis CI automated builds show that my changes don't regress - but because the Linux build environments are old (Ubuntu 12.04 includes libibverbs 1.1.5), the configure script does not enable my new module, and so the C code isn't compiled. Yes, it's a bit of a pain in the ass. Supposedly, travis is working on updates to 14.x or 16.x, but last time I looked it wasn't done. They also used to make EC2/etc. VMs configured identically to theirs in which one could debug on, but that isn't the case anymore. If your dependancies are available as dpkg, you could attempt to add them to the packages; if not then you could configure them with a build script. See what I do here, where I want the latest libpcap in unstrung: https://github.com/mcr/unstrung/blob/master/.travis.yml https://github.com/mcr/unstrung/blob/master/build-setup-travis.sh I build things, and then ask Travis to cache them.

…

-- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works | network architect [ ] mcr@sandelman.ca http://www.sandelman.ca/ | ruby on rails [

ogerlitz · 2017-06-21T13:06:57Z

I looked on the code, seems crazily simple and cool

+static const int RDMASNIFF_RECEIVE_SIZE = 10000;

I haven't use jumbo frames with mlx5, but I guess the max is 9k, so you're fine

We (Mellanox) made some runs with your patch, it turns out that it even works for ConnectX3!!
as both mlx5 and mlx4 drivers support IB_FLOW_ATTR_SNIFFER.

rolandd · 2017-07-12T21:11:15Z

As far as I know, this code is ready to go. @guyharris is there anything further you need? How does this pull request actually get landed?

Thanks!

guyharris · 2017-07-22T18:55:25Z

@guyharris is there anything further you need?

As per my comment, should the descriptions for the "RDMA sniffing" devices include an indication of the port on which the device in question will sniff? The description, if available, is supposed to say more than what the device does, e.g. USB sniffing devices indicate on which bus they'll sniff.

Implement capture support for offloaded RDMA traffic. This uses the RDMA verbs "flow steering" interface, which is available in the Linux kernel since version 3.12. The userspace interface is ibv_create_flow() - so building this support in pcap adds a new dependency on libibverbs. I added a new "rdmasniff" pcap module, which exposes RDMA devices under an interface name equal to their libibverbs name. The module uses the RDMA verbs interface to create a receive queue with a flow steering rule that gets a copy of all packets, even offloaded packets generated by or consumed by the hardware. The autoconf test for a usable version of libibverbs is a bit complicated because ibv_create_flow() is defined as an inline function in the header file, so we need to find the library and header and then try to link a program to check if the API is usable (it appeared in libibverbs 1.1.8).

paravmellanox

Any updates on merging this request?

guyharris · 2017-08-30T22:58:28Z

Any updates on merging this request?

Any updates on my question about device names?

paravmellanox · 2017-08-30T23:22:48Z

Hi Guy,
Is this the question?
'As per my comment, should the descriptions for the "RDMA sniffing" devices include an indication of the port on which the device in question will sniff?'

If device has multiple ports, sniffing capability is generally on both all the ports. From the code it appears that to sniff on particular port, the format is something like below.
mlx5_0:0 (<- port 0)
mlx5_0:1 (<- port 1)

Default it sniff on port 0.

guyharris · 2017-08-31T01:18:31Z

OK, so:

Is there ever more than one RDMA sniffing device?
If so, what is the difference between the devices?
If there isn't any difference between them, why is there more than one?

paravmellanox · 2017-08-31T01:31:05Z

Yes. There can be.
Mostly no difference, at least I don't know of any at this point.
RDMA devices are essentially another networking devices. So a given server can have multiple networking devices say ethX, ethY, etc, there can be multiple such RDMA devices.
RDMA sniffing device is not a dedicated sniffing device. Its just that a given RDMA device might support sniffing. Primary role of the device is not sniffing; primary role is networking, and it might support sniffing those networking packets. Hope this clarifies.

guyharris · 2017-08-31T01:36:39Z

So, for ethX and ethY devices, different devices are normally on different networks; if you want to see the traffic on network X, you sniff on ethX, and if you want to see the traffic on network Y, you sniff on ethY.

Do the RDMA devices differ in a similar fashion, so somebody would know that they'd want to sniff on mlx5_0 rather than mlx5_1? If so, is there any "network name" available that could be used as a hint in the description?

paravmellanox · 2017-08-31T01:52:00Z

Your description is correct. User would know whether he/she wants to sniff which device. Similar to network device which are identified based on device name, RDMA devices are also identified based on "device name" such as mlx5_0, mlx4_0, rxe0 etc.

At this point sniffing traffic on particular network_port or connection are not supported. Its limited to sniff only at device level. But possibly it can extend in future for such extra sniffing parameters depending on users need.

guyharris · 2017-08-31T02:24:22Z

So it looks as if there's no human-readable description that could be given for a particular device identifying not only that it's an RDMA device but also that it's a device corresponding to whatever a particular RDMA device would correspond.

It also looks as if you wouldn't necessarily want to open, for example, mlx5_0 - you might want to open mlx5_0:17, and that there's no way to enumerate what would come after the color. This would cause problems for Wireshark, as its GUI currently (and incorrectly!) assumes that the only names you can use to open a device are the names you get from pcap_findalldevs(), but CLI programs such as tcpdump and TShark wouldn't have this problem. (It needs to be fixed in Wireshark, not in libpcap.)

paravmellanox · 2017-08-31T03:25:06Z

I think Wireshark GUI should let user choose device and its port, so pcap_findalldevs() should create a object that represents mlx5_0:0, mlx5_0:1 and sniffing should happen on it. I haven't reviewed the code frankly. But pcap_findalldevs() should provide information of device and port both, so that Wireshark (and human user) can pick appropriate port of it. Not sure if this can be done incrementally without breaking compatibility between pcap and Wireshark.

guyharris · 2017-08-31T03:49:41Z

I think Wireshark GUI should let user choose device and its port

I think the Wireshark GUI should let you do more than just select devices from the list it gets from pcap_findalldevs() - the netmap devices cannot enumerate all the possible device names, so you can't do something such as

pcap_findalldevs() should create a object that represents mlx5_0:0, mlx5_0:1 and sniffing should happen on it

for netmap devices, so we have to change the way the Wireshark GUI works to allow arbitrary strings to be specified as interfaces, or to allow a particular pcap module to indicate module-specific parameters that can be specified by the program.

I have a project to support those module-specific parameters; they would be specified with a command-line flag for command-line programs such as tcpdump or TShark, and GUI programs would have to ask libpcap what parameters a particular device supports and offer them in the GUI.

In the case of RDMA sniffing devices, the port number would be one such parameter.

But that doesn't yet exist, so you would either have to

for each device, enumerate all the ports available, and provide a separate capture device for each available port for `pcap_findalldevs();

or

just punt on supporting capturing on particular ports in Wireshark for now.

paravmellanox · 2017-08-31T15:04:59Z

I would go with (2) for now given the need of this feature.

In future rdmasniff_findalldevs() can be enhanced to add per port sniffing device.
and description would change from "RDMA sniffer" to "RDMA sniffer port X".
This should be fine, I think.

rolandd force-pushed the master branch 3 times, most recently from 45f98f4 to 7473a0d Compare May 19, 2017 00:35

guyharris reviewed May 19, 2017

View reviewed changes

rolandd force-pushed the master branch 2 times, most recently from dc4ac72 to b08dff8 Compare May 19, 2017 13:55

rolandd force-pushed the master branch from b08dff8 to 7120837 Compare May 29, 2017 15:38

rolandd changed the title ~~[RFC] RDMA sniffing support for pcap~~ RDMA sniffing support for pcap May 29, 2017

rolandd force-pushed the master branch from 7120837 to 81e9d19 Compare July 11, 2017 22:26

rolandd force-pushed the master branch from 725052d to f5f1484 Compare August 25, 2017 17:48

paravmellanox reviewed Aug 30, 2017

View reviewed changes

guyharris merged commit 4f96b10 into the-tcpdump-group:master Aug 31, 2017

paravmellanox mentioned this pull request Feb 13, 2018

When will be new release of libpcap #675

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDMA sniffing support for pcap #585

RDMA sniffing support for pcap #585

rolandd commented May 18, 2017

rolandd commented May 19, 2017

guyharris May 19, 2017

rolandd May 19, 2017

guyharris May 19, 2017

rolandd May 19, 2017

guyharris May 19, 2017

rolandd May 19, 2017

guyharris Jul 22, 2017

guyharris commented May 19, 2017

guyharris commented May 19, 2017

rolandd commented May 19, 2017

mcr commented May 19, 2017 via email

ogerlitz commented Jun 21, 2017

rolandd commented Jul 12, 2017

guyharris commented Jul 22, 2017

paravmellanox left a comment •

edited

Loading

guyharris commented Aug 30, 2017

paravmellanox commented Aug 30, 2017

guyharris commented Aug 31, 2017

paravmellanox commented Aug 31, 2017

guyharris commented Aug 31, 2017

paravmellanox commented Aug 31, 2017

guyharris commented Aug 31, 2017

paravmellanox commented Aug 31, 2017

guyharris commented Aug 31, 2017

paravmellanox commented Aug 31, 2017

RDMA sniffing support for pcap #585

RDMA sniffing support for pcap #585

Conversation

rolandd commented May 18, 2017

rolandd commented May 19, 2017

guyharris May 19, 2017

Choose a reason for hiding this comment

rolandd May 19, 2017

Choose a reason for hiding this comment

guyharris May 19, 2017

Choose a reason for hiding this comment

rolandd May 19, 2017

Choose a reason for hiding this comment

guyharris May 19, 2017

Choose a reason for hiding this comment

rolandd May 19, 2017

Choose a reason for hiding this comment

guyharris Jul 22, 2017

Choose a reason for hiding this comment

guyharris commented May 19, 2017

guyharris commented May 19, 2017

rolandd commented May 19, 2017

mcr commented May 19, 2017 via email

ogerlitz commented Jun 21, 2017

rolandd commented Jul 12, 2017

guyharris commented Jul 22, 2017

paravmellanox left a comment • edited Loading

Choose a reason for hiding this comment

guyharris commented Aug 30, 2017

paravmellanox commented Aug 30, 2017

guyharris commented Aug 31, 2017

paravmellanox commented Aug 31, 2017

guyharris commented Aug 31, 2017

paravmellanox commented Aug 31, 2017

guyharris commented Aug 31, 2017

paravmellanox commented Aug 31, 2017

guyharris commented Aug 31, 2017

paravmellanox commented Aug 31, 2017

paravmellanox left a comment •

edited

Loading