-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pcap_set_fanout function for PACKET_FANOUT support on linux platform #674
Conversation
.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF | ||
.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. | ||
.\" | ||
.TH PCAP_SET_FANTOUT 3PCAP "10 February 2018" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"FANOUT", not "FANTOUT".
@@ -6945,6 +6950,26 @@ pcap_set_protocol(pcap_t *p, int protocol) | |||
return (0); | |||
} | |||
|
|||
int | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for an extra blank line here.
@@ -206,6 +206,11 @@ | |||
# endif /* PCAP_SUPPORT_PACKET_RING */ | |||
#endif /* PF_PACKET */ | |||
|
|||
/* check if kernel supports fanout for socket */ | |||
# ifdef PACKET_FANOUT | |||
# define HAVE_FANOUT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is a Linux-only file, wouldn't it be sufficient to just test PACKET_FANOUT
in the #ifdef
s, rather than defining a new symbol?
@@ -343,6 +343,7 @@ PCAP_API const char *pcap_tstamp_type_val_to_description(int); | |||
|
|||
#ifdef __linux__ | |||
PCAP_API int pcap_set_protocol(pcap_t *, int); | |||
PCAP_API int pcap_set_fanout(pcap_t *, int, int); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's one extra space between int
and the function name, so that the names don't line up.
.B pcap_set_protocol() | ||
is used for forming sockets in fanout groups. Each received | ||
packet will be scheduled to only one socket from this group. | ||
More information about scheduling policies could be found in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...can be found in packet(7)", rather than "...could be found in the packet(7)".
int pcap_set_fanout(pcap_t *p, int flags, int group_id); | ||
.ft | ||
.fi | ||
.SH DESCRIPTION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section should use the same language that the pcap_set_protocol(3pcap) page does to indicate that 1) this is Linux-specific, and the function isn't even available on other platforms and 2) it only affects network interfaces, not other devices.
@@ -6945,6 +6950,26 @@ pcap_set_protocol(pcap_t *p, int protocol) | |||
return (0); | |||
} | |||
|
|||
int | |||
|
|||
pcap_set_fanout(pcap_t *handle, int flags, int group_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flags
is actually a type and flags, as least as I read packet_setsockopt()
and fanout_add()
in af_packet.c
. Either it should be called type_flags
, or there should be three arguments - type
, flags
, and group_id
- which are combined in the argument to setsockopt()
.
.ft | ||
.fi | ||
.SH DESCRIPTION | ||
.B pcap_set_protocol() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should indicate what the second and third arguments - or the second, third, and fourth arguments if we go with separate type and flags arguments - do.
So presumably you'd have multiple processes or threads opening the same interface and joining the same fanout group, and different threads/processes using the resulting |
Given that this is Linux-specific, perhaps, instead, we should add This would be an "escape hatch" for UN*X similar to the |
@guyharris ok, I will close this pull request. For pcap_getsockopt(), pcap_setsockopt(), and pcap_ioctl() I will and wrappers in the next pull request. Something like this:
Is it good enough? Or add more detailed error handling in this functions? |
Given that there are other capture mechanisms that allow packets from a single source to be distributed amongst multiple readers, another possibility might be to provide a standard API to support all of them. @luigirizzo: It looks as if netmap can do that, as per your USENIX paper:
@myri: Can the API for the Myricom cards support multiple threads reading packets from the same interface, and, if so, can separate @sfd: Can that be done with DAG cards and the DAG API? @ntop: Can PF_RING support that? Note that Linux |
For DAG: Only one application can attach to each device at once, e.g. they are single-reader. This means multiple applications/workers can read load-balanced packets from a single interface, but they can't read the same packets. If there was an API in libpcap for supporting multiple load-balanced streams/rings/queues within a single 'device/interface' we could probably support it. |
I.e., any given packet will be only seen by one worker - the other workers won't see it? That seems to be the same sort of fanout that |
Correct, a current limitation is that only one 'reader' can attach to a stream at once, so packets are only seen by one worker. This is likely similar to other load-balancing mechanisms, RSS etc. |
Yes, in netmap it is possible to bind each RX ring (of the same interface) to a different reader thread. |
As I found in documentation for Myricom API(https://s3.amazonaws.com/hpp-cspi-sdrive/a0ij0000005xHcdAAE%2FSNFv3_API_Reference_Manual+%282%29.pdf?response-content-disposition=attachment%3Bfilename*%3DUTF-8%27%27SNFv3_API_Reference_Manual%2520%25282%2529.pdf&AWSAccessKeyId=AKIAJCCINNC6VUDONGUA&Expires=2112637986&Signature=eI5JrhAIdYVlTCvHuFtyYY0gS8o%3D) libpcap already has a multithreaded support:
It can be configured by using environment variables like SNF_FLAGS, SNF_NUM_RINGS. |
So can that be done using the DAG API, or do you have to configure the DAG card with the |
On 7/04/2018, at 4:43 AM, Guy Harris <notifications@github.com<mailto:notifications@github.com>> wrote:
By default captured packet records from all interfaces go to one stream, however records can be steered to streams based on interface, flow load-balancing, filters, or any combination of the above.
So can that be done using the DAG API, or do you have to configure the DAG card with the dagconfig command?
It can be done via C API calls.
Libpcap could potentially determine that load balancing was already configured, or it could configure it on request.
I would be happy to contribute, just need to know where to start.
Stephen
|
So my initial idea for APIs to control fanout are: To open the first
and then, for all other
Another possibility is to have another object of type So could something such as that be made to work? For splitting |
I like the idea of having a PCAP_POLICY_DEFAULT which is always implemented, even if the behaviour may vary. Is it necessary for each subsequent pcap_ts to set the same policy, or is it assumed global? How is the size of the group set? Is it elastic, e.g. starts at 1 and increases as more 'readers' attach, or is it pre-determined? Is there a way to 'get' the maximum number of readers in a group, or only return an error once the maximum supported number is reached? For a DAG device we would probably only support one group, otherwise we would have to duplicate traffic at additional cost. Would the functions be stubbed in the plugin interface, e.g. dag_pcap_set_fanout_policy() would be implemented in pcap-dag.c? If so, would pcap_set_fanout_group() be stubbed, or can we use the kernel group id space even when not creating a kernel group? For multiple processes is it sufficient for one parent process to create the group, and pass the id to the later processes? All processes would still need appropriate permissions/capabilities. |
Yes. There's the policies, for which we could return a bitmap, which would require either that we limit it to 32 or 64 policies or that we have For the policy options, we'd have another call, returning a bitset of all supported options.
We probably should just have it be global - meaning "stored in a per-group data structure", with DAG having only one such structure - so that you don't have to set it for every
For Linux fanout, I think there's no fixed upper limit, so either one could work. Which would work better for DAG cards?
So only group 0 would be supported;
Yes.
Yes. That way, DAG could implement the "fail if the group ID is non-zero or if The per-group data structure would have a count of
That would be ideal, but that might be hard to do if the kernel isn't doing the reference counting of groups, as you can't rely on userland counting, especially if the new processes are created with new programs. |
Sounds reasonable. If you put the stubs in I should be able to fill them for pcap-dag.
DAG cards currently support up to 32 receive streams, so up to 32-way load balancing. Simply returning an error on exceeding the limit would save one function stub, for the get_limit call. Note each time the fanout count increases the set of flows going to each existing client changes, which may cause confusion for some stateful applications. Could it be arranged so that you could create/define the full fanout set before activating them?
That should work. |
Is this still active? I’m in a situation where fanout would really be beneficial and just checking to see what options are available |
Also, is this only for dag cards or does this work with any cards? |
Yes, in the sense that we do want such an API.
Not only is it NOT only for DAG cards, it wasn't even originally proposed for DAG cards, it was proposed for regular Linux network interfaces. The goal is to come up with something that 1) can be used in code that can work with Linux interfaces, DAG cards, etc. if it uses only capabilities that make sense with all of those sets of adapters and 2) also allows code to use particular capabilities of particular interfaces if it's only going to support those interfaces.
It will only work if the software for the adapters - and, if necessary, the hardware/firmware for the adapters - supports it. I think all Linux adapters support the Linux fanout API, which is the API that libpcap would use for this. I don't know whether any OSes other than Linux support an API that can do the same sort of thing. I don't know whether any adapters that don't work as "normal" network interfaces, other than DAG cards, support some form of fanout such as this. |
Very cool, thanks for clarifying all my questions. I wish I knew cpp better so I could contribute something to this effort. I’m definitely following this particular feature with great interest. |
C (libpcap is written in C, not C++) isn't the hard part; coming up with a libpcap API that matches what's needed by Linux, the DAG library, and other adapters' APIs is the hard part (for which I haven't had much time lately, unfortunately). |
I think the FANOUT API is a good idea. I do not think adoption should be delayed to resolve any open questions about DAG support. |
Would it be better to name the function something like |
No, I think it is generically useful. Many capture systems support multiple queues or streams of some form. The proposed implementation covers Linux fine. The underlying issue is that internally libpcap doesn't have the concept of capture devices which have multiple physical interfaces. I don't think this can be properly solved until libpcap gets 'pcapng' style APIs and capture support. That is a much larger task. |
return 0; | ||
#else | ||
pcap_fmt_errmsg_for_errno(handle->errbuf, PCAP_ERRBUF_SIZE, | ||
errno, "funout is not supported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably meant fanout
instead of funout
.
I am closing this pull request, I think that the code needs to be rebased, and it seems like consensus is that a different mechanism should be implemented. Re-open if I'm wrong. |
This work is based on PR the-tcpdump-group#674 while taking into consideration the review comments which caused the PR to be closed by the original author, @xpahos.
This work is based on PR the-tcpdump-group#674 while taking into consideration the review comments which caused the PR to be closed by the original author, @xpahos.
This work is based on PR the-tcpdump-group#674 while taking into consideration the review comments which caused the PR to be closed by the original author, @xpahos.
This work is based on PR the-tcpdump-group#674 while taking into consideration the review comments which caused the PR to be closed by the original author, @xpahos.
This work is based on PR the-tcpdump-group#674 while taking into consideration the review comments which caused the PR to be closed by the original author, @xpahos.
This work is based on PR the-tcpdump-group#674 while taking into consideration the review comments which caused the PR to be closed by the original author, @xpahos.
For multithreaded applications, it will be useful to add PACKET_FANOUT support.
Tests for n cpu thread per application without FANOUT, only RX_RING:
3412458 pps
3432391 pps
Test for n cpu thread per application with FANOUT and RX_RING:
8813457 pps
8932582 pps
More information about PACKET_FANOUT and examples https://www.kernel.org/doc/Documentation/networking/packet_mmap.txt