Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PACKET_FANOUT on Linux #869

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tsnoam
Copy link

@tsnoam tsnoam commented Oct 23, 2019

This work is based on PR #674 while taking into consideration the review
comments which caused the PR to be closed by the original author,
@xpahos.

@mluscon
Copy link

mluscon commented Feb 11, 2020

Hi all, are there any issues with merging this PR?

@Oppen
Copy link
Contributor

Oppen commented Feb 13, 2020

I wonder if there's a more generic way to expose this, as many network stacks provide similar capabilities. Ones I definitely know of are DPDK and XDP (technically Linux, too, but you configure it by other means), but I expect most other mainstream *nixes to have equivalent features.

@tsnoam
Copy link
Author

tsnoam commented Feb 19, 2020

@Oppen
I get what you're saying that a consolidated way is for the best.
However, since DPDK, XDP & PF_PACKET are so different in semantics, requiring the direct consumer to understand:

  1. They're using one technology over another.
  2. How the technology works (how to spawn threads, how to associate them with the kernel, who allocates the memory?).
  3. As of "2" above, how to properly configure the fanout?
    I would say that such consumer would want the full power to configure fanout without being masked out of control.

From what I gather, libpcap currently has initial support for DPDK, while it provides no support for XDP.
I suggest to currently implement fanout support for PF_PACKET as suggested in the PR (maybe even mark it as "unstable API" in the manpage). Once there will be enough code & knowledge about how to use other tech, it will be possible to either have separate APIs or find a way to consolidate them all.

pcap-linux.c Outdated Show resolved Hide resolved
@Oppen
Copy link
Contributor

Oppen commented Apr 14, 2020

@tsnoam I see your point, but I think masking implementation details is more or less the point of using libpcap as an abstraction. I'm OK-ish with marking it as an unstable API, although I have my doubts on that concept (eventually it has users and you can never drop it without facing their wrath).
Note though that I'm not a maintainer. My opinions and suggestions are just that, and carry no weight towards acceptance. Just in case you feel discouraged by my reluctance to make it platform-specific at the API level.

Also, it may be interesting to take a look at this patchset.

pcap-linux.c Outdated Show resolved Hide resolved
@tsnoam
Copy link
Author

tsnoam commented Apr 16, 2020

@tsnoam I see your point, but I think masking implementation details is more or less the point of using libpcap as an abstraction. I'm OK-ish with marking it as an unstable API, although I have my doubts on that concept (eventually it has users and you can never drop it without facing their wrath).
Note though that I'm not a maintainer. My opinions and suggestions are just that, and carry no weight towards acceptance. Just in case you feel discouraged by my reluctance to make it platform-specific at the API level.

Also, it may be interesting to take a look at this patchset.

  1. Again, I get what you're saying about properly wrapping all various technologies into a single API, hiding implementation details from the user. However, waiting until everything was properly learned and consolidated may take too long and practically stops progress.
  2. I looked at the patchset you've sent. It seems to be already merged into the kernel. If you were referring to the fact that there are more functions to wrap: well, this is exactly the point: you make small progress by slowly adding functions to competing APIs (packet socket vs dpdk vs ... ). One day you reach a point where you know enough of all technologies and feel comfortable wrapping them into a single API. But until then, why wait?

@tsnoam
Copy link
Author

tsnoam commented Apr 16, 2020

Also, thanks for the review. I pushed and updated version of the code.

Copy link
Contributor

@Oppen Oppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsnoam I see your point, but I think masking implementation details is more or less the point of using libpcap as an abstraction. I'm OK-ish with marking it as an unstable API, although I have my doubts on that concept (eventually it has users and you can never drop it without facing their wrath).
Note though that I'm not a maintainer. My opinions and suggestions are just that, and carry no weight towards acceptance. Just in case you feel discouraged by my reluctance to make it platform-specific at the API level.
Also, it may be interesting to take a look at this patchset.

1. Again, I get what you're saying about properly wrapping all various technologies into a single API, hiding implementation details from the user. However, waiting until everything was properly learned and consolidated may take too long and practically stops progress.

2. I looked at the patchset you've sent. It seems to be already merged into the kernel. If you were referring to the fact that there are more functions to wrap: well, this is exactly the point: you make small progress by slowly adding functions to competing APIs (packet socket vs dpdk vs ... ). One day you reach a point where you know enough of all technologies and feel comfortable wrapping them into a single API. But until then, why wait?
  1. I agree.
  2. That wasn't a counter argument or a suggestion for you to implement it now. It was just a "look at this cool thing", that may interest you because you're interested in something similar :)
    I know it's merged, but the patch has a nice explanation about what it does, and I think it's a good way to point to features.

pcap/pcap.h Outdated Show resolved Hide resolved
@tsnoam
Copy link
Author

tsnoam commented Apr 16, 2020

  • I agree.

  • That wasn't a counter argument or a suggestion for you to implement it now. It was just a "look at this cool thing", that may interest you because you're interested in something similar :)
    I know it's merged, but the patch has a nice explanation about what it does, and I think it's a good way to point to features.

Oh, thanks for the reference then :-)

In any case, I feel that we do have an understanding that this PR is worthy of finding itself into upstream, with the open question of whether it should be marked as "unstable" pending maintainers decision.

@Oppen
Copy link
Contributor

Oppen commented Apr 16, 2020

In any case, I feel that we do have an understanding that this PR is worthy of finding itself into upstream, with the open question of whether it should be marked as "unstable" pending maintainers decision.

I think it's definitely worthy of getting upstream. That doesn't mean improving it (whatever that means) is out of the question.

@mcr mcr added this to the next-release milestone Aug 28, 2020
@mcr mcr self-assigned this Aug 28, 2020
@mcr mcr added the TPACKET_V3 label Aug 28, 2020
.SH DESCRIPTION
On Linux network interface devices,
.B pcap_set_fanout_linux()
sets the fanout configuration to be used in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain what the affect of having packet fanout mode?
When would I turn it on and why? Is there is a downside to it? What if every application turned it on?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the AF_PACKET fanout mode, packet reception can be load balanced among processes/threads.

When "configuring" the socket to be used, you would be required to set a group_id, use the same group_id to achieve the required load balancing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't tell me. Put it in the manpage!

@aouinizied
Copy link

aouinizied commented Sep 12, 2020

@tsnoam @mcr @guyharris I tried building libpcap from your branch. And I followed the doc. Everything compiles well and I test it using 3 parallel processes.

So I changed my pcap_open_live to the following in each process:

pcap_handle = pcap_create((char*)pcap_file, err_open);
set = pcap_set_fanout_linux(pcap_handle, 1, 0, root_idx);
pcap_set_snaplen(pcap_handle, snaplen);
pcap_set_promisc(pcap_handle, promisc);
pcap_set_timeout(pcap_handle, 500);
pcap_activate(pcap_handle);

Which prints the desired output:

process (group) id: 0 fanout set: 0
process (group) id: 1 fanout set: 0
process (group) id: 2 fanout set: 0

However, when checking if FANOUT worked, I observe that each process still receives all packets.
Note that I choose HASH to be flow-based (0 for mode value) and I know that testing traffic is well balanced based on flow hash (30%, 30%, 30%). Do you have any clue how to fix it?

@xpahos
Copy link

xpahos commented Sep 12, 2020

@aouinizied it's very strange. I'm using the same construction in production:

#if defined(_linux_)
    int val = (getpid() & 0xffff) | (PACKET_FANOUT_LB << 16);
    if (setsockopt(pcap_fileno(pcap.Session()), SOL_PACKET, PACKET_FANOUT, &val, sizeof(val)) < 0) {
        Cerr << "Failed to set fanout: "
             << strerror(errno)
             << "\n";
        return;
    }
#endif

pcap_set_fanout_linux here do the same as

int val = (getpid() & 0xffff) | (PACKET_FANOUT_LB << 16);

Does your root_idx is unique?

@xpahos
Copy link

xpahos commented Sep 12, 2020

@aouinizied looks like I found the problem:

pcap_set_fanout_linux(pcap_t *p, int enable, uint16_t mode, uint16_t group_id)

I think it will be better to use:

set = pcap_set_fanout_linux(pcap_handle, 1, PACKET_FANOUT_LB, root_idx);

because the third parameter is mode. I think it will be better to remove enable because it's not used in code.

@aouinizied
Copy link

@xpahos Thanks for the feedback. I will replace PACKET_FANOUT_LB by 0 for FLOW_HASH mode, test, and get back to you.

@aouinizied
Copy link

@xpahos checked that each root_idx is unique but the same observations.

set = pcap_set_fanout_linux(pcap_handle, 1, 0, root_idx)

The only difference with your setup is the mode. You set it to PACKET_FANOUT_LB whereas I use PACKET_FANOUT_HASH as the third parameter.

@xpahos
Copy link

xpahos commented Sep 12, 2020

@aouinizied yes, you right. PACKET_FANOUT_HASH should send packets only to one unique socket per each address port mapping.

@aouinizied
Copy link

@xpahos another detail that may help reproducing is that I'm using:

int rv_handle = pcap_next_ex(pcap_handle, &hdr, &data);

to poll the received packets. But this should not affect fanout right?

@aouinizied
Copy link

aouinizied commented Sep 13, 2020

@xpahos I found what was wrong. And it's clearly stated in the doc.

  • group_id, as the name indicates, is a group of processes/threads.
  • FANOUT will duplicate traffic across groups. (what I was observing)
  • For a specific group_id. Each process/thread started with this group_id will receive 1/N of the traffic (where N is the number of processes/threads started with this group_id).
  • The mode defines load balancing strategy (on kernel side).

In my example replacing root_idx by 0 solve it and now I see traffic correctly load-balanced across my processes. @tsnoam Thank you for this great contribution!

A detail that maybe need to be added to the doc is that for PACKET_FANOUT_HASH there is a known bug on some Linux kernel versions. The bug consists in that a src->dst flow traffic will be hashed with a different hash than dst->src and this will result in inconsistency for bidirectional flow-based apps.

@tsnoam
Copy link
Author

tsnoam commented Sep 13, 2020

@aouinizied Sorry for the delay in response, I'm happy to see that you've eventually fixed the problem. Anyhow, most of the thanks are to @xpahos who had made the original PR, I have based on their work.

@xpahos that's a good catch regarding the enable flag. It appears that I have forgot to actually use it. I have just pushed a commit to do so.

@tsnoam
Copy link
Author

tsnoam commented Nov 10, 2020

@mcr I've rebased my branch on top of the latest master and made appropriate fixes as of commit 97f59a7

@javedshakeel
Copy link

Hi. I wanted to try out fanout support in tcpdump. Is this mode controlled by a command line option to tcpdump or is it always enabled in this branch?

@aouinizied
Copy link

@tsnoam is it possible to rebase your branch?

Many thanks for your contribution.

@tsnoam
Copy link
Author

tsnoam commented Sep 1, 2021

@aouinizied
I tried rebasing (and then build), and it passes cleanly.

have you encountered problems trying to rebase yourself?

@aouinizied
Copy link

@tsnoam no but I'm using your fork (until it gets merged within libpcap master) within my project (https://www.nfstream.org/docs/#building-nfstream-from-sources) and seems to be outdated. updating your branch with your rebase would be really helpful.

@tsnoam
Copy link
Author

tsnoam commented Sep 1, 2021

Hi @aouinizied ,

As the branch can be directly merged to master, i mostly wait for the maintainers to pick it up.
Since rebasing will also require me to test that no internal changes make it incompatible - things which can only be seen when running, I prefer to leave my PR and branch as it is, so the maintainers of of libpcap will be able to see on which commit I am based.

Personally, my systems work in production based on my branch, but I do not have the capacity at the moment to test a rebase over latest master.

@tsnoam
Copy link
Author

tsnoam commented Sep 1, 2021

@mcr
Do you have an estimation when this PR will be merged to master?
Is there something you expect me to change which I've missed?

Anything I can do to help, let me know!

.SH DESCRIPTION
On Linux network interface devices,
.B pcap_set_fanout_linux()
sets the fanout configuration to be used in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't tell me. Put it in the manpage!

@mcr mcr requested a review from guyharris September 1, 2021 17:25
@infrastation
Copy link
Member

Thank you for proposing these changes. If you consider them ready for merging, please rebase them on the current master branch and squash the commits together into one or more commits, so each change completely progress the working tree from one consistent working state to another.

This work is based on PR the-tcpdump-group#674 while taking into consideration the review
comments which caused the PR to be closed by the original author,
@xpahos.
@tsnoam
Copy link
Author

tsnoam commented Sep 2, 2021

@mcr I've added the documentation as requested. I hope it is clear enough. If not, I'll be happy to fix as necessary.

@infrastation I do consider this change as ready for merging (FWIW, I've been using this code for 2+ years in production). As requested, I've squashed all commits and rebased it over current master.

Regarding CI failure in AppVeyor, it seems unrelated to my change (winflexbison is missing from the system).

@aouinizied
Copy link

@guyharris @tsnoam any plan to merge this PR?
I'm using this branch since more than 1 year on a production network and it works well. It will be good to have it as part of next official release.

Thanks,
Zied

@aouinizied
Copy link

@guyharris any news regarding this PR?

@aouinizied
Copy link

REMINDER: @infrastation @mcr any estimation of when this PR will be merged?

Zied

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

8 participants