-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T5086: Add sFlow feature based on hsflowd #1891
Conversation
Hello all, Three comments on the CLI settings: (1) It is probably better to set the sflow agent-address to an interface, rather than an IP address. That way it doesn't cause confusion if that address goes away. Or worse, if it is allocated to some other device in the network. The line that goes in /etc/hsflowd.conf looks like this: agent=DEVICE e.g. agent=eth0 Did you have some other reason for wanting to set an explicit IP address here? (2) Since the sflow agent-address goes in the payload of the sFlow datagrams, the source-address of those packets is not relevant. So there is no need to make it a setting here. It will be whatever it would normally be to send UDP to that server. If an sFlow feed is forwarded to another server the source address will change. The server will typically ignore the source-address and look only at the agent-address. (3) The sflow server setting should probably allow an extra argument to determine the namespace or vrf. The line that goes in hsflowd.conf looks like this: collector { ip=IPADDRESS udpport=PORT namespace=NAMESPACE } or: collector { ip=IPADDRESS udpport=PORT dev=DEVICE } e.g. collector { ip=10.0.0.30 udpport=6343 dev=eth0 } Whether you use the namespace or the dev parameter depends on how VyOS uses namespaces in Linux. I hope that makes sense? Neil McKee |
What will be if there are several aliases on the interface? Which address it will be?
|
(1) If there are several addresses on the interface, it will choose one of them - the one it considers to be the most likely to be globally unique. And as long as nothing changes it will choose the same one every time. This election will apply to all IP addresses on the router if you do not set agent=DEVICE, so the "set system sflow agent-address" setting can be optional. It is also possible to set agent.cidr=CIDR to indicate a preference, or agent.cidr=!CIDR to put a thumb on the scale the other way. For example: agent.cidr=!10.0.0.0/8 tells the election that 10.* addresses should be avoided. Or: agent.cidr=::/0 indicates that IPv6 is preferred. I don't know if it is necessary to expose this agent.cidr setting. I guess it could be an optional parameter: set system sflow agent DEVICE [ CIDR ] but I don't know if anyone would use it. If the DEVICE has a global IPv4 address then it is almost always OK to use it as the sflow-agent-address. If it only has an IPv6 address then no problem, we'll use that. (3) The process does not have to be started in a namespace/VRF. It can send to multiple collectors in different namespaces as long as it has permission to switch namespaces and open sockets: And new comment: set system sflow enable |
(5) For the 1:N packet sampling rate the default behavior requires explanation. hsflowd will consider the ifSpeed of each port in turn. If the ifSpeed (bits/sec) is unknown then the default given as samplingRate=N is used, but if the interface has a known ifSpeed then the samplingRate is given by the expression: ifSpeed / 1e6. So for a 1G interface it is 1000. For a 10G interface it is 10000 and so on. This works well as a default. It is based on the requirement to detect a new flow of 10% bandwidth in 1 second. (I don't know if hsflowd is picking up the ifSpeed of the interfaces on VyOS. If not, we should try to ensure that it can.) The interface sampling rates can be overridden with settings like this: sampling.1G=2000 Those are easy to understand, so you might want to allow them to be set via the VyOS CLI. Something like: set system sflow sampling-rate speed 400G 65536 In practice the analysis is not particularly sensitive to the sampling rate so that should be enough flexibility for almost any real-world deployment, but if you want to allow the sampling rate to be forced to a specific value for a specific interface then that can be made possible too. Let me know. (6) It seems that the VyOS Linux kernel is compiled without the "drop_monitor" module. Is that correct? If that module can be included in the build then you could add this to hsflowd.conf: dropmon { limit=50 start=on sw=on hw=off } so that hsflowd will export the headers of dropped packets (along with the name of the function in the linux kernel where that skb was dropped) as part of the standard sFlow feed. This measurement complements the sFlow packet-sampling and counter-telemetry well because it provides visibility into the traffic that is not flowing. Very helpful for troubleshooting. The limit (a rate limit max of N drops per second that will be sent out in the sFlow datagrams) is the parameter that you would set in the CLI. Perhaps something like: set system sflow dropmon limit 50 I hope you will consider enabling this feature. Very powerful. |
Sure, there is the typical use case when the user configures the firewall, requiring explicitly set source/destination for allowed directions. So it could be extended
|
If interface eth1 is already associated with
Should it be
Or
I guess It depends on the driver
|
5b1249d
to
28b5ece
Compare
data/templates/sflow/hsflowd.conf.j2
Outdated
# 1G interface will be sampled at 1-in-1000 | ||
# 10G interface will be sampled at 1-in-10000 | ||
# 40G interface will be sampled at 1-in-40000 | ||
# If the ifSpeed (bits/sec) is unknown then the default given as samplingRate=N is used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did research in their documentation and here https://sflow.net/host-sflow-linux-config.php I found following:
sampling=400
For interfaces with no speed or applications with no specific sampling setting, fall back on this default 1-in-N rate.
I read it as it will apply when you have no sections like following in configuration:
sampling.100M = 100
sampling.1G = 500
sampling.10G = 1000
sampling.40G = 4000
I did test on my own machine with 1G interface with following setup:
sflow {
DNSSD = off
polling = 30
agentIP = 127.0.0.1
sampling = 150
collector { ip = 127.0.0.1 udpport = 6343 }
# Add all relevant interfaces to this list
pcap { dev = enp37s0f0 }
}
And it did not behave as I expected. I have no sampling.1G here but it ignored it and used hardcoded 1:1000:
sudo tshark -i lo -n -V -f 'port 6343'|grep 'Sampling rate'
Sampling rate: 1 out of 1000 packets
Sampling rate: 1 out of 1000 packets
Sampling rate: 1 out of 1000 packets
Sampling rate: 1 out of 1000 packets
Sampling rate: 1 out of 1000 packets
Such approach makes sampling rate setup extremely complicated and counter-intuitive.
I believe some intermediate option like setting all speeds and fallback speed to same value will be best option:
sampling.100M = {{ sampling_rate }}
sampling.1G = {{ sampling_rate }}
sampling.10G = {{ sampling_rate }}
sampling.40G = {{ sampling_rate }}
sampling = {{ sampling_rate }}
According to our deployments 1:1000 may be safe option from performance and accuracy perspective for almost all speeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Their default pre-calculated value for 10G is a way too high and it even conflicts with their official recommendations:
10G interface will be sampled at 1-in-10000
@sflow, Honestly, it looks like a bug and should work another way. So this syntax looks ugly
If we already have:
We also use defaultValues starting from ssh-port to container-registry and they all overwritten by configured values. Maybe some new option like If auto settings/calculation works well, I prefer to leave it as is until a better solution is found. |
(3) dev=mgmt or dev=eth1. I think either of these will work, actually. (5) In hsflowd version 2.0.48-1 (released today) you have some new choices. You now have more control over the bps_ratio that applies to interfaces that have an ifSpeed. The default is sampling.bps_ratio=1000000 as discussed above, but you can turn that behavior off altogether with: sampling.bps_ratio = 0 So as long as you haven't set anything like "sampling.1G=1000" it will fall back to the top-level default. So if the goal is to set the same sampling rate to 5000 for all interfaces regardless of ifSpeed then you can do that with: sampling=5000 Also in version 2.0.48-1, you can override the samplingRate down in a pcap{} section. So if you want to allow a custom samplingRate on a given interface (which can occasionally be useful for testing) then you can do this: pcap { dev=eth1 sampling=100 } And one more new comment: |
@sflow awesome, thank you so much for this improvement! I've tried hsflowd 2.0.48 on 1G physical interface:
I've used following configuration:
And it worked just fine:
FastNetMon was able to parse traffic without any issues:
Then I repeated my tests in GCE on virtual network interface:
And it worked just fine from FastNetMon too:
It looks very solid and works as expected. Well done! |
@pavel-odintsov @sflow Thanks!!! I'll change the template soon.
|
We now have |
Yes it does - CI will soon build both versions. |
@c-po Sure, I guess we'll drop And thanks for fixing arm64 :) |
The PR completely ready for review |
Add sFlow feature based on hsflowd According to user reviews, it works more stable and more productive than pmacct I haven't deleted 'pmacct' 'system flow-accounting sflow' yet It could be migrated or deprecated later set system sflow agent-address '192.0.2.14' set system sflow interface 'eth0' set system sflow interface 'eth1' set system sflow polling '30' set system sflow sampling-rate '100' set system sflow server 192.0.2.1 port '6343' set system sflow server 192.0.2.11 port '6343'
@c-po could you take a look on this in the future? |
Added in vyos/vyos-build@771b1f6 |
We found a minor issue. If you don't specify the UDP port in the CLI it seems that the hsflowd.conf that is generated ends up with: collector { ip=10.1.2.3 udpport= } which fails to parse. Seems like this would be easy to fix. You can either generate: collector { ip=10.1.2.3 } or make sure the default is used: collector { ip=10.1.2.3 udpport=6343 } The effect will be the same either way. |
@sflow Thanks, I'll take a look |
There is the fix #1898 |
Change Summary
Add sFlow feature based on hsflowd
According to user reviews, it works more stable and more productive than pmacct
I haven't deleted 'pmacct' 'system flow-accounting sflow' yet It could be migrated or deprecated later
Types of changes
Related PR
vyos/vyos-build#320
Related Task(s)
Component(s) name
sflow
Proposed changes
How to test
check service
config:
Dump:
Checklist: