New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
packet loss stats #106
Comments
Actually one suggestion is D-ITG, which is already used for the VoIP tests. Just noticed that it can do both both one-way and RTT tests, and return packet loss. That could possibly be used instead of netperf for the UDP flows. |
Yes, twd was a bridge too far at the time. I wanted something that was reliably realtime and capable of doing gigE well, in particular, and writing twd in C was too hard. Now things have evolved a bit in these worlds (netmap for example), and perhaps me taking another tack entirely ( https://github.com/dtaht/libv6/blob/master/erm/doc/philosophy.org ) will yield results... eventually. It still seems like punting the whole tcp (or quic?) stack to userspace, using raw sockets, or leveraging bpf (the last seems to have promise)... would get me to the extreme I wanted. |
Wow, that sounds very high end. That (especially your comment about a userspace TCP stack) triggered a couple of thoughts I've been having as I'm completing my second round of point-to-point WiFi tests:
For one, I do know that rig setups can be extremely specific and variable (for me, it's all about Wi-Fi backhaul, which is vastly different from other setups), so I'm not proposing something that varies rig setups automatically, but maybe there could be an automatic test of sorts, after rig setup has occurred somehow, that runs in phases and ramps up until some limit. Two possible phases could be: A) "no load" stuff that summarizes physical link characteristics (useful both for CPE devices with low numbers of users, or for just understanding the basics of backhaul and router links):
B) response under load (useful for loaded CPE devices, and at higher connection counts for backhaul and routers):
I don't mean to start summarizing the RRUL spec! So I'll stop, I only meant that maybe the process for testing RRUL could only have a minimal set of parameters, and the test program could automate the process of producing relevant results. The only parameters to specify, in case the user wants to, might be "how far and how fast" (meaning for example how many simultaneous flows to go up to and how quickly to get there), which could be calculated based on either the results from phase A, "what it's probably capable of" or estimated during phase B based on "how it's going". The "how far to take the test" could be specified in case someone wants to push a link well beyond its limits, or wants to stop well short of them to get something quick. Maybe this could just be a Flent "automatic" test? So I know there's a place for a highly configurable "hard-core" tester that can hit 10 GigE and product microsecond-level accuracy, but I think there may also be a place for such an automated, "good enough for many" test. PS- Golang 2.8 was released with GC improvements that bring pauses to "usually under 100 microseconds and often as low as 10 microseconds". I know that still might not be good enough for some tests, especially 10 GigE or microsecond-sensitive results (I'm starting to look at microseconds for VoIP test as well), but it's getting better. |
Pete Heist <notifications@github.com> writes:
Actually one suggestion is D-ITG, which is already used for the VoIP
tests. Just noticed that it can do both both one-way and RTT tests,
and return packet loss. That could possibly be used instead of netperf
for the UDP flows.
Yeah, we're missing tools to do UDP packet loss statistics,
unfortunately. D-ITG is a possibility, but it is a PITA to set up, and
not suitable to run over the public internet...
…-Toke
|
I noticed that. The VoIP tests were a little painful to get working. My Mac Mini G4 also had really bad clock drift, which at first produced some beautiful but useless delay curves. "adjtimex --tick 10029" got the system clock close enough so that ntp would agree to do the rest. Still, one-way delay can be off by up to a millisecond or so, depending on how ntp is feeling at the moment. |
Pete Heist <notifications@github.com> writes:
I noticed that. The VoIP tests were a little painful to get working.
My Mac Mini G4 also had really bad clock drift, which at first
produced some beautiful but useless delay curves. "adjtimex --tick
10029" got the system clock close enough so that ntp would agree to do
the rest. Still, one-way delay can be off by up to a millisecond or
so, depending on how ntp is feeling at the moment.
Yeah, I run PTP in my testbed to get around that.
For most cases, though, having a simple ping-like (i.e. isochronous)
back-and-forth UDP RTT measurement would be fine... Unfortunately, I
haven't yet found a tool that will do that...
…-Toke
|
Thanks, I might try PTP, didn't know about it. Anything depending on the old xinetd echo service is probably out, right? I guess you'd want a small, native standalone client and server? It's not as easy to find as I thought it would be. |
Pete Heist <notifications@github.com> writes:
Thanks, I might try PTP, didn't know about it.
Anything depending on the old xinetd echo service is probably out,
right? I guess you'd want a small, native standalone client and
server? It's not as easy to find as I thought it would be.
Main requirement is that the client can be convinced to timestamp its
output and run for a pre-defined time interval. Accuracy in timing is a
bonus... ;)
…-Toke
|
Ok, and near as I can tell netperf UDP_RR sends a packet, waits for a response then sends another without any delay. If packets are lost, the test apparently stops, although it at least resumes after I built and installed 2.7.0 from source (after your tip in an email a while back). I would think that, instead of stopping after not receiving a response, it should send another packet after some delay so the test doesn't stop. Perhaps the delay could be around 5x current mean RTT (maybe within the last 5x mean RTT window of time also, so it adapts to changes). That would need testing. I'm surprised that the URP_RR test is that aggressive actually, that it sends continuously instead of at a fixed rate. It means that your UDP flows are in continuous competition with one another, as well as the TCP flows, whereas something like VoIP rather sends at a fixed rate. Perhaps that's what you want for the benchmark. So I'll write if I find anything... |
Pete Heist <notifications@github.com> writes:
Ok, and near as I can tell netperf UDP_RR sends a packet, waits for a
response then sends another without any delay. If packets are lost,
the test apparently stops, although it at least resumes after I built
and installed 2.7.0 from source (after your tip in an email a while
back).
I would think that, instead of stopping after not receiving a
response, it should send another packet after some delay so the test
doesn't stop. Perhaps the delay could be around 5x current mean RTT
(maybe within the last 5x mean RTT window of time also, so it adapts
to changes). That would need testing.
I'm surprised that the URP_RR test is that aggressive actually, that
it sends continuously instead of at a fixed rate. It means that your
UDP flows are in continuous competition with one another, as well as
the TCP flows, whereas something like VoIP rather sends at a fixed
rate. Perhaps that's what you want for the benchmark.
Exactly. The ping-pong means that the rate consumed by the measurement
flow varies with the RTT (so fix bufferbloat, you'll lose bandwidth as
far as that test is concerned). Also, sine netperf only reports number
of successful back-and-forth transactions (which Flent then turns into
an RTT measure), a hickup turns into a very high RTT value, even with
the restart behaviour.
So yeah, fixed (isochronous) rate, similar to 'ping' is what we need.
…-Toke
|
Thanks, now I see where twd was headed and why. :) |
one thing that I increasing think is worth doing is adding an ipv6
timestamp header type. there are internet drafts on this, and it
wouldn't break on local connnectons.
…On 3/30/17 3:47 AM, Toke Høiland-Jørgensen wrote:
Pete Heist ***@***.***> writes:
> Thanks, I might try PTP, didn't know about it.
>
> Anything depending on the old xinetd echo service is probably out,
> right? I guess you'd want a small, native standalone client and
> server? It's not as easy to find as I thought it would be.
Main requirement is that the client can be convinced to timestamp its
output and run for a pre-defined time interval. Accuracy in timing is a
bonus... ;)
-Toke
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#106 (comment)>, or
mute the thread
<https://github.com/notifications/unsubscribe-auth/AAGoijbqICqJp8t8V-qIzcbKVhhImzSwks5rq4hBgaJpZM4MpJAm>.
|
I wrote a quick mockup in Go to see what's possible. Here's pinging localhost for 200 packets with standard 'ping':
And using my 'rrperf' mockup, sending and echoing 200 UDP packets with nanosecond timestamps to localhost:
Summary:
Do you think these stats are within the realm of acceptability for local traffic, and would you use this at all from Flent? If so, I could complete a latency (and throughput for that matter) tester pretty quickly in Go, that outputs results say, to JSON. Basically it could just run multiple isochronous RTT tests simultaneously, specifying packet size, spacing and diffserv marking for each, along with multiple TCP flows, specifying direction and diffserv marking. As for results, I suppose it would have periodic samples from each flow and totals at the end. For the UDP flows, I could have packet loss and RTT, but not OWD, for now (maybe later). I don't know what extra features are needed from netperf, but I suspect there can be more detail. :) Notes / Caveats:
|
Pete Heist <notifications@github.com> writes:
I wrote a quick mockup in Go to see what's possible. Here's pinging localhost for 200 packets with standard 'ping':
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
200 packets transmitted, 200 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.044/0.150/0.224/0.026 ms
And using my 'rrperf' mockup, sending and echoing 200 UDP packets with nanosecond timestamps to localhost:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Iter 199, avg 0.388392 ms, min 0.096244 ms, max 0.527164 ms, stddev 0.075881 ms
Summary:
* mean RTT is 150 microseconds for ping/ICMP, 388 microseconds for rrperf/UDP
* stddev is 26 microseconds for ping/ICMP, 76 microseconds for rrperf/UDP
Do you think these stats are within the realm of acceptability for
local traffic, and would you use this at all from Flent? If so, I
could complete a latency (and throughput for that matter) tester
pretty quickly in Go, that outputs results say, to JSON.
Sure! The latency test is the most pressing, so far netperf works quite
well for throughput (and has a ton of features that it would take some
time to replicate fully).
What does the server look like, and is it safe to expose to the
internet? :)
Basically it could just run multiple isochronous RTT tests
simultaneously, specifying packet size, spacing and diffserv marking
for each, along with multiple TCP flows, specifying direction and
diffserv marking. As for results, I suppose it would have periodic
samples from each flow and totals at the end.
Actually, having Flent run multiple instances of the tool is probably
easier than having to split the output of one instance into multiple
data series.
…-Toke
|
Toke Høiland-Jørgensen <notifications@github.com> writes:
It's a three way handshake that's rather needed before
inflicting it on the internet.
|
Ok, so I'll start with a latency only, single RTT test then. Keep things simple! The client and server are separate, like netperf, as the server might end up a little smaller. As for safe to expose, it's a given that it should be safe from buffer overflow problems (I'll avoid Go's 'unsafe' package). But what else of these is important:
I think #1 and #2 make sense to me and are "easy", but as for #3 and #4, I assume they're not needed now. This is something that might run on public servers and you want the server to be safe, but it's not something that needs to be run securely between trusted parties across the open Internet, right? There's no way to prevent someone from writing a rogue client and hogging up resources, but we could stop random probes with #1, reduce the impact of any attacks with #2, and obviously lock things down more with #3 and #4, with more effort. If all this sounds reasonable, I'll just put something together and welcome any critique... |
Pete Heist <notifications@github.com> writes:
Ok, so I'll start with a latency only, single RTT test then. Keep
things simple!
The client and server are separate, like netperf, as the server might
end up a little smaller.
As for safe to expose, it's a given that it should be safe from buffer
overflow problems (I'll avoid Go's 'unsafe' package). But what else of
these is important:
1 Challenge/response test with a fixed shared key to smoke test for
legitimate clients. (I take this as the "three-way handshake".)
Basically anything that would cause a test like this to be an
inadvertent amplifier. A forged src address that just started a test
with no confirmation, is bad.
So long as there is a challenge response phase (and a strict upper
limit on the duration and number of tests), I can sleep at night.
I'm one of the guys that predicted the ntp amplification attacks...
2 Configurable limits on server for length of test, send interval, etc
(basic DoS protection).
3 Accounts / permissions with "request and grant" (I want to do this
test, will you let me?)
Tis much harder and not strictly necessary. A reply of "I'm busy now,
please try again later" seems much simpler.
For how deep that rathole can go, see owamp for inspiration.
… 4 Invisibility to unauthorized clients (requires #3).
I think #1 and #2 make sense to me and are "easy", but as for #3 and
#4, I assume they're not needed now. This is something that might run
on public servers and you want the server to be safe, but it's not
something that needs to be run securely between trusted parties across
the open Internet, right?
There's no way to prevent someone from writing a rogue client and
hogging up resources, but we could stop random probes with #1, reduce
the impact of any attacks with #2, and obviously lock things down more
with #3 and #4, with more effort.
If all this sounds reasonable, I'll just put something together and
welcome any critique...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Ok, understood. Also, a compiled-in pre-shared key for the handshake that can be overridden from the command line is still easy to implement and would allow for restricted tests, if needed. More later when something's ready... |
Pete Heist <notifications@github.com> writes:
Ok, understood.
Also, a compiled-in pre-shared key for the handshake that can be
overridden from the command line is still easy to implement and would
allow for restricted tests, if needed.
Sure, that is useful (netperf has a similar feature), but it is also
important that one can have a "public" server that will not send large
amounts of unsolicited traffic to random addresses given a spoofed
source IP.
I'm not too worried about this as long as the reply is no bigger than
the request; then it will be no worse than normal ping (well, rate
limiting may be necessary).
Also, playing nice with firewalls (the server should be able to listen
on a configurable port number, and not use a large random port number
space).
…-Toke
|
Yep, I'll try to keep it to a single UDP port on the server (random on client). There might be a bit of a delay as I still have to complete tests of Ubiquiti's stuff for FreeNet and finish the report and presentation by 5/1 (along with the day job :) One of my main motivations for getting this test done asap though is WMM. After Dave's tip I did tests with WMM on and off (or at least avoided) and the results were really surprising. With WMM on, even when you do the default rrul (not rrul_be) test, latencies are 5-10x what they probably should be. Disable or bypass WMM and things look much, much better. So either:
Or something in between. Hopefully we can determine that soon. PS- I finally did Chaos Calmer tests. LEDE's latency improvements under load are pretty staggering, particularly when it comes to dynamic rate drops. So hopefully the report I produce helps highlight the good work you guys are doing. :) |
Pete Heist <notifications@github.com> writes:
Yep, I'll try to keep it to a single UDP port on the server (random on client).
There might be a bit of a delay as I still have to complete tests of Ubiquiti's stuff for FreeNet and finish the report and presentation by 5/1 (along with the day job :)
One of my main motivations for getting this test done asap though is
WMM. After Dave's tip I did tests with WMM on and off (or at least
avoided) and the results were really surprising. With WMM on, even
when you do the default rrul (not rrul_be) test, latencies are 5-10x
what they probably should be. Disable or bypass WMM and things look
much, much better. So either:
1 WMM is "bad" and to be avoided, particularly for higher numbers of
diffserv marked flows, OR
2 The netperf UDP_RR test, with its zero-delay back and forth (which
arguably doesn't represent what you see in the real world) when marked
with higher priority diffserv markings like EF or others, doesn't play
well with WMM
Well, the plain RRUL test will put a TCP flow in each WMM priority bin,
which will hammer them pretty hard. This tends to break things, since
there's no admission control on the different queues.
For instance, according to the standard, the VI queue should never send
aggregates with durations longer than 1ms. The code checks this, but
only for the first rate in the configured rate chain for the
transmission. So if that fails, and subsequent retries drop down to a
lower rate, the aggregate can suddenly be longer than 1ms, in violation
of the standard.
And also, since the queues are served in strict priority order,
hammering the VO and VI queues tends to lock out the others.
…-Toke
|
On Apr 10, 2017, at 11:59 AM, Toke Høiland-Jørgensen ***@***.***> wrote:
Well, the plain RRUL test will put a TCP flow in each WMM priority bin,
which will hammer them pretty hard. This tends to break things, since
there's no admission control on the different queues.
For instance, according to the standard, the VI queue should never send
aggregates with durations longer than 1ms. The code checks this, but
only for the first rate in the configured rate chain for the
transmission. So if that fails, and subsequent retries drop down to a
lower rate, the aggregate can suddenly be longer than 1ms, in violation
of the standard.
And also, since the queues are served in strict priority order,
hammering the VO and VI queues tends to lock out the others.
Interesting, so it sounds like even just the TCP flows by themselves are already giving WMM problems...
I'm trying to answer the question of whether or not FreeNet should be disabling or bypassing WMM in their backhaul. Since we can’t control what goes into the backhaul, it seems like either the diffserv markings need to be removed or WMM needs to be disabled or bypassed.
As for the bypassing option, I can make RRUL results look _way_ better by sending backhaul traffic through an IPIP tunnel with TOS 0. Yes, your MTU goes down a bit, but you don’t have problems with WMM and that _may_ be better than removing diffserv markings entirely as they’re preserved in the encapsulated packet for any routers upstream. RRUL results for rate limited cake with ‘diffserv4’ looks far better with IPIP tunneling than not. I’m asking myself why NOT to do this in production, and one main component of that is determining whether the RRUL test is an approximation of reality, at least some of the time, or not.
|
Pete Heist <peteheist@gmail.com> writes:
> On Apr 10, 2017, at 11:59 AM, Toke Høiland-Jørgensen ***@***.***> wrote:
>
> Well, the plain RRUL test will put a TCP flow in each WMM priority bin,
> which will hammer them pretty hard. This tends to break things, since
> there's no admission control on the different queues.
>
> For instance, according to the standard, the VI queue should never send
> aggregates with durations longer than 1ms. The code checks this, but
> only for the first rate in the configured rate chain for the
> transmission. So if that fails, and subsequent retries drop down to a
> lower rate, the aggregate can suddenly be longer than 1ms, in violation
> of the standard.
>
> And also, since the queues are served in strict priority order,
> hammering the VO and VI queues tends to lock out the others.
Interesting, so it sounds like even just the TCP flows by themselves
are already giving WMM problems...
I'm trying to answer the question of whether or not FreeNet should be
disabling or bypassing WMM in their backhaul. Since we can’t control
what goes into the backhaul, it seems like either the diffserv
markings need to be removed or WMM needs to be disabled or bypassed.
The question is what you gain by having it on. If you're running
FQ-CoDel-enabled WiFi nodes you get almost equivalent behaviour for
voice traffic without using the VO queue (or at least that's what I've
seen in the scenarios I've been testing). And the efficiency of the
network is higher if you don't use the VO and VI queues (since they
can't aggregate as much (or at all in the case of VO)). For a backhaul
link (which I assume is point-to-point?) you are not going to have a lot
of multi-node contention (which is where the VO queue might help as that
also affects contention parameters). So a lot of the potential benefits
of different 802.11 priorities are probably not relevant for this case...
As for the bypassing option, I can make RRUL results look _way_ better
by sending backhaul traffic through an IPIP tunnel with TOS 0. Yes,
your MTU goes down a bit, but you don’t have problems with WMM and
that _may_ be better than removing diffserv markings entirely as
they’re preserved in the encapsulated packet for any routers upstream.
RRUL results for rate limited cake with ‘diffserv4’ looks far better
with IPIP tunneling than not.
'Far better' how? I'm guessing we're down to the micro-optimisation
level here?
I’m asking myself why NOT to do this in production, and one main
component of that is determining whether the RRUL test is an
approximation of reality, at least some of the time, or not.
The RRUL test is designed to be a representation of the worst case,
rather than typical traffic most applications generate. So probably, in
*most* cases you will be fine with leaving it on, since no applications
will generate high volumes of VO or VI traffic. It's more of a security
issue, really, making it easier to DOS your network...
…-Toke
|
On Apr 10, 2017, at 12:37 PM, Toke Høiland-Jørgensen ***@***.***> wrote:
Pete Heist ***@***.*** ***@***.***>> writes:
> Interesting, so it sounds like even just the TCP flows by themselves
> are already giving WMM problems...
>
> I'm trying to answer the question of whether or not FreeNet should be
> disabling or bypassing WMM in their backhaul. Since we can’t control
> what goes into the backhaul, it seems like either the diffserv
> markings need to be removed or WMM needs to be disabled or bypassed.
The question is what you gain by having it on. If you're running
FQ-CoDel-enabled WiFi nodes you get almost equivalent behaviour for
voice traffic without using the VO queue (or at least that's what I've
seen in the scenarios I've been testing). And the efficiency of the
network is higher if you don't use the VO and VI queues (since they
can't aggregate as much (or at all in the case of VO)). For a backhaul
link (which I assume is point-to-point?) you are not going to have a lot
of multi-node contention (which is where the VO queue might help as that
also affects contention parameters). So a lot of the potential benefits
of different 802.11 priorities are probably not relevant for this case…
Exactly, that’s what I’m thinking (for the backhaul). But WMM is required in 802.11n and later as I understand it. If you disable it in LEDE, you fall back to 802.11g speeds. You can’t disable it in Ubiquiti’s UI, but it looks like it can be done manually in their config files, another thing I need to test.
> As for the bypassing option, I can make RRUL results look _way_ better
> by sending backhaul traffic through an IPIP tunnel with TOS 0. Yes,
> your MTU goes down a bit, but you don’t have problems with WMM and
> that _may_ be better than removing diffserv markings entirely as
> they’re preserved in the encapsulated packet for any routers upstream.
> RRUL results for rate limited cake with ‘diffserv4’ looks far better
> with IPIP tunneling than not.
'Far better' how? I'm guessing we're down to the micro-optimisation
level here?
Here are three rrul results using the default LEDE config, one with no IPIP tunnel, one with an IPIP tunnel with TOS 0, and one with TOS ‘inherit’, which takes the TOS value of the encapsulated packet:
http://www.drhleny.cz/bufferbloat_tmp/diffserv_rrul_default_noipip/index.html <http://www.drhleny.cz/bufferbloat_tmp/diffserv_rrul_default_noipip/index.html>
http://www.drhleny.cz/bufferbloat_tmp/diffserv_rrul_default_ipip_tos_0/index.html <http://www.drhleny.cz/bufferbloat_tmp/diffserv_rrul_default_ipip_tos_0/index.html>
http://www.drhleny.cz/bufferbloat_tmp/diffserv_rrul_default_ipip_tos_inh/index.html <http://www.drhleny.cz/bufferbloat_tmp/diffserv_rrul_default_ipip_tos_inh/index.html>
The TOS 0 version basically looks like the rrul_be test, because all traffic going over the P2P link is effectively best effort. TOS ‘inherit’ basically looks the same as having no IPIP tunnel, because the packets have the same diffserv markings, just with a little bit of overhead. It makes sense, other than the fundamental point of why latencies are so much higher with the rrul test and rrul_be.
Now, I can use an IPIP tunnel with TOS 0, rate limit and shape that with Cake diffserv4, for example, and get much better results than with no IPIP tunnel:
http://www.drhleny.cz/bufferbloat_tmp/diffserv_rrul_eg_cake_ds4_36mbit_noipip/index.html <http://www.drhleny.cz/bufferbloat_tmp/diffserv_rrul_eg_cake_ds4_36mbit_noipip/index.html>
http://www.drhleny.cz/bufferbloat_tmp/diffserv_rrul_eg_cake_ds4_36mbit_ipip_tos_0/index.html <http://www.drhleny.cz/bufferbloat_tmp/diffserv_rrul_eg_cake_ds4_36mbit_ipip_tos_0/index.html>
By adding the IPIP tunnel with TOS 0, average latency goes from ~50 ms to ~10 ms, and to summarize:
ICMP: 21.2 -> 7.0 ms
UDP BE: 20.8 -> 7.3 ms
UDP BK: 84.3 -> 14.2 ms
UDP EF: 9.0 -> 8.1 ms
All the latencies go down considerably, so why not do this? Unless the test just isn’t close to reality.
Now, and this is also important, I’m making an assumption here that this is demonstrating a problem with WMM, but maybe it’s not. Is LEDE’s new driver also prioritizing based on diffserv markings, or what exactly am I seeing?
> I’m asking myself why NOT to do this in production, and one main
> component of that is determining whether the RRUL test is an
> approximation of reality, at least some of the time, or not.
The RRUL test is designed to be a representation of the worst case,
rather than typical traffic most applications generate. So probably, in
*most* cases you will be fine with leaving it on, since no applications
will generate high volumes of VO or VI traffic. It's more of a security
issue, really, making it easier to DOS your network...
That’s what I’d like to also confirm with the tests. :) Thanks Toke for your advice...
|
On Apr 10, 2017, at 2:28 PM, Toke Høiland-Jørgensen ***@***.***> wrote:
Pete Heist ***@***.*** ***@***.***>> writes:
> Now, I can use an IPIP tunnel with TOS 0, rate limit and shape that with Cake diffserv4, for example, and get much better results than with no IPIP tunnel:
>
> http://www.drhleny.cz/bufferbloat_tmp/diffserv_rrul_eg_cake_ds4_36mbit_noipip/index.html
>
> http://www.drhleny.cz/bufferbloat_tmp/diffserv_rrul_eg_cake_ds4_36mbit_ipip_tos_0/index.html
>
> By adding the IPIP tunnel with TOS 0, average latency goes from ~50 ms to ~10 ms, and to summarize:
>
> ICMP: 21.2 -> 7.0 ms
> UDP BE: 20.8 -> 7.3 ms
> UDP BK: 84.3 -> 14.2 ms
> UDP EF: 9.0 -> 8.1 ms
>
> All the latencies go down considerably, so why not do this? Unless the test just isn’t close to reality.
>
> Now, and this is also important, I’m making an assumption here that
> this is demonstrating a problem with WMM, but maybe it’s not. Is
> LEDE’s new driver also prioritizing based on diffserv markings, or
> what exactly am I seeing?
The Linux WiFi stack will serve the different priority queues in strict
priority order. So if you have a long backlog on the VO queue, that will
basically lock out the others; which is why you see the other traffic
classes worsen in this case. In addition, the VO queue can't aggregate
packets, so when it is busy, the effective bandwidth of the link drops.
Which I believe is the reason you're seeing Cake behave better when
you're using tunnelling that hides the diffserv markings from the WiFi
stack: When there tunnelling is turned off, Cake is simply no longer
shaping at less than the effective bandwidth of the link.
Ok, but of the two possibilities, I think your first assessment is more likely (that the VO queue is taking priority over the others). If Cake were not in control of the queue for the no IPIP tunnel case because the effective bandwidth of the link were too low, I would expect the average bandwidth for upload and download to not be as flat as it appears.
I would like to prove this with another test limited to lower rates, but my LEDE hardware has gone back to its home as of today (springtime at the camp) running Open Mesh’s firmware. I’ll be testing Ubiquiti hardware soon. I expect its WiFi stack is the same though(?), and it will be interesting to see how it compares. Now that LEDE has a stable release build, I’m also considering trying that at the camp, but that’s another step, and maybe more tests, for later.
I’ll also have to do a real test of disabling WMM, to sort out what if any difference here is coming from WMM, or elsewhere. It was a “bridge too far” for me to try that in LEDE thus far.
I do believe the 'wash' keyword made it back into cake, btw. If you
enable that, cake will zero out the diffserv marks (after acting on
them); so you could get the behaviour you want without using tunneling,
as long as you don't have any other bottlenecks further along the path
where you need the markings...
I noticed it in the source. I’m not sure yet that removing the markings entirely is better than tunneling through the WiFi links, as they may be useful elsewhere upstream. I’m also not sure they’d make much difference on the downstream in consumer hardware though, when that’s probably not usually where the bottleneck is. Overall, in order to add the complexity of tunneling, I need to show it makes a real difference in the real world.
So maybe I’ll have more on this after the Ubiquiti results…
|
Pete Heist <notifications@github.com> writes:
The "rightest" answer for backhaul wifi is to leave wmm enabled, but
stop classifying anything into anything other than the BE queue.
That was a one line patch at the time. There is a new abstraction
that I don't know how to get to, that allows setting a qos map for
wifi, that can be made to do the same thing.
…
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
On Apr 10, 2017, at 7:03 PM, Dave Täht ***@***.***> wrote:
The "rightest" answer for backhaul wifi is to leave wmm enabled, but
stop classifying anything into anything other than the BE queue.
That was a one line patch at the time. There is a new abstraction
that I don't know how to get to, that allows setting a qos map for
wifi, that can be made to do the same thing.
Oh yeah, that's “righter” for sure. Then you’d preserve diffserv markings without tunneling. I just need to find out if this is a problem on UBNT, and fix if there if so.
Even still, I’m not done experimenting with my “one weird trick” just yet. I tested a “poor man’s full duplex WiFi” by using the fact that the transmit and receive sides of IPIP tunnels are separate, and can travel over separate routes. Unfortunately I didn’t have four WiFi devices at the time I tried it, so I used Ethernet for one direction in the link. But the Flent results were rather beautiful, and one-way latency under load for the WiFi link was ~2.5ms with Cake limiting. I just got four UBNT NanoBridge M5’s to test this fully, so will include it in my results. If it works well, then in theory you could have a full-duplex WiFi setup, with failover to half-duplex, for much less than what it usually costs. This might be useful for backhaul. (BTW, one might be able to do this with straight asymmetric routing and no tunnel also, but as I understand it, asymmetric routing is generally not something you want to do on purpose. I don’t understand all of the reasons why, beyond breaking NAT setups.)
|
So any updates on any of this work? :) -Toke |
Funny you should ask...it was impossible to do anything over the summer, but in the last couple of weeks I've gotten close on the new latency tester. It took some time playing around with timer error, system vs monotonic clock values, and socket options, among other things (Windows might be mostly a lost cause on that). A few more things left to do, and I hope to update more soon...
|
Pete Heist <notifications@github.com> writes:
Funny you should ask...it was impossible to do anything over the
summer, but in the last couple of weeks I've gotten close on the new
latency tester. It took some time playing around with timer error,
system vs monotonic clock values, and socket options, among other
things (Windows might be mostly a lost cause on that). A few more
things left to do, and I hope to update more soon...
Neat! Does it do machine parsable output, and can it output stats at
intervals during the test? :)
…-Toke
|
Pete Heist <notifications@github.com> writes:
G.711 can be simulated today with `-i 20ms -l 172 -fill rand
-fillall`. I do this test pretty often, and I think it would be a good
default voip test.
The problem with this is that it also changes the sampling rate. I don't
necessarily want to plot the latency every 20ms, so I'd have to
compensate for that in the Flent plotter somehow. Also, a better way to
deal with loss would be needed.
…-Toke
|
On Nov 20, 2017, at 1:11 PM, Toke Høiland-Jørgensen ***@***.***> wrote:
Pete Heist ***@***.***> writes:
> G.711 can be simulated today with `-i 20ms -l 172 -fill rand
> -fillall`. I do this test pretty often, and I think it would be a good
> default voip test.
The problem with this is that it also changes the sampling rate. I don't
necessarily want to plot the latency every 20ms, so I'd have to
compensate for that in the Flent plotter somehow. Also, a better way to
deal with loss would be needed.
I wondered if/when this would come up… Why not plot the latency every 20ms, too dense? I guess even if not, eventually at a low enough interval the round trip and plotting intervals would need to be decoupled, no matter what plot type is used.
If we want to minimize flent changes, irtt could optionally produce a `round_trip_snapshots` (name TBD) array in the json with elements created at a specified interval (`-si duration` or similar) that would summarize the data from multiple round trips. For each snapshot, there would be no timestamps, but the start and end seqnos would be there (if needed), mean delays and ipdv, counts (or percentages?) of lost, lost_up or lost_down, etc. I’d need to spec this out, but would something like this help?
|
Pete Heist <notifications@github.com> writes:
> On Nov 20, 2017, at 1:11 PM, Toke Høiland-Jørgensen ***@***.***> wrote:
>
> Pete Heist ***@***.***> writes:
>
> > G.711 can be simulated today with `-i 20ms -l 172 -fill rand
> > -fillall`. I do this test pretty often, and I think it would be a good
> > default voip test.
>
> The problem with this is that it also changes the sampling rate. I don't
> necessarily want to plot the latency every 20ms, so I'd have to
> compensate for that in the Flent plotter somehow. Also, a better way to
> deal with loss would be needed.
I wondered if/when this would come up… Why not plot the latency every
20ms, too dense?
For the current plot type (where data points are connected by lines),
certainly. It would probably be possible to plot denser data sets by a
point cloud type plot, but that would make denser data series harder to
read.
I guess even if not, eventually at a low enough interval the round
trip and plotting intervals would need to be decoupled, no matter what
plot type is used.
Yeah, exactly.
If we want to minimize flent changes, irtt could optionally produce a
`round_trip_snapshots` (name TBD) array in the json with elements
created at a specified interval (`-si duration` or similar) that would
summarize the data from multiple round trips. For each snapshot, there
would be no timestamps, but the start and end seqnos would be there
(if needed), mean delays and ipdv, counts (or percentages?) of lost,
lost_up or lost_down, etc. I’d need to spec this out, but would
something like this help?
Hmm, seeing as we probably want to keep all the data points in the Flent
data file anyway, I think we might as well do the sub-sampling in Flent.
Just thinning the plots is a few lines of numpy code; just need to
figure out a good place to apply it.
Handling loss is another matter, but one that I need to deal with
anyway. Right now I'm just throwing away lost data points entirely,
which loses the lost_{up,down} information. Will fix that and also
figure out the right way to indicate losses.
…-Toke
|
On Nov 20, 2017, at 2:21 PM, Toke Høiland-Jørgensen ***@***.***> wrote:
Pete Heist ***@***.***> writes:
>> On Nov 20, 2017, at 1:11 PM, Toke Høiland-Jørgensen ***@***.***> wrote:
> I wondered if/when this would come up… Why not plot the latency every
> 20ms, too dense?
For the current plot type (where data points are connected by lines),
certainly. It would probably be possible to plot denser data sets by a
point cloud type plot, but that would make denser data series harder to
read.
Yeah, thought of the same, or some area fill between min and max, or 98th percentile values, or something.
Hmm, seeing as we probably want to keep all the data points in the Flent
data file anyway, I think we might as well do the sub-sampling in Flent.
Just thinning the plots is a few lines of numpy code; just need to
figure out a good place to apply it.
Didn’t think of that (keeping all data points anyway), but it really makes more sense. At first I thought numpy was a new street-talkin’ adjective (as in, that’s some really numpy code). I see: NumPy. :)
Handling loss is another matter, but one that I need to deal with
anyway. Right now I'm just throwing away lost data points entirely,
which loses the lost_{up,down} information. Will fix that and also
figure out the right way to indicate losses.
Really looking forward to it!
|
Really looking forward to it!
Working on it. Turned out to need a bit of refactoring. This is me
currently: https://i.imgur.com/t0XHtgJ.gif
…-Toke
|
:) Gut laugh, I know that feeling sometimes... |
Okay, testable code in the runner-refactor branch. Ended up doing a fairly involved refactoring of how runners work with |
On Mon, Nov 20, 2017 at 5:21 AM, Toke Høiland-Jørgensen <
notifications@github.com> wrote:
Pete Heist ***@***.***> writes:
>> On Nov 20, 2017, at 1:11 PM, Toke Høiland-Jørgensen <
***@***.***> wrote:
>>
>> Pete Heist ***@***.***> writes:
>>
>> > G.711 can be simulated today with `-i 20ms -l 172 -fill rand
>> > -fillall`. I do this test pretty often, and I think it would be a good
>> > default voip test.
>>
>> The problem with this is that it also changes the sampling rate. I don't
>> necessarily want to plot the latency every 20ms, so I'd have to
>> compensate for that in the Flent plotter somehow. Also, a better way to
>> deal with loss would be needed.
>
>
> I wondered if/when this would come up… Why not plot the latency every
> 20ms, too dense?
For the current plot type (where data points are connected by lines),
certainly. It would probably be possible to plot denser data sets by a
point cloud type plot, but that would make denser data series harder to
read.
Winstein plot of latency variance? It doesn't get denser, it gets darker.
Packet loss vs throughput?
> I guess even if not, eventually at a low enough interval the round
> trip and plotting intervals would need to be decoupled, no matter what
> plot type is used.
Yeah, exactly.
> If we want to minimize flent changes, irtt could optionally produce a
> `round_trip_snapshots` (name TBD) array in the json with elements
> created at a specified interval (`-si duration` or similar) that would
> summarize the data from multiple round trips. For each snapshot, there
> would be no timestamps, but the start and end seqnos would be there
> (if needed), mean delays and ipdv, counts (or percentages?) of lost,
> lost_up or lost_down, etc. I’d need to spec this out, but would
> something like this help?
Hmm, seeing as we probably want to keep all the data points in the Flent
data file anyway, I think we might as well do the sub-sampling in Flent.
Just thinning the plots is a few lines of numpy code; just need to
figure out a good place to apply it.
Handling loss is another matter, but one that I need to deal with
anyway. Right now I'm just throwing away lost data points entirely,
which loses the lost_{up,down} information. Will fix that and also
figure out the right way to indicate losses.
Groovy.
… -Toke
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#106 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AerUv9ZyQpd6Oq7HudRiOYPKKIg8Ey8Aks5s4XzMgaJpZM4MpJAm>
.
_______________________________________________
Flent-users mailing list
***@***.***
http://flent.org/mailman/listinfo/flent-users_flent.org
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
|
A goal for me has been to be able to run Opus at 24 bit, 96Khz, with 2.7ms
sampling latency.
Actually getting 8 channels of that through a loaded box would be mahvelous.
…On Mon, Nov 20, 2017 at 1:14 PM, Dave Taht ***@***.***> wrote:
On Mon, Nov 20, 2017 at 5:21 AM, Toke Høiland-Jørgensen <
***@***.***> wrote:
> Pete Heist ***@***.***> writes:
>
> >> On Nov 20, 2017, at 1:11 PM, Toke Høiland-Jørgensen <
> ***@***.***> wrote:
> >>
> >> Pete Heist ***@***.***> writes:
> >>
> >> > G.711 can be simulated today with `-i 20ms -l 172 -fill rand
> >> > -fillall`. I do this test pretty often, and I think it would be a
> good
> >> > default voip test.
> >>
> >> The problem with this is that it also changes the sampling rate. I
> don't
> >> necessarily want to plot the latency every 20ms, so I'd have to
> >> compensate for that in the Flent plotter somehow. Also, a better way to
> >> deal with loss would be needed.
> >
> >
> > I wondered if/when this would come up… Why not plot the latency every
> > 20ms, too dense?
>
> For the current plot type (where data points are connected by lines),
> certainly. It would probably be possible to plot denser data sets by a
> point cloud type plot, but that would make denser data series harder to
> read.
Winstein plot of latency variance? It doesn't get denser, it gets darker.
Packet loss vs throughput?
> > I guess even if not, eventually at a low enough interval the round
> > trip and plotting intervals would need to be decoupled, no matter what
> > plot type is used.
>
> Yeah, exactly.
>
> > If we want to minimize flent changes, irtt could optionally produce a
> > `round_trip_snapshots` (name TBD) array in the json with elements
> > created at a specified interval (`-si duration` or similar) that would
> > summarize the data from multiple round trips. For each snapshot, there
> > would be no timestamps, but the start and end seqnos would be there
> > (if needed), mean delays and ipdv, counts (or percentages?) of lost,
> > lost_up or lost_down, etc. I’d need to spec this out, but would
> > something like this help?
>
> Hmm, seeing as we probably want to keep all the data points in the Flent
> data file anyway, I think we might as well do the sub-sampling in Flent.
> Just thinning the plots is a few lines of numpy code; just need to
> figure out a good place to apply it.
>
> Handling loss is another matter, but one that I need to deal with
> anyway. Right now I'm just throwing away lost data points entirely,
> which loses the lost_{up,down} information. Will fix that and also
> figure out the right way to indicate losses.
>
Groovy.
> -Toke
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#106 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AerUv9ZyQpd6Oq7HudRiOYPKKIg8Ey8Aks5s4XzMgaJpZM4MpJAm>
> .
>
> _______________________________________________
> Flent-users mailing list
> ***@***.***
> http://flent.org/mailman/listinfo/flent-users_flent.org
>
>
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619 <(669)%20226-2619>
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
|
On Nov 20, 2017, at 10:44 PM, flent-users ***@***.***> wrote:
A goal for me has been to be able to run Opus at 24 bit, 96Khz, with 2.7ms
sampling latency.
Actually getting 8 channels of that through a loaded box would be marvelous.
Sounds like a musician. :) If it were CBR, I don’t know if this is a way to estimate it:
2.7ms ~= 370 packets/sec
@128KBPS, 56 bytes / packet (44 data + 12 RTP)
@256kbps, 99 bytes / packet (87 data + 12 RTP)
Just for fun, a ~256 kbps test between two sites, 50km apart, both using p2p WiFi to the Internet. For realtime audio, I guess it’s the maximums that could be the biggest issue.
```
% ./irtt client -i 2.7ms -l 99 -q -d 10s a.b.c.d
[Connecting] connecting to a.b.c.d
[Connected] connected to a.b.c.d:2112
Min Mean Median Max Stddev
--- ---- ------ --- ------
RTT 10.16ms 15.57ms 14.14ms 71.37ms 4.89ms
send delay 4.5ms 8.01ms 6.85ms 33.1ms 3.6ms
receive delay 4.99ms 7.56ms 6.93ms 64.86ms 3.05ms
IPDV (jitter) 1.06µs 2.52ms 2.56ms 56.16ms 2.55ms
send IPDV 50ns 2.1ms 1.93ms 25.94ms 2.18ms
receive IPDV 49ns 1.14ms 663µs 58.63ms 1.9ms
send call time 38.2µs 83.2µs 13.46ms 310µs
timer error 2ns 44.7µs 18.23ms 620µs
server proc. time 33.6µs 47.4µs 242µs 18.1µs
duration: 10.2s (wait 214.1ms)
packets sent/received: 3647/3644 (0.08% loss)
server packets received: 3644/3647 (0.08%/0.00% loss up/down)
bytes sent/received: 361053/360756
send/receive rate: 288.9 Kbps / 288.7 Kbps
packet length: 99 bytes
timer stats: 57/3704 (1.54%) missed, 1.65% error
```
|
On Nov 20, 2017, at 9:58 PM, Toke Høiland-Jørgensen ***@***.***> wrote:
Okay, testable code in the runner-refactor branch.
Ended up doing a fairly involved refactoring of how runners work with
data; which is good, as the new way to structure things makes a lot more
sense in general; but it did mean I had to change the data format, so
quite a few places this can break. So testing appreciated, both for
running new tests, and for plotting old data files.
Awesome, I’m sure it could take some shaking out. I tried an rrul_be test on the runner-refactor branch...
```
% flent rrul_be --socket-stats -l 60 -H 10.72.0.231 -p all_scaled --figure-width=10 --figure-height=7.5 -t new_runner_test -o new_runner_test.png
Started Flent 1.1.1-git-b958d01 using Python 2.7.13.
Starting rrul_be test. Expected run time: 70 seconds.
Traceback (most recent call last):
File "/usr/local/bin/flent", line 11, in <module>
load_entry_point('flent===1.1.1-git-b958d01', 'console_scripts', 'flent')()
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/__init__.py", line 59, in run_flent
b.run()
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/batch.py", line 609, in run
return self.run_test(self.settings, self.settings.DATA_DIR, True)
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/batch.py", line 508, in run_test
res = self.agg.postprocess(self.agg.aggregate(res))
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/aggregators.py", line 232, in aggregate
measurements, metadata, raw_values = self.collect()
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/aggregators.py", line 120, in collect
t.check()
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/runners.py", line 964, in check
ip_version=args['ip_version'])
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/runners.py", line 232, in add_child
c.check()
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/runners.py", line 1652, in check
super(SsRunner, self).check()
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/runners.py", line 393, in check
self.metadata['UNITS'] = self.units
AttributeError: 'SsRunner' object has no attribute 'units'
```
Without —socket-stats:
```
% flent rrul_be -l 60 -H 10.72.0.231 -p all_scaled --figure-width=10 --figure-height=7.5 -t new_runner_test -o new_runner_test.png
Started Flent 1.1.1-git-b958d01 using Python 2.7.13.
Starting rrul_be test. Expected run time: 70 seconds.
Traceback (most recent call last):
File "/usr/local/bin/flent", line 11, in <module>
load_entry_point('flent===1.1.1-git-b958d01', 'console_scripts', 'flent')()
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/__init__.py", line 59, in run_flent
b.run()
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/batch.py", line 609, in run
return self.run_test(self.settings, self.settings.DATA_DIR, True)
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/batch.py", line 508, in run_test
res = self.agg.postprocess(self.agg.aggregate(res))
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/aggregators.py", line 232, in aggregate
measurements, metadata, raw_values = self.collect()
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/aggregators.py", line 120, in collect
t.check()
File "/usr/local/lib/python2.7/dist-packages/flent-1.1.1_git_b958d01-py2.7.egg/flent/runners.py", line 1458, in check
delay=self.delay, remote_host=self.remote_host,
AttributeError: 'UdpRttRunner' object has no attribute 'delay'
```
|
On Nov 20, 2017, at 10:14 PM, flent-users ***@***.***> wrote:
Winstein plot of latency variance? It doesn't get denser, it gets darker.
Packet loss vs throughput?
Not sure what that is exactly. Something like from July 2014 on this page?
https://cs.stanford.edu/~keithw/ <https://cs.stanford.edu/~keithw/>
|
Pete Heist <notifications@github.com> writes:
> On Nov 20, 2017, at 9:58 PM, Toke Høiland-Jørgensen ***@***.***> wrote:
> Okay, testable code in the runner-refactor branch.
>
> Ended up doing a fairly involved refactoring of how runners work with
> data; which is good, as the new way to structure things makes a lot more
> sense in general; but it did mean I had to change the data format, so
> quite a few places this can break. So testing appreciated, both for
> running new tests, and for plotting old data files.
>
Awesome, I’m sure it could take some shaking out. I tried an rrul_be
test on the runner-refactor branch...
Ha! Epic fail! :D
Well, I only just managed to finish writing the code and unbreaking the
CI tests; didn't actually get around to running any tests. I've fixed
those two errors, and am running a full test run on my testbed now...
…-Toke
|
On Nov 21, 2017, at 11:36 AM, Toke Høiland-Jørgensen ***@***.***> wrote:
Ha! Epic fail! :D
Well, I only just managed to finish writing the code and unbreaking the
CI tests; didn't actually get around to running any tests. I've fixed
those two errors, and am running a full test run on my testbed now…
Much better now though! Both rrul_be test ran fine for me (with and without —socket-stats).
I have a number of .flent.gz files from Jan this year I can try when I get a chance. I just deleted thousands of them from my newer (unreleased) tests from March or so as I want to re-run them all in my new test bed, but oh well...
Next thing I noticed as for current tests, for rrul_be_nflows, the test completed but only one irtt instance ran (also just saw one connection to the server).
% flent rrul_be_nflows --test-parameter upload_streams=8 --test-parameter download_streams=8 --socket-stats -l 60 -H $SERVER -p all_scaled --figure-width=10 --figure-height=7.5 -t irtt -o irtt_8flows.png
|
Pete Heist <notifications@github.com> writes:
> On Nov 21, 2017, at 11:36 AM, Toke Høiland-Jørgensen ***@***.***> wrote:
>
> Ha! Epic fail! :D
>
> Well, I only just managed to finish writing the code and unbreaking the
> CI tests; didn't actually get around to running any tests. I've fixed
> those two errors, and am running a full test run on my testbed now…
Much better now though! Both rrul_be test ran fine for me (with and
without —socket-stats).
Cool. Getting closer. Still a few bugs to fix with the more esoteric
runners, but I'm working on that.
Next thing I noticed as for current tests, for rrul_be_nflows, the
test completed but only one irtt instance ran (also just saw one
connection to the server).
% flent rrul_be_nflows --test-parameter upload_streams=8
--test-parameter download_streams=8 --socket-stats -l 60 -H $SERVER -p
all_scaled --figure-width=10 --figure-height=7.5 -t irtt -o
irtt_8flows.png
Well that's actually to be expected. That test only varies the number of
TCP parameters; there's always a single ICMP and a single UDP latency
measurement.
…-Toke
|
On Nov 21, 2017, at 3:53 PM, Toke Høiland-Jørgensen ***@***.***> wrote:
> Next thing I noticed as for current tests, for rrul_be_nflows, the
> test completed but only one irtt instance ran (also just saw one
> connection to the server).
>
> % flent rrul_be_nflows --test-parameter upload_streams=8
> --test-parameter download_streams=8 --socket-stats -l 60 -H $SERVER -p
> all_scaled --figure-width=10 --figure-height=7.5 -t irtt -o
> irtt_8flows.png
Well that's actually to be expected. That test only varies the number of
TCP parameters; there's always a single ICMP and a single UDP latency
measurement.
Aha, my bad, I must have never noticed that. I’ll plot some of my older stuff too and let you know…
Pete
|
Pete Heist <notifications@github.com> writes:
> On Nov 20, 2017, at 10:44 PM, flent-users ***@***.***> wrote:
>
> A goal for me has been to be able to run Opus at 24 bit, 96Khz, with 2.7ms
> sampling latency.
> Actually getting 8 channels of that through a loaded box would be marvelous.
Sounds like a musician. :) If it were CBR, I don’t know if this is a way to
estimate it:
2.7ms ~= 370 packets/sec
Well, it might be 8 of those with different tuples.
@128KBPS, 56 bytes / packet (44 data + 12 RTP)
@256kbps, 99 bytes / packet (87 data + 12 RTP)
Just for fun, a ~256 kbps test between two sites, 50km apart, both using p2p
WiFi to the Internet. For realtime audio, I guess it’s the maximums that could
be the biggest issue.
Hah. I didn't say over wifi. That's impossible.
…
```
% ./irtt client -i 2.7ms -l 99 -q -d 10s a.b.c.d
[Connecting] connecting to a.b.c.d
[Connected] connected to a.b.c.d:2112
Min Mean Median Max Stddev
--- ---- ------ --- ------
RTT 10.16ms 15.57ms 14.14ms 71.37ms 4.89ms
send delay 4.5ms 8.01ms 6.85ms 33.1ms 3.6ms
receive delay 4.99ms 7.56ms 6.93ms 64.86ms 3.05ms
IPDV (jitter) 1.06µs 2.52ms 2.56ms 56.16ms 2.55ms
send IPDV 50ns 2.1ms 1.93ms 25.94ms 2.18ms
receive IPDV 49ns 1.14ms 663µs 58.63ms 1.9ms
send call time 38.2µs 83.2µs 13.46ms 310µs
timer error 2ns 44.7µs 18.23ms 620µs
server proc. time 33.6µs 47.4µs 242µs 18.1µs
duration: 10.2s (wait 214.1ms)
packets sent/received: 3647/3644 (0.08% loss)
server packets received: 3644/3647 (0.08%/0.00% loss up/down)
bytes sent/received: 361053/360756
send/receive rate: 288.9 Kbps / 288.7 Kbps
packet length: 99 bytes
timer stats: 57/3704 (1.54%) missed, 1.65% error
```
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Trying to confirm how latency was being calculated before with the UDP_RR test. Looking at its raw output, I see that transactions per second is probably used to calculate RTT, with interim results like:
So RTT = (1 / 3033.41) ~= 330us And this likely takes the mean value of all transactions and summarizes it at the end of the interval, then the calculated latency was what was plotted in flent? |
Pete Heist <notifications@github.com> writes:
Trying to confirm how latency was being calculated before with the
UDP_RR test. Looking at its raw output, I see that transactions per
second is probably used to calculate RTT, with interim results like:
```
NETPERF_INTERIM_RESULT[0]=3033.41
NETPERF_UNITS[0]=Trans/s
NETPERF_INTERVAL[0]=0.200
NETPERF_ENDING[0]=1511296777.475
```
So RTT = (1 / 3033.41) ~= 330us
And this likely takes the mean value of all transactions and
summarizes it at the end of the interval, then the calculated latency
was what was plotted in flent?
Yup, that's exactly it :)
|
On Nov 21, 2017, at 10:56 PM, Toke Høiland-Jørgensen ***@***.***> wrote:
Pete Heist ***@***.***> writes:
> Trying to confirm how latency was being calculated before with the
> UDP_RR test. Looking at its raw output, I see that transactions per
> second is probably used to calculate RTT, with interim results like:
>
> ```
> NETPERF_INTERIM_RESULT[0]=3033.41
> NETPERF_UNITS[0]=Trans/s
> NETPERF_INTERVAL[0]=0.200
> NETPERF_ENDING[0]=1511296777.475
> ```
>
> So RTT = (1 / 3033.41) ~= 330us
>
> And this likely takes the mean value of all transactions and
> summarizes it at the end of the interval, then the calculated latency
> was what was plotted in flent?
Yup, that's exactly it :)
Ok, it’ll be interesting for me to look at the differences between the two going forward. Naturally doing it the udp_rr way would probably result in a smoother line. The other impacts on the test might be fun to explore.
|
Pete Heist <notifications@github.com> writes:
> > And this likely takes the mean value of all transactions and
> > summarizes it at the end of the interval, then the calculated latency
> > was what was plotted in flent?
>
> Yup, that's exactly it :)
Ok, it’ll be interesting for me to look at the differences between the
two going forward. Naturally doing it the udp_rr way would probably
result in a smoother line. The other impacts on the test might be fun
to explore.
Well the obvious one is that the netperf measurement uses more bandwidth
as the latency decreases. Have been meaning to add that to the Flent
bandwidth graphs, but now I'm not sure I'll even bother :P
Also, the netperf measurement will stop at the first packet loss (later
versions added in a timeout parameter that will restart it, but even
with that we often see UDP latency graphs completely stopping after a
few seconds of the RRUL test).
…-Toke
|
On Nov 22, 2017, at 8:49 AM, Toke Høiland-Jørgensen ***@***.***> wrote:
Pete Heist ***@***.***> writes:
>> > And this likely takes the mean value of all transactions and
>> > summarizes it at the end of the interval, then the calculated latency
>> > was what was plotted in flent?
>>
>> Yup, that's exactly it :)
>
> Ok, it’ll be interesting for me to look at the differences between the
> two going forward. Naturally doing it the udp_rr way would probably
> result in a smoother line. The other impacts on the test might be fun
> to explore.
Well the obvious one is that the netperf measurement uses more bandwidth
as the latency decreases. Have been meaning to add that to the Flent
bandwidth graphs, but now I'm not sure I'll even bother :P
True that, it ends up in a pretty tight loop with straight cabled GigE, as in my test bed...
Also, the netperf measurement will stop at the first packet loss (later
versions added in a timeout parameter that will restart it, but even
with that we often see UDP latency graphs completely stopping after a
few seconds of the RRUL test).
Yes, was noticing that before (one of our original motivations).
I know it’s a random connection, but I wonder how this would affect the throughput asymmetry I was seeing on the MBPs, for example. Would the driver/card grab airtime more aggressively when it’s transmitting many small packets, or do those get grouped together anyway? I can test it again when I get a chance, but I’m out of my league on the theory side here.
|
Right, so convinced myself that I'd fixed most of the breakages in the refactor (which turned out to be a multiple-thousands lines patch, but with a net negative of 400 lines of code; not too bad), so merged it and closed this issue. Please open new issue(s) for any breakage that I missed. I'll open a new one specifically for using irtt for VoIP tests. |
Oh, and many thanks for your work on irtt, @peteheist! We really needed such a tool :) |
Oh yeah, probably time for this issue thread to retire. :) So I'm glad! Looking forward to playing with this more soon. Thanks for all that refactoring too, looks like it was some real walking through walls... |
Pete Heist <notifications@github.com> writes:
So I'm glad! Looking forward to playing with this more soon. Thanks
for all that refactoring too, looks like it was some real walking
through walls...
Meh, it needed doing anyway. You just gave me a chance to repay a bit of
technical debt ;)
|
Toke Høiland-Jørgensen <notifications@github.com> writes:
Oh, and many thanks for your work on irtt, @peteheist! We really needed such a
tool :)
Thx very much also. I'd really like to get some owd plots out of
flent....
…
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
The owd data is already being collected, so it's fairly trivial to add the plots... |
Loss stats and jitter are listed in the RRUL spec (https://www.bufferbloat.net/projects/bloat/wiki/RRUL_Spec/) but not available in Flent. I'd like to be able to see packet loss to compare the drop decisions made by different qdiscs, particularly on UDP flows.
It looks like this is actually a limitation of netperf, as I don't see packet loss available in the UDP_RR test results. And I understand that getting TCP loss could be challenging as we have to get it from the OS somehow or capture packets, but isn't there a different utility that could be used for the UDP flows that would measure both RTT and packet loss? If not, perhaps one could be written. :) I remember now that Dave was starting a twd project a while back, but it ended up being a bridge too far. What I'm thinking of is probably simpler, but I don't know if it's enough. UDP packets could be sent from each end (of a certain size, at a certain rate, TBD) with sequence numbers and timestamps, and the receiver could both count how many it didn't receive and send a response packet back to the client, so you have both requests and responses being sent and received from each end. One and two way delay could potentially be measured.
Suggestions?
The text was updated successfully, but these errors were encountered: