Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packet loss stats #106

Closed
heistp opened this issue Mar 25, 2017 · 148 comments
Closed

packet loss stats #106

heistp opened this issue Mar 25, 2017 · 148 comments

Comments

@heistp
Copy link
Contributor

heistp commented Mar 25, 2017

Loss stats and jitter are listed in the RRUL spec (https://www.bufferbloat.net/projects/bloat/wiki/RRUL_Spec/) but not available in Flent. I'd like to be able to see packet loss to compare the drop decisions made by different qdiscs, particularly on UDP flows.

It looks like this is actually a limitation of netperf, as I don't see packet loss available in the UDP_RR test results. And I understand that getting TCP loss could be challenging as we have to get it from the OS somehow or capture packets, but isn't there a different utility that could be used for the UDP flows that would measure both RTT and packet loss? If not, perhaps one could be written. :) I remember now that Dave was starting a twd project a while back, but it ended up being a bridge too far. What I'm thinking of is probably simpler, but I don't know if it's enough. UDP packets could be sent from each end (of a certain size, at a certain rate, TBD) with sequence numbers and timestamps, and the receiver could both count how many it didn't receive and send a response packet back to the client, so you have both requests and responses being sent and received from each end. One and two way delay could potentially be measured.

Suggestions?

@heistp
Copy link
Contributor Author

heistp commented Mar 26, 2017

Actually one suggestion is D-ITG, which is already used for the VoIP tests. Just noticed that it can do both both one-way and RTT tests, and return packet loss. That could possibly be used instead of netperf for the UDP flows.

@dtaht
Copy link
Collaborator

dtaht commented Mar 27, 2017

Yes, twd was a bridge too far at the time. I wanted something that was reliably realtime and capable of doing gigE well, in particular, and writing twd in C was too hard. Now things have evolved a bit in these worlds (netmap for example), and perhaps me taking another tack entirely ( https://github.com/dtaht/libv6/blob/master/erm/doc/philosophy.org ) will yield results... eventually.

It still seems like punting the whole tcp (or quic?) stack to userspace, using raw sockets, or leveraging bpf (the last seems to have promise)... would get me to the extreme I wanted.

@heistp
Copy link
Contributor Author

heistp commented Mar 27, 2017

Wow, that sounds very high end. That (especially your comment about a userspace TCP stack) triggered a couple of thoughts I've been having as I'm completing my second round of point-to-point WiFi tests:

  1. With my Flent runs, I'm testing what would happen if clients were connected directly to the backhaul at the backhaul's maximum rate, but it's really that the client connections are the bottleneck and several slower, varying rate clients come together to eventually, and sometimes, saturate the backhaul links. I think simulating this may produce a different response from AQM than straight-on saturating flows (and perhaps make AQM look even better). A more accurate test would respect this, but then, as you suggested, either the test itself would need a TCP stack in it or the client flows would need to go through virtual interfaces (maybe with netem? I don't know how else) to simulate changing rates, latency or maybe loss for the individual flows that feed the RRUL test. This is, for me, a bridge too far right now, but it makes me hope that the results I'm producing are still useful somehow!

  2. The original spirit (and spec) of the RRUL test sounded like it envisioned some sort of "metric" summarizing the response of the system being tested under load. After spending many hours putting together my second round of Flent results, I yearn for a mostly automatic test that would produce relevant results without having to configure enumerations of individual tests. It doesn't need to be a single metric, but a minimal set of metrics representing response under load. I'm not sure if this can be done, but...

For one, I do know that rig setups can be extremely specific and variable (for me, it's all about Wi-Fi backhaul, which is vastly different from other setups), so I'm not proposing something that varies rig setups automatically, but maybe there could be an automatic test of sorts, after rig setup has occurred somehow, that runs in phases and ramps up until some limit. Two possible phases could be:

A) "no load" stuff that summarizes physical link characteristics (useful both for CPE devices with low numbers of users, or for just understanding the basics of backhaul and router links):

  • straight throughput in either and both directions, testing symmetry
  • unloaded one-way latency, RTT and jitter, for VoIP-like, videoconferencing like, gaming-like and sparse UDP flows like DNS/NTP

B) response under load (useful for loaded CPE devices, and at higher connection counts for backhaul and routers):

  • TCP side: increasing numbers of real-world flows like:
    • conversational and temporary one-way flows, web-browsing-like and POP/IMAP-like
    • pulsed downloads (YouTube / streaming video)
    • aggressive, P2P or Torrent like flow-swarms
  • UDP/ICMP side: increasing numbers of flows similar to part A, now how to those look?
  • diffserv markings (and not)

I don't mean to start summarizing the RRUL spec! So I'll stop, I only meant that maybe the process for testing RRUL could only have a minimal set of parameters, and the test program could automate the process of producing relevant results.

The only parameters to specify, in case the user wants to, might be "how far and how fast" (meaning for example how many simultaneous flows to go up to and how quickly to get there), which could be calculated based on either the results from phase A, "what it's probably capable of" or estimated during phase B based on "how it's going". The "how far to take the test" could be specified in case someone wants to push a link well beyond its limits, or wants to stop well short of them to get something quick.

Maybe this could just be a Flent "automatic" test?

So I know there's a place for a highly configurable "hard-core" tester that can hit 10 GigE and product microsecond-level accuracy, but I think there may also be a place for such an automated, "good enough for many" test.

PS- Golang 2.8 was released with GC improvements that bring pauses to "usually under 100 microseconds and often as low as 10 microseconds". I know that still might not be good enough for some tests, especially 10 GigE or microsecond-sensitive results (I'm starting to look at microseconds for VoIP test as well), but it's getting better.

@tohojo
Copy link
Owner

tohojo commented Mar 28, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Mar 29, 2017

I noticed that. The VoIP tests were a little painful to get working. My Mac Mini G4 also had really bad clock drift, which at first produced some beautiful but useless delay curves. "adjtimex --tick 10029" got the system clock close enough so that ntp would agree to do the rest. Still, one-way delay can be off by up to a millisecond or so, depending on how ntp is feeling at the moment.

@tohojo
Copy link
Owner

tohojo commented Mar 29, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Mar 29, 2017

Thanks, I might try PTP, didn't know about it.

Anything depending on the old xinetd echo service is probably out, right? I guess you'd want a small, native standalone client and server? It's not as easy to find as I thought it would be.

@tohojo
Copy link
Owner

tohojo commented Mar 30, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Mar 30, 2017

Ok, and near as I can tell netperf UDP_RR sends a packet, waits for a response then sends another without any delay. If packets are lost, the test apparently stops, although it at least resumes after I built and installed 2.7.0 from source (after your tip in an email a while back).

I would think that, instead of stopping after not receiving a response, it should send another packet after some delay so the test doesn't stop. Perhaps the delay could be around 5x current mean RTT (maybe within the last 5x mean RTT window of time also, so it adapts to changes). That would need testing.

I'm surprised that the URP_RR test is that aggressive actually, that it sends continuously instead of at a fixed rate. It means that your UDP flows are in continuous competition with one another, as well as the TCP flows, whereas something like VoIP rather sends at a fixed rate. Perhaps that's what you want for the benchmark.

So I'll write if I find anything...

@tohojo
Copy link
Owner

tohojo commented Mar 30, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Mar 30, 2017

Thanks, now I see where twd was headed and why. :)

@dtaht
Copy link
Collaborator

dtaht commented Mar 30, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Apr 7, 2017

I wrote a quick mockup in Go to see what's possible. Here's pinging localhost for 200 packets with standard 'ping':

200 packets transmitted, 200 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.044/0.150/0.224/0.026 ms

And using my 'rrperf' mockup, sending and echoing 200 UDP packets with nanosecond timestamps to localhost:

Iter 199, avg 0.388392 ms, min 0.096244 ms, max 0.527164 ms, stddev 0.075881 ms

Summary:

  • mean RTT is 150 microseconds for ping/ICMP, 388 microseconds for rrperf/UDP
  • stddev is 26 microseconds for ping/ICMP, 76 microseconds for rrperf/UDP

Do you think these stats are within the realm of acceptability for local traffic, and would you use this at all from Flent? If so, I could complete a latency (and throughput for that matter) tester pretty quickly in Go, that outputs results say, to JSON.

Basically it could just run multiple isochronous RTT tests simultaneously, specifying packet size, spacing and diffserv marking for each, along with multiple TCP flows, specifying direction and diffserv marking. As for results, I suppose it would have periodic samples from each flow and totals at the end. For the UDP flows, I could have packet loss and RTT, but not OWD, for now (maybe later).

I don't know what extra features are needed from netperf, but I suspect there can be more detail. :)

Notes / Caveats:

  • I did note that things don't change much using a single goroutine for both send and receive, so I don't think there's much of a measurable impact from goroutine scheduling.
  • Like we talked about, there is the 2+ MB statically linked executable, which can be reduced on some platforms with -s -w to ldflags, or there's upx for executable compression, but it isn't reliable on all platforms all the time in my experience.
  • As for throughput, I think saturating 1Gbit would be no problem but don't have 10Gbit to test.

@tohojo
Copy link
Owner

tohojo commented Apr 8, 2017 via email

@dtaht
Copy link
Collaborator

dtaht commented Apr 8, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Apr 8, 2017

Ok, so I'll start with a latency only, single RTT test then. Keep things simple!

The client and server are separate, like netperf, as the server might end up a little smaller.

As for safe to expose, it's a given that it should be safe from buffer overflow problems (I'll avoid Go's 'unsafe' package). But what else of these is important:

  1. Challenge/response test with a fixed shared key to smoke test for legitimate clients. (I take this as the "three-way handshake".)
  2. Configurable limits on server for length of test, send interval, etc (basic DoS protection).
  3. Accounts / permissions with "request and grant" (I want to do this test, will you let me?)
  4. Invisibility to unauthorized clients (requires PING -D option #3).

I think #1 and #2 make sense to me and are "easy", but as for #3 and #4, I assume they're not needed now. This is something that might run on public servers and you want the server to be safe, but it's not something that needs to be run securely between trusted parties across the open Internet, right?

There's no way to prevent someone from writing a rogue client and hogging up resources, but we could stop random probes with #1, reduce the impact of any attacks with #2, and obviously lock things down more with #3 and #4, with more effort.

If all this sounds reasonable, I'll just put something together and welcome any critique...

@dtaht
Copy link
Collaborator

dtaht commented Apr 9, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Apr 9, 2017

Ok, understood.

Also, a compiled-in pre-shared key for the handshake that can be overridden from the command line is still easy to implement and would allow for restricted tests, if needed.

More later when something's ready...

@tohojo
Copy link
Owner

tohojo commented Apr 9, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Apr 10, 2017

Yep, I'll try to keep it to a single UDP port on the server (random on client).

There might be a bit of a delay as I still have to complete tests of Ubiquiti's stuff for FreeNet and finish the report and presentation by 5/1 (along with the day job :)

One of my main motivations for getting this test done asap though is WMM. After Dave's tip I did tests with WMM on and off (or at least avoided) and the results were really surprising. With WMM on, even when you do the default rrul (not rrul_be) test, latencies are 5-10x what they probably should be. Disable or bypass WMM and things look much, much better. So either:

  1. WMM is "bad" and to be avoided, particularly for higher numbers of diffserv marked flows, OR
  2. The netperf UDP_RR test, with its zero-delay back and forth (which arguably doesn't represent what you see in the real world) when marked with higher priority diffserv markings like EF or others, doesn't play well with WMM

Or something in between. Hopefully we can determine that soon.

PS- I finally did Chaos Calmer tests. LEDE's latency improvements under load are pretty staggering, particularly when it comes to dynamic rate drops. So hopefully the report I produce helps highlight the good work you guys are doing. :)

@tohojo
Copy link
Owner

tohojo commented Apr 10, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Apr 10, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Apr 10, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Apr 10, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Apr 10, 2017 via email

@dtaht
Copy link
Collaborator

dtaht commented Apr 10, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Apr 10, 2017 via email

@tohojo
Copy link
Owner

tohojo commented Sep 18, 2017

So any updates on any of this work? :)

-Toke

@heistp
Copy link
Contributor Author

heistp commented Sep 18, 2017

Funny you should ask...it was impossible to do anything over the summer, but in the last couple of weeks I've gotten close on the new latency tester. It took some time playing around with timer error, system vs monotonic clock values, and socket options, among other things (Windows might be mostly a lost cause on that). A few more things left to do, and I hope to update more soon...

tron:~/src/github.com/peteheist/irtt:% ./irtt -fill rand -fillall -i 10ms -l 160 -d 5s -timer comp -ts a.b.c.d
IRTT to a.b.c.d (a.b.c.d:2112)

                  RTT: mean=19.7584ms min=12.4221ms max=64.2016ms
   one-way send delay: mean=9.1482ms min=3.8003ms max=44.6247ms
one-way receive delay: mean=10.6096ms min=8.0872ms max=42.1626ms
packets received/sent: 498/499 (0.20% loss)
  bytes received/sent: 79680/79840
    receive/send rate: 127.8 Kbps / 127.7 Kbps
             duration: 5.209s (wait 193ms)
         timer misses: 1/500 (0.20% missed)
          timer error: mean=-997ns (-0.01%) min=-3.492683ms max=2.55169ms
       send call time: mean=84.8µs min=13.2µs max=180.4µs

@tohojo
Copy link
Owner

tohojo commented Sep 18, 2017 via email

@tohojo
Copy link
Owner

tohojo commented Nov 20, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Nov 20, 2017 via email

@tohojo
Copy link
Owner

tohojo commented Nov 20, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Nov 20, 2017 via email

@tohojo
Copy link
Owner

tohojo commented Nov 20, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Nov 20, 2017

:) Gut laugh, I know that feeling sometimes...

@tohojo
Copy link
Owner

tohojo commented Nov 20, 2017

Okay, testable code in the runner-refactor branch.

Ended up doing a fairly involved refactoring of how runners work with
data; which is good, as the new way to structure things makes a lot more
sense in general; but it did mean I had to change the data format, so
quite a few places this can break. So testing appreciated, both for
running new tests, and for plotting old data files.

@flent-users
Copy link

flent-users commented Nov 20, 2017 via email

@flent-users
Copy link

flent-users commented Nov 20, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Nov 21, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Nov 21, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Nov 21, 2017 via email

@tohojo
Copy link
Owner

tohojo commented Nov 21, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Nov 21, 2017 via email

@tohojo
Copy link
Owner

tohojo commented Nov 21, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Nov 21, 2017 via email

@dtaht
Copy link
Collaborator

dtaht commented Nov 21, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Nov 21, 2017

Trying to confirm how latency was being calculated before with the UDP_RR test. Looking at its raw output, I see that transactions per second is probably used to calculate RTT, with interim results like:

NETPERF_INTERIM_RESULT[0]=3033.41
NETPERF_UNITS[0]=Trans/s
NETPERF_INTERVAL[0]=0.200
NETPERF_ENDING[0]=1511296777.475

So RTT = (1 / 3033.41) ~= 330us

And this likely takes the mean value of all transactions and summarizes it at the end of the interval, then the calculated latency was what was plotted in flent?

@tohojo
Copy link
Owner

tohojo commented Nov 21, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Nov 21, 2017 via email

@tohojo
Copy link
Owner

tohojo commented Nov 22, 2017 via email

@heistp
Copy link
Contributor Author

heistp commented Nov 22, 2017 via email

@tohojo tohojo closed this as completed in 117b39c Nov 22, 2017
@tohojo
Copy link
Owner

tohojo commented Nov 22, 2017

Right, so convinced myself that I'd fixed most of the breakages in the refactor (which turned out to be a multiple-thousands lines patch, but with a net negative of 400 lines of code; not too bad), so merged it and closed this issue.

Please open new issue(s) for any breakage that I missed. I'll open a new one specifically for using irtt for VoIP tests.

@tohojo
Copy link
Owner

tohojo commented Nov 22, 2017

Oh, and many thanks for your work on irtt, @peteheist! We really needed such a tool :)

@heistp
Copy link
Contributor Author

heistp commented Nov 22, 2017

Oh yeah, probably time for this issue thread to retire. :)

So I'm glad! Looking forward to playing with this more soon. Thanks for all that refactoring too, looks like it was some real walking through walls...

@tohojo
Copy link
Owner

tohojo commented Nov 22, 2017 via email

@dtaht
Copy link
Collaborator

dtaht commented Nov 22, 2017 via email

@tohojo
Copy link
Owner

tohojo commented Nov 22, 2017

The owd data is already being collected, so it's fairly trivial to add the plots...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants