torproject / sbws Public archive
Check that the scaling is working #182
Comments
|
I was going to plot what pastly did in [1] with new files. |
|
https://github.com/pastly/simple-bw-scanner/blob/master/scripts/tools/v3bw-into-xy.sh used to take a v3bw file and produce output like but it looks like it needs minor updates to handle the new v3bw file format. https://github.com/pastly/simple-bw-scanner/blob/master/scripts/tools/plot-v3bw-xy.py takes the output of the previous script and plots the scatter plots that you referenced (and no longer exist because share.riseup.net deletes stuff after like a week). Example usage (from memory): You can avoid making temporary files with some bash magic. This does the same thing as above without temporary files (this is how I was doing it, but again from memory): |
|
I replaced v3bw-into-xy.sh with v3bw-into-xy.py since it was easier to parse the slightly more complex v3bw files in python. |
|
Thanks for that change. |
|
The plot script works now that I rewrote What more work needs to happen on the parsing/plotting scripts? What modifications are you making to I'm going to stop the webserver that I think is causing my problems and in ~5 days hopefully my results will look better. |
|
Yeah, now that |
I should mention that I didn't do any scaling with those results. What I did do is wget a file from a variety of allegedly fast sources -- including my freebird server -- and freebird capped out at ~7.5 MBps while tityos got to 60+ MBps. (Yes bytes in both cases). |
|
hmm, you mean that |
|
The pastly results are generated by
|
|
This is from sbws 0.4.2-dev, using only my "tityos" destination (no longer using the "freebird" destination) and measuring from ln5's host. It's like ... no better. It's no closer to being similar to torflow. By the way, this is what it looks like if I don't cap the Y at 10,000 (same data, different view). Scaling my results doesn't really help. To understand why: scaling the way we've proposed doesn't change the shape of the curve. There's either something really wrong with sbws or the environment I'm running it in. |
|
Or the graphs are not accurate. If every |
Because the dots are ordered by moria's data. Relay number 0 is the fastest relay according to moria and relay number ~7000 is the slowest according to moria.
The black dots come from a v3bw file fetched from moria. The red from a v3bw file I generated from sbws data. The two v3bw files were fetched at the same time.
I did not scale sbws data.
I could plot this, sure. I'll try to remember to do so. |
|
Remember: there is significant variance between bandwidth authorities, and sbws only has to be similar to one of them. Please plot all existing bandwidth authorities on the same graph, and order by sbws measurement. |
|
These are the 3 bwauths that make their v3bw files public. URLs fetched from here Compared to sbws, the bwauths are not that different from each other. I'd include sbws and run my new plotting scripts except my sbws died a few days ago and I haven't noticed until now. I'm starting to think we're going to have to do more than just single circuit download performance. For example, download over many circuits at once through a target relay, or do whatever torflow does with relays' self-measured bandwidth (like @binnacle talked about in #150). |
So sbws produces a flatter curve. Remember: the goal of the bandwidth measurement system is to produce weights that make efficient use of the available relay bandwidth: And there is evidence that torflow is allocating too much load to large relays: So before we make sbws match torflow, let's check if torflow's results are actually what the Tor network needs:
I'm not sure how we can answer all these questions. As a first step, let's scale sbws's results to match torflow's results. Yes, the shape of the curve will be the same. But maybe that shape is better that torflow's. As a second step, let's set up both the sbws client and server on fast servers in Germany or France, near most of the current high-bandwidth relays. Then we can see if the results are closer to torflow's. |
|
Or here's a better comparison: Set up torflow and sbws on the same client and server, and compare the results. |
|
Set up torflow and sbws on the same client and server, and compare the results.
(But don't run them at exactly the same time, they'll fight for bandwidth.)
Then maybe we need to compare only the relays that are present on both
or take the median of all the ones that are not present on both?.
I guess that in average we can ignore the fact that bandwidth for the
same relay might be different at different times?
|
|
@pastly, which server machine would you use for the client?, i've run Torflow before in ln5's one and the vps i'm using, let me know if should setup it up somewhere else. |
I'm using ln5's machine for my scanner and tityos for the server. I think you can keep using whatever you want because right now I don't think there's a bandwidth contention issue on ln5's machine. If we can get tjr to run sbws and get a 1 GiB file on whatever webserver he's using, then we will have a direct comparison. |
|
I was curious to graph the same sbws data with and without scale. |
Graph Analysis
It's closer, but the curve is a different shape:
That's interesting, because then we get to ask ourselves:
Metcalf's law suggests that the network itself should follow a linear to parabolic distribution (n*log(n) to n2): And maybe a hyperbolic distribution is bad for the Tor network? Possible ExplanationsWe could be seeing a different distribution because torflow distributes its bandwidth files based on its own scaled bandwidth measurements. torflow claims that each relay is measured using > 5 times that relay's bandwidth (since the files are in powers of two, that's 5-10 times): But it's actually measuring them at (scaled bandwidth) * 5-10 times. Next StepsLet's increase the sbws download length so that bandwidth dominates, rather than latency. Tor latency is at most 1 second for large relays: So let's try 20-40 second sbws downloads, rather than 5-10 second downloads? |
Okay. I'm running sbws with the following settings (changed the |
|
First I'll mention that a 30 second target causes the scanning to take much much longer. sbws has been running for about two days and isn't done measuring every relay yet. (This was what I predicted, but I just wanted to make sure it was explicit). Now, on to the results. Here's the same graph as 15 days ago but with today's data from moria and sbws (and sbws targeting 30s downloads). This looks very very similar to the old graph when sbws was targeting 6s. I don't think a 30s target is a good idea and that 6s was fine. What I think the next steps should be: Either
|
|
Have a couple suggestions: In addition to comparing raw and cooked absolute votes, try comparing vote sets normalized to selection probability. Include the synchronous consensus selection probability curve as an overly. Avoid comparing anachronistic vote and consensus sets since the numbers shift by as much as 10% in twelve hours. |
|
Both arma and tjr have expressed ability to share raw scanner results with us (option 2 above), though tjr says he'd like instructions on how/what to share. arma pointed to https://trac.torproject.org/projects/tor/ticket/2532, but I don't think we need to reopen it because I don't think metrics needs to get involved at this time. |
|
In https://lists.torproject.org/pipermail/tor-dev/2018-July/013330.html, @teor2345 said: |
|
I left the IRC conversation today expecting you to run sbws and torflow. That's why I thanked arma and tjr. I see how it was confusing that I mentioned advantages to having tjr run sbws, but I don't think he's planning on doing it (because AFAICT, we un-asked him to run it on IRC). |
|
One think i've been thinking on and started to work on is to refactor part of the |
I'm not sure I understand what you mean here.
You could add debug logging, or add debug attributes to the bandwidth lines.
How many times are we going to generate the graphs?
Yes, please open new tickets for new code, this task is an analysis task. |
|
> One think i've been thinking on and started to work on is to refactor part of the v3bwfile.py code so that we have the possibility to also generate the relays bw resting their own rtt. I think this may discard latency.
I'm not sure I understand what you mean here.
Can you explain how sbws uses the rtt at the moment?
sbws calculates rtt but does not use it for anything except for
including it in the bandwidth list files.
i've already generated graphs comparing bandwidths with and without rtt
(without rtt: bw = amount of data downloaded / (time it took - rtt) and
bandwidth are slightly higher without rtt (as expected) but the curve
looks the same.
I still need to compare it with the results obtained increasing the time
to download.
@pastly: did you plan to do something else with the rtts measurements?.
[...]
> Maybe i should open new tickets for this?
Yes, please open new tickets for new code, this task is an analysis task.
ok, i'm so far modifying the code to generate bandwidths in different
ways and modifying the graphs code to compare them. Will open ticket for
code when it's ready.
|
|
It would be great if any of you could remind me why our scaling is |
See the comment here: And the thread here: |
oh, sorry i didn' t follow that link, yeah, that's a more elaborated explanation. So, in theory, a network with 7500 relays, would have:
If i get it correctly from the mail, in theory, a network of 6460 measured relays, would have:
Correct? In practice, a network with 6460 relays, measured with sbws, have:
So it seems there is an error somewhere with the units, sbws results would make more sense divided by 100 and multiplied per 2:
Correct? Same measurements with sbws using scaling:
So with scaling, are we trying to get the mean ~1?, and the total bw to ~num relays? If yes, then scaling is almost working, but then i might not be interpreting correctly the mail. With measurements from Torflow, having 8748 relays:
What confirms the previous paragraph. With that, the previous scaled
Correct? I can graph sbws scaled results with that, though i think the shape is not going to change And what's the advantage of scaling this way?, would make more sense to just have the "raw" bandwidths and make Tor calculate weights in a different way? Sorry i'm questioning this now and not months ago. |
|
yes, i did, the correct results are:
i mean, they don't make more sense, but they are closer to what expected. Actually i could just divide each sbws measured bw by 1/45 to get total bw: ~50000000 and mean bw: ~7500. But then, why ~1/45?
ok, it's not the case, but would make sense if what we wanted was to normalized I'll show some graphs here soon |
|
@juga0, I'm not sure how to answer the questions in your last 3 comments. The purpose of scaling is to make sure that bandwidth weights don't change when torflow instances are replaced with sbws instances. So we want the total bandwidth (or average bandwidth) to be similar between torflow and sbws. |
|
Right, what @teor2345 said. In my mind this ticket morphed from "check that scaling is working" into "check that sbws produces sane results that may or may not need scaling" a while ago. |
|
I parsed torflow raw files, took strm_bw, then took the median of all the bandwidths for one node.
sbws results from days before:
Results are now closer. The max and min differences could be cause i run sbws during less time, or just cause they run at different times. Now plotting both together, first ordering with torflow, then ordering with sbws. |
|
All bandwidths are in Bytes/second, so there's no conversion errors. |
|
Very exciting results, thanks @juga0! (I'll analyze better ASAP) |
Ok, so we have two options:
I suggest we go with option 1, but make the scaling depend on a consensus parameter. Then the directory authority operators can turn scaling off after a majority of bandwidth authority operators transition to sbws. |
|
Hmm, the main conclusion i get from those graphs is:
So i was thinking: |
|
According to the specs, when the [1] https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/aggregate.py |
|
Hmm, what about to calculate the weights that would be assigned for each relay with our sbws scaled results and compared them with the weights actually assigned to the relays in that period? |
|
Maybe this way we can check how much weights might be different using sbws instead of Torflow |
I'm not sure if we're talking about the same thing here. Torflow scales each relay's observed bandwidths using the ratio between that relay's measured bandwidth, and the total measured bandwidth for all relays. If we want sbws to match torflow, we need to do similar scaling in sbws. Specifically, we need to:
We know that stream bandwidths are similar between torflow and sbws, and we also know that PID control is broken. So I think we need to copy these 4 lines of torflow's scaling code: We might also need to cap the result:
We can't change Tor's code, because it takes too long to deploy new tor versions. And if we did change Tor's code, it would be very easy to double-scale torflow's results. |
bwweightscale is for bandwidth-weights. bandwidth-weights are not used to scale relay measured bandwidths. bandwidth-weights are used for relay position weights:
These divisions are used for the consensus parameters for PID control, which is broken. (The feedback loop is not fast enough or reliable enough. I can't find the ticket.)
You already did scaling in: scaling might make the results a similar size, but it isn't going to change the shape of the curve:
I think that this graph could be useful. If we decide to implement torflow's observed bandwidth scaling in sbws, we can compare the graphs. |
|
Would like to add my thoughts. Per earlier comments I favor the idea that observed self-measure is a necessary ingredient, that it cannot (with reasonable resources) be discerned remotely. Suggest a spreadsheet representing the data from SBWS, Torflow input and aggregate.py output will illuminate this more clearly than the graph, where the idea is to examine individual relays with an eye toward the sanity of the weights assigned to them. I believe cases of potential severe misrating will be evident. Yet the approach of biasing self-measure may benefit from any number of refinements--no effort in that direction was pursued previously. Some ideas in no particular order: Instead of a single simple linear factor (Kp=1) perhaps parameterized polynomial equations could be incorporated such that the degree of SBWS adjustment to self-measure can vary depending on the advertised (or measured) speed of each relay. A separate equation for above-mean and below-mean biasing would allow for curves that emphasize optimal consensus balance for the former and collaring of nodes gaming to high bandwidth for the latter. Instead of a single all-relay average: a) perhaps calculate a sliding weighted average by bandwidth or bin by decile, ventile, etc. and/or b) perhaps calculate averages separately for each consensus class (e.g. exit, guard, unflagged/middle, ?exit+guard, ?exit-only). Relay selection probabality determination in each class operates independent of the others if I understand correctly. All of the above and more could be implemented with consensus parameter controls that allows for cautious, iterative refinement of the consensus outcome. Modelling results saves a great deal of time, but cannot represent all real-world behaviors due to the feedback dimensions of bandwidth scanning and consensus construction. Relay self-measure bears improvement so that inputs to the process represent true capacity. In particular, relays under-report capacity when lightly loaded. Informing all of the above is the reality that gaming of bandwidth measurement appears to have little value to sophisticated adversaries--else more trouble surely would have arrived by now. Consider: overrated relays attract attention and an overload condition impairs various nuanced attack models. Gaming for high bandwidth is for amateurs and miscreants. |
|
Replying to teor's comment in #182 (comment):
Thanks for pointing me at those lines, i think i finally understand what torflow is doing. |
|
Replying to teor's comment in #182 (comment)
oh, right, sorry i got confused with that |
|
Code (self-explained) in #243 |
|
I'm quite sure now i found the problem. |
|
Oh, and not only is written once, it's not updated from day to day |
|
This is dealt with by https://trac.torproject.org/projects/tor/ticket/27108 and the other children of https://trac.torproject.org/projects/tor/ticket/27107 |













As part of the tasks for having a MVP mentioned in https://trac.torproject.org/projects/tor/wiki/org/meetings/2018NetworkTeamHackfestSeattle/sbws.
The text was updated successfully, but these errors were encountered: