torproject / sbws Public archive
Conversation
3f97c4b
to
75168c8
Thanks for updating the documentation.
I think we should fix some of the comments.
We should also change one of the key names so it is less confusing.
sbws/lib/v3bwfile.py
Outdated
| BW_HEADER_KEYVALUES_RECENT_MEASUREMENTS_EXCLUDED = [ | ||
| # The number of success results should be: | ||
| # the number of attempts - the number of failures | ||
| # 4.6 header: the number of successful results, created in the last 5 days, |
What does "4.6 header:" mean here?
There is no link to any other document, so the section number is not useful.
Also, the code counts relays, not results.
| # 4.6 header: the number of successful results, created in the last 5 days, | |
| # header: the number of relays with enough results to be measured |
sbws/lib/v3bwfile.py
Outdated
| @@ -43,6 +43,18 @@ | |||
| STATS_KEYVALUES = ['number_eligible_relays', 'minimum_number_eligible_relays', | |||
| 'number_consensus_relays', 'percent_eligible_relays', | |||
| 'minimum_percent_eligible_relays'] | |||
| BW_HEADER_KEYVALUES_RECENT_MEASUREMENTS_EXCLUDED = [ | |||
| # The number of success results should be: | |||
| # the number of attempts - the number of failures | |||
I don't understand how these two comment lines are related to the code.
Should they be removed?
sbws/lib/v3bwfile.py
Outdated
| # This is the sum of the following 3 + not success results | ||
| # 'recent_measurement_exclusion_count', | ||
| 'recent_measurement_exclusion_not_success_count', | ||
| 'recent_measurement_exclusion_not_distanciated_count', |
"distanciated" is an English word that many people do not know.
And it has a psychological meaning.
Can you replace it with a more common word?
(Or explain it in a comment?)
Distant is better. But don't use two negatives "exclusion not distant". It's confusing.
Instead, explain what is being counted:
These recent measurements are in a short time range. Here is the exclusion count for those measurements.
| 'recent_measurement_exclusion_not_distanciated_count', | |
| 'recent_measurement_exclusion_short_time_range_count', |
sbws/lib/v3bwfile.py
Outdated
| @@ -135,11 +138,14 @@ | |||
| 'relay_recent_measurement_failure_count', | |||
| # The number of success results should be: | |||
| # the number of attempts - the number of failures | |||
I don't understand how these two comment lines are related to the code.
Should they be removed?
sbws/lib/v3bwfile.py
Outdated
| if not success_results: | ||
| return None | ||
| return None, 'recent_measurement_exclusion_not_success_count' |
This code counts relays, not results.
But the header key comment says "results", not "relays".
We should update the header key comment.
sbws/lib/v3bwfile.py
Outdated
| if not results_away: | ||
| return None | ||
| return None, 'recent_measurement_exclusion_not_distanciated_count' |
This code counts relays, not results. (See above.)
It also counts relays that were not excluded by 'recent_measurement_exclusion_not_success_count', but are excluded by 'recent_measurement_exclusion_not_distanciated_count'. But this is not documented in comments or the spec.
sbws/lib/v3bwfile.py
Outdated
| if not results_recent: | ||
| return None | ||
| return None, 'recent_measurement_exclusion_not_recent_count' |
This code suffers from the same issues as the other exclusion keys.
sbws/lib/v3bwfile.py
Outdated
| if not len(results_recent) >= min_num: | ||
| # log.debug('The number of results is less than %s', min_num) | ||
| return None | ||
| return None, 'recent_measurement_exclusion_not_min_num_count' |
This code suffers from the same issues as the other exclusion keys.
sbws/lib/v3bwfile.py
Outdated
| # They rules were introduced in #28061 and #27338 | ||
| # In #28565 we introduce the KeyValues to know why they're excluded. | ||
| # In #28563 will include again these lines, but make them unparseable | ||
| # by Tor. |
This comment is confusing: it sounds like you will make the file unparseable by Tor, which is bad. What you actually want is to make Tor ignore some of the relay lines in the file.
sbws/lib/v3bwfile.py
Outdated
| # that were excluded by a filter | ||
| # This is the sum of the following 3 + not success results | ||
| # 'recent_measurement_exclusion_count', | ||
| 'recent_measurement_exclusion_not_success_count', |
This key name is hard to understand, because it uses two negatives in a row: "exclusion not success".
Usually, we use a short name for the total, and then add extra information for each part of the total:
| 'recent_measurement_exclusion_not_success_count', | |
| 'recent_measurement_exclusion_count', |
I left commented that key because that would be the sum of all the relays excluded by a rule.
What about recent_measurement_exclusion_error_count?
This is actually the sum of all the errors, which we can count externally, so maybe just not include it?
I think we do not need it. Most parsers will be able to add numbers.
hmm, thinking it better, we could now deduce all the relays excluded by a rule counting "vote=0" when it's implemented and then looking why there were measurements excluded for each relay.
But if we leave all the reasons whey the relays are excluded, it'll give us a fast idea on what it's failing.
And then i think we should leave this one for consistency.
sbws/lib/v3bwfile.py
Outdated
| # The relay measurements are not X time recent. | ||
| # By default this is 5 days, which is the same time the older | ||
| # the measurements can be by default. | ||
| 'recent_measurement_exclusion_not_recent_count', |
| 'recent_measurement_exclusion_not_recent_count', | |
| 'recent_measurement_exclusion_old_count', |
sbws/lib/v3bwfile.py
Outdated
| 'recent_measurement_exclusion_not_recent_count', | ||
| # The minimum number of measurement a relay needs to have. | ||
| # By default this is two. | ||
| 'recent_measurement_exclusion_not_min_num_count', |
| 'recent_measurement_exclusion_not_min_num_count', | |
| 'recent_measurement_exclusion_under_min_num_count', |
sbws/lib/v3bwfile.py
Outdated
| @@ -135,12 +145,21 @@ | |||
| 'relay_recent_measurement_failure_count', | |||
| # The number of success results should be: | |||
| # the number of attempts - the number of failures | |||
| # Number of error results, it can be deduced counting the number of errors | |||
| # but this way seem clearer. | |||
| 'relay_recent_measurement_exclusion_not_success_count', | |||
| 'relay_recent_measurement_exclusion_not_success_count', | |
| 'relay_recent_measurement_exclusion_count', |
I left commented that key because that would be the sum of all the excluded measurements.
For instance, if each rule would exclude a measurement.
What about relay_recent_measurement_exclusion_error_count?
This is actually the sum of all the errors, which we can count externally, so maybe just not include it?
or maybe leaving it, it makes it more clear that failure (when the relay was attempted to be measured but we don't know what happened) is different from error
sbws/lib/v3bwfile.py
Outdated
| # 'relay_recent_measurement_exclusion_count', | ||
| # The number of measurements excluded because they were not X time | ||
| # away from each other. By default it is one day. | ||
| 'relay_recent_measurement_exclusion_not_distanciated_count', |
| 'relay_recent_measurement_exclusion_not_distanciated_count', | |
| 'relay_recent_measurement_exclusion_short_time_range_count', |
sbws/lib/v3bwfile.py
Outdated
| # away from each other. By default it is one day. | ||
| 'relay_recent_measurement_exclusion_not_distanciated_count', | ||
| # The number of measurements excluded because there were not X time recent. | ||
| # By default this is 5 days. | ||
| 'relay_recent_measurement_exclusion_not_recent_count', |
| 'relay_recent_measurement_exclusion_not_recent_count', | |
| 'relay_recent_measurement_exclusion_old_count', |
sbws/lib/v3bwfile.py
Outdated
| 'relay_recent_measurement_exclusion_not_recent_count', | ||
| # The number of measurements excluded because there were not at least X. | ||
| # By default it is 2. | ||
| 'relay_recent_measurement_exclusion_not_min_num_count', |
| 'relay_recent_measurement_exclusion_not_min_num_count', | |
| 'relay_recent_measurement_exclusion_under_min_num_count', |
|
Also, i would replace all the keys:
What do you think? |
|
I think i'm also going to modify the code so that the |
|
I made a "wip" commit with the names i was suggesting, because i don't know if you would like them.
In the end i only include them when a measurement is excluded, no matter if the relay is excluded or not. |
Yes, that seems easier to understand. |
Ok |
|
I will review the new code in about 12-18 hours. |
|
Added more tests in last fixup |
sbws/lib/v3bwfile.py
Outdated
| @@ -43,13 +43,17 @@ | |||
| STATS_KEYVALUES = ['number_eligible_relays', 'minimum_number_eligible_relays', | |||
| 'number_consensus_relays', 'percent_eligible_relays', | |||
| 'minimum_percent_eligible_relays'] | |||
| # KeyValues to count the number of relays not included in the bandwidth file | |||
| # of for which is not reported a bandwidth value. | |||
This comment is really hard to understand, and I don't think it matches the latest code.
Did you mean:
# KeyValues that count the number of relays that are in the bandwidth file,
# but ignored by Tor when voting, because they do not have a measured bandwidth.
I think you missed this change. |
sbws/lib/v3bwfile.py
Outdated
| # The rules were introduced in #28061 and #27338. | ||
| # In #28565 we introduce the KeyValues to know why they're excluded. | ||
| # In #28563 we report these relays, but make Tor ignore them. | ||
| # This might confirm #28042. |
I don't think you need to repeat this comment?
Repeated comments can get out of sync.
Just say "see from_results() for details"
Part of #28565.
and return reason why the relay was excluded to be able to count the relays excluded in the header. Part of #28565.
Part of #28565.
Part of #28565.
|
made other fixup, squashed all, force pushed |
To review only from 70531ba included
The text was updated successfully, but these errors were encountered: