Skip to content
This repository has been archived by the owner. It is now read-only.

Ticket28565 rebased #348

Merged
merged 4 commits into from Mar 21, 2019
Merged

Ticket28565 rebased #348

merged 4 commits into from Mar 21, 2019

Conversation

Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
@juga0
Copy link
Contributor

@juga0 juga0 commented Mar 12, 2019

To review only from 70531ba included

@juga0 juga0 force-pushed the ticket28565_rebased branch 4 times, most recently from 3f97c4b to 75168c8 Mar 17, 2019
Copy link
Contributor

@teor2345 teor2345 left a comment

Thanks for updating the documentation.
I think we should fix some of the comments.
We should also change one of the key names so it is less confusing.

BW_HEADER_KEYVALUES_RECENT_MEASUREMENTS_EXCLUDED = [
# The number of success results should be:
# the number of attempts - the number of failures
# 4.6 header: the number of successful results, created in the last 5 days,
Copy link
Contributor

@teor2345 teor2345 Mar 18, 2019

What does "4.6 header:" mean here?
There is no link to any other document, so the section number is not useful.

Also, the code counts relays, not results.

Suggested change
# 4.6 header: the number of successful results, created in the last 5 days,
# header: the number of relays with enough results to be measured

@@ -43,6 +43,18 @@
STATS_KEYVALUES = ['number_eligible_relays', 'minimum_number_eligible_relays',
'number_consensus_relays', 'percent_eligible_relays',
'minimum_percent_eligible_relays']
BW_HEADER_KEYVALUES_RECENT_MEASUREMENTS_EXCLUDED = [
# The number of success results should be:
# the number of attempts - the number of failures
Copy link
Contributor

@teor2345 teor2345 Mar 18, 2019

I don't understand how these two comment lines are related to the code.
Should they be removed?

# This is the sum of the following 3 + not success results
# 'recent_measurement_exclusion_count',
'recent_measurement_exclusion_not_success_count',
'recent_measurement_exclusion_not_distanciated_count',
Copy link
Contributor

@teor2345 teor2345 Mar 18, 2019

"distanciated" is an English word that many people do not know.
And it has a psychological meaning.

Can you replace it with a more common word?
(Or explain it in a comment?)

Copy link
Contributor Author

@juga0 juga0 Mar 18, 2019

how does distant sounds?, better away?

Copy link
Contributor

@teor2345 teor2345 Mar 19, 2019

Distant is better. But don't use two negatives "exclusion not distant". It's confusing.

Instead, explain what is being counted:

These recent measurements are in a short time range. Here is the exclusion count for those measurements.

Suggested change
'recent_measurement_exclusion_not_distanciated_count',
'recent_measurement_exclusion_short_time_range_count',

Copy link
Contributor Author

@juga0 juga0 Mar 20, 2019

what about recent_measurement_exclusion_near_count?

Copy link
Contributor

@teor2345 teor2345 Mar 20, 2019

Ok

@@ -135,11 +138,14 @@
'relay_recent_measurement_failure_count',
# The number of success results should be:
# the number of attempts - the number of failures
Copy link
Contributor

@teor2345 teor2345 Mar 18, 2019

I don't understand how these two comment lines are related to the code.
Should they be removed?

Copy link
Contributor

@teor2345 teor2345 Mar 21, 2019

I think you missed this fix?

if not success_results:
return None
return None, 'recent_measurement_exclusion_not_success_count'
Copy link
Contributor

@teor2345 teor2345 Mar 18, 2019

This code counts relays, not results.

But the header key comment says "results", not "relays".

We should update the header key comment.

if not results_away:
return None
return None, 'recent_measurement_exclusion_not_distanciated_count'
Copy link
Contributor

@teor2345 teor2345 Mar 18, 2019

This code counts relays, not results. (See above.)

It also counts relays that were not excluded by 'recent_measurement_exclusion_not_success_count', but are excluded by 'recent_measurement_exclusion_not_distanciated_count'. But this is not documented in comments or the spec.

if not results_recent:
return None
return None, 'recent_measurement_exclusion_not_recent_count'
Copy link
Contributor

@teor2345 teor2345 Mar 18, 2019

This code suffers from the same issues as the other exclusion keys.

if not len(results_recent) >= min_num:
# log.debug('The number of results is less than %s', min_num)
return None
return None, 'recent_measurement_exclusion_not_min_num_count'
Copy link
Contributor

@teor2345 teor2345 Mar 18, 2019

This code suffers from the same issues as the other exclusion keys.

sbws/lib/v3bwfile.py Show resolved Hide resolved
# They rules were introduced in #28061 and #27338
# In #28565 we introduce the KeyValues to know why they're excluded.
# In #28563 will include again these lines, but make them unparseable
# by Tor.
Copy link
Contributor

@teor2345 teor2345 Mar 18, 2019

This comment is confusing: it sounds like you will make the file unparseable by Tor, which is bad. What you actually want is to make Tor ignore some of the relay lines in the file.

# that were excluded by a filter
# This is the sum of the following 3 + not success results
# 'recent_measurement_exclusion_count',
'recent_measurement_exclusion_not_success_count',
Copy link
Contributor

@teor2345 teor2345 Mar 19, 2019

This key name is hard to understand, because it uses two negatives in a row: "exclusion not success".

Usually, we use a short name for the total, and then add extra information for each part of the total:

Suggested change
'recent_measurement_exclusion_not_success_count',
'recent_measurement_exclusion_count',

Copy link
Contributor Author

@juga0 juga0 Mar 20, 2019

I left commented that key because that would be the sum of all the relays excluded by a rule.
What about recent_measurement_exclusion_error_count?

Copy link
Contributor Author

@juga0 juga0 Mar 20, 2019

This is actually the sum of all the errors, which we can count externally, so maybe just not include it?

Copy link
Contributor

@teor2345 teor2345 Mar 20, 2019

I think we do not need it. Most parsers will be able to add numbers.

Copy link
Contributor Author

@juga0 juga0 Mar 20, 2019

hmm, thinking it better, we could now deduce all the relays excluded by a rule counting "vote=0" when it's implemented and then looking why there were measurements excluded for each relay.
But if we leave all the reasons whey the relays are excluded, it'll give us a fast idea on what it's failing.
And then i think we should leave this one for consistency.

# The relay measurements are not X time recent.
# By default this is 5 days, which is the same time the older
# the measurements can be by default.
'recent_measurement_exclusion_not_recent_count',
Copy link
Contributor

@teor2345 teor2345 Mar 19, 2019

Suggested change
'recent_measurement_exclusion_not_recent_count',
'recent_measurement_exclusion_old_count',

'recent_measurement_exclusion_not_recent_count',
# The minimum number of measurement a relay needs to have.
# By default this is two.
'recent_measurement_exclusion_not_min_num_count',
Copy link
Contributor

@teor2345 teor2345 Mar 19, 2019

Suggested change
'recent_measurement_exclusion_not_min_num_count',
'recent_measurement_exclusion_under_min_num_count',

Copy link
Contributor Author

@juga0 juga0 Mar 20, 2019

what about recent_measurement_exclusion_few_count?

Copy link
Contributor

@teor2345 teor2345 Mar 20, 2019

Ok

@@ -135,12 +145,21 @@
'relay_recent_measurement_failure_count',
# The number of success results should be:
# the number of attempts - the number of failures
# Number of error results, it can be deduced counting the number of errors
# but this way seem clearer.
'relay_recent_measurement_exclusion_not_success_count',
Copy link
Contributor

@teor2345 teor2345 Mar 19, 2019

Suggested change
'relay_recent_measurement_exclusion_not_success_count',
'relay_recent_measurement_exclusion_count',

Copy link
Contributor Author

@juga0 juga0 Mar 20, 2019

I left commented that key because that would be the sum of all the excluded measurements.
For instance, if each rule would exclude a measurement.
What about relay_recent_measurement_exclusion_error_count?

Copy link
Contributor Author

@juga0 juga0 Mar 20, 2019

This is actually the sum of all the errors, which we can count externally, so maybe just not include it?

Copy link
Contributor

@teor2345 teor2345 Mar 20, 2019

Ok

Copy link
Contributor Author

@juga0 juga0 Mar 20, 2019

or maybe leaving it, it makes it more clear that failure (when the relay was attempted to be measured but we don't know what happened) is different from error

# 'relay_recent_measurement_exclusion_count',
# The number of measurements excluded because they were not X time
# away from each other. By default it is one day.
'relay_recent_measurement_exclusion_not_distanciated_count',
Copy link
Contributor

@teor2345 teor2345 Mar 19, 2019

Suggested change
'relay_recent_measurement_exclusion_not_distanciated_count',
'relay_recent_measurement_exclusion_short_time_range_count',

# away from each other. By default it is one day.
'relay_recent_measurement_exclusion_not_distanciated_count',
# The number of measurements excluded because there were not X time recent.
# By default this is 5 days.
'relay_recent_measurement_exclusion_not_recent_count',
Copy link
Contributor

@teor2345 teor2345 Mar 19, 2019

Suggested change
'relay_recent_measurement_exclusion_not_recent_count',
'relay_recent_measurement_exclusion_old_count',

'relay_recent_measurement_exclusion_not_recent_count',
# The number of measurements excluded because there were not at least X.
# By default it is 2.
'relay_recent_measurement_exclusion_not_min_num_count',
Copy link
Contributor

@teor2345 teor2345 Mar 19, 2019

Suggested change
'relay_recent_measurement_exclusion_not_min_num_count',
'relay_recent_measurement_exclusion_under_min_num_count',

@juga0
Copy link
Contributor Author

@juga0 juga0 commented Mar 20, 2019

Also, i would replace all the keys:

s/_measurement_exclusion_/_measurements_excluded/

What do you think?

@juga0
Copy link
Contributor Author

@juga0 juga0 commented Mar 20, 2019

I think i'm also going to modify the code so that the relay_recent_measurement_exclusion(*) is not present when it actually does not exclude any measurement.
I think it make it clearer and less data in the file.

@juga0
Copy link
Contributor Author

@juga0 juga0 commented Mar 20, 2019

I made a "wip" commit with the names i was suggesting, because i don't know if you would like them.

I think i'm also going to modify the code so that the relay_recent_measurement_exclusion(*) is not present when it actually does not exclude any measurement.

In the end i only include them when a measurement is excluded, no matter if the relay is excluded or not.
That commit would need to be squashed if you think is a good solution.
I also change the name of a key in that commit (should have been in the wip commit)

@teor2345
Copy link
Contributor

@teor2345 teor2345 commented Mar 20, 2019

Also, i would replace all the keys:

s/_measurement_exclusion_/_measurements_excluded/

What do you think?

Yes, that seems easier to understand.

@teor2345
Copy link
Contributor

@teor2345 teor2345 commented Mar 20, 2019

I think i'm also going to modify the code so that the relay_recent_measurement_exclusion(*) is not present when it actually does not exclude any measurement.
I think it make it clearer and less data in the file.

Ok

@teor2345
Copy link
Contributor

@teor2345 teor2345 commented Mar 20, 2019

I will review the new code in about 12-18 hours.

@juga0
Copy link
Contributor Author

@juga0 juga0 commented Mar 20, 2019

Added more tests in last fixup

@@ -43,13 +43,17 @@
STATS_KEYVALUES = ['number_eligible_relays', 'minimum_number_eligible_relays',
'number_consensus_relays', 'percent_eligible_relays',
'minimum_percent_eligible_relays']
# KeyValues to count the number of relays not included in the bandwidth file
# of for which is not reported a bandwidth value.
Copy link
Contributor

@teor2345 teor2345 Mar 21, 2019

This comment is really hard to understand, and I don't think it matches the latest code.

Did you mean:

# KeyValues that count the number of relays that are in the bandwidth file,
# but ignored by Tor when voting, because they do not have a measured bandwidth.

@teor2345
Copy link
Contributor

@teor2345 teor2345 commented Mar 21, 2019

Also, i would replace all the keys:
s/_measurement_exclusion_/_measurements_excluded/
What do you think?

Yes, that seems easier to understand.

I think you missed this change.

# The rules were introduced in #28061 and #27338.
# In #28565 we introduce the KeyValues to know why they're excluded.
# In #28563 we report these relays, but make Tor ignore them.
# This might confirm #28042.
Copy link
Contributor

@teor2345 teor2345 Mar 21, 2019

I don't think you need to repeat this comment?
Repeated comments can get out of sync.
Just say "see from_results() for details"

juga0 added 4 commits Mar 21, 2019
and return reason why the relay was excluded to be able to count
the relays excluded in the header.
Part of #28565.
@juga0 juga0 force-pushed the ticket28565_rebased branch from c1f48a7 to 127cf46 Mar 21, 2019
@juga0
Copy link
Contributor Author

@juga0 juga0 commented Mar 21, 2019

made other fixup, squashed all, force pushed

@juga0 juga0 merged commit 127cf46 into torproject:master Mar 21, 2019
1 check passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.