torproject / tor Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adaptive padding rebased 0.3.6 #439
Adaptive padding rebased 0.3.6 #439
Conversation
We need this for padding negotiation so that we can have later machine revisions supercede earlier ones.
Pull Request Test Coverage Report for Build 2699
|
This is a good code review start point, to get an overview of the interfaces and types used in circuit padding.
This helps us to determine if a middle node can pad to us or not.
These event callbacks allow circuit padding to decide when to attempt to launch and negotiate new padding machines, and when to tear old ones down.
These callbacks allow the padding state machines to react to various types of sent and received relay cells.
This implements all of the event handling, state machines, and padding decisions for circuit padding. I recommend reviewing this after you look at the call-in points into it from the rest of Tor.
* Tighten and justify cutoff for approximation in GPD sampler.
* Use expm1(-xi log U) to evaluate U^{-xi} - 1 without precision loss.
* Use log1p(x) to evaluate log(1 + x) precisely even for x near 0.
* Avoid 0 by rejection sampling, not by choosing to return 0 instead.
(This avoids the problematic 0 return from crypto_rand_double()
into the log() without introducing distribution bias).
Hope is this will make it easier to test on the live tor network. Does not need to be merged if we don't want to, but will come in handy for researchers.
Note to self/others: don't merge this.
65a334c
to
09d2269
Optimize the padding_sent event callpath. By moving it out from relay_send_command_from_edge(), we avoid recursing with a cell_t and a relay_header_t in each stack frame for back-to-back padding cells. The recursion thing still happens, and is an argument for using Nick's pub/sub API here instead, since no packets will be sent until we're finished deciding to add padding packets back-to-back.
I did an incomplete review and found a few minor bugs, a some code that could be improved if you want, and a lot of questions. I think many of my questions could be answered by docs that talk about the general structure of the code, the threats it is trying to defend against, and the kind of timing accuracy it's trying to achieve.
| /* Researchers need a way to tell their clients to use specific | ||
| * middles that they also control, to allow safe live-network | ||
| * experimentation with new padding machines. */ | ||
| V(MiddleNodes, ROUTERSET, NULL), |
We'll also need a feature like this for PrivCount.
When will the MiddleNodes feature be merged?
(PrivCount will probably also need some way of tagging circuits to be measured, but I don't think padding needs that, because you already have padding request cells.)
Please document this new option in the man page.
Do we also need to update the man page entry for StrictNodes to say that it doesn't apply to MiddleNodes?
Ok I put this in the TODO file. I will probably not get to it until next week,
src/feature/nodelist/nodelist.c
Outdated
| @@ -2393,6 +2393,7 @@ compute_frac_paths_available(const networkstatus_t *consensus, | |||
| smartlist_free(mid); | |||
| smartlist_free(exits); | |||
|
|
|||
| // XXX-MP-AP: Do the same for options->MiddleNodes? | |||
Yes, you need to do the same for MiddleNodes, otherwise clients may bootstrap and try to open circuits before they have any allowed middle nodes.
But the middle node code should be much simpler, because there's no need to check for the Exit flag, or exit policies.
| @@ -2580,7 +2584,24 @@ choose_good_middle_server(uint8_t purpose, | |||
| return choice; | |||
| } | |||
How do MiddleNodes and Vanguards interact?
Should we document the fact that Vanguards overrides MiddleNodes?
I think the sanest way for them to interact is by having the vanguards addon inspect MiddleNodes, and only choose HSLayer2Nodes from that set. Otherwise, the manpage should say that researchers must manually set both values to the same nodes if they want to test padding on vanguards-like circuits without using the addon.
src/core/or/circuitpadding.c
Outdated
| FREE_AND_NULL(circpad_machineinfo_handle_t, \ | ||
| circpad_machineinfo_handle_free_, (h)) | ||
|
|
||
| #define USEC_PER_SEC (1000000) |
TOR_USEC_PER_SEC is only defined in C files and test files. I can name my thing to be the same. Or I can move the tvdiff.c definitions out of their C and test files and into their header, and use that.
Though which header? Right now, tvdiff.c is the thing that defines TOR_USEC_PER_SEC, and I don't need to use tvdiff functions... Should I put it in some other header?
I'm not sure, I think @nmathewson knows where time constants belong. Somewhere in lib?
Switched name to TOR_USEC_PER_SEC in f4d84db.
I put an XXX there for Nick when he looks at the branch later this week.
src/core/or/circuitpadding.c
Outdated
| return start_usec+1; | ||
|
|
||
| return start_usec | ||
| + (state->range_sec*USEC_PER_SEC)/(1<<(state->histogram_len-bin+1)); |
What happens when (state->range_sec*USEC_PER_SEC)/(1<<(state->histogram_len-bin+1)) is zero?
Hrmm. A bunch of 0 sized bins.
We should probably have a circpad_machine_validate() function that checks for obvious signs of machine degeneracy like this while initializing machines. We might be able to wait for the serialization code to be written before we do that, since we only have one hardcoded machine right now that is presumed to be valid.
On a related note: this line is very complex, it contains 14-16 elements: 8 values and 6-8 operators (and 3 sets of brackets). Most humans are good with 3-7 things in their head at once.
For maintainability, please split it into at least 3 lines, using temporary variables. If you make your variables const, most compilers will eliminate the extra operations. (Some will do it even if they're not const.)
| static double | ||
| circpad_distribution_sample(circpad_distribution_t dist) | ||
| { | ||
| double p = 0; |
General comment:
If we can work out an accurate way to do p/(1.0-p), let's put it in a function and use it.
src/core/or/circuitpadding.c
Outdated
| /* https://en.wikipedia.org/wiki/Weibull_distribution \ | ||
| * #Cumulative_distribution_function | ||
| * param1 is k, param2 is Lambda */ | ||
| return dist.param2*pow(-log1p(p), 1.0/dist.param1); |
Here's what we can fix:
- pow(-log1p(p), x) will either produce negative results, or imaginary numbers, it should be pow(-log1p(-p), x)
- Please add unit tests to catch bugs like this
- increase the resolution of p to at least 1/2**53
- 1.0/param1 is a constant, let's just specify param1 = 1.0/k instead
Here are my remaining questions:
- what are typical values for param1 and param2? What are the min and max values?
- is covering the full range of outputs important? What range of outputs do we need?
- will infinite floating point results cause errors?
- I don't know the typical accuracy of the pow() function, so let's get a second opinion on this code
| return dist.param2*pow(-log1p(p), 1.0/dist.param1); | |
| /* log(p) is inaccurate near 1 when the resolution of p is greater than 1/2**53, because p is inaccurate near 1. | |
| * So we use log1p(-p) instead. | |
| * The result is potentially infinite. | |
| * The maximum value is approximately param2*pow(2**32, param1). */ | |
| return dist.param2*pow(-log1p(-p), dist.param1); |
| * param1 is k, param2 is Lambda */ | ||
| return dist.param2*pow(-log1p(p), 1.0/dist.param1); | ||
| case CIRCPAD_DIST_PARETO: | ||
| p = 1.0-crypto_rand_double(); // Pareto quantile needs (0,1] |
Let's make crypto_rand_double() take flags for including the endpoints. (Or let's make a new function that takes those flags, and implement crypto_rand_double() using that function.)
Hrmm.. I am kinda against a bunch of flags if this is the only case we need it for.
I could easily write up a crypto_rand_double64() though, and use that here.
I am suggesting we add two flags: allow_zero, and allow_one.
I'll also need these flags when I rewrite the Laplace code.
We'll also use the flags for the logistic distribution. And I bet if we review the existing code that uses crypto_rand_double(), we'll find other cases where they're useful. (I also expect to use them in PrivCount.)
I am not sure what you mean by crypto_rand_double64().
The current implementation of crypto_rand_double() tries to use whatever INT is on the platform as the underlying range for the double. I am saying I could make one that always uses uint64_t.
| * < 1. Since -log U < 1000 for all U in (0, 1] in binary64 | ||
| * floating-point, we can safely cut xi off at 1e-20 < eps/4000 | ||
| * and attain <1ulp error. | ||
| */ |
Ideally, all our floating point code should have a detailed error analysis like this.
In practice, we should document why we chose particular code, but I don't think we need to calculate ulps.
(How accurate do these distributions need to be?)
| * and attain <1ulp error. | ||
| */ | ||
| if (fabs(dist.param2) <= 1e-20) | ||
| return -dist.param1*tor_mathlog(p); |
Here are my remaining questions:
- what are typical values for param1 and param2? What are the min and max values?
- is covering the full range of outputs important? What range of outputs do we need?
- is the discontinuity at the changeover point significant? Do we need to give advice about setting param2 to just above or just below 1e-20?
- can we get infinite floating point results from either part of this function? will infinite floating point results cause errors?
I don't actually know the typical values here. It is supported in case someone needs it, because it was easy to do. Same for Weibull..
For the changeover point, in the limit as Xi->0, these functions are the same. So it's just a matter of trying to pick which one to use when close to 0. That's the analysis Riastradh did in the comment.
|
GitHub doesn't like the number of comments I added in my review: some of them will be hidden. |
TODO notes from teor's comments; more test plans; other things.
| cell_t cell; | ||
| ssize_t len; | ||
|
|
||
| if (!circpad_circuit_supports_padding(circ, target_hopnum)) { |
This seems to be an error path. Let's document it as such and also explain the scenarios that can make us end up here.
| { | ||
| const circpad_state_t *state = circpad_machine_current_state(mi); | ||
| circpad_delay_t start_usec; | ||
| int bin; |
My compiler warns me aabout the fact that state can be NULL here:
+ src/core/or/circuitpadding.c: In function ‘circpad_histogram_usec_to_bin’:
+ src/core/or/circuitpadding.c:162:7: error: potential null pointer dereference [-Werror=null-dereference]
+ if (state->use_rtt_estimate)
+ ^~~~~
I patched it with:
+ if (!state) {
+ return CIRCPAD_DELAY_INFINITE;
+ }
just to be able to proceed.
Huh. This is checked explicitly by the caller (circpad_machine_remove_token()). Unless maybe the unit tests don't check it, and that's where the error came from?
Anyway I added a check in d00ff6b
Update todo.
Add comments
Rename and refactor circuit_get_nth_hop().
Satisfy pedantic compiler.
Fix loops for cell delivery events. The initial implementation did not have fixed machine indexes - instead we just used first available. But that would cause non-deterministic event ordering between two machines. So now we specify machine index. But this also means that either machine slot can be empty.
* Fix typo in Weibull use of pow(-log1p(-p), x). * Specify our parameters as 1.0/param when distributions use that value, so we can control rounding issues directly.
More TODO notes.
Fix a comment.
Co-Authored-By: mikeperry-tor <github@unencrypted.info>
Co-Authored-By: mikeperry-tor <github@unencrypted.info>
Co-Authored-By: mikeperry-tor <github@unencrypted.info>
Hrmm. I hope this is right. I don't understand the mapping between C enums and rust ones..
Co-Authored-By: mikeperry-tor <github@unencrypted.info>
Comment circuit fields a bit better.
Clarify machine conditions type.
Clamp iat_dist sampling.
I still don't know about this log1p stuff..
Better logs.
Spaces fixes.
Improvements to machine creation events.
Machine creation event improvements.
Add relay_early event; rename purpose_changed event.
Comment why we need shutdown on END as an option; also teardown the machineinfo in that case so it can be replaced immediately.
Mock time to fix test flapping.
Pad circsetup only while we have relay_early left.
Save infinity bin in case it was scheduled.
Remove newline.
|
Closing this PR in favor of the squashed one (#461). |
PADDING_TODO.txt
Outdated
| - Specify ports for machine?? | ||
|
|
||
| - circpad_machine_validate() function to sanity-check histograms loaded from | ||
| consensus/torrc. |
Fuzz code that loads anything complex from consensus/torrc.
| /** | ||
| * EWMA estimate of the RTT of the circuit from this hop | ||
| * to the exit end, in microseconds. */ | ||
| circpad_delay_t rtt_estimate_us; |
Consistency in the C files is also important for maintainability.
In the C files, I've also found bin_start, bin_end, and rtt_time.
Can you standardise on "usec" everywhere? Including function names?
Review branch for asn. Strongly recommend reviewing in commit order.
WARNING: Github is being lame. The below order seems to be timestamp order, not git commit order :/
Actual git commit order is the best way to review.
The text was updated successfully, but these errors were encountered: