Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive padding rebased 0.3.6 #439

Conversation

Labels
None yet
Projects
None yet
5 participants
@mikeperry-tor
Copy link
Contributor

@mikeperry-tor mikeperry-tor commented Oct 23, 2018

Review branch for asn. Strongly recommend reviewing in commit order.

WARNING: Github is being lame. The below order seems to be timestamp order, not git commit order :/

Actual git commit order is the best way to review.

We need this for padding negotiation so that we can have later machine
revisions supercede earlier ones.
@coveralls
Copy link

@coveralls coveralls commented Oct 23, 2018

Pull Request Test Coverage Report for Build 2699

  • 552 of 771 (71.6%) changed or added relevant lines in 11 files are covered.
  • 476 unchanged lines in 6 files lost coverage.
  • Overall coverage increased (+0.2%) to 62.215%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/feature/nodelist/routerlist.c 1 2 50.0%
src/core/or/circuituse.c 3 5 60.0%
src/core/or/connection_edge.c 0 2 0.0%
src/core/or/relay.c 4 11 36.36%
src/core/or/circuitbuild.c 2 13 15.38%
src/core/or/circuitpadding.c 533 729 73.11%
Files with Coverage Reduction New Missed Lines %
src/core/or/circuitbuild.c 2 42.06%
src/feature/hs/hs_config.c 2 85.96%
src/lib/net/address.c 27 91.14%
src/feature/hs/hs_descriptor.c 98 88.01%
src/feature/hs/hs_service.c 151 73.92%
src/feature/relay/dns.c 196 17.89%
Totals Coverage Status
Change from base Build 2564: 0.2%
Covered Lines: 44703
Relevant Lines: 71852

💛 - Coveralls

mikeperry-tor and others added 13 commits Oct 24, 2018
This is a good code review start point, to get an overview of the interfaces
and types used in circuit padding.
This helps us to determine if a middle node can pad to us or not.
These event callbacks allow circuit padding to decide when to attempt to
launch and negotiate new padding machines, and when to tear old ones down.
These callbacks allow the padding state machines to react to various types of
sent and received relay cells.
This implements all of the event handling, state machines, and padding
decisions for circuit padding.

I recommend reviewing this after you look at the call-in points into it from
the rest of Tor.
* Tighten and justify cutoff for approximation in GPD sampler.
* Use expm1(-xi log U) to evaluate U^{-xi} - 1 without precision loss.
* Use log1p(x) to evaluate log(1 + x) precisely even for x near 0.
* Avoid 0 by rejection sampling, not by choosing to return 0 instead.
  (This avoids the problematic 0 return from crypto_rand_double()
   into the log() without introducing distribution bias).
Hope is this will make it easier to test on the live tor network.

Does not need to be merged if we don't want to, but will come in handy
for researchers.
Note to self/others: don't merge this.
@mikeperry-tor mikeperry-tor force-pushed the adaptive_padding-rebased_0.3.6 branch from 65a334c to 09d2269 Oct 24, 2018
Optimize the padding_sent event callpath. By moving it out from
relay_send_command_from_edge(), we avoid recursing with a cell_t and a
relay_header_t in each stack frame for back-to-back padding cells.

The recursion thing still happens, and is an argument for using Nick's
pub/sub API here instead, since no packets will be sent until we're finished
deciding to add padding packets back-to-back.
Copy link
Contributor

@teor2345 teor2345 left a comment

I did an incomplete review and found a few minor bugs, a some code that could be improved if you want, and a lot of questions. I think many of my questions could be answered by docs that talk about the general structure of the code, the threats it is trying to defend against, and the kind of timing accuracy it's trying to achieve.

/* Researchers need a way to tell their clients to use specific
* middles that they also control, to allow safe live-network
* experimentation with new padding machines. */
V(MiddleNodes, ROUTERSET, NULL),
Copy link
Contributor

@teor2345 teor2345 Oct 24, 2018

We'll also need a feature like this for PrivCount.
When will the MiddleNodes feature be merged?

(PrivCount will probably also need some way of tagging circuits to be measured, but I don't think padding needs that, because you already have padding request cells.)

Copy link
Contributor

@teor2345 teor2345 Oct 24, 2018

Please document this new option in the man page.

Do we also need to update the man page entry for StrictNodes to say that it doesn't apply to MiddleNodes?

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 25, 2018

Ok I put this in the TODO file. I will probably not get to it until next week,

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 29, 2018

@@ -2393,6 +2393,7 @@ compute_frac_paths_available(const networkstatus_t *consensus,
smartlist_free(mid);
smartlist_free(exits);

// XXX-MP-AP: Do the same for options->MiddleNodes?
Copy link
Contributor

@teor2345 teor2345 Oct 24, 2018

Yes, you need to do the same for MiddleNodes, otherwise clients may bootstrap and try to open circuits before they have any allowed middle nodes.

But the middle node code should be much simpler, because there's no need to check for the Exit flag, or exit policies.

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 25, 2018

Ok put this in the TODO too.

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 29, 2018

@@ -2580,7 +2584,24 @@ choose_good_middle_server(uint8_t purpose,
return choice;
}
Copy link
Contributor

@teor2345 teor2345 Oct 24, 2018

How do MiddleNodes and Vanguards interact?
Should we document the fact that Vanguards overrides MiddleNodes?

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 24, 2018

I think the sanest way for them to interact is by having the vanguards addon inspect MiddleNodes, and only choose HSLayer2Nodes from that set. Otherwise, the manpage should say that researchers must manually set both values to the same nodes if they want to test padding on vanguards-like circuits without using the addon.

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 29, 2018

Documented in 74404f7

FREE_AND_NULL(circpad_machineinfo_handle_t, \
circpad_machineinfo_handle_free_, (h))

#define USEC_PER_SEC (1000000)
Copy link
Contributor

@teor2345 teor2345 Oct 24, 2018

Redundant: use TOR_USEC_PER_SEC

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 25, 2018

TOR_USEC_PER_SEC is only defined in C files and test files. I can name my thing to be the same. Or I can move the tvdiff.c definitions out of their C and test files and into their header, and use that.

Though which header? Right now, tvdiff.c is the thing that defines TOR_USEC_PER_SEC, and I don't need to use tvdiff functions... Should I put it in some other header?

Copy link
Contributor

@teor2345 teor2345 Oct 26, 2018

I'm not sure, I think @nmathewson knows where time constants belong. Somewhere in lib?

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 29, 2018

Switched name to TOR_USEC_PER_SEC in f4d84db.

I put an XXX there for Nick when he looks at the branch later this week.

return start_usec+1;

return start_usec
+ (state->range_sec*USEC_PER_SEC)/(1<<(state->histogram_len-bin+1));
Copy link
Contributor

@teor2345 teor2345 Oct 24, 2018

What happens when (state->range_sec*USEC_PER_SEC)/(1<<(state->histogram_len-bin+1)) is zero?

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 24, 2018

Hrmm. A bunch of 0 sized bins.

We should probably have a circpad_machine_validate() function that checks for obvious signs of machine degeneracy like this while initializing machines. We might be able to wait for the serialization code to be written before we do that, since we only have one hardcoded machine right now that is presumed to be valid.

Copy link
Contributor

@teor2345 teor2345 Oct 26, 2018

On a related note: this line is very complex, it contains 14-16 elements: 8 values and 6-8 operators (and 3 sets of brackets). Most humans are good with 3-7 things in their head at once.

For maintainability, please split it into at least 3 lines, using temporary variables. If you make your variables const, most compilers will eliminate the extra operations. (Some will do it even if they're not const.)

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 29, 2018

static double
circpad_distribution_sample(circpad_distribution_t dist)
{
double p = 0;
Copy link
Contributor

@teor2345 teor2345 Oct 24, 2018

General comment:

If we can work out an accurate way to do p/(1.0-p), let's put it in a function and use it.

/* https://en.wikipedia.org/wiki/Weibull_distribution \
* #Cumulative_distribution_function
* param1 is k, param2 is Lambda */
return dist.param2*pow(-log1p(p), 1.0/dist.param1);
Copy link
Contributor

@teor2345 teor2345 Oct 24, 2018

Here's what we can fix:

  • pow(-log1p(p), x) will either produce negative results, or imaginary numbers, it should be pow(-log1p(-p), x)
  • Please add unit tests to catch bugs like this
  • increase the resolution of p to at least 1/2**53
  • 1.0/param1 is a constant, let's just specify param1 = 1.0/k instead

Here are my remaining questions:

  • what are typical values for param1 and param2? What are the min and max values?
  • is covering the full range of outputs important? What range of outputs do we need?
  • will infinite floating point results cause errors?
  • I don't know the typical accuracy of the pow() function, so let's get a second opinion on this code
Suggested change
return dist.param2*pow(-log1p(p), 1.0/dist.param1);
/* log(p) is inaccurate near 1 when the resolution of p is greater than 1/2**53, because p is inaccurate near 1.
* So we use log1p(-p) instead.
* The result is potentially infinite.
* The maximum value is approximately param2*pow(2**32, param1). */
return dist.param2*pow(-log1p(-p), dist.param1);

* param1 is k, param2 is Lambda */
return dist.param2*pow(-log1p(p), 1.0/dist.param1);
case CIRCPAD_DIST_PARETO:
p = 1.0-crypto_rand_double(); // Pareto quantile needs (0,1]
Copy link
Contributor

@teor2345 teor2345 Oct 24, 2018

Let's make crypto_rand_double() take flags for including the endpoints. (Or let's make a new function that takes those flags, and implement crypto_rand_double() using that function.)

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 24, 2018

Hrmm.. I am kinda against a bunch of flags if this is the only case we need it for.

I could easily write up a crypto_rand_double64() though, and use that here.

Copy link
Contributor

@teor2345 teor2345 Oct 26, 2018

I am suggesting we add two flags: allow_zero, and allow_one.
I'll also need these flags when I rewrite the Laplace code.

Copy link
Contributor

@teor2345 teor2345 Oct 26, 2018

We'll also use the flags for the logistic distribution. And I bet if we review the existing code that uses crypto_rand_double(), we'll find other cases where they're useful. (I also expect to use them in PrivCount.)

I am not sure what you mean by crypto_rand_double64().

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 29, 2018

The current implementation of crypto_rand_double() tries to use whatever INT is on the platform as the underlying range for the double. I am saying I could make one that always uses uint64_t.

* < 1. Since -log U < 1000 for all U in (0, 1] in binary64
* floating-point, we can safely cut xi off at 1e-20 < eps/4000
* and attain <1ulp error.
*/
Copy link
Contributor

@teor2345 teor2345 Oct 24, 2018

Ideally, all our floating point code should have a detailed error analysis like this.
In practice, we should document why we chose particular code, but I don't think we need to calculate ulps.
(How accurate do these distributions need to be?)

* and attain <1ulp error.
*/
if (fabs(dist.param2) <= 1e-20)
return -dist.param1*tor_mathlog(p);
Copy link
Contributor

@teor2345 teor2345 Oct 24, 2018

Here are my remaining questions:

  • what are typical values for param1 and param2? What are the min and max values?
  • is covering the full range of outputs important? What range of outputs do we need?
  • is the discontinuity at the changeover point significant? Do we need to give advice about setting param2 to just above or just below 1e-20?
  • can we get infinite floating point results from either part of this function? will infinite floating point results cause errors?

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 25, 2018

I don't actually know the typical values here. It is supported in case someone needs it, because it was easy to do. Same for Weibull..

For the changeover point, in the limit as Xi->0, these functions are the same. So it's just a matter of trying to pick which one to use when close to 0. That's the analysis Riastradh did in the comment.

@teor2345
Copy link
Contributor

@teor2345 teor2345 commented Oct 24, 2018

GitHub doesn't like the number of comments I added in my review: some of them will be hidden.

TODO notes from teor's comments; more test plans; other things.
src/lib/smartlist_core/smartlist_foreach.h Show resolved Hide resolved
src/core/or/circuitpadding.c Show resolved Hide resolved
src/core/or/circuitpadding.c Outdated Show resolved Hide resolved
src/core/or/circuitpadding.c Show resolved Hide resolved
src/core/or/circuitpadding.c Show resolved Hide resolved
cell_t cell;
ssize_t len;

if (!circpad_circuit_supports_padding(circ, target_hopnum)) {
Copy link
Member

@asn-d6 asn-d6 Oct 25, 2018

This seems to be an error path. Let's document it as such and also explain the scenarios that can make us end up here.

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 25, 2018

Also documented in eb11886

{
const circpad_state_t *state = circpad_machine_current_state(mi);
circpad_delay_t start_usec;
int bin;
Copy link
Member

@asn-d6 asn-d6 Oct 25, 2018

My compiler warns me aabout the fact that state can be NULL here:

+     src/core/or/circuitpadding.c: In function ‘circpad_histogram_usec_to_bin’:
+     src/core/or/circuitpadding.c:162:7: error: potential null pointer dereference [-Werror=null-dereference]
+   if (state->use_rtt_estimate)
+       ^~~~~

Copy link
Member

@asn-d6 asn-d6 Oct 25, 2018

I patched it with:

+  if (!state) {
+    return CIRCPAD_DELAY_INFINITE;
+  }

just to be able to proceed.

Copy link
Contributor Author

@mikeperry-tor mikeperry-tor Oct 25, 2018

Huh. This is checked explicitly by the caller (circpad_machine_remove_token()). Unless maybe the unit tests don't check it, and that's where the error came from?

Anyway I added a check in d00ff6b

src/core/or/circuitpadding.c Outdated Show resolved Hide resolved
Rename and refactor circuit_get_nth_hop().
Satisfy pedantic compiler.
Fix loops for cell delivery events.

The initial implementation did not have fixed machine indexes - instead we
just used first available. But that would cause non-deterministic event
ordering between two machines. So now we specify machine index. But this also
means that either machine slot can be empty.
* Fix typo in Weibull use of pow(-log1p(-p), x).
* Specify our parameters as 1.0/param when distributions use that value,
  so we can control rounding issues directly.
mikeperry-tor and others added 6 commits Oct 29, 2018
Co-Authored-By: mikeperry-tor <github@unencrypted.info>
Co-Authored-By: mikeperry-tor <github@unencrypted.info>
Co-Authored-By: mikeperry-tor <github@unencrypted.info>
Hrmm. I hope this is right. I don't understand the mapping between C enums and
rust ones..
Co-Authored-By: mikeperry-tor <github@unencrypted.info>
src/core/or/circuitpadding.c Outdated Show resolved Hide resolved
mikeperry-tor added 20 commits Oct 30, 2018
Comment circuit fields a bit better.
Clarify machine conditions type.
Clamp iat_dist sampling.
I still don't know about this log1p stuff..
Improvements to machine creation events.
Machine creation event improvements.
Add relay_early event; rename purpose_changed event.
Comment why we need shutdown on END as an option; also teardown the
machineinfo in that case so it can be replaced immediately.
Mock time to fix test flapping.
Pad circsetup only while we have relay_early left.
Save infinity bin in case it was scheduled.
@mikeperry-tor
Copy link
Contributor Author

@mikeperry-tor mikeperry-tor commented Oct 31, 2018

Closing this PR in favor of the squashed one (#461).

PADDING_TODO.txt Outdated
- Specify ports for machine??

- circpad_machine_validate() function to sanity-check histograms loaded from
consensus/torrc.
Copy link
Contributor

@teor2345 teor2345 Nov 1, 2018

Fuzz code that loads anything complex from consensus/torrc.

/**
* EWMA estimate of the RTT of the circuit from this hop
* to the exit end, in microseconds. */
circpad_delay_t rtt_estimate_us;
Copy link
Contributor

@teor2345 teor2345 Nov 1, 2018

Consistency in the C files is also important for maintainability.
In the C files, I've also found bin_start, bin_end, and rtt_time.

Can you standardise on "usec" everywhere? Including function names?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment