Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration patterns 1: publish/subscribe messaging #452

Closed
wants to merge 10 commits into from

Conversation

Labels
None yet
Projects
None yet
4 participants
@nmathewson
Copy link
Contributor

@nmathewson nmathewson commented Oct 29, 2018

No description provided.

@coveralls
Copy link

@coveralls coveralls commented Oct 29, 2018

Pull Request Test Coverage Report for Build 2887

  • 474 of 484 (97.93%) changed or added relevant lines in 4 files are covered.
  • 5348 unchanged lines in 34 files lost coverage.
  • Overall coverage increased (+0.3%) to 62.324%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/lib/dispatch/dispatch.c 74 76 97.37%
src/lib/dispatch/dispatch_build.c 328 336 97.62%
Files with Coverage Reduction New Missed Lines %
src/feature/hs/hs_config.c 2 85.96%
src/lib/thread/compat_pthreads.c 4 91.78%
src/lib/evloop/timers.c 4 93.83%
src/core/or/versions.c 4 95.74%
src/lib/tls/tortls.c 8 95.45%
src/lib/err/torerr.c 8 85.51%
src/lib/crypt_ops/crypto_init.c 8 81.13%
src/feature/dirparse/authcert_parse.c 18 68.93%
src/lib/tls/tortls_openssl.c 23 95.89%
src/lib/compress/compress.c 25 84.24%
Totals Coverage Status
Change from base Build 2595: 0.3%
Covered Lines: 44704
Relevant Lines: 71728

💛 - Coveralls

Copy link
Contributor

@tlyu tlyu left a comment

This batch of comments is mostly about msg*.h.

#include "lib/dispatch/msg_impl.h"

#define DECLARE_MESSAGE_COMMON_(messagename, typename, ctype) \
typedef ctype msg_arg_type__ ##messagename; \
Copy link
Contributor

@tlyu tlyu Nov 9, 2018

ctype is a confusing name here. It's both a macro parameter for the concrete C type and a component of the msg_arg_ctype__-prefixed typedef name below (apparently to mean a const version of the type).

How about renaming the typename parameter to typestr, and making it a string parameter (so there's no need for a # string-generating operator)? Then rename ctype to typename.

Copy link
Contributor

@tlyu tlyu Nov 9, 2018

Maybe it would help to group the DECLARE_MESSAGE_ macros together, and have a comment that's a high-level overview describing how they're related? Like "declare set and get functions with names that look like ____"? Also maybe explain that the parameter types of these functions differ, and how that helps to achieve type safety.

Copy link
Contributor Author

@nmathewson nmathewson Nov 9, 2018

Okay, will do. I'm not sure I'll go exactly with these names, but I'll try to find something better.

Copy link
Contributor Author

@nmathewson nmathewson Nov 10, 2018

I've tried to explain all of this in e3b1e73; better now?

typedef union {
void *ptr;
uint64_t u64;
} msg_aux_data_t;
Copy link
Contributor

@tlyu tlyu Nov 9, 2018

Does anything assume that void * is the same size as uint64_t? Do we care about wasted space on 32-bit platforms?

Copy link
Contributor Author

@nmathewson nmathewson Nov 9, 2018

I actually did this union because I'm assuming that we don't want our integers to max out at 64 bits on some platforms and 32 bits on other. I don't think the 4 bytes per message will matter.

Copy link
Contributor Author

@nmathewson nmathewson Nov 10, 2018

I've tried to explain this in e3b1e73; better now?

*
* It helps with strong typing.
*/
#define DECLARE_MESSAGE(messagename, typename, ctype) \
Copy link
Contributor

@tlyu tlyu Nov 9, 2018

Why is there a ctype parameter when there's already a typedef name defined above that can be derived from messagename?

Copy link
Contributor Author

@nmathewson nmathewson Nov 9, 2018

It is passed to the DECLARE_MESSAGE_COMMON_() macro, which is supposed to be only for internal use between DECLARE_MESSAGE() and DECLARE_MESSAGE_INT(). I didn't intend that people call DECLARE_MESSAGE_COMMON_() directly.

Copy link
Contributor Author

@nmathewson nmathewson Nov 10, 2018

I've tried to explain all of this in e3b1e73; better now?

@@ -217,4 +217,8 @@
/** Macro: Yields the number of elements in array x. */
#define ARRAY_LENGTH(x) ((sizeof(x)) / sizeof(x[0]))

/* Eat a semicolon that somebody puts at the end of a global macro. */
Copy link
Contributor

@tlyu tlyu Nov 9, 2018

Maybe explain a little how this works, like app/config/confparse.h does for a very similar use?

Copy link
Contributor Author

@nmathewson nmathewson Nov 9, 2018

Sure; will do

Copy link
Contributor Author

@nmathewson nmathewson Nov 10, 2018

Documented in 3c4d5d3


/**
* Use this macro inside a C file to declare that we'll be publishing a given
* message type from within this module. It helps with strong typing. */
Copy link
Contributor

@tlyu tlyu Nov 9, 2018

Maybe group DECLARE_PUBLISH and DECLARE_SUBSCRIBE, and add an overview comment explaining how the stuff they declare gets used by the other macros?

Copy link
Contributor Author

@nmathewson nmathewson Nov 9, 2018

Sounds good.

Copy link
Contributor Author

@nmathewson nmathewson Nov 10, 2018

I've tried to explain all of this in e3b1e73; better now?

* Use this macro inside a C file to declare that we're subscribing to a
* given message and associating it with a given "hook function". It
* declares the hook function static, and helps with strong typing.
*/
Copy link
Contributor

@tlyu tlyu Nov 9, 2018

So the caller has to write the body of hoookfn as a static function with the correct signature?

Copy link
Contributor Author

@nmathewson nmathewson Nov 9, 2018

That's correct.

Copy link
Contributor Author

@nmathewson nmathewson Nov 10, 2018

Documented in e3b1e73

lib/dispatch/*.h
lib/log/*.h
lib/malloc/*.h
lib/string/*.h
Copy link
Contributor

@tlyu tlyu Nov 9, 2018

nit: missing EOL

Copy link
Contributor Author

@nmathewson nmathewson Nov 10, 2018

Fixed in 8240d16

nmathewson added 3 commits Nov 10, 2018
Document the EAT_SEMICOLON macro better.
Add a long comment about how to use it, document its internals, and
expand on the existing documentation.

Also, try to disambiguate "ctype" -- use "consttype" when we mean
the constant version of a type, and "c_type" when we mean a type in
C, as opposed to the unique identifier ("typename") for a type.
Copy link
Contributor

@tlyu tlyu left a comment

This batch is for the mqueue stuff.

struct smartlist_t;
/**
* A message queue, backed by a pair of smartlists.
*/
Copy link
Contributor

@tlyu tlyu Nov 12, 2018

I'm not sure I understand why there needs to be this additional complexity of using a pair of smartlists. Is this to work around the limitations of smartlists?

Copy link
Contributor Author

@nmathewson nmathewson Nov 13, 2018

We want to remove from the front of the queue and append to the back. But smartlists are just dynamic C arrays -- they only grow or shrink in one direction: so, pushing to the end of a smartlist is amortized O(1), and popping from the end is O(1), but popping or pushing at the front is O(N).

To get an O(1) pop, we have two arrays: one stored in reverse order that is draining, and one stored in forward order that is filling.

As an alternative, we could use a single array, and use various tricks to make push and pop O(1) -- like treating the array as circular and growing it when it becomes full, for example. The implementation isn't essential here, and it should be fine to replace this mqueue implementation with any other array-backed queue with O(1) amortized behavior.

Did that answer your question? Shall I put it in a comment?

Copy link
Contributor

@tlyu tlyu Nov 13, 2018

Hm. I think maybe it's worth it to go with a simpler implementation here. I have a slight preference for taking a O(n) performance hit to delete from the front of the array, under the assumption that n will tend to be small. At a minimum there's already a penalty of O(n) to do the reversal whenever the "flushing" smartlist is empty and we need to refill it. (Also memmove() is likely better optimized than our code that does smartlist reversal. Then again, with the single smartlist approach, we would potentially call memmove() for each pop compared to reversing the list fewer times in the dual smartlist approach.)

If it turns out that we do need O(1) amortized pop-from-beginning, we can implement a circular queue. Are there other places in tor that could take advantage of a ring buffer or circular queue?

Do we already have expected use cases for pubsub where we expect large queues? I'm not sure.

I'm undecided whether the dual smartlist approach is more or less risky than a circular queue in terms of complexity.

Copy link
Contributor Author

@nmathewson nmathewson Nov 14, 2018

Then again, with the single smartlist approach, we would potentially call memmove() for each pop compared to reversing the list fewer times in the dual smartlist approach.

Right. I'm pretty sure that the dual-smartlist approach is amortized O(1) for pop, as opposed to the single smartlist approach which is always O(N). [Proof sketch: every time we reverse a smartlist, it takes N move operations, but it gives us N elements that we can pop before we have to reverse a smartlist again.]

Are there other places in tor that could take advantage of a ring buffer or circular queue?

Hm. Possibly the queued_control_events array in control.c, or the workqueue structure in workqueue.c would like something like this.

I'm undecided whether the dual smartlist approach is more or less risky than a circular queue in terms of complexity.

Hm. I thought the dual stack approach is simpler, but I'll try hacking out the ring buffer one so we can compare.

Copy link
Contributor Author

@nmathewson nmathewson Nov 14, 2018

I've done two slightly different ringbuf implementations in "messaging_ringbuf" and "messaging_ringbuf_v2". I don't vouch for their correctness: much more testing and analysis would be needed. Nonetheless, they might help come up with a picture of how the complexity differs between the two approaches.

Copy link
Contributor

@tlyu tlyu Nov 19, 2018

Summarizing some IRC conversation: using the BSD queue macros (maybe SIMPLEQ?) seems like a logical thing here: always O(1) enqueue and dequeue. The downsides are that they involve intrusive structures. msg_t is visible to at least the receiving functions in subscribers, so the intrusive link fields would be visible to them as well. We don't gain much locality of reference from using an array-based implementation, given that it's arrays of pointers to individually heap-allocated objects.

Wrapping a msg_t in another struct might be one way to hide the links from the consuming code, assuming that only approved functions get used to allocate or free these objects. This is similar to an approach I've seen elsewhere in tor.

I think it's less complicated to allow the link structures to be intrusive, so I think we should do that unless there's a good reason. (Do we need to be defensive against other tor developers inappropriately using the link fields in a publisher or subscriber? Data hiding comes with a complexity cost, and C isn't a good language for hiding that complexity.)

Copy link
Contributor

@tlyu tlyu left a comment

namemap stuff looks good

Copy link
Contributor

@tlyu tlyu left a comment

Initial comments on the main dispatch code. I'll probably have more comments later, but wanted to get these out there while I followed up on the mqueue.h stuff. (GitHub doesn't seem to let you edit multiple pending reviews at once.)

#include "lib/dispatch/msgtypes.h"

/**
* Overview: Messages are sent over channels. Before sending a message on a
Copy link
Contributor

@tlyu tlyu Nov 19, 2018

Is this meant to be the main header file? It seems it might be, given the amount of documentation comments. If so, maybe it should be called dispatch.h?

for (i=0; i < N_FAST_FNS && i < n_fns; ++i) {
ent->fns[i](m);
}
for ( ; i < n_fns; ++i) {
Copy link
Contributor

@tlyu tlyu Nov 19, 2018

I'm not sure I like how the same index gets used for two different arrays and we do arithmetic on it. Is there a reason these aren't kept as separate lengths?

* them, and return -1. Otherwise return 0.
**/
static int
lint_message(const dispatch_adjacency_map_t *map, message_id_t msg)
Copy link
Contributor

@tlyu tlyu Nov 19, 2018

This is a long (150 line) function that could probably be broken into helper functions. The bitarray tests for example could possibly be a separate function. It also should get unit-tested. Maybe test_dispatch_build.c covers it adequately? It's not completely clear.

It might even make sense to put the extensive checking code in a separate file.

**/
typedef struct dispatch_adjacency_map_t {
/* XXXX The next three fields are currently constructed but not yet
* XXXX used. I beleive we'll want them in the future, though. -nickm
Copy link
Contributor

@tlyu tlyu Nov 19, 2018

Maybe we shouldn't declare stuff we're not going to use until we actually use it? This is an internal interface so it doesn't make sense to try to keep it stable. This also eliminates the code that maintains these fields.

It does look like n_subsystems gets used by some lint_message()stuff in dispatch_build.c so this comment isn't quite accurate.

Or if we think the'll be useful for debugging, actually add some functions to print/log their contents in a helpful way?

@torproject-pusher torproject-pusher deleted the branch torproject:master May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment