torproject / tor Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration patterns 1: publish/subscribe messaging #452
Conversation
(Currently, it is all unimplemented.)
We'll be using this for four kinds of identifier in dispatch.c
We already have something like this that we use for controller events; we should eventually refactor it to use this.
Pull Request Test Coverage Report for Build 2887
|
src/lib/dispatch/msg.h
Outdated
| #include "lib/dispatch/msg_impl.h" | ||
|
|
||
| #define DECLARE_MESSAGE_COMMON_(messagename, typename, ctype) \ | ||
| typedef ctype msg_arg_type__ ##messagename; \ |
ctype is a confusing name here. It's both a macro parameter for the concrete C type and a component of the msg_arg_ctype__-prefixed typedef name below (apparently to mean a const version of the type).
How about renaming the typename parameter to typestr, and making it a string parameter (so there's no need for a # string-generating operator)? Then rename ctype to typename.
Maybe it would help to group the DECLARE_MESSAGE_ macros together, and have a comment that's a high-level overview describing how they're related? Like "declare set and get functions with names that look like ____"? Also maybe explain that the parameter types of these functions differ, and how that helps to achieve type safety.
Okay, will do. I'm not sure I'll go exactly with these names, but I'll try to find something better.
I've tried to explain all of this in e3b1e73; better now?
| typedef union { | ||
| void *ptr; | ||
| uint64_t u64; | ||
| } msg_aux_data_t; |
Does anything assume that void * is the same size as uint64_t? Do we care about wasted space on 32-bit platforms?
I actually did this union because I'm assuming that we don't want our integers to max out at 64 bits on some platforms and 32 bits on other. I don't think the 4 bytes per message will matter.
I've tried to explain this in e3b1e73; better now?
src/lib/dispatch/msg.h
Outdated
| * | ||
| * It helps with strong typing. | ||
| */ | ||
| #define DECLARE_MESSAGE(messagename, typename, ctype) \ |
Why is there a ctype parameter when there's already a typedef name defined above that can be derived from messagename?
It is passed to the DECLARE_MESSAGE_COMMON_() macro, which is supposed to be only for internal use between DECLARE_MESSAGE() and DECLARE_MESSAGE_INT(). I didn't intend that people call DECLARE_MESSAGE_COMMON_() directly.
I've tried to explain all of this in e3b1e73; better now?
src/lib/cc/compat_compiler.h
Outdated
| @@ -217,4 +217,8 @@ | |||
| /** Macro: Yields the number of elements in array x. */ | |||
| #define ARRAY_LENGTH(x) ((sizeof(x)) / sizeof(x[0])) | |||
|
|
|||
| /* Eat a semicolon that somebody puts at the end of a global macro. */ | |||
Maybe explain a little how this works, like app/config/confparse.h does for a very similar use?
src/lib/dispatch/msg.h
Outdated
|
|
||
| /** | ||
| * Use this macro inside a C file to declare that we'll be publishing a given | ||
| * message type from within this module. It helps with strong typing. */ |
Maybe group DECLARE_PUBLISH and DECLARE_SUBSCRIBE, and add an overview comment explaining how the stuff they declare gets used by the other macros?
I've tried to explain all of this in e3b1e73; better now?
| * Use this macro inside a C file to declare that we're subscribing to a | ||
| * given message and associating it with a given "hook function". It | ||
| * declares the hook function static, and helps with strong typing. | ||
| */ |
So the caller has to write the body of hoookfn as a static function with the correct signature?
src/lib/dispatch/.may_include
Outdated
| lib/dispatch/*.h | ||
| lib/log/*.h | ||
| lib/malloc/*.h | ||
| lib/string/*.h |
Document the EAT_SEMICOLON macro better.
Add a long comment about how to use it, document its internals, and
expand on the existing documentation.
Also, try to disambiguate "ctype" -- use "consttype" when we mean
the constant version of a type, and "c_type" when we mean a type in
C, as opposed to the unique identifier ("typename") for a type.
Add an EOF newline.
| struct smartlist_t; | ||
| /** | ||
| * A message queue, backed by a pair of smartlists. | ||
| */ |
I'm not sure I understand why there needs to be this additional complexity of using a pair of smartlists. Is this to work around the limitations of smartlists?
We want to remove from the front of the queue and append to the back. But smartlists are just dynamic C arrays -- they only grow or shrink in one direction: so, pushing to the end of a smartlist is amortized O(1), and popping from the end is O(1), but popping or pushing at the front is O(N).
To get an O(1) pop, we have two arrays: one stored in reverse order that is draining, and one stored in forward order that is filling.
As an alternative, we could use a single array, and use various tricks to make push and pop O(1) -- like treating the array as circular and growing it when it becomes full, for example. The implementation isn't essential here, and it should be fine to replace this mqueue implementation with any other array-backed queue with O(1) amortized behavior.
Did that answer your question? Shall I put it in a comment?
Hm. I think maybe it's worth it to go with a simpler implementation here. I have a slight preference for taking a O(n) performance hit to delete from the front of the array, under the assumption that n will tend to be small. At a minimum there's already a penalty of O(n) to do the reversal whenever the "flushing" smartlist is empty and we need to refill it. (Also memmove() is likely better optimized than our code that does smartlist reversal. Then again, with the single smartlist approach, we would potentially call memmove() for each pop compared to reversing the list fewer times in the dual smartlist approach.)
If it turns out that we do need O(1) amortized pop-from-beginning, we can implement a circular queue. Are there other places in tor that could take advantage of a ring buffer or circular queue?
Do we already have expected use cases for pubsub where we expect large queues? I'm not sure.
I'm undecided whether the dual smartlist approach is more or less risky than a circular queue in terms of complexity.
Then again, with the single smartlist approach, we would potentially call memmove() for each pop compared to reversing the list fewer times in the dual smartlist approach.
Right. I'm pretty sure that the dual-smartlist approach is amortized O(1) for pop, as opposed to the single smartlist approach which is always O(N). [Proof sketch: every time we reverse a smartlist, it takes N move operations, but it gives us N elements that we can pop before we have to reverse a smartlist again.]
Are there other places in tor that could take advantage of a ring buffer or circular queue?
Hm. Possibly the queued_control_events array in control.c, or the workqueue structure in workqueue.c would like something like this.
I'm undecided whether the dual smartlist approach is more or less risky than a circular queue in terms of complexity.
Hm. I thought the dual stack approach is simpler, but I'll try hacking out the ring buffer one so we can compare.
I've done two slightly different ringbuf implementations in "messaging_ringbuf" and "messaging_ringbuf_v2". I don't vouch for their correctness: much more testing and analysis would be needed. Nonetheless, they might help come up with a picture of how the complexity differs between the two approaches.
Summarizing some IRC conversation: using the BSD queue macros (maybe SIMPLEQ?) seems like a logical thing here: always O(1) enqueue and dequeue. The downsides are that they involve intrusive structures. msg_t is visible to at least the receiving functions in subscribers, so the intrusive link fields would be visible to them as well. We don't gain much locality of reference from using an array-based implementation, given that it's arrays of pointers to individually heap-allocated objects.
Wrapping a msg_t in another struct might be one way to hide the links from the consuming code, assuming that only approved functions get used to allocate or free these objects. This is similar to an approach I've seen elsewhere in tor.
I think it's less complicated to allow the link structures to be intrusive, so I think we should do that unless there's a good reason. (Do we need to be defensive against other tor developers inappropriately using the link fields in a publisher or subscriber? Data hiding comes with a complexity cost, and C isn't a good language for hiding that complexity.)
Initial comments on the main dispatch code. I'll probably have more comments later, but wanted to get these out there while I followed up on the mqueue.h stuff. (GitHub doesn't seem to let you edit multiple pending reviews at once.)
| #include "lib/dispatch/msgtypes.h" | ||
|
|
||
| /** | ||
| * Overview: Messages are sent over channels. Before sending a message on a |
Is this meant to be the main header file? It seems it might be, given the amount of documentation comments. If so, maybe it should be called dispatch.h?
| for (i=0; i < N_FAST_FNS && i < n_fns; ++i) { | ||
| ent->fns[i](m); | ||
| } | ||
| for ( ; i < n_fns; ++i) { |
I'm not sure I like how the same index gets used for two different arrays and we do arithmetic on it. Is there a reason these aren't kept as separate lengths?
| * them, and return -1. Otherwise return 0. | ||
| **/ | ||
| static int | ||
| lint_message(const dispatch_adjacency_map_t *map, message_id_t msg) |
This is a long (150 line) function that could probably be broken into helper functions. The bitarray tests for example could possibly be a separate function. It also should get unit-tested. Maybe test_dispatch_build.c covers it adequately? It's not completely clear.
It might even make sense to put the extensive checking code in a separate file.
| **/ | ||
| typedef struct dispatch_adjacency_map_t { | ||
| /* XXXX The next three fields are currently constructed but not yet | ||
| * XXXX used. I beleive we'll want them in the future, though. -nickm |
Maybe we shouldn't declare stuff we're not going to use until we actually use it? This is an internal interface so it doesn't make sense to try to keep it stable. This also eliminates the code that maintains these fields.
It does look like n_subsystems gets used by some lint_message()stuff in dispatch_build.c so this comment isn't quite accurate.
Or if we think the'll be useful for debugging, actually add some functions to print/log their contents in a helpful way?
No description provided.
The text was updated successfully, but these errors were encountered: