-
Notifications
You must be signed in to change notification settings - Fork 78
Expose sorting API and test C++ implementation. #627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Easiest/laziest thing regarding exceptions is the following: try
{
std::sort(blah blah blah);
}
catch(...)
{
return -1;
}This is like a bare |
|
As for tests, I'd just make sure the return value is 0 and then use existing C logic along with your existing test framework. I think you could also make the C++ sorting work via a function with C linkage, so that you can compile tests using that function as if they are C. That'll work as long as any exceptions are caught. |
Codecov Report
@@ Coverage Diff @@
## master #627 +/- ##
==========================================
+ Coverage 87.45% 87.46% +0.01%
==========================================
Files 24 24
Lines 18156 18162 +6
Branches 3606 3609 +3
==========================================
+ Hits 15878 15886 +8
+ Misses 1100 1099 -1
+ Partials 1178 1177 -1
Continue to review full report at Codecov.
|
|
Looks very reasonable to me. I only skimmed the diffs; I can do a more full review if necessary. The question I have at this point is: why is this scheme being implemented only for the sorting of edges? Wouldn't it make sense to do the same thing for the sorting of the other tables? Perhaps I missed the beginning of the conversation regarding this. |
The other sorts aren't really registering in profiles the way edge sorting is. |
Might that depend upon the model being run? Or are you pretty sure it's not a problem for any model? I suppose we can always add the same sort of delegated sorting for other tables later if it turns out to be necessary... |
|
You can always make a model to emphasize mutation sorting: set u = 1000 and r = 0. But a lot of the models folks seem to be wanting in involve r >= u and the edge sorting function is the most expensive of the 3--its comparison function requires the most logic. We could add the ability for other callback, but if you're doing 2, you probably want to do all 3, and at that point you've written a stand alone sorter. |
The only reason for skipping it @bhaller was I wanted to see if you and @molpopgen agreed with the basic architecture before I bothered setting it up for the other tables. No reason why we wouldn't expose the functionality, although things are a bit trickier for mutation and site tables. |
|
Great, thanks both. I'll finish this up, document it and ping you for a review in a few days. One quick question @molpopgen - is this safe: int
sort_edges(tsk_table_sorter_t *sorter, stuff)
{
try {
ret = cpp_sort_edges(sorter, stuff);
} catch (...)
ret = -1;
}
return ret;
}That seems like a reasonable thing to recommend in the documentation then? |
I believe that it is safe*, but I find the semantics confusing--why does an edge sorting function need to know about the entire sorter layout? What would this mean in terms of the callback proposal described above?
|
This is the same structure I have above. Rather than functions with callbacks, I thought it would be simpler to just overwrite the function pointer on the "class". The |
|
If I understand right, the sorter still contains a callback? So you have a c++ function taking a pointer to a sorter containing a callback to another c++ function? |
|
Yes, exactly, that's how it's set up in the |
That's right. The signature doesn't include the function name, which is "mangled" if it is a C++ function. |
|
Interesting idea from @molpopgen offline: we should check if Probably wouldn't generalise to sites and mutations though, since they must be done in a certain order (which will be documented). |
Yeah, the home thread should suffice there anyways: the async edge sorter could run in parallel. |
c31eb7a to
0fa49e9
Compare
530627a to
1bf765a
Compare
|
This should be nearly done now, I think. I hit an annoying problem with doxygen where it's ignoring the typedef'd struct name, so will have to figure out a way around that. Should be done except for documentation though, if you'd like to base your sorting work from this @mufernando. I'll try and get this finished up ASAP and merge though. |
d287db1 to
883d50c
Compare
|
This is ready for review I think - @molpopgen, @bhaller, any chance you could take a look please? Hopefully we can do the actual example as a follow up, I think the C++ tests should be a clear enough indication of how this is intended to be used for now. |
|
Will take a look tomorrow. |
bhaller
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems very good, just trivial tweaks and comments/questions.
| tables.sequence_length = 1; | ||
|
|
||
| // ret = tsk_table_collection_simplify(&tables, NULL, 0, 0, NULL); | ||
| ret = tsk_table_collection_simplify(&tables, NULL, 0, 0, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear why this is in the diffs; is this just an unrelated bug fix that's along for the ride?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, exactly. I must have spotted it while I was in there.
| goto out; | ||
| } | ||
| ret = table_sorter_sort_sites(self); | ||
| if (self->sort_edges != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a utility to not sorting the edges, for some case? I.e., is that why passing NULL produces that behavior, instead of just asserting that self->sort_edges must be non-NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can imagine the following type of logic (psueodcode) being useful for very large tables:
std::packaged_task<int()> efficient_edge_sorter(...);
std::future<int> sorter_future = efficient_edge_sorter.get_future();
tsk_table_collection_sort(...); // sort_edges = nullptr
sorter_future.wait();The result is that the edge sorting is done asynchronously to the sorting of the other tables, which only depend on times, meaning the contents of the node table.
|
@jeromekelleher -- I think that this is in excellent shape. The only outstanding issue that I see is @bhaller's comment about giving the other tables a similar treatment. I'm fine holding off on that for now, though. |
883d50c to
af86c0e
Compare
|
Thanks a bunch @bhaller and @molpopgen, this was really helpful. I'll open an issue to track adding the C++ example - I think it'd be a good idea for us to come up with most efficient (single threaded) way we can think of to sort the tables. I |
af86c0e to
09d41ef
Compare
Closes #616
Here's an initial proposal @molpopgen and @bhaller. I've tested out basic C++ linkage by stealing @molpopgen's nice code, which should hopefully make the proposal clear.
Basically, in C++ code we'd have
One thing I'm a little bit uneasy about is what happens if
std::sortthrows an exception within thesort_edgesfunction. The C code is expectingsort_edgesto return an error if something bad happens, so what'll happen if an exception occurs? Will it all just work out, or will things explode?I guess we should test it. (If anyone has ideas for less lame ways than
assert(something)for defining the C++ tests that doesn't involve an annoying dependency, I'd love to hear it!)