-
Notifications
You must be signed in to change notification settings - Fork 78
Alignments using C API #3326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alignments using C API #3326
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3326 +/- ##
==========================================
+ Coverage 89.72% 89.75% +0.03%
==========================================
Files 29 29
Lines 31181 31225 +44
Branches 5720 5728 +8
==========================================
+ Hits 27976 28026 +50
+ Misses 1796 1792 -4
+ Partials 1409 1407 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
7d75d5a to
970e0b0
Compare
9c46c51 to
4558bdb
Compare
|
Slightly uneasy about the API breakage here, although I agree it's the right approach. I guess returning a numpy array is in practise quite similar to returning an iterator? |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good!
python/tskit/trees.py
Outdated
| bool(isolated_as_missing), | ||
| ) | ||
|
|
||
| flat_arr = np.frombuffer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test that this is definitely the same underlying buffer as the low-level version? A copy would be bad here.
python/_tskitmodule.c
Outdated
| num_nodes = (tsk_size_t) PyArray_DIM(nodes_array, 0); | ||
| nodes = PyArray_DATA(nodes_array); | ||
|
|
||
| Py_ssize_t total |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit convoluted cast-wise, what about
total = (Py_ssize_t)(num_nodes * (tsk_size_t)(right - left)
standard crib about declarations at the top.
python/_tskitmodule.c
Outdated
| } | ||
| buf = PyBytes_AS_STRING(buf_obj); | ||
|
|
||
| Py_BEGIN_ALLOW_THREADS err = tsk_treeseq_decode_alignments(self->tree_sequence, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to turn off clang format locally to prevent it messing things up around Py_BEGIN_ALLOW_THREADS
I'm 50/50 on it - I don't like that the iterator suggests to the user that this is a memory efficient method, but you're right that it is bit of a break, although |
|
But then |
|
I think we follow the pattern. We can add another more explicit numpy approach later if needed. Not breaking stuff is what we do. |
|
Also iterator pattern would allow us do things like specify a memory budget,which would be quite handy |
|
This last point is decisive for me - something like would be super handy, especially if it double-buffered and decoded alignments in a background thread while the foreground thread was feeding the iterator. We do not need to do this for the 1.0 release though! |
OMG I love this. |
cd4b359 to
4db3588
Compare
|
Ok back to an iterator with other comments addressed and a nasty bug with return values fixed. |
|
LGTM - I think we just need to verify the memory-tightness (standard loop on something medium sized) to check the Python-C bit and we're done. |
4db3588 to
ee0b2c9
Compare
|
Ran with a 5000 sample, 100000 length tree sequence for 200 iterations, max memory at iteration 22. |
|
Great. Can you check before/after memory and time for (say) 100K SARS-CoV-2 samples. If we have a nice perf bump we can add to the changelog. |
|
This branch around 30s. |
|
It's not main we're comparing to though, it's the last released version. Can you compare please and update the CHANGELOG accordingly? |
|
Ah, of course. In progress |
Stacked on #3319