Skip to content

Conversation

@benjeffery
Copy link
Member

@benjeffery benjeffery commented Nov 14, 2025

Stacked on #3319

@codecov
Copy link

codecov bot commented Nov 14, 2025

Codecov Report

❌ Patch coverage is 93.65079% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.75%. Comparing base (5ffcf6f) to head (ee0b2c9).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
python/_tskitmodule.c 91.83% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3326      +/-   ##
==========================================
+ Coverage   89.72%   89.75%   +0.03%     
==========================================
  Files          29       29              
  Lines       31181    31225      +44     
  Branches     5720     5728       +8     
==========================================
+ Hits        27976    28026      +50     
+ Misses       1796     1792       -4     
+ Partials     1409     1407       -2     
Flag Coverage Δ
c-tests 86.77% <100.00%> (+0.06%) ⬆️
lwt-tests 80.38% <ø> (ø)
python-c-tests 87.09% <91.83%> (+0.04%) ⬆️
python-tests 98.84% <100.00%> (-0.01%) ⬇️
python-tests-no-jit 33.58% <0.00%> (+0.02%) ⬆️
python-tests-numpy1 50.15% <0.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
c/tskit/genotypes.c 82.50% <100.00%> (+2.02%) ⬆️
python/tskit/trees.py 98.88% <100.00%> (-0.01%) ⬇️
python/_tskitmodule.c 87.09% <91.83%> (+0.04%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jeromekelleher
Copy link
Member

Slightly uneasy about the API breakage here, although I agree it's the right approach. I guess returning a numpy array is in practise quite similar to returning an iterator?

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good!

bool(isolated_as_missing),
)

flat_arr = np.frombuffer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test that this is definitely the same underlying buffer as the low-level version? A copy would be bad here.

num_nodes = (tsk_size_t) PyArray_DIM(nodes_array, 0);
nodes = PyArray_DATA(nodes_array);

Py_ssize_t total
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit convoluted cast-wise, what about

total = (Py_ssize_t)(num_nodes  * (tsk_size_t)(right - left)

standard crib about declarations at the top.

}
buf = PyBytes_AS_STRING(buf_obj);

Py_BEGIN_ALLOW_THREADS err = tsk_treeseq_decode_alignments(self->tree_sequence,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to turn off clang format locally to prevent it messing things up around Py_BEGIN_ALLOW_THREADS

@benjeffery
Copy link
Member Author

I guess returning a numpy array is in practise quite similar to returning an iterator?

I'm 50/50 on it - I don't like that the iterator suggests to the user that this is a memory efficient method, but you're right that it is bit of a break, although for i in alignments should still work.

@benjeffery
Copy link
Member Author

But then haplotypes returns an iterator - so probably best to follow that pattern or change haplotypes too.

@jeromekelleher
Copy link
Member

I think we follow the pattern. We can add another more explicit numpy approach later if needed. Not breaking stuff is what we do.

@jeromekelleher
Copy link
Member

Also iterator pattern would allow us do things like specify a memory budget,which would be quite handy

@jeromekelleher
Copy link
Member

This last point is decisive for me - something like

for a in ts.alignments(max_mem="1G"):
     # do something with a

would be super handy, especially if it double-buffered and decoded alignments in a background thread while the foreground thread was feeding the iterator.

We do not need to do this for the 1.0 release though!

@benjeffery
Copy link
Member Author

This last point is decisive for me - something like

for a in ts.alignments(max_mem="1G"):
     # do something with a

would be super handy, especially if it double-buffered and decoded alignments in a background thread while the foreground thread was feeding the iterator.

We do not need to do this for the 1.0 release though!

OMG I love this.

@benjeffery benjeffery force-pushed the alignments-python branch 2 times, most recently from cd4b359 to 4db3588 Compare November 20, 2025 17:38
@benjeffery
Copy link
Member Author

Ok back to an iterator with other comments addressed and a nasty bug with return values fixed.

@jeromekelleher
Copy link
Member

LGTM - I think we just need to verify the memory-tightness (standard loop on something medium sized) to check the Python-C bit and we're done.

@benjeffery
Copy link
Member Author

Ran with a 5000 sample, 100000 length tree sequence for 200 iterations, max memory at iteration 22.

@benjeffery benjeffery marked this pull request as ready for review November 21, 2025 10:40
@jeromekelleher
Copy link
Member

Great. Can you check before/after memory and time for (say) 100K SARS-CoV-2 samples. If we have a nice perf bump we can add to the changelog.

@jeromekelleher jeromekelleher added this to the Python 1.0 milestone Nov 24, 2025
@benjeffery
Copy link
Member Author

This branch around 30s. main I killed after 5min.

@benjeffery benjeffery added this pull request to the merge queue Nov 25, 2025
Merged via the queue into tskit-dev:main with commit f904749 Nov 25, 2025
18 checks passed
@benjeffery benjeffery deleted the alignments-python branch November 25, 2025 12:07
@jeromekelleher
Copy link
Member

It's not main we're comparing to though, it's the last released version. Can you compare please and update the CHANGELOG accordingly?

@benjeffery
Copy link
Member Author

Ah, of course. In progress

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants