Alignments using C API #3326

benjeffery · 2025-11-14T20:37:57Z

~~Stacked on #3319~~

codecov · 2025-11-14T20:44:06Z

Codecov Report

❌ Patch coverage is 93.65079% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.75%. Comparing base (5ffcf6f) to head (ee0b2c9).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
python/_tskitmodule.c	91.83%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3326      +/-   ##
==========================================
+ Coverage   89.72%   89.75%   +0.03%     
==========================================
  Files          29       29              
  Lines       31181    31225      +44     
  Branches     5720     5728       +8     
==========================================
+ Hits        27976    28026      +50     
+ Misses       1796     1792       -4     
+ Partials     1409     1407       -2

Flag	Coverage Δ
c-tests	`86.77% <100.00%> (+0.06%)`	⬆️
lwt-tests	`80.38% <ø> (ø)`
python-c-tests	`87.09% <91.83%> (+0.04%)`	⬆️
python-tests	`98.84% <100.00%> (-0.01%)`	⬇️
python-tests-no-jit	`33.58% <0.00%> (+0.02%)`	⬆️
python-tests-numpy1	`50.15% <0.00%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
c/tskit/genotypes.c	`82.50% <100.00%> (+2.02%)`	⬆️
python/tskit/trees.py	`98.88% <100.00%> (-0.01%)`	⬇️
python/_tskitmodule.c	`87.09% <91.83%> (+0.04%)`	⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jeromekelleher · 2025-11-19T16:57:21Z

Slightly uneasy about the API breakage here, although I agree it's the right approach. I guess returning a numpy array is in practise quite similar to returning an iterator?

jeromekelleher

Generally looks good!

jeromekelleher · 2025-11-19T17:00:01Z

python/tskit/trees.py

+            bool(isolated_as_missing),
+        )
+
+        flat_arr = np.frombuffer(


Can you test that this is definitely the same underlying buffer as the low-level version? A copy would be bad here.

jeromekelleher · 2025-11-19T17:05:30Z

python/_tskitmodule.c

+    num_nodes = (tsk_size_t) PyArray_DIM(nodes_array, 0);
+    nodes = PyArray_DATA(nodes_array);
+
+    Py_ssize_t total


This is a bit convoluted cast-wise, what about

total = (Py_ssize_t)(num_nodes * (tsk_size_t)(right - left)

standard crib about declarations at the top.

jeromekelleher · 2025-11-19T17:06:10Z

python/_tskitmodule.c

+    }
+    buf = PyBytes_AS_STRING(buf_obj);
+
+    Py_BEGIN_ALLOW_THREADS err = tsk_treeseq_decode_alignments(self->tree_sequence,


You have to turn off clang format locally to prevent it messing things up around Py_BEGIN_ALLOW_THREADS

benjeffery · 2025-11-19T18:02:19Z

I guess returning a numpy array is in practise quite similar to returning an iterator?

I'm 50/50 on it - I don't like that the iterator suggests to the user that this is a memory efficient method, but you're right that it is bit of a break, although for i in alignments should still work.

benjeffery · 2025-11-19T18:24:40Z

But then haplotypes returns an iterator - so probably best to follow that pattern or change haplotypes too.

jeromekelleher · 2025-11-19T19:35:10Z

I think we follow the pattern. We can add another more explicit numpy approach later if needed. Not breaking stuff is what we do.

jeromekelleher · 2025-11-19T20:17:33Z

Also iterator pattern would allow us do things like specify a memory budget,which would be quite handy

jeromekelleher · 2025-11-20T09:37:52Z

This last point is decisive for me - something like

for a in ts.alignments(max_mem="1G"):
     # do something with a

would be super handy, especially if it double-buffered and decoded alignments in a background thread while the foreground thread was feeding the iterator.

We do not need to do this for the 1.0 release though!

benjeffery · 2025-11-20T11:50:40Z

This last point is decisive for me - something like
for a in ts.alignments(max_mem="1G"):
     # do something with a
would be super handy, especially if it double-buffered and decoded alignments in a background thread while the foreground thread was feeding the iterator.

We do not need to do this for the 1.0 release though!

OMG I love this.

benjeffery · 2025-11-20T17:38:42Z

Ok back to an iterator with other comments addressed and a nasty bug with return values fixed.

jeromekelleher · 2025-11-21T09:37:56Z

LGTM - I think we just need to verify the memory-tightness (standard loop on something medium sized) to check the Python-C bit and we're done.

benjeffery · 2025-11-21T10:37:28Z

Ran with a 5000 sample, 100000 length tree sequence for 200 iterations, max memory at iteration 22.

jeromekelleher · 2025-11-21T10:46:36Z

Great. Can you check before/after memory and time for (say) 100K SARS-CoV-2 samples. If we have a nice perf bump we can add to the changelog.

benjeffery · 2025-11-25T11:43:11Z

This branch around 30s. main I killed after 5min.

jeromekelleher · 2025-11-25T13:14:26Z

It's not main we're comparing to though, it's the last released version. Can you compare please and update the CHANGELOG accordingly?

benjeffery · 2025-11-25T14:21:58Z

Ah, of course. In progress

benjeffery force-pushed the alignments-python branch from 7d75d5a to 970e0b0 Compare November 17, 2025 13:17

benjeffery added 2 commits November 19, 2025 16:33

Implement alignments using C API

c30e643

More tests for C side

4558bdb

benjeffery force-pushed the alignments-python branch from 9c46c51 to 4558bdb Compare November 19, 2025 16:33

jeromekelleher reviewed Nov 19, 2025

View reviewed changes

benjeffery force-pushed the alignments-python branch 2 times, most recently from cd4b359 to 4db3588 Compare November 20, 2025 17:38

Change back to iterator

ee0b2c9

benjeffery force-pushed the alignments-python branch from 4db3588 to ee0b2c9 Compare November 21, 2025 10:36

benjeffery marked this pull request as ready for review November 21, 2025 10:40

jeromekelleher approved these changes Nov 21, 2025

View reviewed changes

jeromekelleher added this to the Python 1.0 milestone Nov 24, 2025

benjeffery added this pull request to the merge queue Nov 25, 2025

Merged via the queue into tskit-dev:main with commit f904749 Nov 25, 2025
18 checks passed

benjeffery deleted the alignments-python branch November 25, 2025 12:07

benjeffery mentioned this pull request Nov 25, 2025

Enable alignments on sc2ts ARG internal nodes #3293

Closed

Alignments using C API #3326

Alignments using C API #3326

Uh oh!

Conversation

benjeffery commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jeromekelleher commented Nov 19, 2025

Uh oh!

jeromekelleher left a comment

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

benjeffery commented Nov 19, 2025

Uh oh!

benjeffery commented Nov 19, 2025

Uh oh!

jeromekelleher commented Nov 19, 2025

Uh oh!

jeromekelleher commented Nov 19, 2025

Uh oh!

jeromekelleher commented Nov 20, 2025

Uh oh!

benjeffery commented Nov 20, 2025

Uh oh!

benjeffery commented Nov 20, 2025

Uh oh!

jeromekelleher commented Nov 21, 2025

Uh oh!

benjeffery commented Nov 21, 2025

Uh oh!

jeromekelleher commented Nov 21, 2025

Uh oh!

benjeffery commented Nov 25, 2025

Uh oh!

Uh oh!

jeromekelleher commented Nov 25, 2025

Uh oh!

benjeffery commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

benjeffery commented Nov 14, 2025 •

edited

Loading

codecov bot commented Nov 14, 2025 •

edited

Loading