-
Notifications
You must be signed in to change notification settings - Fork 79
Decouple the LightweightTableCollection from C module. #862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the quick patch. I had one question which might be obvious, but I couldn't grasp it from your test code. It seems like there is no way to go from a https://github.com/terhorst/xsmc/blob/631b6cf34dc1187d687d00c9f457afad38c3c971/xsmc/xsmc.py#L49-L57 It seems like I might be missing something -- I don't really see what's the gain of going through Asking because it's fairly easy to bundle the |
Codecov Report
@@ Coverage Diff @@
## main #862 +/- ##
=======================================
Coverage 93.52% 93.52%
=======================================
Files 24 25 +1
Lines 19927 19932 +5
Branches 789 789
=======================================
+ Hits 18636 18641 +5
Misses 1259 1259
Partials 32 32
Continue to review full report at Codecov.
|
benjeffery
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@jeromekelleher shall we merge now and do the docs/example bulids in a follow up?
|
That's a good question @terhorst. If you have your data in a LightweightTableCollection tables = tskit.TableCollection.fromdict(lwt.asdict())
ts = tables.tree_sequence()However, if I understand your application you'll probably want to do this on the C side. So, you have a lwt = xmsc.LightweightTableCollection(ts.sequence_length)
lwt.fromdict(ts.tables.asdict())
# pass lwt to your class/function that's doing the C stuff.Then, in C tsk_treeseq_t ts;
// lwt is a LightweightTableCollection instance
ret = tsk_treeseq_init(&ts, lwt->tables, TSK_BUILD_INDEXES);
// now you can do stuff with &ts.There's a bit of inefficiency here in that we need to recompute the indexes (I think the index data isn't included in the dictionary encoding at the moment), but it's pretty minimal unless the tree sequence is big. Does this make sense/help? |
8203950 to
3076a0c
Compare
I don't mind - are you OK with the names of things? I didn't really much thought into how that would affect things. But yeah, if your happy with it as is, then lets merge and lodge an issue for follow-ups. One thing I'd like to sort out is how we can reuse the tests in downstream repos without editing the code. I think the only change we make at the moment is the |
@jeromekelleher I've just pushed a commit here showing how this might be done, basically you move the tests into |
python/tests/test_dict_encoding.py
Outdated
| tskit.TableCollection.fromdict(d) | ||
| # Check we've done something | ||
| self.assertTrue(seen_msprime) | ||
| from tskit.dict_encoding_tests import * # noqa: F403,F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good idea, but this will mean that we use the version of the tests from the installed version of tskit, not the version which corresponds to the C-lib version. These could get out of date - suppose we have msprime using version 0.99.X and everything is working fine. Then we release a tskit version which is based on 0.99.X + 1, which has an added some extra fields to the data model that are irrelevant for msprime. Then, these tests will break. It would essentially force us to update the C API in all downstream packages everytime we update the C API in Python tskit, which I don't think we want to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably simplest to mess around with this downstream and see what can be made work. Maybe we merge the basic change first, and see how we go? It'll make things easier to test if we can point msprime at a hash in the main repo rather than pointing at a fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the tskit-using-application tests need to pull from the tests from the submodule of tskit, not the installed tskit. So it's either a weird looking import or a symlink I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, lets back out the last commit i did, then merge.
8b2aca0 to
3076a0c
Compare
3076a0c to
6a5eb93
Compare
An initial attempt to pull out the LightweightTableCollection code in a way that's reusable among downstream Python projects.
Motivated by #655, but also it'd be nice to get rid of a few K-lines of boilerplate from msprime and tsinfer.
The idea is that Python projects (like msprime) that use the tskit C API can include this new header file into their own low-level C module. Since they are already doing a git submodule for tskit, then by pulling in this file from the tskit source it should ensure that the code stays up to date and those projects don't need to worry about compatibility issues.
I'm sure this will work for Python C modules like we're using in msprime/tsinfer, but we're kinda hoping that this will also work for Cython code, like @terhorst has been using.