Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove table copies from tree_sequence_t and add options to share tables #23

Closed
jeromekelleher opened this issue Mar 15, 2018 · 7 comments
Assignees
Labels
C API Issue is about the C API enhancement New feature or request

Comments

@jeromekelleher
Copy link
Member

It should now be possible to make a tree sequence based on a "borrowed" or "stolen" reference to a table_collection_t. For a borrowed reference, we store a pointer to the supplied tables, and do not free these tables when the tree sequence object is destroyed. For a stolen reference, we store a pointer to the supplied table collection which we free when the tree sequence object is destroyed. These can be specified with (mutually exclusive) flags. The default behaviour should be the present case, where we take a copy of the argument table_collection_t.

This behaviour will be useful for simulations, where we really don't need to have two copies of the same tables. However, we'll need to be careful to ensure that the underlying tables don't get modified. Possibly need to add some locks to the tables to ensure this.

See also tree_sequence_load for current wasteful behaviour.

@jeromekelleher jeromekelleher self-assigned this Mar 15, 2018
@petrelharp petrelharp transferred this issue from tskit-dev/msprime Jan 10, 2019
@benjeffery benjeffery added C API Issue is about the C API enhancement New feature or request labels Sep 29, 2020
@jeromekelleher jeromekelleher added this to the C API 1.0.0 milestone Jul 23, 2021
@jeromekelleher
Copy link
Member Author

We should consider whether we want to do this for 1.0, and if it's something that we'd need to introduce backward incompatible changes later to support. I've added it to the 1.0 milestone for now.

@molpopgen
Copy link
Member

I think this would be very handy, especially for people working with large tables.

@jeromekelleher
Copy link
Member Author

From an API perspective, what we need to enable this is a flag to say "transfer ownership of this table collection", where the tree sequence would tree the table collection at the end of it's lifetime. This would allow us to implement load more efficiently.

Because we need to support this behaviour, the tables parameter and struct member must remain non-const. So the change that we need to make is to drop the const qualifier on tsk_treeseq_init.

@jeromekelleher
Copy link
Member Author

I think this is done now @benjeffery?

@benjeffery
Copy link
Member

Not completly as the issue talks about borrowed references. We have only implemented stealing the reference. If we think that's enough then this could still be closed.

@molpopgen
Copy link
Member

I think a stolen reference is good enough. On the C side, managing memory safety of a tree sequence if a mutable table collection is "live" will be annoying. It may be a win to discourage such things.

@jeromekelleher
Copy link
Member Author

Let's close this - I don't think we want to make any more changes to the semantics post 1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C API Issue is about the C API enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants