Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tables.tree_sequence() does not check if nodes from an individual are contiguous #750

Closed
hyanwong opened this issue Aug 6, 2020 · 7 comments · Fixed by #752
Closed

tables.tree_sequence() does not check if nodes from an individual are contiguous #750

hyanwong opened this issue Aug 6, 2020 · 7 comments · Fixed by #752

Comments

@hyanwong
Copy link
Member

hyanwong commented Aug 6, 2020

In the docs it says "For simplicity and algorithmic efficiency, all nodes referring to the same (non-null) individual must be contiguous.". But this isn't checked when converting tables to a tree sequence (see below). I assume this could lead to bugs.

ts = msprime.simulate(4, random_seed=1)
tables = ts.tables
tables.individuals.add_row()
tables.individuals.add_row()
ind = tables.nodes.individual
ind[ts.samples()] = [0, 1, 0, 1]  # make nodes from same individual non-contiguous
tables.nodes.individual = ind
new_ts = tables.tree_sequence()
print(new_ts.individual(0)).nodes

array([0, 2], dtype=int32)
@petrelharp
Copy link
Contributor

I vaguely remember deciding that we should enforce this, and the only individual-producing code I can think of - which is SLiM - will ensure this. So, that seems good. And, I still think it's a good idea. It could cause someone's script that makes up individuals produce a bad tree sequence, so this should definately be added to the CHANGELOG under "possibly breaking changes".

And, btw, this requirement means that adding individuals to - say - an msprime simulation could require re-ordering the nodes, so it's a good thing we have subset to do that now.

@petrelharp
Copy link
Contributor

p.s. I can't think of anywhere this would currently lead to bugs, so that's good anyhow.

@jeromekelleher
Copy link
Member

jeromekelleher commented Aug 6, 2020

This does seem like a bug if we're not enforcing the property and we're using it internally. I'm not sure that we are though, it seems like a pretty big oversight if we forgot to test it.

@petrelharp
Copy link
Contributor

I'll have a look through the code - we don't use node.individual in very many places, so it won't be too bad to check.

@petrelharp
Copy link
Contributor

petrelharp commented Aug 6, 2020

Nope - we really don't seem to be relying on this property anywhere in the C code. The code that finds which nodes go to which individual (tsk_treeseq_init_individuals) would be a lot simpler, in fact, if we did rely on nodes from the same individual being adjacent.

@hyanwong
Copy link
Member Author

hyanwong commented Aug 6, 2020

So should I delete the comment in the docs then?

@jeromekelleher
Copy link
Member

So should I delete the comment in the docs then?

Yes please - we've clearly gone to the trouble of implementing and testing this without assuming adjacency, and the docs must have been forgotten about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants