-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finish/remove simulate_from #572
Comments
Coming back to the example in #541, this is what we should have:
and this is what we get
The only way I can see around this is to use the tables to record the state of these extant ancestors. For example, we might explicitly store the segment chain for each one as individual metadata, probably via a list of (double, double, int32) tuples. I'm sure this would work well for backwards-in-time simulations, and would be a lot simpler in the end as we wouldn't need to piece together the root segments for trees. However, it's not clear to me whether the same approach could be applied to forward simulations. Do we have this information for forward-time simulations? If not, then it may be best to just forget about this idea, as we already have a good way to composing different backward-time simulation models within msprime. |
Ah, I see! Yes, that is annoying. Let's see: I think that this would not be a problem if any nodes that contain not-totally-coalesced segments at the longest-ago-time are recorded in the tables, right? In forwards simulations, we can guarantee this just by marking the intial generation as Is this right? If the tables say there are two nonoverlapping segments inheriting from the same node at the time we start recapitating from, would these two segments end up in the same individual? For reverse-time simulations, we would also have this information if we recorded in the tables nodes and edges corresponding to any not-yet-coalesced segments at the time we stop simulating. Am I making sense? |
I've come across another side effect of the same issue: recapitating a tree sequence with a lot of roots can take a LOT of memory. I'm encountering it in trees produced by SLiM; trying to replicate it with just msprime produces a different, but I think related, issue:
The balooning memory happens in set-up, specifically in this loop. One issue here looks to be that the loop produces a new segment for each root of each tree, when many of these could be merged into a single segment. |
Hm, I realized my suggestion earlier was foolish:
Yes, this would work, but it defeats the purpose of recapitation (we might as well start with a tree sequence). We could keep the required information around by (a) marking the initial generation as samples, but then (b) just before recapitation, removing them as samples and simplifying. Otherwise, we'll need simplify to record nodes and edges for not-totally-coalesced roots to retain this information. |
Thanks @petrelharp, very helpful. The memory issue can surely be solved --- I was hoping I'd get away with this laziness, but I know how to tackle it. I'll have a think about the deeper problem and try playing with some smaller forward sims to see what I can come up with. |
Note: we should make sure to document that simulate from remaps samples, just as |
I'm having second thoughts about this. Calling What do you think @petrelharp? |
I was thinking the same thing. |
Or if you're going along getting the "tree heights" by getting the root times. |
Closed in #581. |
#541 added a method to complete a simulation from an existing tree sequence that has partially coalesced. However, this method is a poor approximation of the standard coalescent, as it discards all information about how different root nodes are grouped together into individuals.
We either need to (a) figure out a better way to do it, or (b) remove this method.
cc @petrelharp, @bhaller.
The text was updated successfully, but these errors were encountered: