-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add special case for merging non-overlapping segments #1931
Comments
I have run simulations with Drosophila-like parameters: Every
Heatmap showing the hull width (yellow) for a single run for all lineages present at time = |
Very interesting thanks @GertjanBisschop! I think we need one more bit of information: what is the average number of segments per lineage at these time points? It looks like the probability of two randomly chosen lineages overlapping is small (as expected), but our special case is worthwhile only if the number of segments the lineages carry is more than (say) 5. |
Hmm, ok, so this means that this special case we're talking about above won't have much effect. Great work @GertjanBisschop - you just saved us a whole bunch of refactoring that would have ended in disappointment! I'm going to close issue. We may want to implement the special case later if we change the data structures around a bit so that it's easy (i.e., we store the head and tail of the lineages segments, not just the head) but it's not worth doing that refactoring just for this minor optimisation. |
Our analysis of the running time of Hudson's algorithm in the msprime 1.0 paper predicts that a lot of the time spent for long genomes will be in events that merge two widely separated chunks of ancestry. It may therefore be worthwhile adding a special case in the
merge_ancestors
code path to deal with this case, where given two segment chainsa
andb
, we have the right-most segment ofa
is < the left most segment ofb
. However, we currently don't record the extremities of the segments per individual, so we would need to refactor things somewhat.This would also only be worth doing if the number of segments in
a
andb
we reasonably large, so a first step would be to do some exploratory work to see what the average number of segments in this "gather" phase of the simulation really is. If anyone is interested in making their Drosophila simulations go faster, then this would be a good place to start.The text was updated successfully, but these errors were encountered: