-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with mutate
on same geneology sampled at two time points
#38
Comments
Sorry, didn't see this earlier. So, If I understand correctly, there should be many edges in common between T1 and T2 (like, a bunch of T1 should be the upper bit of T2), so you'd like to get those mutations being the same. They're not, because you haven't got the same edges in the same order. Is that right? The way that I would do this is instead of saving out the simulation at T1, I would just Remember all the individuals alive at that time, then keep going to T2 and save then. That way you get a single tree sequence with both time points in it. I'm about to update pyslim (in a few hours) to a version that has a function |
Thanks Peter, we didn't check if the orders of the edges were different between the two timepoints, so that was probably the issue. Your solution sounds like it should work. So we would: (i) output at T2, (ii) recapitate, (iii) mutate, then (iv) use ts.individuals_at(T1) to get the individuals at that timepoint. Then we can output the individual genomes at both timepoints for our analysis. |
Yep! If this sounds good, then will you close this issue? Ping me if you would like further input on making this happen. |
One more thing to clarify - if there has been a pruning event between T1 and T2, the individuals we sample at T1 would be the ancestors of T2 (and therefore a biased representation of all individuals that were alive at T1)? |
I'm not sure what you mean by "a pruning event" - do you mean a "simplification" of the tree sequence? I am suggesting putting something like
in your recipe, which would ensure that all individuals in population |
Hi Peter, I'm attempting to implement what you've described with pyslim v. 0.31 ('pip3 install pyslim --upgrade' gives this version), SLiM v. 3.3, and python3 v. 3.7.3; After running the SLiM simulation to completion with a minimum SLiM code:
to generate the file recipe_16.1_T2.trees, I'm attempting to access the individuals at Generation 1000 with the following:
but the last command brings the error "object has no attribute 'individuals_at'" Attempting Is the problem the version of pyslim I'm getting from pip? |
Apologies, that's because the function is called |
Hi Peter, That is a better name, and thank you once again very much for all your help, but I'm afraid I'm still having some troubles with this. Here's what I'm running:
Could you possibly show me an example of how this should be going? |
To make a tree sequence back into a pyslim.SlimTreeSequence, you just need to use the pyslim.SlimTreeSequence( ) function:
So that you didn't have to do this, we could define a After I do that, then
I don't know why the |
Hi Peter, Trying to output both time point 1 and time point 2 as VCF files to compare their mutation shifts is still giving difficulties. I run this code in SLiM 3.3
This, as I understand it is supposed to save the individuals at Gen. 1000, as well as the individuals at Gen. 1200 so that we can compare them down the line. Next I run the python 3 script below:
Attempting to export the recapped and mutated files as VCFs returns the error: "'numpy.ndarray' object has no attribute 'write_vcf'" and attempting this with an msprime.mutate file obviously get the "'TreeSequence' object has no attribute 'individuals_alive_at'" error. The "individuals_alive_at" seems to not return any individuals and we're unable to export them anyway to a VCF file. Is the issue in SLiM not saving the individuals or msprime not registering them? |
The "individuals_alive_at" seems to not return any individuals and we're
unable to export them anyway to a VCF file. Is the issue in SLiM not saving
the individuals or msprime not registering them?
Sigh: this is an annoying gotcha. This is because of the unfortunate
difference between time in slim and "time" in msprime, which is really
"time ago". The time argument to individuals_alive_at is "time ago", i.e.,
time since the end of the simulation:
individuals_alive_at(self, time)
Returns an array giving the IDs of all individuals that are known to be
alive at the given time ago. This is determined by seeing if their age
So, I think that what you want is
mutatedT2.individuals_alive_at(200) # at slim time step 1000 = 1200 - 200
but you should probably double-check what times are actually present, by
looking at
set(T2.individual_times)
and perhaps also set(T2.individual_ages), if this is a nonWF model.
Hope that clears things up?
|
Almost there but still circling the drain a bit.
shows that the times present are
so there should be individuals to export for a VCF file at time 0 (Gen. 1200), time 200 (Gen. 1000), and time 1199 (sim. starting population). However individuals_alive_at seems to be returning an empty array no matter the time
But most interestingly, when exporting T2 as a VCF file
it exports a VCF file with information for 1000 individuals, as opposed to the 500 specified in the SLiM simulation. So it seems to be combining the individuals from T1 and T2 in the output file.
So at least the individuals do in fact exists in the mutatedT2 file, they're just not separated by time... |
Oh, dear, this is actually a bug, many apologies. I sure thought that
I've slightly lost track of what you want to do with that that list of individuals; recall that this is individuals, not genomes, so to e.g. simplify down to just those you need to first extract their node ids. |
I've fixed that bug, now, so if you reinstall from git you could continue to use |
Also, this issue may have strayed beyond its original purpose. I am going to close it, but feel free to reopen if appropriate, or to open a new one for different issues. |
We are conducting some climate change simulations, and are having some issues with overlaying mutations on the same genealogy sampled at two different timepoints.
We run the simulation for a while in SliM, output the .trees file at the first timepoint (T1), and then read this back into SliM and run the simulation again under the climate change scenario to output the second timepoint (T2).
When we recaptitate both trees, we get the same tree topology, but when we try overlaying neutral mutations to both trees (with the same seed) we get completely different genomes with almost no overlap among the location of variants. (In traditional forward time simulations we see many shared variants in the two timepoints, so this result was not an artifact of the simulation.) For full details including links to the files needed to reproduce the problem see here:
https://github.com/TestTheTests/TTT_Offset_Vulnerability_GF_Sims/blob/master/Notebook/2019_05_03_Mutate_Recap_notes.md
For now, we just want to understand better how "mutate" works and why this problem is happening.
The text was updated successfully, but these errors were encountered: