-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Converting VCF to Genepop #177
Comments
Dear Alex, Big reptile fan here... Several at tortoises, turtles and lizards at home. Try running
You can always send me in private the files to speed up the debugging Best |
Hi Thierry! Well I'm a big fan of Montreal, and radiator, so the admiration is mutual! I tried running
Interestingly, As an aside, I'm working to migrate my scripts away from PGDSpider, so I definitely appreciate the |
I usually deal with stacks, ipyrad vcf specificity automatically |
With latest push (v.1.2.8) problem should be fixed |
Depending on what you intend to do, several route to get the genepop file and all should work:
or this
or this:
|
But since you want to work with NeEstimator and I assume your data is filtered, I did a couple checks.
You see that 9 SNPs were removed because it was considered monomorphic. radiator also automatically removed the markers not in common between your population/strata, it's quite a few 45% so I would check the filters you did before generating that VCF. You can check the folder: |
2 more checks I did...because of the red flags I got from looking at the read_vcf folder
the first one looks for duplicates. Technical ones forgotten or real ones generated by lab problems. When you look at the Manhattan plot. Usually samples < 0.25 are considered duplicate or I look really hard into the pair of samples. They are not that much, usually. And depending on projects, the pair of samples around the 0.5 mark are usually considered close kin... and if you run colony or Kinference it will highlight the relationship deeper. If you have dots that are grey, it means that the samples pair are from different strata/pop. Usually not good if it's < or around 0.25. I would use the file Conclusion:
|
Answer no to this. Have a look at the boxplot and Manhattan figures. You will understand why the previous analysis generated false-closekin ... From the Manhattan plot I would say that you have a missing data problem:
|
Thanks for your detailed explanation, Thierry. I really appreciate it. This dataset is from Alligator Snapping Turtles, and the three strata represent different species. Even the among-population divergence is quite strong in these species (FST ~ 0.4), but the among-species divergence is higher than I've ever seen (FST ~ 0.8, despite only being separated by ~50 km). It's not surprising that there are lots of missing sites among the species. I came to this conclusion separately, but it is great to know how easy it can be done in radiator. I was running this dataset through NeEstimator just out of curiosity to see if any of these species might meet the proposed IUCN cutoff for low genetic diversity (Ne << 50). I knew it wasn't the best way to calculate Ne, but I appreciate your feedback nonetheless. Thanks for helping me get radiator working with this vcf. |
Different species make sense. MATE094 and MATE263 are probably not in the good strata. But even that, the duplicate analysis shows problematic close kin relationship. If it's really different species, I strongly suggest filtering separately before running NeEstimator or LDNe. Good Luck with your analysis |
Thanks for your help!
…On Fri, Apr 7, 2023 at 9:20 AM Thierry Gosselin ***@***.***> wrote:
Different species make sense. MATE094 and MATE263 are probably not in the
good strata. But even that, the duplicate analysis shows problematic close
kin relationship. If it's really different species, I strongly suggest
filtering separately before running NeEstimator or LDNe.
Good Luck with your analysis
—
Reply to this email directly, view it on GitHub
<#177 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADAEIHCPLQJMB7TPXVSVONDXAAIA5ANCNFSM6AAAAAAWOWZ3LI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hey @thierrygosselin. Still having the same problem with
|
I'm encountering a problem that I don't fully understand when trying to convert a VCF to Genepop format (to use in NeEstimator).
Here is the code that I'm running:
The
pop-file.csv
contains two columns:ind
with a list of individual names as they appear in the VCF, andpop
with their population assignments.The strata do seem to be properly imported to the VCF.
read_vcf()
outputs:However,
write_genepop()
gives an error saying that the column STRATA doesn't exist. Any idea why this might be? Here's the fullwrite_genepop()
error:The text was updated successfully, but these errors were encountered: