-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add statistical tests against existing simulator #102
Comments
See the msprime verification.py script which does lots of this type of thing, and also the section in the msprime developer docs |
I am thinking about comparing tstrait with SLiM and AlphaSimR. I am planning to simulate traits and genetic information from those packages, obtain the tree sequence from the simulated genetic information, and simulate traits by using tstrait. I think that would be a good comparison, as SLiM and AlphaSimR are widely used packages. |
If you do that though, you'll have to take into account the different processes that generate the ARG in the first place wont you? Much simpler if you take a given ARG, and then generate traits either using tstrait, or one of the methods that are based on input sequences. These totally don't have to be big examples, the R methods that read in the full VCF are fine. |
@jeromekelleher I'm thinking about simulating a small tree sequence in msprime, put it into AlphaSimR as a founder population, and simulate the phenotype information just from that founder population (I'm not planning to do any genetic simulation with AlphaSimR), but what do you think about this? |
To me this sounds more complicated than exporting to vcf, but if you think we can do what we need with AlphaSim, then great. |
I managed to compare tstrait with AlphaSimR, and we managed to observe a consistency with the results after standardizing the simulated genetic values (AlphaSimR standardizes the simulated genetic values to make sure that it is perfectly the same with the input mean and variance). |
I will upload the comparison codes soon. |
Yeah, don’t go into the kinship abyss |
I added a comparison with AlphaSimR in #108. I used various parameter combinations, and the QQ-plots look amazing Many thanks to @jeromekelleher and @gregorgorjanc for valuable suggestions. |
I think now it is safe to say that tstrait's simulated genetic values are having the correct properties (or otherwise the simulation through AlphaSimR is incorrect, which is very unlikely). |
I should also note that the QQ-plot is not producing a straight line simply because the effect sizes are simulated from a normal distribution. For example, if scaling is not done in tstrait's genetic values (scaling is conducted in AlphaSimR), we will observe a strange QQ-plot. Thus, we can say that the simulated genetic values from both tstrait and AlphaSimR are having similar distributions. |
The statistical tests can be now added to tstrait by using verification.py. All tests must be a subclass of the Test class defined in https://github.com/tskit-dev/tstrait/blob/main/verification.py#L38, and the test methods must start with ''test_''. The codes were largely adapted from msprime/verification.py, and please see its documentation for details. |
I did this in Pull request #132, but I'm not sure if this validation step would be a good test, considering that all individuals will have a similar normal-looking distributions. So even if I use two different individuals to produce a QQ-plot, the results match. Instead, I would like to propose a similar validation test that simplePHENOTYPES did https://github.com/samuelbfernandes/simplePHENOTYPES/blob/master/vignettes/Supplementary.pdf, and examine if the simulation output of tstrait is exactly the same as the simulation output of other packages. I would like to propose that we completely ignore the QQ-plot comparison, and we instead do the following:
For these exact test, we will be simulating traits + effect sizes by using an extrenal program, and we will put those effect sizes into the tstrait package to see if the simulated output will exactly match. What do you think about this @jeromekelleher ? |
The exact tests are performed here (#140), but we plan to add further tests to examine the complete pipeline of the simulation framework. |
We should be testing our results statistically by comparing them against other simulators. There's no reason we can do this at the small scale.
It shouldn't be too hard to use e.g. PhenotypeSimulator on some simple simulations, and qqplot the results as a comparison with our results.
So, we would do something like:
This will take some work to do, but is an important validation step.
The text was updated successfully, but these errors were encountered: