Add statistical tests against existing simulator #102

jeromekelleher · 2023-11-15T14:54:07Z

We should be testing our results statistically by comparing them against other simulators. There's no reason we can do this at the small scale.

It shouldn't be too hard to use e.g. PhenotypeSimulator on some simple simulations, and qqplot the results as a comparison with our results.

So, we would do something like:

Simulate a small tree sequence using msprime for say, 10 samples, and then for n replicates each do:
Compute phenotypes for each of the 10 individuals under a given model to get a distribution per-individual using tstrait
Export the data to VCF and do the same thing with an external too/ We can then compare the per-individual distributions as qqplots, which should be meaningful.

This will take some work to do, but is an important validation step.

jeromekelleher · 2023-11-15T15:04:04Z

See the msprime verification.py script which does lots of this type of thing, and also the section in the msprime developer docs

daikitag · 2023-11-15T15:06:53Z

I am thinking about comparing tstrait with SLiM and AlphaSimR. I am planning to simulate traits and genetic information from those packages, obtain the tree sequence from the simulated genetic information, and simulate traits by using tstrait. I think that would be a good comparison, as SLiM and AlphaSimR are widely used packages.

jeromekelleher · 2023-11-15T17:30:50Z

If you do that though, you'll have to take into account the different processes that generate the ARG in the first place wont you?

Much simpler if you take a given ARG, and then generate traits either using tstrait, or one of the methods that are based on input sequences.

These totally don't have to be big examples, the R methods that read in the full VCF are fine.

daikitag · 2023-11-15T17:40:15Z

@jeromekelleher
I'm thinking about comparing tstrait with AlphaSimR, as they are using the exact same algorithm that we are using (see Third Step: Assign Quantitative Trait Nucleotide Effects, ... in https://acsess.onlinelibrary.wiley.com/doi/10.3835/plantgenome2016.02.0013), and they have written the codes to incorporate tree sequence file into their simulation algorithm (https://github.com/gaynorr/AlphaSimR_Examples/blob/master/misc/msprime.R).
I imagine that AlphaSimR will obtain similar results as us, as I had taken a lot of ideas from them.

I'm thinking about simulating a small tree sequence in msprime, put it into AlphaSimR as a founder population, and simulate the phenotype information just from that founder population (I'm not planning to do any genetic simulation with AlphaSimR), but what do you think about this?
Given the popularity of AlphaSimR, I think we only need to compare tstrait with AlphaSimR.

jeromekelleher · 2023-11-15T19:09:06Z

To me this sounds more complicated than exporting to vcf, but if you think we can do what we need with AlphaSim, then great.

daikitag · 2023-11-16T01:31:13Z

I managed to compare tstrait with AlphaSimR, and we managed to observe a consistency with the results after standardizing the simulated genetic values (AlphaSimR standardizes the simulated genetic values to make sure that it is perfectly the same with the input mean and variance).
Comparison with PhenotypeSimulator sounds pretty complicated, as we need to input kinship to simulate effect sizes (https://hannahvmeyer.github.io/PhenotypeSimulator/reference/geneticBgEffects.html) and their environmental noise simulation is not using heritability values (https://hannahvmeyer.github.io/PhenotypeSimulator/reference/noiseBgEffects.html).

daikitag · 2023-11-16T01:31:55Z

I will upload the comparison codes soon.

gregorgorjanc · 2023-11-16T06:24:23Z

Comparison with PhenotypeSimulator sounds pretty complicated, as we need to input kinship to simulate effect sizes

Yeah, don’t go into the kinship abyss

daikitag · 2023-11-17T01:40:36Z

I added a comparison with AlphaSimR in #108. I used various parameter combinations, and the QQ-plots look amazing

Many thanks to @jeromekelleher and @gregorgorjanc for valuable suggestions.

daikitag · 2023-11-17T01:45:03Z

I think now it is safe to say that tstrait's simulated genetic values are having the correct properties (or otherwise the simulation through AlphaSimR is incorrect, which is very unlikely).

daikitag · 2023-11-17T01:49:07Z

I should also note that the QQ-plot is not producing a straight line simply because the effect sizes are simulated from a normal distribution. For example, if scaling is not done in tstrait's genetic values (scaling is conducted in AlphaSimR), we will observe a strange QQ-plot. Thus, we can say that the simulated genetic values from both tstrait and AlphaSimR are having similar distributions.

daikitag · 2024-02-01T09:35:39Z

The statistical tests can be now added to tstrait by using verification.py. All tests must be a subclass of the Test class defined in https://github.com/tskit-dev/tstrait/blob/main/verification.py#L38, and the test methods must start with ''test_''. The codes were largely adapted from msprime/verification.py, and please see its documentation for details.

daikitag · 2024-02-09T14:09:58Z

We should be testing our results statistically by comparing them against other simulators. There's no reason we can do this at the small scale.

It shouldn't be too hard to use e.g. PhenotypeSimulator on some simple simulations, and qqplot the results as a comparison with our results.

So, we would do something like:

Simulate a small tree sequence using msprime for say, 10 samples, and then for n replicates each do:

Compute phenotypes for each of the 10 individuals under a given model to get a distribution per-individual using tstrait

Export the data to VCF and do the same thing with an external too/ We can then compare the per-individual distributions as qqplots, which should be meaningful.

This will take some work to do, but is an important validation step.

I did this in Pull request #132, but I'm not sure if this validation step would be a good test, considering that all individuals will have a similar normal-looking distributions. So even if I use two different individuals to produce a QQ-plot, the results match.

Instead, I would like to propose a similar validation test that simplePHENOTYPES did https://github.com/samuelbfernandes/simplePHENOTYPES/blob/master/vignettes/Supplementary.pdf, and examine if the simulation output of tstrait is exactly the same as the simulation output of other packages.
I think putting a similar Notebook as this inside the paper as a supplementary material will definitely strengthen our claim that the simulation output of tstrait is correct, as a single exact test ensures that the tstrait algorithm is producing correct results.

I would like to propose that we completely ignore the QQ-plot comparison, and we instead do the following:

Exact test with AlphaSimR for a single trait
Exact test with AlphaSimR for pleiotropic trait
Exact test with simplePHENOTYPES for a single trait
Exact test with simplePHENOTYPES for pleiotropic trait
Exact test with ARG-Needle simulation for a single trait
Exact test with ARG-Needle simulation for a single trait with frequency dependence

For these exact test, we will be simulating traits + effect sizes by using an extrenal program, and we will put those effect sizes into the tstrait package to see if the simulated output will exactly match.

What do you think about this @jeromekelleher ?

daikitag · 2024-02-13T01:52:13Z

The exact tests are performed here (#140), but we plan to add further tests to examine the complete pipeline of the simulation framework.

daikitag · 2024-02-13T15:19:47Z

I did the simulation by using the out of Africa model in stdpopsim, and it seems that the simulated phenotypes are slightly different when they are from different populations:

I guess we can repeat the above with the out of Africa model by using a high narrow-sense heritability, such that the simulation results won't be dominated by normal nosie.

jeromekelleher added this to the 0.0.2 release milestone Nov 15, 2023

daikitag mentioned this issue Nov 17, 2023

TEST: AlphaSimR Comparison #108

Closed

daikitag linked a pull request Nov 22, 2023 that will close this issue

TEST: AlphaSimR Comparison #108

Closed

daikitag removed a link to a pull request Jan 23, 2024

TEST: AlphaSimR Comparison #108

Closed

daikitag linked a pull request Feb 8, 2024 that will close this issue

INFRA: Statistical test against external simulators #132

Merged

daikitag removed a link to a pull request Feb 13, 2024

INFRA: Statistical test against external simulators #132

Merged

daikitag linked a pull request Feb 20, 2024 that will close this issue

INFRA: Statistical test against external simulators #132

Merged

mergify bot closed this as completed in #132 Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add statistical tests against existing simulator #102

Add statistical tests against existing simulator #102

jeromekelleher commented Nov 15, 2023 •

edited

Loading

jeromekelleher commented Nov 15, 2023

daikitag commented Nov 15, 2023

jeromekelleher commented Nov 15, 2023

daikitag commented Nov 15, 2023

jeromekelleher commented Nov 15, 2023

daikitag commented Nov 16, 2023

daikitag commented Nov 16, 2023

gregorgorjanc commented Nov 16, 2023

daikitag commented Nov 17, 2023

daikitag commented Nov 17, 2023

daikitag commented Nov 17, 2023

daikitag commented Feb 1, 2024

daikitag commented Feb 9, 2024

daikitag commented Feb 13, 2024

daikitag commented Feb 13, 2024

Add statistical tests against existing simulator #102

Add statistical tests against existing simulator #102

Comments

jeromekelleher commented Nov 15, 2023 • edited Loading

jeromekelleher commented Nov 15, 2023

daikitag commented Nov 15, 2023

jeromekelleher commented Nov 15, 2023

daikitag commented Nov 15, 2023

jeromekelleher commented Nov 15, 2023

daikitag commented Nov 16, 2023

daikitag commented Nov 16, 2023

gregorgorjanc commented Nov 16, 2023

daikitag commented Nov 17, 2023

daikitag commented Nov 17, 2023

daikitag commented Nov 17, 2023

daikitag commented Feb 1, 2024

daikitag commented Feb 9, 2024

daikitag commented Feb 13, 2024

daikitag commented Feb 13, 2024

jeromekelleher commented Nov 15, 2023 •

edited

Loading