The name grur |ɡro͞oˈr| was chosen because the missing genotypes dilemma with RADseq data reminds me of the cheese paradox.
Here, I don’t want to sustain a war or the controversy of cheese with holes, so choose as you like, the French Gruyère or the Swiss Emmental. The paradox is that the more cheese you have the more holes you’ll get. But, the more holes you have means the less cheese you have… So, someone could conclude, the more cheese = the less cheese ? I’ll leave that up to you, back to genomics…
Numerous genomic analysis are vulnerable to missing values, don’t get trapped by missing genotypes in your RADseq dataset.
Use grur to visualize patterns of missingness and perform map-independent imputations of missing genotypes (see features below).
To try out the dev version of grur, copy/paste the code below:
if (!require("devtools")) install.packages("devtools") # to install devtools::install_github("thierrygosselin/grur") library(grur)
Options and required packages
Please follow instructions in the Notebook vignette to install required packages for the selected imputation options below:
|imputation options||package||install instructions|
|imputation.method = “lightgbm”||
|imputation.method = “xgboost”||
|imputation.method = “rf”||
|imputation.method = “rf_pred”||
|imputation.method = “bpca”||
|if using pmm > 0||
rmetasimneeds to be modified in order to simulate more than 2000 markers notebook vignette
- Parallel computing: follow the steps in this notebook vignette to install the packages with OpenMP-enabled compiler and conduct imputations in parallel.
- Installation problems.
- Windows users: Install Rtools.
- The R GUI is unstable with functions using parallel (more info), so I recommend using RStudio for a better experience.
- Running codes in chunks inside R Notebook might cause problem, run it outside in the console.
|Simulate RADseq data||
|Patterns of missingness||
* Random Forests (on-the-fly-imputations with randomForestSRC or using predictive modelling using ranger and missRanger),
* Extreme Gradient Tree Boosting (using XGBoost or LightGBM),
* Bayesian PCA (using bpca in pcaMethods),
* **Classic Strawman: ** the most frequently observed, non-missing, genotypes is used for imputation.
**Hierarchy: ** algorithm’s model can account for strata groupings, e.g. if patterns of missingness is found in the data.
**Haplotypes: ** automatically detect SNPs on the same LOCUS (read/haplotype) to impute the SNPs jointly, reducing imputation artifacts. Vignette coming soon.
|Input/Output||The imputations offered in grur are seamlesly integrated in radiator and assigner. Imputations are also integrated with usefull filters, blacklists and whitelists inside those 2 packages. Genetic formats recognized: VCF, SNPs and haplotypes, PLINK tped/tfam, genind, genlight, strataG gtypes, Genepop, STACKS haplotype file, hierfstat, COLONY, betadiv, δaδi, structure, Arlequin, SNPRelate, dataframes of genotypes in wide or long/tidy format.|
|ggplot2-based plotting||Visualization: publication-ready figures of important metrics and statistics.|
|Parallel||Codes designed and optimized for fast computations with progress bars. Works with all OS: Linux, Mac and yes PC!|
Vignettes and examples
Vignettes with real data for example in the form of R Notebooks take too much space to be included in package, without CRAN complaining. Consequently, vignettes will start to be distributed separately, follow the links below.
To get the citation, inside R:
Change log, version, new features and bug history lives in the NEWS.md file
grur v.0.0.10 2018-04-26
- I transferred to
Suggestssection these packages: lightgbm, missRanger, randomForestSRC, ranger, rmarkdown, rmetasim, strataG, xgboost.
- Functions thate requires specific package will now say so.
- Reason: people only interested in
missing_visualizationdon’t have to install all the required packages required for imputations or simulations.
simulate_rad: with the latest R release (3.5.0), Check now throw a new
note: Note: next used in wrong context: no loop is visible at
simulate_rad.R:189 I replaced
grur v.0.0.9 2017-10-27
lightGBMoption to conduct the imputations is fully functional
Roadmap of future developments
- Integrate more imputation method.
- Workflow tutorial to further explore some problems.
- Use Shiny and ggvis (when subplots and/or facets becomes available for ggvis).
- Until publication grur will change rapidly, stay updated with releases and contribute with bug reports.
- Suggestions ?
This package has been developed in the open, and it wouldn’t be nearly as good without your contributions. There are a number of ways you can help me make this package even better: