Design and Maintenance: Dong Yin, Xuanning Zhang, Lilin Yin ,Haohao Zhang, and Xiaolei Liu.
Contributors: Zhenshuang Tang, Jingya Xu, Xiaohui Yuan, Xiang Zhou, Xinyun Li, and Shuhong Zhao.
If you have any bug reports or questions, please feed back 👉here👈.
📫 HIBLUP: Versatile and easy-to-use GS toolbox. | 🏔️ IAnimal: an omics knowledgebase for animals. |
🚴♂️ KAML: Advanced GS method for complex traits. | 📊 CMplot: A drawing tool for genetic analyses. |
📮 rMVP: Efficient and easy-to-use GWAS tool. | 🏊 hibayes: A Bayesian-based GWAS and GS tool. |
- Installation
- Data Preparation
- Data Input
- Quick Start
- Genotype Simulation
- Phenotype Simulation
- Gallery of phenotype simulation parameters
- Generate phenotype using an external or species-specific or random genotype matrix
- Generate continuous phenotype
- Generate case-control phenotype
- Generate categorical phenotype
- Generate phenotype using A model
- Generate phenotype using AD model
- Generate phenotype using GxG model
- Generate phenotype using Repeated Record model
- Generate phenotype controlled by QTNs subject to Normal distribution
- Generate phenotype controlled by QTNs subject to Geometric distribution
- Generate phenotype controlled by QTNs subject to Gamma distribution
- Generate phenotype controlled by QTNs subject to Beta distribution
- Generate phenotype with fixed effect and covariate and environmental random effect
- Generate phenotype using GxE model
- Generate phenotype using ExE model
- Generate phenotype controlled by varied QTN effect distribution
- Population Simulation of Multiple-Generation with Genotype and Phenotype
- Gallery of population simulation parameters
- Individual selection for a single trait
- Family selection for a single trait
- Within-family selection for a single trait
- Combined selection for a single trait
- Tandem selection for multiple traits
- Independent culling selection for multiple traits
- Index selection for multiple traits
- Clone for plants
- Doubled haploid for plants
- Self-pollination for plants and micro-organisms
- Random mating for plants and animals
- Random mating excluding self-pollination for animals
- Assortative mating for plants and animals
- Disassortative mating for plants and animals
- Two way cross for animals
- Three way cross for animals
- Four way cross for animals
- Back cross for animals
- User-designed pedigree mating for plants and animals
- AN EASY WAY TO GENERATE A POPULATION
- Breeding Program Design
- Global Options
- Output
- Citation
- FAQ and Hints
WE STRONGLY RECOMMEND TO INSTALL SIMER ON Microsoft R Open (https://mran.microsoft.com/download/).
- The stable version:
install.packages("simer")
- The latest version:
devtools::install_github("xiaolei-lab/SIMER")
After installed successfully, SIMER
can be loaded by typing
> library(simer)
Typing ?simer
could get the details of all parameters.
Genotype data should be Numeric format (m rows and n columns, m is the number of SNPs, n is the number of individuals). Other genotype data, such as PLINK Binary format (details see http://zzz.bwh.harvard.edu/plink/data.shtml#bed), VCF, or Hapmap can be converted to Numeric format using MVP.Data
function in the rMVP
(https://github.com/xiaolei-lab/rMVP).
genotype.txt
2 | 1 | 0 | 1 | 0 | … | 0 |
1 | 2 | 0 | 1 | 0 | … | 0 |
1 | 1 | 2 | 1 | 0 | … | 0 |
1 | 1 | 0 | 2 | 1 | … | 0 |
0 | 0 | 0 | 0 | 2 | … | 0 |
back to top
A genetic map is necessary in SIMER
. The first column is the SNP name, the second column is the Chromosome ID, the third column is physical position, the fourth column is REF, and the fifth column is ALT. This will be used to generate annotation data, genotype data, and phenotype data.
map.txt
SNP | Chrom | BP | REF | ALT |
---|---|---|---|---|
1_10673082 | 1 | 10673082 | T | C |
1_10723065 | 1 | 10723065 | A | G |
1_11407894 | 1 | 11407894 | A | G |
1_11426075 | 1 | 11426075 | T | C |
1_13996200 | 1 | 13996200 | T | C |
1_14638936 | 1 | 14638936 | T | C |
back to top
SIMER
supports user designed pedigree to control mating process. User designed pedigree is useful only in userped
reproduction. The first column is sample id, the second column is paternal id, and the third column is maternal id. Please make sure that paternal id and maternal id can match to genotype data.
userped.txt
Index | Sire | Dam |
---|---|---|
41 | 1 | 11 |
42 | 1 | 11 |
43 | 1 | 11 |
44 | 1 | 11 |
45 | 2 | 12 |
46 | 2 | 12 |
back to top
At least users should prepare two datasets: genotypic map and genotype data.
genotype data, Numeric format (m rows and n columns, m is the number of SNPs, n is the number of individuals)
genotypic map, SNP map information, the first column is SNP name, the second column is Chromosome ID, the third column is physical position, the fourth column is REF, and the fifth column is ALT.
pop.geno <- read.table("genotype.txt")
pop.map <- read.table("map.txt" , head = TRUE)
back to top
The mating process can be designed by user-designed pedigree.
pedigree, pedigree information, the first column is sample id, the second column is paternal id, and the third column is maternal id. Note that the individuals in the pedigree do not need to be sorted by the date of birth, and the missing value can be replaced by NA or 0.
userped <- read.table("userped.txt", header = TRUE)
All simulation processes can be divided into two steps: 1) generation of simulation parameters; 2) run simulation process.
A quick start for Population Simulation is shown below:
# Generate all simulation parameters
SP <- param.simer(out = "simer")
# Run Simer
SP <- simer(SP)
A quick start for Genotype Simulation is shown below:
# Generate annotation simulation parameters
SP <- param.annot(species = "pig")
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2)
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
A quick start for Phenotype Simulation is shown below:
# Generate annotation simulation parameters
SP <- param.annot(species = "pig")
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2)
# Generate phenotype simulation parameters
SP <- param.pheno(SP = SP, phe.h2A = list(tr1 = 0.3))
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)
Genotype data in SIMER
is generated randomly or through an external genotype matrix. Chromosome crossovers and base mutations depend on block information and recombination information of Annotation data.
genotype
, main function of Genotype Simulation:
Paramater | Default | Options | Description |
pop.geno | NULL | big.matrix or matrix | the genotype data. |
incols | 1 | 1 or 2 | '1': one-column genotype represents an individual; '2': two-column genotype represents an individual. |
pop.marker | 1e4 | num | the number of markers. |
pop.ind | 1e2 | num | the number of individuals in the base population. |
prob | NULL | num vector | the genotype code probability. |
rate.mut | 1e-8 | num | the mutation rate of the genotype data. |
annotation
, main function of Annotation Simulation:
Paramater | Default | Options | Description |
pop.map | NULL | data.frame | the map data with annotation information. |
species | NULL | character | the species of genetic map, which can be "arabidopsis", "cattle", "chicken", "dog", "horse", "human", "maize", "mice", "pig", and "rice". |
pop.marker | 1e4 | num | the number of markers. |
num.chr | 18 | num | the number of chromosomes. |
len.chr | 1.5e8 | num | the length of chromosomes. |
recom.spot | FALSE | TRUE or FALSE | whether to generate recombination events. |
range.hot | 4:6 | num vector | the recombination times range in the hot spot. |
range.cold | 1:5 | num vector | the recombination times range in the cold spot. |
Users can generate a genetic map by inputting an external genetic map.
# Real genotypic map
mapPath <- system.file("extdata", "06map", "pig_map.txt", package = "simer")
pop.map <- read.table(mapPath, header = TRUE)
# Generate annotation simulation parameters
SP <- param.annot(pop.map = pop.map)
# Run annotation simulation
SP <- annotation(SP)
Users can also use the inner real genetic map with species
, which can be "arabidopsis", "cattle", "chicken", "dog", "horse", "human", "maize", "mice", "pig", and "rice".
# Generate annotation simulation parameters
SP <- param.annot(species = "pig")
# Run annotation simulation
SP <- annotation(SP)
Users can generate a random genetic map with pop.marker
, num.chr
, and len.chr
.
# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4, num.chr = 18, len.chr = 1.5e8)
# Run annotation simulation
SP <- annotation(SP)
Users can use real genotype data with specific genetic structure for subsequent simulation.
# Create a genotype matrix
# pop.geno <- read.table("genotype.txt")
# pop.geno <- bigmemory::attach.big.matrix("genotype.geno.desc")
pop.geno <- matrix(c(0, 1, 2, 0), nrow = 1e4, ncol = 1e2, byrow = TRUE)
# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4)
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.geno = pop.geno)
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
Users can also generate genotype matrix with the inner real genetic map with species
, which can be "arabidopsis", "cattle", "chicken", "dog", "horse", "human", "maize", "mice", "pig", and "rice".
# Generate annotation simulation parameters
SP <- param.annot(species = "pig")
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2)
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
Users can also specify pop.marker
and pop.ind
to generate random genotype data.
# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4)
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2)
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
Users can generate a genotype matrix with complete linkage disequilibrium by incols = 2
and cld = TRUE
.
# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4)
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2, incols = 2, cld = TRUE)
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
With annotation data, chromosome crossovers and mutations can be added to a genotype matrix.
# Generate annotation simulation parameters
# If recom.spot = TRUE, chromsome crossovers will be added to genotype matrix
SP <- param.annot(pop.marker = 1e4, recom.spot = TRUE)
# Generate genotype simulation parameters
# Base mutation rate of QTN and SNP are 1e8
SP <- param.geno(SP = SP, pop.ind = 1e2, rate.mut = list(qtn = 1e-8, snp = 1e-8))
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
Note that recombination only exists in meiosis. Therefore, some reproduction methods such as clone
do not have recombination processes. Users can set recom.spot = FALSE
to add only mutations to the genotype matrix.
# Generate annotation simulation parameters
# If recom.spot = FALSE, chromsome crossovers will not be added to genotype matrix
SP <- param.annot(pop.marker = 1e4, recom.spot = FALSE)
# Generate genotype simulation parameters
# Base mutation rate of QTN and SNP are 1e8
SP <- param.geno(SP = SP, pop.ind = 1e2, rate.mut = list(qtn = 1e-8, snp = 1e-8))
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
Phenotype data in SIMER
is generated according to different models, which include:
(1) Single-Trait Model
(2) Multiple-Trait Model
(3) Repeated Record Model
(4) Genetic Effect Model (Additive effect, Dominant effect, and Genetic-Genetic interaction effect)
(5) Genetic Model with Varied QTN Effect Distributions (QTN effect distribution: Normal distribution, Geometric distribution, Gamma distribution, Beta distribution, and their combination)
(6) Linear Mixed Model (Fixed effect, Covariate, Environmental Random effect, Genetic Random effect, Genetic-Environmental interaction effect, and Environmental-Environmental interaction effect)
phenotype
, main function of Phenotype Simulation:
Paramater | Default | Options | Description |
pop | NULL | data.frame | the population information containing environmental factors and other effects. |
pop.ind | 100 | num | the number of individuals in the base population. |
pop.rep | 1 | num | the repeated times of repeated records. |
pop.rep.bal | TRUE | TRUE or FALSE | whether repeated records are balanced. |
pop.env | NULL | list | a list of environmental factors setting. |
phe.type | list(tr1 = "continuous") | list | a list of phenotype types. |
phe.model | list(tr1 = "T1 = A + E") | list | a list of genetic model of phenotype such as "T1 = A + E". |
phe.h2A | list(tr1 = 0.3) | list | a list of additive heritability. |
phe.h2D | list(tr1 = 0.1) | list | a list of dominant heritability. |
phe.h2GxG | NULL | list | a list of GxG interaction heritability. |
phe.h2GxE | NULL | list | a list of GxE interaction heritability. |
phe.h2PE | NULL | list | a list of permanent environmental heritability. |
phe.var | NULL | list | a list of phenotype variance. |
phe.corA | NULL | matrix | the additive genetic correlation matrix. |
phe.corD | NULL | matrix | the dominant genetic correlation matrix. |
phe.corGxG | NULL | list | a list of the GxG genetic correlation matrix. |
phe.corPE | NULL | matrix | the permanent environmental correlation matrix. |
phe.corE | NULL | matrix | the residual correlation matrix. |
annotation
, main function of Annotation Simulation:
Paramater | Default | Options | Description |
pop.map | NULL | data.frame | the map data with annotation information. |
qtn.model | 'A' | character | the genetic model of QTN such as 'A + D'. |
qtn.index | 10 | list | the QTN index for each trait. |
qtn.num | 10 | list | the QTN number for (each group in) each trait. |
qtn.dist | list(tr1 = 'norm') | list | the QTN distribution containing 'norm', 'geom', 'gamma' or 'beta'. |
qtn.var | list(tr1 = 1) | list | the variances for normal distribution. |
qtn.prob | NULL | list | the probability of success for geometric distribution. |
qtn.shape | NULL | list | the shape parameter for gamma distribution. |
qtn.scale | NULL | list | the scale parameter for gamma distribution. |
qtn.shape1 | NULL | list | the shape1 parameter for beta distribution. |
qtn.shape2 | NULL | list | the shape2 parameter for beta distribution. |
qtn.ncp | NULL | list | the ncp parameter for beta distribution. |
qtn.spot | NULL | list | the QTN distribution probability in each block. |
len.block | 5e7 | num | the block length. |
maf | NULL | num | the maf threshold, markers less than this threshold will be exclude. |
Users can use real genotype data with specific genetic structure to generate phenotype.
# Create a genotype matrix
# pop.geno <- read.table("genotype.txt")
# pop.geno <- bigmemory::attach.big.matrix("genotype.geno.desc")
pop.geno <- matrix(c(0, 1, 2, 0), nrow = 1e4, ncol = 1e2, byrow = TRUE)
# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4)
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.geno = pop.geno)
# Generate phenotype simulation parameters
SP <- param.pheno(SP = SP, phe.h2A = list(tr1 = 0.3))
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)
Users can also generate phenotype using species-specific genotype matrix.
# Generate annotation simulation parameters
SP <- param.annot(species = "pig")
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2)
# Generate phenotype simulation parameters
SP <- param.pheno(SP = SP, phe.h2A = list(tr1 = 0.3))
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)
Users can also generate phenotype using random genotype.
# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4)
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2)
# Generate phenotype simulation parameters
SP <- param.pheno(SP = SP, phe.h2A = list(tr1 = 0.3))
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)
SIMER
generates continuous phenotypes by default. Continuous phenotype simulation is displayed as follows:
If users want to output files, please see File output.
# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4, qtn.num = list(tr1 = 10), qtn.model = "A") # Additive effect
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2) # random genotype
# Generate phenotype simulation parameters
SP <- param.pheno(
SP = SP,
phe.type = list(tr1 = "continuous"),
phe.model = list(tr1 = "T1 = A + E"), # "T1" (Trait 1) consists of Additive effect and Residual effect
# phe.var = list(tr1 = 100),
phe.h2A = list(tr1 = 0.3)
)
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)
Multiple-trait simulation of continuous phenotype is displayed as follows:
If users want to output files, please see File output.
# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4, qtn.num = list(tr1 = 10, tr2 = 10), qtn.model = "A") # Additive effect
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2) # random genotype
# Generate phenotype simulation parameters
SP <- param.pheno(
SP = SP,
phe.type = list(tr1 = "continuous", tr2 = "continuous"),
phe.model = list(
tr1 = "T1 = A + E", # "T1" (Trait 1) consists of Additive effect and Residual effect
tr2 = "T2 = A + E" # "T2" (Trait 2) consists of Additive effect and Residual effect
),
# phe.var = list(tr1 = 100, tr2 = 100),
phe.h2A = list(tr1 = 0.3, tr2 = 0.3),
phe.corA = matrix(c(1, 0.5, 0.5, 1), 2, 2) # Additive genetic correlation
)
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)
SIMER
generates case-control phenotypes by phe.type
. phe.type
consists of the variable names and their percentages. Case-control phenotype simulation is displayed as follows:
If users want to output files, please see File output.
# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4, qtn.num = list(tr1 = 10), qtn.model = "A") # Additive effect
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2) # random genotype
# Generate phenotype simulation parameters
SP <- param.pheno(
SP = SP,
phe.type = list(tr1 = list(case = 0.01, control = 0.99)), # "T1" (Trait 1) consists of 1% case and 99% control
phe.model = list(tr1 = "T1 = A + E"), # "T1" (Trait 1) consists of Additive effect and Residual effect
# phe.var = list(tr1 = 100),
phe.h2A = list(tr1 = 0.3)
)
# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)
Multiple-trait simulation of case-control phenotype is displayed as follows:
If users want to output files, please see File output.
# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4, qtn.num = list(tr1 = 10, tr2 =