Skip to content

xiaolei-lab/SIMER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SIMER

GitHub issues CRAN Version

Data Simulation for Life Science and Breeding

Authors:

Design and Maintenance: Dong Yin, Xuanning Zhang, Lilin Yin ,Haohao Zhang, and Xiaolei Liu.
Contributors: Zhenshuang Tang, Jingya Xu, Xiaohui Yuan, Xiang Zhou, Xinyun Li, and Shuhong Zhao.

If you have any bug reports or questions, please feed back 👉here👈.

🧰 Relevant software tools for genetic analyses and genomic breeding

📫 HIBLUP: Versatile and easy-to-use GS toolbox. 🏔️ IAnimal: an omics knowledgebase for animals.
🚴‍♂️ KAML: Advanced GS method for complex traits. 📊 CMplot: A drawing tool for genetic analyses.
📮 rMVP: Efficient and easy-to-use GWAS tool. 🏊 hibayes: A Bayesian-based GWAS and GS tool.

Contents


Installation

back to top

WE STRONGLY RECOMMEND TO INSTALL SIMER ON Microsoft R Open (https://mran.microsoft.com/download/).

Installation

  • The stable version:
install.packages("simer")
  • The latest version:
devtools::install_github("xiaolei-lab/SIMER")

After installed successfully, SIMER can be loaded by typing

> library(simer)

Typing ?simer could get the details of all parameters.


Data Preparation

Genotype

back to top

Genotype data should be Numeric format (m rows and n columns, m is the number of SNPs, n is the number of individuals). Other genotype data, such as PLINK Binary format (details see http://zzz.bwh.harvard.edu/plink/data.shtml#bed), VCF, or Hapmap can be converted to Numeric format using MVP.Data function in the rMVP (https://github.com/xiaolei-lab/rMVP).

genotype.txt

2 1 0 1 0 0
1 2 0 1 0 0
1 1 2 1 0 0
1 1 0 2 1 0
0 0 0 0 2 0

Genetic map

back to top
A genetic map is necessary in SIMER. The first column is the SNP name, the second column is the Chromosome ID, the third column is physical position, the fourth column is REF, and the fifth column is ALT. This will be used to generate annotation data, genotype data, and phenotype data.

map.txt

SNP Chrom BP REF ALT
1_10673082 1 10673082 T C
1_10723065 1 10723065 A G
1_11407894 1 11407894 A G
1_11426075 1 11426075 T C
1_13996200 1 13996200 T C
1_14638936 1 14638936 T C

Pedigree

back to top
SIMER supports user designed pedigree to control mating process. User designed pedigree is useful only in userped reproduction. The first column is sample id, the second column is paternal id, and the third column is maternal id. Please make sure that paternal id and maternal id can match to genotype data.

userped.txt

Index Sire Dam
41 1 11
42 1 11
43 1 11
44 1 11
45 2 12
46 2 12

Data Input

Basic

back to top
At least users should prepare two datasets: genotypic map and genotype data.

genotype data, Numeric format (m rows and n columns, m is the number of SNPs, n is the number of individuals)
genotypic map, SNP map information, the first column is SNP name, the second column is Chromosome ID, the third column is physical position, the fourth column is REF, and the fifth column is ALT.

pop.geno <- read.table("genotype.txt")
pop.map <- read.table("map.txt" , head = TRUE)

Optional

back to top
The mating process can be designed by user-designed pedigree.

pedigree, pedigree information, the first column is sample id, the second column is paternal id, and the third column is maternal id. Note that the individuals in the pedigree do not need to be sorted by the date of birth, and the missing value can be replaced by NA or 0.

userped <- read.table("userped.txt", header = TRUE)

Quick Start

back to top

All simulation processes can be divided into two steps: 1) generation of simulation parameters; 2) run simulation process.

Quick Start for Population Simulation

back to top

A quick start for Population Simulation is shown below:

# Generate all simulation parameters
SP <- param.simer(out = "simer")

# Run Simer
SP <- simer(SP)

Quick Start for Genotype Simulation

back to top

A quick start for Genotype Simulation is shown below:

# Generate annotation simulation parameters
SP <- param.annot(species = "pig")
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2)

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)

Quick Start for Phenotype Simulation

back to top

A quick start for Phenotype Simulation is shown below:

# Generate annotation simulation parameters
SP <- param.annot(species = "pig")
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2)
# Generate phenotype simulation parameters
SP <- param.pheno(SP = SP, phe.h2A = list(tr1 = 0.3))

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)

Genotype Simulation

back to top

Genotype data in SIMER is generated randomly or through an external genotype matrix. Chromosome crossovers and base mutations depend on block information and recombination information of Annotation data.

Gallery of genotype simulation parameters

back to top

genotype, main function of Genotype Simulation:

Paramater Default Options Description
pop.geno NULL big.matrix or matrix the genotype data.
incols 1 1 or 2 '1': one-column genotype represents an individual; '2': two-column genotype represents an individual.
pop.marker 1e4 num the number of markers.
pop.ind 1e2 num the number of individuals in the base population.
prob NULL num vector the genotype code probability.
rate.mut 1e-8 num the mutation rate of the genotype data.

annotation, main function of Annotation Simulation:

Paramater Default Options Description
pop.map NULL data.frame the map data with annotation information.
species NULL character the species of genetic map, which can be "arabidopsis", "cattle", "chicken", "dog", "horse", "human", "maize", "mice", "pig", and "rice".
pop.marker 1e4 num the number of markers.
num.chr 18 num the number of chromosomes.
len.chr 1.5e8 num the length of chromosomes.
recom.spot FALSE TRUE or FALSE whether to generate recombination events.
range.hot 4:6 num vector the recombination times range in the hot spot.
range.cold 1:5 num vector the recombination times range in the cold spot.

Generate an external or species-specific or random genetic map

back to top

Users can generate a genetic map by inputting an external genetic map.

# Real genotypic map
mapPath <- system.file("extdata", "06map", "pig_map.txt", package = "simer")
pop.map <- read.table(mapPath, header = TRUE)

# Generate annotation simulation parameters
SP <- param.annot(pop.map = pop.map)

# Run annotation simulation
SP <- annotation(SP)

Users can also use the inner real genetic map with species, which can be "arabidopsis", "cattle", "chicken", "dog", "horse", "human", "maize", "mice", "pig", and "rice".

# Generate annotation simulation parameters
SP <- param.annot(species = "pig")

# Run annotation simulation
SP <- annotation(SP)

Users can generate a random genetic map with pop.marker, num.chr, and len.chr.

# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4, num.chr = 18, len.chr = 1.5e8)

# Run annotation simulation
SP <- annotation(SP)

Generate an external or species-specific or random genotype matrix

back to top

Users can use real genotype data with specific genetic structure for subsequent simulation.

# Create a genotype matrix
# pop.geno <- read.table("genotype.txt")
# pop.geno <- bigmemory::attach.big.matrix("genotype.geno.desc")
pop.geno <- matrix(c(0, 1, 2, 0), nrow = 1e4, ncol = 1e2, byrow = TRUE)

# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4)
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.geno = pop.geno)

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)

Users can also generate genotype matrix with the inner real genetic map with species, which can be "arabidopsis", "cattle", "chicken", "dog", "horse", "human", "maize", "mice", "pig", and "rice".

# Generate annotation simulation parameters
SP <- param.annot(species = "pig")
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2)

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)

Users can also specify pop.marker and pop.ind to generate random genotype data.

# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4)
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2)

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)

Generate a genotype matrix with complete linkage disequilibrium

back to top

Users can generate a genotype matrix with complete linkage disequilibrium by incols = 2 and cld = TRUE.

# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4)
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2, incols = 2, cld = TRUE)

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)

Add chromosome crossovers and mutations to genotype matrix

back to top

With annotation data, chromosome crossovers and mutations can be added to a genotype matrix.

# Generate annotation simulation parameters
# If recom.spot = TRUE, chromsome crossovers will be added to genotype matrix
SP <- param.annot(pop.marker = 1e4, recom.spot = TRUE)
# Generate genotype simulation parameters
# Base mutation rate of QTN and SNP are 1e8
SP <- param.geno(SP = SP, pop.ind = 1e2, rate.mut = list(qtn = 1e-8, snp = 1e-8))

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)

Note that recombination only exists in meiosis. Therefore, some reproduction methods such as clone do not have recombination processes. Users can set recom.spot = FALSE to add only mutations to the genotype matrix.

# Generate annotation simulation parameters
# If recom.spot = FALSE, chromsome crossovers will not be added to genotype matrix
SP <- param.annot(pop.marker = 1e4, recom.spot = FALSE)
# Generate genotype simulation parameters
# Base mutation rate of QTN and SNP are 1e8
SP <- param.geno(SP = SP, pop.ind = 1e2, rate.mut = list(qtn = 1e-8, snp = 1e-8))

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)

Phenotype Simulation

back to top

Phenotype data in SIMER is generated according to different models, which include:
(1) Single-Trait Model
(2) Multiple-Trait Model
(3) Repeated Record Model
(4) Genetic Effect Model (Additive effect, Dominant effect, and Genetic-Genetic interaction effect)
(5) Genetic Model with Varied QTN Effect Distributions (QTN effect distribution: Normal distribution, Geometric distribution, Gamma distribution, Beta distribution, and their combination)
(6) Linear Mixed Model (Fixed effect, Covariate, Environmental Random effect, Genetic Random effect, Genetic-Environmental interaction effect, and Environmental-Environmental interaction effect)

Gallery of phenotype simulation parameters

back to top

phenotype, main function of Phenotype Simulation:

Paramater Default Options Description
pop NULL data.frame the population information containing environmental factors and other effects.
pop.ind 100 num the number of individuals in the base population.
pop.rep 1 num the repeated times of repeated records.
pop.rep.bal TRUE TRUE or FALSE whether repeated records are balanced.
pop.env NULL list a list of environmental factors setting.
phe.type list(tr1 = "continuous") list a list of phenotype types.
phe.model list(tr1 = "T1 = A + E") list a list of genetic model of phenotype such as "T1 = A + E".
phe.h2A list(tr1 = 0.3) list a list of additive heritability.
phe.h2D list(tr1 = 0.1) list a list of dominant heritability.
phe.h2GxG NULL list a list of GxG interaction heritability.
phe.h2GxE NULL list a list of GxE interaction heritability.
phe.h2PE NULL list a list of permanent environmental heritability.
phe.var NULL list a list of phenotype variance.
phe.corA NULL matrix the additive genetic correlation matrix.
phe.corD NULL matrix the dominant genetic correlation matrix.
phe.corGxG NULL list a list of the GxG genetic correlation matrix.
phe.corPE NULL matrix the permanent environmental correlation matrix.
phe.corE NULL matrix the residual correlation matrix.

annotation, main function of Annotation Simulation:

Paramater Default Options Description
pop.map NULL data.frame the map data with annotation information.
qtn.model 'A' character the genetic model of QTN such as 'A + D'.
qtn.index 10 list the QTN index for each trait.
qtn.num 10 list the QTN number for (each group in) each trait.
qtn.dist list(tr1 = 'norm') list the QTN distribution containing 'norm', 'geom', 'gamma' or 'beta'.
qtn.var list(tr1 = 1) list the variances for normal distribution.
qtn.prob NULL list the probability of success for geometric distribution.
qtn.shape NULL list the shape parameter for gamma distribution.
qtn.scale NULL list the scale parameter for gamma distribution.
qtn.shape1 NULL list the shape1 parameter for beta distribution.
qtn.shape2 NULL list the shape2 parameter for beta distribution.
qtn.ncp NULL list the ncp parameter for beta distribution.
qtn.spot NULL list the QTN distribution probability in each block.
len.block 5e7 num the block length.
maf NULL num the maf threshold, markers less than this threshold will be exclude.

Generate phenotype using an external or species-specific or random genotype matrix

back to top

Users can use real genotype data with specific genetic structure to generate phenotype.

# Create a genotype matrix
# pop.geno <- read.table("genotype.txt")
# pop.geno <- bigmemory::attach.big.matrix("genotype.geno.desc")
pop.geno <- matrix(c(0, 1, 2, 0), nrow = 1e4, ncol = 1e2, byrow = TRUE)

# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4)
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.geno = pop.geno)
# Generate phenotype simulation parameters
SP <- param.pheno(SP = SP, phe.h2A = list(tr1 = 0.3))

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)

Users can also generate phenotype using species-specific genotype matrix.

# Generate annotation simulation parameters
SP <- param.annot(species = "pig")
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2)
# Generate phenotype simulation parameters
SP <- param.pheno(SP = SP, phe.h2A = list(tr1 = 0.3))

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)

Users can also generate phenotype using random genotype.

# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4)
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2)
# Generate phenotype simulation parameters
SP <- param.pheno(SP = SP, phe.h2A = list(tr1 = 0.3))

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)

Generate continuous phenotype

back to top

SIMER generates continuous phenotypes by default. Continuous phenotype simulation is displayed as follows:
If users want to output files, please see File output.

# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4, qtn.num = list(tr1 = 10), qtn.model = "A") # Additive effect
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2) # random genotype
# Generate phenotype simulation parameters
SP <- param.pheno(
  SP = SP,
  phe.type = list(tr1 = "continuous"),
  phe.model = list(tr1 = "T1 = A + E"), # "T1" (Trait 1) consists of Additive effect and Residual effect
  # phe.var = list(tr1 = 100),
  phe.h2A = list(tr1 = 0.3)
)

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)

Multiple-trait simulation of continuous phenotype is displayed as follows:
If users want to output files, please see File output.

# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4, qtn.num = list(tr1 = 10, tr2 = 10), qtn.model = "A") # Additive effect
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2) # random genotype
# Generate phenotype simulation parameters
SP <- param.pheno(
  SP = SP,
  phe.type = list(tr1 = "continuous", tr2 = "continuous"),
  phe.model = list(
    tr1 = "T1 = A + E", # "T1" (Trait 1) consists of Additive effect and Residual effect
    tr2 = "T2 = A + E"  # "T2" (Trait 2) consists of Additive effect and Residual effect
  ),
  # phe.var = list(tr1 = 100, tr2 = 100),
  phe.h2A = list(tr1 = 0.3, tr2 = 0.3),
  phe.corA = matrix(c(1, 0.5, 0.5, 1), 2, 2) # Additive genetic correlation
)

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)

Generate case-control phenotype

back to top

SIMER generates case-control phenotypes by phe.type. phe.type consists of the variable names and their percentages. Case-control phenotype simulation is displayed as follows:
If users want to output files, please see File output.

# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4, qtn.num = list(tr1 = 10), qtn.model = "A") # Additive effect
# Generate genotype simulation parameters
SP <- param.geno(SP = SP, pop.ind = 1e2) # random genotype
# Generate phenotype simulation parameters
SP <- param.pheno(
  SP = SP,
  phe.type = list(tr1 = list(case = 0.01, control = 0.99)), # "T1" (Trait 1) consists of 1% case and 99% control
  phe.model = list(tr1 = "T1 = A + E"), # "T1" (Trait 1) consists of Additive effect and Residual effect
  # phe.var = list(tr1 = 100),
  phe.h2A = list(tr1 = 0.3)
)

# Run annotation simulation
SP <- annotation(SP)
# Run genotype simulation
SP <- genotype(SP)
# Run phenotype simulation
SP <- phenotype(SP)

Multiple-trait simulation of case-control phenotype is displayed as follows:
If users want to output files, please see File output.

# Generate annotation simulation parameters
SP <- param.annot(pop.marker = 1e4, qtn.num = list(tr1 = 10, tr2 =