/
baselineSim.Rmd
218 lines (187 loc) · 9.59 KB
/
baselineSim.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
---
title: "Baseline simulation of cassava GS"
author: "Marnin Wolfe"
date: "2021-Aug-27"
output:
workflowr::wflow_html:
code_folding: hide
toc: true
editor_options:
chunk_output_type: inline
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE,
tidy='styler',
tidy.opts=list(strict=FALSE,width.cutoff=100),
highlight=TRUE)
```
# Previous steps
1. Empirical estimate of TrialType-specific error variances in terms of the IITA selection index (SELIND). See that analysis [here](https://wolfemd.github.io/IITA_2021GS/inputsForSimulationV2.html).
2. [Burn-in simulations](burnInSims.html)
# Set-up a singularity shell with R+OpenBLAS
***This is optional and can be skipped.***
If you want the advantage of multi-threaded BLAS to speed up predictions within the simulations, you need an R instance that is linked to OpenBLAS (another example is Microsoft R Open). For CBSU, the recommended approach is currently to use singularity shells provided by the "rocker" project. They even come pre-installed with tidyverse :). Linked to OpenBLAS, using a simple function `RhpcBLASctl::blas_set_num_threads()` I can add arguments to functions to control this feature. For optimal performance, it is import to balance the number of threads each R session uses for BLAS against any other form of parallel processing being used and considering total available system resources.
```{bash, eval=F}
# 0) Pull a singularity image with OpenBLAS enabled R + tidyverse from rocker/
# singularity pull ~/rocker2.sif docker://rocker/tidyverse:latest;
# 1) start a screen shell
screen;
# 2) reserve interactive slurm
salloc -n 25 --mem 60G;
# 3) start the singularity Linux shell inside that
singularity shell ~/rocker2.sif;
# Project directory, so R will use as working dir.
cd /home/mw489/BreedingSchemeOpt/;
# 3) Start R
export OMP_NUM_THREADS=1;
R
```
```{r, eval=F}
# Install genomicMateSelectR to user-accessible libPath
### In a singularity shell, sintall as follows:
libPath<-"/home/mw489/R/x86_64-pc-linux-gnu-library/4.1" # should be YOUR libPath
withr::with_libpaths(new=libPath, devtools::install_github("wolfemd/genomicMateSelectR", ref = 'master'))
### Else, simply
devtools::install_github("wolfemd/genomicMateSelectR", ref = 'master')
# Install my own forked repo of AlphaSimHlpR
withr::with_libpaths(new=libPath, install.packages("Rcpp"))
withr::with_libpaths(new=libPath, install.packages("AlphaSimR"))
withr::with_libpaths(new=libPath, install.packages("optiSel"))
withr::with_libpaths(new=libPath, install.packages("rgl"))
withr::with_libpaths(new=libPath, devtools::install_github("wolfemd/AlphaSimHlpR", ref = 'master', force=T))
# devtools::install_github("wolfemd/AlphaSimHlpR", ref = 'master', force=T)
```
# A small example
**Objective:** develop a hopefully faster GS simulator which uses a fixed number of clones for predictions... hoping to constrain compute requirements better than using the `trainingPopCycles` parameter, without sacrificing much prediction accuracy.
For specifying a `newBSP`, there are some considerations. The `bsp` from previous run contains an entry `bsp$checks` specifying a pop-object with randomly chosen checks from the burn-in phase. *For now*, I will ensure the burn-in checks are used post burn-in. Keep it simple. For that reason, `newBSP` should not alter the `nChks` value. If it does, might cause code to break...
- Set-up my own population improvement function
- Added some options to `bsp` objects to control which phenotype records are used, which clones are includedin the TP, and which clones are considered selection candidates
- JL's original `popImprov1Cyc` drew `candidates` from the full `records$F1@id` (excluding potentially indivs only scored during the current year, if `useCurrentPhenoTrain=FALSE`).
- My changes:
- `nTrainPopCycles`: draw training pop clones only from this number of recent cycles.
- `nYrsAsCandidates`: candidates for selection only from this number of recent years
- `maxTrainingPopSize`: From the lines in the most recent cycles (indicated by `nTrainPopCycles`), subsample this number of lines for training data.
- This is *in addition to* the "check" (`bsp$checks@id`) and the lines indicates as selection `candidates` according to the setting of `nYrsAsCandidates`.
- Phenotypic records of `candidates` will *also* be included in any predictions.
- All "historical" data will always be used, but the number of maximum training lines will be held constant.
- Replaces the stage-specific `bsp$trainingPopCycles`, which will be unused in this pipeline, but not deleted from the package.
```{r inputs,eval=T}
suppressMessages(library(AlphaSimHlpR))
suppressMessages(library(tidyverse))
suppressMessages(library(genomicMateSelectR))
select <- dplyr::select
schemeDF<-read.csv(here::here("data","baselineScheme - Test.csv"),
header = T, stringsAsFactors = F)
source(here::here("code","runSchemesPostBurnIn.R"))
simulations<-readRDS(here::here("output","burnIn_test.rds"))
```
## Example
```{r}
newBSP<-specifyBSP(schemeDF = schemeDF,
nChr = 3,effPopSize = 100,quickHaplo = F,
segSites = 400, nQTL = 40, nSNP = 100, genVar = 40,
gxeVar = NULL, gxyVar = 15, gxlVar = 10,gxyxlVar = 5,
meanDD = 0.5,varDD = 0.01,relAA = 0.5,
stageToGenotype = "PYT",
nParents = 10, nCrosses = 4, nProgeny = 50,nClonesToNCRP = 3,
phenoF1toStage1 = T,errVarPreStage1 = 500,
useCurrentPhenoTrain = F,
nCyclesToKeepRecords = 30,
selCritPipeAdv = selCritIID,
selCritPopImprov = selCritIID,
nTrainPopCycles=6,
nYrsAsCandidates=2,
maxTrainingPopSize=500)
```
```{r continue after burn-in with GS, eval=F}
start<-proc.time()[3]
postBurnInGS_test<-runSchemesPostBurnIn(simulations = simulations,
newBSP=newBSP,
nPostBurnInCycles=10,
selCritPop="parentSelCritGEBV",
selCritPipe="selCritIID",
productFunc="productPipeline",
popImprovFunc="popImprovByParentSel",
ncores=4,
nBLASthreads=4)
saveRDS(postBurnInGS_test,file = here::here("output","postBurnInGS_test.rds"))
end<-proc.time()[3]; print(paste0((end-start)/60," mins elapsed"))
```
## Benchmark "full-scale" GS cycles
## Debug
```{r debugging params}
# burnInSim<-simulations$burnInSim[[1]]
# selCritPop="parentSelCritGEBV";
# selCritPipe="selCritIID";
# productFunc="productPipeline";
# popImprovFunc="popImprov1Cyc";
# newBSP=burnInSim$bsp
# newBSP[["maxTrainingPopSize"]]<-500
# newBSP[["nYrsAsCandidates"]]<-2
# newBSP[["nTrainPopCycles"]]<-10
# newBSP$checks<-NULL
```
```{r runSchemesPostBurnIn}
# runSchemesPostBurnIn<-function(simulations,
# newBSP=NULL, # so you can change the scheme entirely after burn-in
# nPostBurnInCycles,
# selCritPop="selCritIID",
# selCritPipe="selCritIID",
# productFunc="productPipeline",
# popImprovFunc="popImprovByParentSel",
# ncores=1,
# nBLASthreads=NULL,nThreadsMacs2=NULL){
#
# require(furrr); plan(multisession, workers = ncores)
# options(future.globals.maxSize=+Inf); options(future.rng.onMisuse="ignore")
#
# simulations<-simulations %>%
# mutate(SimOutput=future_map2(SimRep,burnInSim,function(SimRep,burnInSim,...){
# # debug
# # burnInSim<-simulations$burnInSim[[1]]
# if(!is.null(nBLASthreads)) { RhpcBLASctl::blas_set_num_threads(nBLASthreads) }
# cat("******", SimRep, "\n")
#
# # This CONTINUES where previous sims left off
# ## no initialize step
# ## Keep burn-in stage sim params "SP"
# SP<-burnInSim$SP
# ## specify a potentially new bsp object
# ## (keep checks stored in burn-in stage's bsp)
# if(!is.null(newBSP)){
# bsp<-newBSP; bsp$checks<-burnInSim$bsp$checks
# } else { bsp<-burnInSim$bsp }
# ## 'historical' records from burn-in
# records<-burnInSim$records
# ## override burn-in specified product and population improvement funcs
# bsp[["productPipeline"]] <- get(productFunc)
# bsp[["populationImprovement"]] <- get(popImprovFunc)
# bsp[["selCritPipeAdv"]] <- get(selCritPipe)
# bsp[["selCritPopImprov"]] <- get(selCritPop)
#
# # Post burn-in cycles
# cat("\n"); cat("Post burn-in cycles"); cat("\n")
# for (cycle in 1:nPostBurnInCycles){
# cat(cycle, " ")
# records <- bsp$productPipeline(records, bsp, SP)
# records <- bsp$populationImprovement(records, bsp, SP)
# }
#
# # Finalize the stageOutputs
# records <- AlphaSimHlpR:::lastCycStgOut(records, bsp, SP)
#
# return(list(records=records,
# bsp=bsp,
# SP=SP))
# },
# nPostBurnInCycles=nPostBurnInCycles,
# selCritPop=selCritPop,
# selCritPipe=selCritPipe,
# productFunc=productFunc,
# popImprovFunc=popImprovFunc,
# nBLASthreads=nBLASthreads,
# nThreadsMacs2=nThreadsMacs2))
# plan(sequential)
# return(simulations)
# }
```