/
index.Rmd
48 lines (36 loc) · 3.2 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
title: "Implementation Tests for Genomic Mate Selection in Cassava"
site: workflowr::wflow_site
output:
workflowr::wflow_html:
toc: false
editor_options:
chunk_output_type: console
---
Available data and software make it hypothetically possible to predict cross variances (mendelian sampling variance) and use it to select mates, *e.g.* using the usefuleness criterion (UC).
New information also adds new sources of uncertainty:
- Phasing error
- Use of a (or various) genetic map(s) to predict recombination frequencies
- Inherent uncertainty in marker effects and their effect on variance prediction
We used cross-validation to estimate the accuracy of predicting means, variances and the usefulness of crosses on selection indices. That analysis leveraged a high-quality validated pedigree-based phasing pipeline. That pipeline is considerably more involved and may not be implementable on the entire breeding germplasm.
Here I test cross-variance prediction in our current breeding pipeline's available data. We will assess the whether and how to start using cross variance predictions in practice.
# Cross-validation study
1. [Prepare training dataset](01-cleanTPdata.html): Download data from DB, "Clean" and format DB data. Use the standard pipeline to obtain complete breeding trial data for IITA, generate de-regressed BLUPs for downstream analysis.
* Copy `gsFunctions.R` from `code/` of most recent NextGen prediction, [NRCRI C3b predicted April 2021](https://wolfemd.github.io/NRCRI_2021GS/).
* Reference [previous analysis for IITA (2020)](https://wolfemd.github.io/IITA_2020GS/) in case there are variations.
2. [Get BLUPs combining all trial data](02-GetBLUPs.html): Combine data from all trait-trials to get BLUPs for downstream genomic prediction. Fit mixed-model to multi-trial dataset and extract BLUPs, de-regressed BLUPs and weights. Include two rounds of outlier removal.
3. [Validate the pedigree obtained from cassavabase](03-validatePedigree.html): Before setting up a cross-validation scheme for predictions that depend on a correct pedigree, add a basic verification step to the pipeline. Not trying to fill unknown relationships or otherwise correct the pedigree. Assess evidence that relationship is correct, remove if incorrect.
4. [Preprocess data files](04-PreprocessDataFiles.html): Prepare haplotype and dosage matrices, pedigree and BLUPs, genetic map _and_ recombination frequency matrix, for use in predictions.
5. [Parent-wise cross-validation](05-CrossValidation.html): Compute parent-wise cross-validation folds using the validated pedigree. Fit models to get marker effects and make subsequent predictions of cross means and (co)variances.
6. [Results](06-Results.html): Home for plots and summary tables.
**Additional future analyses to do:**
1. PHG imputed and phased marker data
2. AWC's genetic map
3. Multi-trait and/or Bayesian models
4. Other efforts to improve variance prediction accuracy?
5. Simulation to explore factors impacting estimate of accuracy
- Impact of phasing switch errors
- Quality and quantity of data available on parents
- Relatedness of parents
- Cross-mean and cross-variance prediction accuracy
+ Rleationship between the true accuracy and estimate of accuracy?