-
Notifications
You must be signed in to change notification settings - Fork 2
/
finemapping_tutorial.Rmd
118 lines (92 loc) · 4.68 KB
/
finemapping_tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: "Tutorial for fine-mapping with functional priors"
author: Kaixuan Luo, Alan Selewa
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval=TRUE)
```
In this tutorial, we perform fine-mapping (using SuSiE) with
functional priors computed from the enrichment analysis using `TORUS`.
Load the packages.
```{r load-package}
library(mapgen)
```
## Input data
Please see the [data preparation tutorial][data-preparation-tutorial] about
preparing GWAS summary statistics.
We need a `bigSNP` object for the reference genotype panel,
which will be used for computing LD matrices. See the
[data preparation tutorial][data-preparation-tutorial] for details.
## SNP-level priors
Let's load the processed GWAS summary statistics obtained from
the [data preparation tutorial][data-preparation-tutorial] and the
SNP-level priors using `run_torus()` obtained from the [enrichment analysis tutorial][enrichment-tutorial].
```{r}
gwas.sumstats <- readRDS(system.file('extdata', 'test.processed.sumstats.rds', package='mapgen'))
torus.res <- readRDS(system.file('extdata', 'test.torus.res.rds', package='mapgen'))
```
We can add the SNP-level priors (column called 'torus_prior') to
our summary statistics.
```{r add-torus-priors}
sumstats.for.susie <- prepare_susie_data_with_torus_result(sumstats = gwas.sumstats,
torus_prior = torus.res$snp_prior)
```
We could set `filter == "pval"` to limit loci with GWAS p-value cutoff (default: 5e-8),
or set `filter == "torus_fdr"` by the FDR cutoff (default: 0.1)
from running `run_torus()` with `option = "fdr"`.
## Fine-mapping
We can perform fine-mapping with SNP-level priors using `SuSiE`,
you will need the `susieR` package for the steps below.
```{r, message=FALSE}
library(susieR)
```
To use SNP-level priors computed from `TORUS`, we use `run_finemapping()`
with `priortype = 'torus'`.
You can also run fine-mapping using uniform prior (without SNP-level priors)
with `priortype = 'uniform'`. _(You don't need to run `TORUS` if you run
fine-mapping using uniform prior.)_
You can use the bigSNP object prepared earlier in
the [data preparation tutorial][data-preparation-tutorial].
If you provide bigSNP instead of LD matrices,
it will compute R matrices using the genotype data in the bigSNP object.
```{r run-susie-torus-prior-bigSNP, eval=FALSE, message=FALSE}
# you will need the bigsnpr package if you use the bigSNP object.
library(bigsnpr)
susie.res <- run_finemapping(sumstats = sumstats.for.susie,
bigSNP = bigSNP,
priortype = 'torus',
L = 1)
```
We did not provide sample size (n) for this toy example,
but it is highly encouraged to provide that (set argument `n` = sample size)
in the real data analysis.
If you have the LD matrices, you can use those instead of the bigSNP object.
We have pre-computed the LD matrices of European samples from UK Biobank.
They can be downloaded [here][UKBB_LD].
See [Fine-mapping with UK Biobank LD tutorial][finemapping-ukbb-ld-tutorial]
and [LD diagnosis tutorial][ld-diagnosis-tutorial]
for details.
We ran `SuSiE` with `L = 1` here, which allows a single causal signal for
each LD block and is robust to mismatching LD patterns.
`susie.res` is a list of `SuSiE` results, one for each locus.
We can merge GWAS summary statistics with SuSiE result:
```{r merge-susie-sumstats, eval=FALSE}
finemap.sumstats <- merge_susie_sumstats(susie.res, sumstats.for.susie)
```
We now have a new column called `susie_pip`, which is the probability of a SNP
being causal estimated using `SuSiE`.
_Note_: Inconsistencies between GWAS z-scores and the LD reference could
potentially inflate PIPs from SuSiE finemapping. It would be helpful to
check potential LD mismatch issue, for example using [DENTIST][DENTIST] or
the [diagnostic][diagnostic-susie-rss] from `susie_rss`.
See [Fine-mapping with UK Biobank LD tutorial][finemapping-ukbb-ld-tutorial]
for details.
So to be conservative, you could use `L = 1` if you have "out-of-sample" LD reference,
and focus on the loci with significant GWAS signals (e.g. `pval < 5e-8`).
[data-preparation-tutorial]: https://xinhe-lab.github.io/mapgen/articles/data_preparation_tutorial.html
[enrichment-tutorial]: https://xinhe-lab.github.io/mapgen/articles/enrichment_finemapping_tutorial.html
[DENTIST]: https://doi.org/10.1038/s41467-021-27438-7
[diagnostic-susie-rss]: https://stephenslab.github.io/susieR/articles/susierss_diagnostic.html
[finemapping-ukbb-ld-tutorial]: https://xinhe-lab.github.io/mapgen/articles/finemapping_ukbb_ld_diagnosis.html
[UKBB_LD]: https://uchicago.box.com/s/jqocacd2fulskmhoqnasrknbt59x3xkn