vignettes/finemapping_tutorial.Rmd

---
title: "Tutorial for fine-mapping with functional priors"
author: Kaixuan Luo, Alan Selewa
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval=TRUE)
```

In this tutorial, we perform fine-mapping (using SuSiE) with 
functional priors computed from the enrichment analysis using `TORUS`.

Load the packages. 
```{r load-package}
library(mapgen)
```

## Input data

Please see the [data preparation tutorial][data-preparation-tutorial] about 
preparing GWAS summary statistics. 

We need a `bigSNP` object for the reference genotype panel, 
which will be used for computing LD matrices. See the 
[data preparation tutorial][data-preparation-tutorial] for details.

## SNP-level priors

Let's load the processed GWAS summary statistics obtained from 
the [data preparation tutorial][data-preparation-tutorial] and the 
SNP-level priors using `run_torus()` obtained from the [enrichment analysis tutorial][enrichment-tutorial].
```{r}
gwas.sumstats <- readRDS(system.file('extdata', 'test.processed.sumstats.rds', package='mapgen'))
torus.res <- readRDS(system.file('extdata', 'test.torus.res.rds', package='mapgen'))
```

We can add the SNP-level priors (column called 'torus_prior') to 
our summary statistics.
```{r add-torus-priors}
sumstats.for.susie <- prepare_susie_data_with_torus_result(sumstats = gwas.sumstats, 
                                                           torus_prior = torus.res$snp_prior)
```

We could set `filter == "pval"` to limit loci with GWAS p-value cutoff (default: 5e-8),
or set `filter == "torus_fdr"` by the FDR cutoff (default: 0.1) 
from running `run_torus()` with `option = "fdr"`. 

## Fine-mapping

We can perform fine-mapping with SNP-level priors using `SuSiE`, 
you will need the `susieR` package for the steps below. 
```{r, message=FALSE}
library(susieR)
```

To use SNP-level priors computed from `TORUS`, we use `run_finemapping()` 
with `priortype = 'torus'`. 

You can also run fine-mapping using uniform prior (without SNP-level priors)
with `priortype = 'uniform'`. _(You don't need to run `TORUS` if you run 
fine-mapping using uniform prior.)_

You can use the bigSNP object prepared earlier in 
the [data preparation tutorial][data-preparation-tutorial]. 
If you provide bigSNP instead of LD matrices, 
it will compute R matrices using the genotype data in the bigSNP object. 
```{r run-susie-torus-prior-bigSNP, eval=FALSE, message=FALSE}
# you will need the bigsnpr package if you use the bigSNP object.
library(bigsnpr)
susie.res <- run_finemapping(sumstats = sumstats.for.susie, 
                             bigSNP = bigSNP, 
                             priortype = 'torus', 
                             L = 1)
```

We did not provide sample size (n) for this toy example, 
but it is highly encouraged to provide that (set argument `n` = sample size) 
in the real data analysis.

If you have the LD matrices, you can use those instead of the bigSNP object.

We have pre-computed the LD matrices of European samples from UK Biobank. 
They can be downloaded [here][UKBB_LD]. 

See [Fine-mapping with UK Biobank LD tutorial][finemapping-ukbb-ld-tutorial] 
and [LD diagnosis tutorial][ld-diagnosis-tutorial] 
for details.

We ran `SuSiE` with `L = 1` here, which allows a single causal signal for 
each LD block and is robust to mismatching LD patterns. 

`susie.res` is a list of `SuSiE` results, one for each locus. 
We can merge GWAS summary statistics with SuSiE result:
```{r merge-susie-sumstats, eval=FALSE}
finemap.sumstats <- merge_susie_sumstats(susie.res, sumstats.for.susie)
```

We now have a new column called `susie_pip`, which is the probability of a SNP 
being causal estimated using `SuSiE`.

_Note_: Inconsistencies between GWAS z-scores and the LD reference could 
potentially inflate PIPs from SuSiE finemapping. It would be helpful to 
check potential LD mismatch issue, for example using [DENTIST][DENTIST] or 
the [diagnostic][diagnostic-susie-rss] from `susie_rss`. 

See [Fine-mapping with UK Biobank LD tutorial][finemapping-ukbb-ld-tutorial] 
for details.

So to be conservative, you could use `L = 1` if you have "out-of-sample" LD reference, 
and focus on the loci with significant GWAS signals (e.g. `pval < 5e-8`).

[data-preparation-tutorial]: https://xinhe-lab.github.io/mapgen/articles/data_preparation_tutorial.html
[enrichment-tutorial]: https://xinhe-lab.github.io/mapgen/articles/enrichment_finemapping_tutorial.html
[DENTIST]: https://doi.org/10.1038/s41467-021-27438-7
[diagnostic-susie-rss]: https://stephenslab.github.io/susieR/articles/susierss_diagnostic.html
[finemapping-ukbb-ld-tutorial]: https://xinhe-lab.github.io/mapgen/articles/finemapping_ukbb_ld_diagnosis.html
[UKBB_LD]: https://uchicago.box.com/s/jqocacd2fulskmhoqnasrknbt59x3xkn