# Causal gene regulatory network reconstruction

Cells continuously monitor their internal and external environment and calculate the amount at which each type of protein is needed. This information-processing function is carried out by [gene regulatory networks (GRNs)](https://en.wikipedia.org/wiki/Gene_regulatory_network), which control the rate of production of each protein. Two important classes of regulatory molecules in GRNs are [transcription factors](https://en.wikipedia.org/wiki/Transcription_factor) and [microRNAs](https://en.wikipedia.org/wiki/MicroRNA).

In this tutorial we will use causal inference to reconstruct GRNs from [mRNA](https://en.wikipedia.org/wiki/Messenger_RNA) and [microRNA](https://en.wikipedia.org/wiki/MicroRNA) expression data from the [GEUVADIS study](https://www.nature.com/articles/nature12531). In particular, we will use genetic variants as causal instruments and the [BioFindr software](https://github.com/tmichoel/BioFindr.jl) for model selection, as introduced in the [blessing of dimensionality notebook](2_blessing_of_dimensionality.ipynb).

## Setup the environment

In [1]:
using DrWatson
quickactivate(@__DIR__)

In [15]:
using DataFrames
using Arrow
using CSV
using Gadfly
using Compose
using BioFindr

## Load data

This tutorial uses [preprocessed data files](https://github.com/lingfeiwang/findr-data-geuvadis) from the [GEUVADIS study](https://doi.org/10.1038/nature12531). We have mRNA (`t` for transcripts) and microRNA (`m`) expression data:

In [11]:
dt = DataFrame(Arrow.Table(datadir("processed","findr-data-geuvadis", "dt.arrow")));
dm = DataFrame(Arrow.Table(datadir("processed","findr-data-geuvadis", "dm.arrow")));

We also have genotype data for the strongest [eQTLs](https://en.wikipedia.org/wiki/Expression_quantitative_trait_loci) for a subset of mRNAs and microRNAs:

In [5]:
dgt = DataFrame(Arrow.Table(datadir("processed","findr-data-geuvadis", "dgt.arrow")));
dgm = DataFrame(Arrow.Table(datadir("processed","findr-data-geuvadis", "dgm.arrow")));

As regulators in our GRN, we will use TFs and microRNAs for which validation data is available:

In [24]:
TFs = DataFrame(CSV.File(datadir("processed","findr-data-geuvadis", "TFs.csv")));
microRNAs = DataFrame(CSV.File(datadir("processed","findr-data-geuvadis", "miRNAs.csv")));

The [preprocessed GEUVADIS data](https://github.com/lingfeiwang/findr-data-geuvadis) has been organized such that each column of the genotype data is the strongest eQTLs for the corresponding column in the matching expression data. Usually however, eQTL mapping data will be available in the form of a table with variant IDs, gene IDs, and various eQTL associaion statistics (see the [original GEUVADIS file](https://www.ebi.ac.uk/biostudies/files/E-GEUV-1/E-GEUV-1/analysis_results/EUR373.gene.cis.FDR5.all.rs137.txt.gz) for an example). Let's artificially create such tables for our data, keeping only eQTL genes/microRNAs that are also in our lists of regulators (`p` for "pairs"):

In [37]:
eQTL_TF = map(x -> ∈(x,TFs.TF), names(dt)[1:ncol(dgt)]);
dpt = DataFrame(SNP_ID = names(dgt)[eQTL_TF], GENE_ID=names(dt)[1:ncol(dgt)][eQTL_TF]);

In [40]:
eQTL_mir = map(x -> ∈(x,microRNAs.miRNA), names(dm)[1:ncol(dgm)]);
dpm = DataFrame(SNP_ID = names(dgm)[eQTL_mir], GENE_ID=names(dm)[1:ncol(dgm)][eQTL_mir]);

Note that we are now using only a very small number of TFs and microRNAs. This will be sufficient to illustrate the use of the software, but in real applications one would consider many more genes as potential regulators (in principle all genes with a valid eQTL instrument).

## GRN reconstruction with BioFindr

### TF-target prediction

We start by computing the probabilities of causal interactions from the selected TFs to all other genes using the methods explained in the [blessing of dimensionality notebook](2_blessing_of_dimensionality.ipynb). Naturally, we don't have to call the same low-level functions used there. Instead we can call a high-level `findr` function that does everything under the hood. You can read more about this function in the [BioFindr documentation](https://lab.michoel.info/BioFindr.jl/stable/testLLR/) or in the [BioFindr tutorials](https://lab.michoel.info/BioFindrTutorials/)