## Introduction

Internally, all [Findr.jl][1] functions work with matrices or array-based data, and the [DataFrame](https://dataframes.juliadata.org/stable/) based `findr` methods used in the [coexpression analysis](coexpression.qmd), [association analysis](association.qmd), and [causal inference](causal-inference.qmd) tutorials are wrapper functions provided for convenience. If you prefer matrix-based data over DataFrames, you can directly use matrix-based `findr` methods without having to create DataFrames first.

## Set up the environment


In [1]:
using DrWatson
quickactivate(@__DIR__)

using DataFrames
using Arrow

using Findr

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mPrecompiling Findr [77580646-997d-4218-a3cc-42097ecd1c68]


## Load data

Let's pretend our GEUVADIS data is in a matrix-based format:


In [2]:
Xt = Matrix(DataFrame(Arrow.Table(datadir("exp_pro","findr-data-geuvadis", "dt.arrow"))));
Xm = Matrix(DataFrame(Arrow.Table(datadir("exp_pro","findr-data-geuvadis", "dm.arrow"))));
Gm = Matrix(DataFrame(Arrow.Table(datadir("exp_pro","findr-data-geuvadis", "dgm.arrow"))));

We also need the microRNA eQTL mapping (see the [causal inference tutorial](causal-inference.qmd)), in this case in the form of an array where each row corresponds to a cis-eQTL/eGene pair represented by of a column index of `Gm` (i.e. a SNP) and a column index of `Xm` (i.e. a microRNA). [Recall](causal-inference.qmd) that due to the preprocessing of the [findr-geuvadis][2] data. the column indices are identical, but this will not be the case in general:


In [3]:
mirpairs = zeros(Int32,size(Gm,2),2);
for k=1:size(mirpairs,1)
    mirpairs[k,:] = [k k]
end

Note that data must be stored in matrices where **columns correspond to variables** (genes, SNPs, etc.) and **rows correspond to observations** (samples).

## Run Findr.jl

Below, we only show the relevant `findr` commands. Check the corresponding tutorials and [Findr.jl documentation][6] for more details. 

### Coexpression analysis

#### All-vs-all

Coexpression analysis on a single matrix returns a square matrix with dimensions equal to the number of variables (columns) in the input matrix:


In [4]:
P = findr(Xm)

674×674 Matrix{Float64}:
 1.0         0.0197072    0.225709   …  0.0332835  0.109018   0.106084
 0.0649505   1.0          0.0931958     0.0475433  0.733006   0.248782
 0.182571    0.0161952    1.0           0.777922   0.108028   0.1005
 0.00185799  0.078849     0.0794521     0.0700811  0.0996734  0.123286
 0.109671    0.0852573    0.239883      0.0346806  0.166101   0.298598
 0.0323098   0.0302238    0.0587378  …  0.569455   0.105368   0.186618
 0.466356    0.0116614    0.087737      0.043641   0.126047   0.0943835
 0.0233157   0.00802049   0.56947       0.0551034  0.34866    0.0983946
 0.101298    0.00280115   0.135996      0.0440159  0.193652   0.131925
 0.187731    0.535853     1.0           0.0338527  0.214703   0.134344
 0.401814    0.136546     0.433783   …  0.0333494  0.16544    0.188016
 0.195701    0.0226562    0.0287868     0.0539721  0.10707    0.161418
 0.695515    0.00794346   0.453652      0.111576   0.283267   0.0973275
 ⋮                                   ⋱             

In the output, columns correspond to A-genes (causal factors) and rows to B-genes (targets), that is:

$$
P_{i,j} = P(X_j \to X_i)
$$

Note that the diagonal is arbitrarily set to one, Findr cannot make any inferences about the presence or absence of self-regulation!

#### Bipartite

Analyse coexpression *from* a subset of variables *to* the whole set:


In [5]:
P = findr(Xm; cols=[1,3,7,50])

674×4 Matrix{Float64}:
 1.0         0.225709   0.409288   0.242541
 0.0649505   0.0931958  0.0636956  0.154539
 0.182571    1.0        0.067289   0.0618777
 0.00185799  0.0794521  0.160056   0.481415
 0.109671    0.239883   0.312803   0.0509264
 0.0323098   0.0587378  0.0264131  0.0405768
 0.466356    0.087737   1.0        0.198557
 0.0233157   0.56947    0.263707   0.114162
 0.101298    0.135996   0.0171641  0.233525
 0.187731    1.0        0.126453   0.10633
 0.401814    0.433783   0.0676799  0.288999
 0.195701    0.0287868  0.0967082  0.138005
 0.695515    0.453652   0.13247    0.259272
 ⋮                                 
 0.0484937   0.872223   0.340024   0.522543
 0.205802    0.105303   0.335258   0.103468
 0.42119     0.969754   0.2815     0.120862
 0.0699346   0.112064   0.25883    0.323375
 0.12073     0.79598    0.124066   0.282935
 0.10299     0.046016   0.51225    0.236598
 0.0109064   0.174158   0.14438    0.299121
 0.321733    0.0977148  0.188552   0.397318
 0.169049    0.

Analyse coexpression *from* the variables in `Xm` *to* the variables in `Xt`:


In [6]:
P = findr(Xt,Xm)

23722×674 Matrix{Float64}:
 0.00400739   0.0         0.747615     …  0.00195771   0.0541561
 0.00103875   0.00883145  0.4872          0.0813478    0.107062
 0.0059461    0.0         0.357284        0.0685684    0.0581842
 0.46758      0.0         0.634931        0.136805     0.0436366
 0.00128052   0.0         0.0216848       0.120345     0.711452
 0.0295149    0.00883145  0.28128      …  0.0726776    0.847095
 1.39584e-6   0.0158817   0.300538        0.00144828   0.177948
 0.115171     0.0         0.132252        0.0427649    0.0407486
 0.00041855   0.0         0.273877        0.0129385    0.381245
 0.00223831   0.00883145  0.000626261     0.00137152   0.275221
 4.42713e-5   0.00883145  0.469129     …  0.00241127   0.00558805
 0.0407525    0.0         0.550892        0.000342252  0.0564657
 0.000116842  0.0         0.537246        0.586175     0.990271
 ⋮                                     ⋱               
 0.125664     0.0         0.955045     …  0.151681     0.37233
 0.00304995   0

### Association analysis

Testing associations between eQTL genotypes in `Gm`and microRNA expression levels in `Xm`:


In [7]:
P = findr(Xm,Gm)

674×55 Matrix{Float64}:
 0.853201    0.356346  0.0       0.0  …  0.000797726  0.0  0.00224096
 0.120473    0.99982   0.0       0.0     0.00129051   0.0  0.00585794
 0.0925184   0.124446  1.0       0.0     0.000784934  0.0  0.00954639
 0.0202679   0.47438   0.0       1.0     0.000733876  0.0  0.00263948
 0.0623391   0.449505  0.0       0.0     0.00114391   0.0  0.00194524
 0.0767795   0.377259  0.0       0.0  …  0.0028452    0.0  0.00227388
 0.0271901   0.396912  0.0       0.0     0.00125577   0.0  1.0
 0.0274731   0.328489  0.0       0.0     0.00373303   0.0  0.0124102
 0.0619347   0.148897  0.0       0.0     0.00112504   0.0  0.00539071
 0.151423    0.403371  0.999825  0.0     0.00286934   0.0  0.00362802
 0.042108    0.446897  0.0       0.0  …  0.00162458   0.0  0.00890627
 0.0384388   0.272137  0.0       0.0     0.00175638   0.0  0.00188658
 0.101742    0.391688  0.0       0.0     0.000633432  0.0  0.00520416
 ⋮                                    ⋱                    
 0.0655423   0

In the output, columns correspond to eQTLs and rows to genes, that is,


$$
P_{i,j} = P(E_j \to X_i)
$$

### Causal inference

#### Subset-to-all

When you run causal inference with `findr` using matrix-based inputs, the default is to return posterior probabilities for [each test](https://tmichoel.github.io/Findr.jl/dev/realLLR/) separately:


In [8]:
P = findr(Xm,Gm,mirpairs);

Note the dimensions of `P`:


In [9]:
size(P)

(674, 4, 55)

The third dimension indexes the A-genes (causes), the second dimension the tests (test 2-5, see link above), and the first the B-genes (targets).

[1]: https://github.com/tmichoel/Findr.jl
[2]: https://github.com/lingfeiwang/findr-data-geuvadis
[3]: https://doi.org/10.1038/nature12531
[4]: https://dataframes.juliadata.org/stable/
[5]: https://doi.org/10.1371/journal.pcbi.1005703
[6]: https://tmichoel.github.io/Findr.jl/dev/