# Network analysis

## Network visualization in Cytoscape

[Cytoscape](https://cytoscape.org/) is probably the best software for network visualization.

<div style="background-color:LightYellow; color:black">
<h3>Exercise</h3> 
     Install Cytoscape. Import the directed, acyclic regulator -> regulator graph you saved in the [GRN reconstruction notebook](3_grn_reconstruction.ipynb), and visualize it using a hierarchical layout (<a href="https://apps.cytoscape.org/apps/yfileslayoutalgorithms">yFiles layout</a> is recommended). In the [GRN reconstruction notebook](3_grn_reconstruction.ipynb), we saw that <a href="https://en.wikipedia.org/wiki/Sp2_transcription_factor">SP2</a> was the TF with the highest number of targets. Where does SP2 lie in our reconstructed hierarchy? 
</div>

## Network validation

After reconstructing a GRN, we want to evaluate if the predicted targets of a TF overlap with its known targets. The most common validation data are TF binding locations from [ChIP sequencing](https://en.wikipedia.org/wiki/ChIP_sequencing) or differential expression following [gene silencing](https://en.wikipedia.org/wiki/Gene_silencing) or [gene knockout](https://en.wikipedia.org/wiki/Gene_knockout) experiments.

In a study by [Cusanovich et al.](https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004226#s2) such binding and silencing data were generated for a lymphoblastoid cell line, the same cell type used in the [GEUVADIS study](https://doi.org/10.1038/nature12531). The difficulty of validating GRNs is well illustrated by the fact that the main finding of this study was the limited overlap between these two experimental validation methods!

<div style="background-color:LightYellow; color:black">
<h3>Exercise</h3> 
     Download Supplementary Table S3 from the Cusanovich et al study, find validation data for TFs in our predicted GRN and compute the overlap between predicted and validated targets. Use the <a href="https://en.wikipedia.org/wiki/Hypergeometric_distribution#Hypergeometric_test">hypergeometric test</a> to test if the overlap is significant.
</div>

## Subnetwork eigengene correlations with phenotypes

In a disease-related study where we have eQTL and gene expression data to reconstruct causal GRNs, we usually also have clinical characteristics or other phenotypes available for the same individuals. A key question then is whether sub-networks of the inferred GRN are involved in the regulation of certain disease phenotypes. A commonly used method to answer this question is to summarize the expression of a subnetwork of genes as an [*eigengene*](https://bmcsystbiol.biomedcentral.com/articles/10.1186/1752-0509-1-54), defined as the first principal component of the expression data for the genes in the subnetwork. Then associations are tested between the eigengenes and the available phenotypes using correlation (for continuous phenotypes) or differential expression (for discrete phenotypes).

In the [GEUVADIS study](https://doi.org/10.1038/nature12531), no real phenotypes are available. To nevertheless illustrate this approach fake phenotypes have been created using the principal components of the expression data (saved to a file `PC_phenotypes.csv`).

<div style="background-color:LightYellow; color:black">
<h3>Exercise</h3> 
     For each regulator in the predicted GRN, define a subnetwork consisting of the regulator and its predicted targets. Use the <a href="https://lab.michoel.info/BioFindr.jl/dev/inference/#BioFindr.supernormalize">supernormalized data</a> to compute each subnetwork's eigengene. Check how much variance of the subnetwork's genes is explained by the eigengene. Find the subnetworks most strongly associated with each of the three phenotypes.
</div>

## Functional enrichment and drug repurposing

When we have found subnetworks that are correlated with disease phenotypes, we want to better understand the functional meaning of these networks, if they contain druggable genes, and if they can be targeted by already known drugs or compounds. For instance, if we find a subnetwork whose genes are positively correlated with more severy disease phenotypes, we would like to identify compounds that reduce the expression of these genes.

### Functional enrichment

Functional enrichment tests if the genes in a subnetwork are known to operate in the same biological processes. Many tools exist to analyze functional enrichment. A popular one is:

- [Enrichr](https://maayanlab.cloud/Enrichr/)

### Druggability

Finding druggable genes in a subnetwork includes checking if genes coding for known drug targets or belonging to general druggable classes of proteins are present in the network. Available resources include:

- List of [protein kinases coding genes](https://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/complete/docs/pkinfam.txt)
- List of [G-protein-coupled receptors (GPCRs)](https://www.guidetopharmacology.org/GRAC/GPCRListForward?class=A)
- Drug-gene interactions from [DGIdb](https://www.dgidb.org/) or [Guide to pharmacology](https://www.guidetopharmacology.org/)
- Protein-protein interactions from [ConsensusPathDB](http://cpdb.molgen.mpg.de/) or other similar databases

### Drug repurposing

Drug repurposing is based on matching gene sets (e.g. from a subnetwork of interest) against gene signatures of differential expression following treatment with drugs or other chemical compounds and gene silencing or overexpression experiments. Thousands of such signatures have been determined experimentally in cancer cell lines:

- [Connectivity map](https://clue.io/)
- [SigCOM LINCS](https://maayanlab.cloud/sigcom-lincs)

<div style="background-color:LightYellow; color:black">
<h3>Exercise</h3> 
     For each regulator in the predicted GRN, define a subnetwork consisting of the regulator and its predicted targets. Define up- and down-regulated gene sets as the targets that are positively, resp. negatively correlated with the regulator. Upload gene sets to the Connectivity Map. Do you find any compounds whose signature overlaps significantly with your subnetwork?
</div>