<a href="https://colab.research.google.com/github/trevdog94/multiplex/blob/tgk%2Fconstruct-multiplex/multiplex/notebooks/DRUG_REPOSITIONING_MULTIPLEX.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Open Research Questions

1. What are the specific mechanisms that drive drug repurposing? Why would we construct the network in this way? Are there existing connnections or entities that are missing which would be valuable to include?

2. What disease should we conduct a case study on?

In [1]:
## Install dependencies
!sudo apt-get install graphviz graphviz-dev
!pip install pygraphviz

## Import Libraries
import networkx as nx
import pygraphviz as pgv
import torch
import json
import pandas as pd
import numpy as np

import gzip
import shutil

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Note, selecting 'libgraphviz-dev' instead of 'graphviz-dev'
graphviz is already the newest version (2.40.1-2).
The following packages were automatically installed and are no longer required:
  cuda-command-line-tools-10-0 cuda-command-line-tools-10-1
  cuda-command-line-tools-11-0 cuda-compiler-10-0 cuda-compiler-10-1
  cuda-compiler-11-0 cuda-cuobjdump-10-0 cuda-cuobjdump-10-1
  cuda-cuobjdump-11-0 cuda-cupti-10-0 cuda-cupti-10-1 cuda-cupti-11-0
  cuda-cupti-dev-11-0 cuda-documentation-10-0 cuda-documentation-10-1
  cuda-documentation-11-0 cuda-documentation-11-1 cuda-gdb-10-0 cuda-gdb-10-1
  cuda-gdb-11-0 cuda-gpu-library-advisor-10-0 cuda-gpu-library-advisor-10-1
  cuda-libraries-10-0 cuda-libraries-10-1 cuda-libraries-11-0
  cuda-memcheck-10-0 cuda-memcheck-10-1 cuda-memcheck-11-0 cuda-nsight-10-0
  cuda-nsight-10-1 cuda-nsight-11-0 cuda-nsight-11-1 cuda-nsight-compute-10-0
  cuda-nsight

# Drug Repositioning Using Multiplex-Heterogeneous Network Embedding

The goal of this notebook is to construct and explore a Multiplex-Heterogeneous Network (MH-Network) for the purpose of Drug Repositioning. This work is enspired by the paper [MultiVERSE: a multiplex and multiplex-heterogeneous network embedding approach](https://www.nature.com/articles/s41598-021-87987-1.pdf). In this work, the authors were able to embed a MH-Network consisting of a **Drug-Target Multiplex** and **Human Molecular Multiplex** into a lower demensional space so that clustering and link prediction can be performed to find new drugs that could potentially be used to treat a given disease. Here we will process the raw data needed to construct the MH-Network and ingest the data into a graph database (Dgraph). The end goal is to build a frontend application where users can query for potential drugs that could be used to treat a given illness.

## Construct the Multiplex-Heterogeneous Network

List of Data Sources

```
layer                              |  multiplex | source
protein-protein interaction (PPI)  |  human     | 
projected drug-target              |  drug      | http://snap.stanford.edu/biodata/datasets/10002/10002-ChG-Miner.html
```

To employ the MultiVERSE algorithm, the final Multiplex-Heterogenous network needs to be converted to extended edgelist format:

```
edge_type source target weight
  r1        n1    n2    1
  r2        n2    n3    1
```

In [2]:
dt_net_loc = '/content/drive/MyDrive/multiplex/data/raw/ChG-Miner_miner-chem-gene.tsv.gz'
dt_net_tsv_loc = '/content/drive/MyDrive/multiplex/data/interim/dt_net.tsv'

### Drug Multiplex Network

#### The Projected Drug-Target Network

In [9]:
## The drug-target network from Biosnap
#http://snap.stanford.edu/biodata/datasets/10002/10002-ChG-Miner.html)
with gzip.open(dt_net_loc, 'rb') as f_in:
  with open(dt_net_tsv_loc, 'wb') as f_out:
    shutil.copyfileobj(f_in, f_out)

In [10]:
## Convert to a pandas df
dt_net_df = pd.read_csv(dt_net_tsv_loc, sep = '\t', header=0)
dt_net_df.rename(columns={'#Drug':'drug', 'Gene':'gene'}, inplace=True)
dt_net_sample_df = dt_net_df.sample(500)
dt_net_sample_df

Unnamed: 0,drug,gene
10677,DB01942,P20472
9816,DB03015,P00813
12811,DB04847,Q96KS0
6279,DB00157,P48728
4136,DB00030,Q96C24
...,...,...
13937,DB00121,P11498
7925,DB08689,P31040
14945,DB01144,P00915
12042,DB07419,P10275


In [5]:
## Convert to networkx Graph object
G1 = nx.from_pandas_edgelist(dt_net_df, source = 'drug', target = 'gene')

Jaccard coefficient of nodes $u$ and $v$ is defined as:

$$\frac{|\Gamma(u) \cap \Gamma(v)|}{|\Gamma(u) \cup \Gamma(v)|}$$

where $\Gamma(u)$ denotes the set of neighbors of $u$.

In [6]:
## Compute the Jaccard index between pairs of drugs based on the neighborhoods of each drug. 
D = list(dt_net_df['drug'].drop_duplicates().values)

drug_pairs = [(u, v) for idx, u in enumerate(D) for v in D[idx + 1:]]

coefs = nx.jaccard_coefficient(G1, drug_pairs)

In [None]:
## If the Jaccard Index between the neighborhoods of two drug nodes in the drug target network is > 0.4 then draw an edge between the two drugs.
DL4 = nx.Graph()
for u, v, p in coefs:
  if p > 0.4:
    print(f"({u}, {v}) -> {p:.8f}")
    DL4.add_edge(u, v)

In [12]:
dl4 = nx.json_graph.node_link_data(DL4)

json.dump(dl4, open("/content/drive/MyDrive/multiplex/frontend/force/projected_drug_net.json", "w"))

In [14]:

# !git add ./drive/MyDrive/multiplex/frontend/force/projected_drug_net.json

fatal: not a git repository (or any of the parent directories): .git


In [22]:
print(len(DL4.edges()), len(DL4.nodes()))

48593 4298


In [16]:
## Draw the network and export to png
A1 = nx.nx_agraph.to_agraph(DL4)
A1.layout('dot')
A1.draw('/content/drive/MyDrive/multiplex/figures/drug_target_net.png')

KeyboardInterrupt: ignored

In [8]:
# ## Set node attributes
# U1 = list(dt_net_sample_df['drug'].values)
# U1_type = list(np.repeat('Drug', len(U1)))
# U1_color = list(np.repeat('blue', len(U1)))

# V1 = list(dt_net_sample_df['drug'].values)
# V1_type = list(np.repeat('Gene', len(V1)))
# V1_color = list(np.repeat('blue', len(V1)))

# U1_attr_dict = {'id':U1, 'dgraph.type': U1_type, 'color':U1_color}
# V1_attr_dict = {'id':V1, 'dgraph.type': V1_type, 'color':V1_color}

### Human Molecular Multiplex Network

# References

[1] https://link.springer.com/chapter/10.1007/978-3-030-93413-2_60

[2] https://www.nature.com/articles/nrd.2018.168.pdf