This script describes how to process the PPI network embedded in the GCN layer.

Before you start, download the filered PPI network for homo sapiens from the STRING database (https://stringdb-downloads.org/download/protein.links.v12.0/9606.protein.links.v12.0.txt.gz, https://stringdb-downloads.org/download/protein.info.v12.0/9606.protein.info.v12.0.txt.gz), and place it under the assigned directory.

In [2]:
library(dplyr)
setwd('/nfs/public/xixi/scRegulate/string_ppi')

# String PPI

In [15]:
ppi <- read.table('9606.protein.links.v12.0.txt', header = TRUE)
ppi

protein1,protein2,combined_score
<chr>,<chr>,<int>
9606.ENSP00000000233,9606.ENSP00000356607,173
9606.ENSP00000000233,9606.ENSP00000427567,154
9606.ENSP00000000233,9606.ENSP00000253413,151
9606.ENSP00000000233,9606.ENSP00000493357,471
9606.ENSP00000000233,9606.ENSP00000324127,201
9606.ENSP00000000233,9606.ENSP00000325266,180
9606.ENSP00000000233,9606.ENSP00000320935,181
9606.ENSP00000000233,9606.ENSP00000371175,594
9606.ENSP00000000233,9606.ENSP00000480364,154
9606.ENSP00000000233,9606.ENSP00000388107,161


In [16]:
info <- read.csv('9606.protein.info.v12.0.txt', sep = '\t')
info

X.string_protein_id,preferred_name,protein_size,annotation
<chr>,<chr>,<int>,<chr>
9606.ENSP00000000233,ARF5,180,ADP-ribosylation factor 5; GTP-binding protein involved in protein trafficking; may modulate vesicle budding and uncoating within the Golgi apparatus. Belongs to the small GTPase superfamily. Arf family.
9606.ENSP00000000412,M6PR,277,Cation-dependent mannose-6-phosphate receptor; Transport of phosphorylated lysosomal enzymes from the Golgi complex and the cell surface to lysosomes. Lysosomal enzymes bearing phosphomannosyl residues bind specifically to mannose-6-phosphate receptors in the Golgi apparatus and the resulting receptor-ligand complex is transported to an acidic prelyosomal compartment where the low pH mediates the dissociation of the complex.
9606.ENSP00000001008,FKBP4,459,"Peptidyl-prolyl cis-trans isomerase FKBP4, N-terminally processed; Immunophilin protein with PPIase and co-chaperone activities. Component of steroid receptors heterocomplexes through interaction with heat-shock protein 90 (HSP90). May play a role in the intracellular trafficking of heterooligomeric forms of steroid hormone receptors between cytoplasm and nuclear compartments. The isomerase activity controls neuronal growth cones via regulation of TRPC1 channel opening. Acts also as a regulator of microtubule dynamics by inhibiting MAPT/TAU ability to promote microtubule assembly. May [...]"
9606.ENSP00000001146,CYP26B1,512,"Cytochrome P450 26B1; Involved in the metabolism of retinoic acid (RA), rendering this classical morphogen inactive through oxidation. Involved in the specific inactivation of all-trans-retinoic acid (all-trans-RA), with a preference for the following substrates: all-trans-RA > 9-cis-RA > 13- cis-RA. Generates several hydroxylated forms of RA, including 4-OH-RA, 4-oxo-RA, and 18-OH-RA. Catalyzes the hydroxylation of carbon hydrogen bonds of atRA primarily at C-4. Essential for postnatal survival. Plays a central role in germ cell development: acts by degrading RA in the developing test [...]"
9606.ENSP00000002125,NDUFAF7,441,"Protein arginine methyltransferase NDUFAF7, mitochondrial; Arginine methyltransferase involved in the assembly or stability of mitochondrial NADH:ubiquinone oxidoreductase complex (complex I). Acts by mediating symmetric dimethylation of 'Arg-118' of NDUFS2 after it assembles into the complex I, stabilizing the early intermediate complex."
9606.ENSP00000002165,FUCA2,467,"Plasma alpha-L-fucosidase; Alpha-L-fucosidase is responsible for hydrolyzing the alpha- 1,6-linked fucose joined to the reducing-end N-acetylglucosamine of the carbohydrate moieties of glycoproteins; Belongs to the glycosyl hydrolase 29 family."
9606.ENSP00000002596,HS3ST1,307,Heparan sulfate glucosamine 3-O-sulfotransferase 1; Sulfotransferase that utilizes 3'-phospho-5'-adenylyl sulfate (PAPS) to catalyze the transfer of a sulfo group to position 3 of glucosamine residues in heparan. Catalyzes the rate limiting step in the biosynthesis of heparan sulfate (HSact). This modification is a crucial step in the biosynthesis of anticoagulant heparan sulfate as it completes the structure of the antithrombin pentasaccharide binding site.
9606.ENSP00000002829,SEMA3F,785,Semaphorin-3F; May play a role in cell motility and cell adhesion.
9606.ENSP00000003084,CFTR,1480,Cystic fibrosis transmembrane conductance regulator; Epithelial ion channel that plays an important role in the regulation of epithelial ion and water transport and fluid homeostasis. Mediates the transport of chloride ions across the cell membrane. Channel activity is coupled to ATP hydrolysis. The ion channel is also permeable to HCO(3-); selectivity depends on the extracellular chloride concentration. Exerts its function also by modulating the activity of other ion channels and transporters. Plays an important role in airway fluid homeostasis. Contributes to the regulation of the pH [...]
9606.ENSP00000003100,CYP51A1,509,"Lanosterol 14-alpha demethylase; A cytochrome P450 monooxygenase involved in sterol biosynthesis. Catalyzes 14-alpha demethylation of lanosterol and 24,25- dihydrolanosterol likely through sequential oxidative conversion of 14- alpha methyl group to hydroxymethyl, then to carboxylaldehyde, followed by the formation of the delta 14,15 double bond in the sterol core and concomitant release of formic acid. Mechanistically, uses molecular oxygen inserting one oxygen atom into a substrate, and reducing the second into a water molecule, with two electrons provided by NADPH via cytochrome P45 [...]"


In [17]:
genenames <- as.factor(info$preferred_name)
names(genenames) <- info$X.string_protein_id
genenames

In [18]:
ppi$protein1 <- as.character(genenames[ppi$protein1])
ppi$protein2 <- as.character(genenames[ppi$protein2])
ppi

protein1,protein2,combined_score
<chr>,<chr>,<int>
ARF5,RALGPS2,173
ARF5,FHDC1,154
ARF5,ATP6V1E1,151
ARF5,CYTH2,471
ARF5,PSD3,201
ARF5,TTC9C,180
ARF5,SLC2A4,181
ARF5,GGA1,594
ARF5,RCC1L,154
ARF5,UBA52,161


In [19]:
# genes here are selected genes in the model
genes <- read.table('/nfs/public/xixi/scRegulate/T2D/predict_status/genes.txt')$V1
inds <- as.factor(c(0:(length(genes)-1)))
names(inds) <- genes
inds

In [20]:
edges <- unique(ppi[(ppi$protein1 %in% genes) & (ppi$protein2 %in% genes), 1:2])
edges$protein1 <- as.character(inds[edges$protein1])
edges$protein2 <- as.character(inds[edges$protein2])
edges <- unique(edges)
edges

Unnamed: 0_level_0,protein1,protein2
Unnamed: 0_level_1,<chr>,<chr>
50841,238,226
50860,238,19
50890,238,50
50924,238,80
51197,238,48
51208,238,11
51226,238,69
51256,238,41
51259,238,28
51261,238,13


In [21]:
write.table(edges, '/nfs/public/xixi/scRegulate/T2D/predict_status/string_filtered.txt', sep = '\t', quote = F, row.names = F)