This script describes how to process the PPI network embedded in the GCN layer.

Before you start, download the filered PPI network for homo sapiens from the STRING database (https://stringdb-downloads.org/download/protein.links.v12.0/10090.protein.links.v12.0.txt.gz), and place it under the assigned directory.

In [1]:
library(dplyr)
setwd('/nfs/public/xixi/scRegulate/string_ppi')


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




# String PPI

In [2]:
ppi <- read.table('10090.protein.links.v12.0.txt', header = TRUE)
ppi

protein1,protein2,combined_score
<chr>,<chr>,<int>
10090.ENSMUSP00000000001,10090.ENSMUSP00000027991,889
10090.ENSMUSP00000000001,10090.ENSMUSP00000137332,163
10090.ENSMUSP00000000001,10090.ENSMUSP00000041756,201
10090.ENSMUSP00000000001,10090.ENSMUSP00000075170,969
10090.ENSMUSP00000000001,10090.ENSMUSP00000110978,267
10090.ENSMUSP00000000001,10090.ENSMUSP00000049296,194
10090.ENSMUSP00000000001,10090.ENSMUSP00000105588,392
10090.ENSMUSP00000000001,10090.ENSMUSP00000121127,916
10090.ENSMUSP00000000001,10090.ENSMUSP00000081569,956
10090.ENSMUSP00000000001,10090.ENSMUSP00000001845,206


In [3]:
info <- read.csv('10090.protein.info.v12.0.txt', sep = '\t')
info

X.string_protein_id,preferred_name,protein_size,annotation
<chr>,<chr>,<int>,<chr>
10090.ENSMUSP00000000001,Gnai3,354,"Guanine nucleotide-binding protein G(i) subunit alpha; Heterotrimeric guanine nucleotide-binding proteins (G proteins) function as transducers downstream of G protein-coupled receptors (GPCRs) in numerous signaling cascades. The alpha chain contains the guanine nucleotide binding site and alternates between an active, GTP-bound state and an inactive, GDP-bound state. Signaling by an activated GPCR promotes GDP release and GTP binding. The alpha subunit has a low GTPase activity that converts bound GTP to GDP, thereby terminating the signal. Both GDP release and GTP hydrolysis are modul [...]"
10090.ENSMUSP00000000003,Pbsn,174,Probasin.
10090.ENSMUSP00000000010,Hoxb9,250,Homeobox protein Hox-B9; Sequence-specific transcription factor which is part of a developmental regulatory system that provides cells with specific positional identities on the anterior-posterior axis; Belongs to the Abd-B homeobox family.
10090.ENSMUSP00000000028,Cdc45,566,Cell division control protein 45 homolog; Required for initiation of chromosomal DNA replication; Belongs to the CDC45 family.
10090.ENSMUSP00000000049,Apoh,345,"Beta-2-glycoprotein 1; Binds to various kinds of negatively charged substances such as heparin, phospholipids, and dextran sulfate. May prevent activation of the intrinsic blood coagulation cascade by binding to phospholipids on the surface of damaged cells."
10090.ENSMUSP00000000058,Cav2,162,"Caveolin-2; May act as a scaffolding protein within caveolar membranes. Interacts directly with G-protein alpha subunits and can functionally regulate their activity. Acts as an accessory protein in conjunction with CAV1 in targeting to lipid rafts and driving caveolae formation. The Ser-36 phosphorylated form has a role in modulating mitosis in endothelial cells. Positive regulator of cellular mitogenesis of the MAPK signaling pathway. Required for the insulin-stimulated nuclear translocation and activation of MAPK1 and STAT3, and the subsequent regulation of cell cycle progression (B [...]"
10090.ENSMUSP00000000080,Klf6,318,Krueppel-like factor 6; Transcriptional activator. Binds a GC box motif. Could play a role in B-cell growth and development (By similarity); Belongs to the krueppel C2H2-type zinc-finger protein family.
10090.ENSMUSP00000000090,Cox5a,146,"Cytochrome c oxidase subunit 5A, mitochondrial; Component of the cytochrome c oxidase, the last enzyme in the mitochondrial electron transport chain which drives oxidative phosphorylation. The respiratory chain contains 3 multisubunit complexes succinate dehydrogenase (complex II, CII), ubiquinol- cytochrome c oxidoreductase (cytochrome b-c1 complex, complex III, CIII) and cytochrome c oxidase (complex IV, CIV), that cooperate to transfer electrons derived from NADH and succinate to molecular oxygen, creating an electrochemical gradient over the inner membrane that drives transmembrane [...]"
10090.ENSMUSP00000000095,Tbx2,711,T-box transcription factor TBX2; Involved in the transcriptional regulation of genes required for mesoderm differentiation. Probably plays a role in limb pattern formation. Acts as a negative regulator of PML function in cellular senescence. May be required for cardiac atrioventricular canal formation (By similarity).
10090.ENSMUSP00000000122,Ngfr,427,"Tumor necrosis factor receptor superfamily member 16; Low affinity neurotrophin receptor which can bind to mature NGF, BDNF, NTF3, and NTF4. Forms a heterodimeric receptor with SORCS2 that binds the precursor forms of NGF (proNGF), BDNF (proBDNF) and NTF3 (proNT3) with high affinity, and has much lower affinity for mature NGF and BDNF. Plays an important role in differentiation and survival of specific neuronal populations during development. Can mediate cell survival as well as cell death of neural cells. The heterodimeric receptor formed with SORCS2 plays a role in proBDNF-dependent [...]"


In [None]:
genenames <- as.factor(info$preferred_name)
names(genenames) <- info$X.string_protein_id

In [5]:
ppi$protein1 <- as.character(genenames[ppi$protein1])
ppi$protein2 <- as.character(genenames[ppi$protein2])
ppi

protein1,protein2,combined_score
<chr>,<chr>,<int>
Gnai3,Rgs4,889
Gnai3,Cmtm4,163
Gnai3,Arl5a,201
Gnai3,Drd2,969
Gnai3,Grm8,267
Gnai3,Pkd1,194
Gnai3,Sstr4,392
Gnai3,Gnb4,916
Gnai3,Rgs3,956
Gnai3,Capns1,206


In [None]:
genes <- read.table('/nfs/public/xixi/scRegulate/SHAREseq/predict_lineage_pseudotime/genes.txt')$V1
inds <- as.factor(c(0:(length(genes)-1)))
names(inds) <- genes

In [None]:
edges <- unique(ppi[(ppi$protein1 %in% genes) & (ppi$protein2 %in% genes), 1:2])
edges$protein1 <- as.character(inds[edges$protein1])
edges$protein2 <- as.character(inds[edges$protein2])
edges <- unique(edges)

In [8]:
write.table(edges, '/nfs/public/xixi/scRegulate/SHAREseq/predict_lineage_pseudotime/string_filtered.txt', sep = '\t', quote = F, row.names = F)