<a href="https://colab.research.google.com/github/zurkin1/Pathweigh/blob/master/Pathweigh.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pathweigh Analysis.

- This notebook demonstrates running analysis of pathway analysis of sequencing data of type RNAseq. The input file is located in the data folder.

- The input file is named input.csv and is a next gen RNA sequence data.
- A sample file is provided. Please ignore its content. This is just a sample file to demonstrate the flow.
- We need to make sure that:
<ol>
<li> The normalization methods are either TPM or RPKM or FPKM. This is important since the default behaviour in UDP.py assumes negative binomial distribution of the genes. If you are familiar with Python you can edit udp.py and select other distribution that are provided there as well.
<li> We need to make sure to configure the following parameter is_rnaseq == True.
<li> Gene names follow HUGO standard as defined for example in https://www.genenames.org/. In general the gene names are the names that are used in the pathways definition.
</ol>

## Calculate UDP per Gene.
Here we calculate the Up or Down Probability per gene, for all pathways.

In [2]:
%cd ..

C:\Users\danili\Downloads\Pathweigh-master


In [2]:
from udp import *

calc_udp_multi_process(is_rnaseq = True)

Tue Jul 18 08:41:00 2023 Calculate UDP, is_rnaseq: True
..................................................Tue Jul 18 08:52:12 2023 Done.


Unnamed: 0_level_0,17-002,17-006,17-019,17-023,17-026,17-027,17-030,17-034,17-040,17-041,17-042,17-043,17-045,17-047,17-054,17-055,17-057,17-058,17-060,17-061
GeneSym,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
7a5,0.466695,0.638102,0.638102,0.466695,0.638102,0.291047,0.525962,0.466695,0.583448,0.525962,0.583448,0.466695,0.466695,0.638102,0.583448,0.525962,0.406879,0.406879,0.638102,0.466695
a1bg,0.364542,0.900828,0.415207,0.612862,0.516567,0.516567,0.466157,0.516567,0.364542,0.516567,0.773386,0.315032,0.415207,0.612862,0.466157,0.415207,0.900828,0.900828,0.612862,0.900828
a1cf,0.580718,0.403807,0.463669,0.635594,0.523055,0.635594,0.463669,0.523055,0.523055,0.580718,0.463669,0.580718,0.463669,0.463669,0.580718,0.523055,0.523055,0.635594,0.403807,0.635594
a26c3,0.396579,0.455696,0.338548,0.725774,0.514578,0.571994,0.230997,0.678348,0.455696,0.455696,0.725774,0.626874,0.725774,0.514578,0.282937,0.626874,0.725774,0.725774,0.455696,0.678348
a2bp1,0.738256,0.579151,0.518831,0.456669,0.579151,0.738256,0.518831,0.738256,0.579151,0.456669,0.636414,0.689670,0.738256,0.738256,0.456669,0.518831,0.456669,0.518831,0.456669,0.579151
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
zyg11a,0.759685,0.677734,0.477553,0.720547,0.677734,0.423564,0.795015,0.423564,0.477553,0.477553,0.795015,0.477553,0.530768,0.795015,0.423564,0.423564,0.530768,0.795015,0.530768,0.631521
zyg11b,0.629372,0.019689,0.735585,0.900154,0.657508,0.220309,0.900154,0.271472,0.015741,0.154089,0.271472,0.196753,0.900154,0.900154,0.570649,0.386490,0.570649,0.710713,0.759203,0.570649
zyx,0.771535,0.292669,0.835107,0.902406,0.513309,0.717255,0.902406,0.636005,0.531031,0.395290,0.902406,0.031803,0.422692,0.036859,0.185028,0.902406,0.301318,0.134973,0.098462,0.706885
zzef1,0.129505,0.086441,0.844376,0.458051,0.858779,0.099526,0.204638,0.273043,0.858779,0.812473,0.794949,0.323227,0.691918,0.017682,0.844376,0.858779,0.844376,0.249228,0.204638,0.858779


## Calculate Activity For all Samples and For All Pathways.

In [3]:
from udp import *
from activity import *


udp = pd.read_csv('./data/output_udp.csv', index_col=0, encoding="utf-8")
activity_obj = path_activity(udp, is_rnaseq = True)
activity = activity_obj.calc_activity_consistency_multi_process() #Output is saved to output_activity.csv.

activity_obj.activity.head()

Tue Jul 18 09:12:10 2023 Init activity object
Tue Jul 18 09:12:11 2023 Calculate activity and consistency...
.Tue Jul 18 09:12:24 2023 Done.


sampleID,17-002,17-006,17-019,17-023,17-026,17-027,17-030,17-034,17-040,17-041,17-042,17-043,17-045,17-047,17-054,17-055,17-057,17-058,17-060,17-061
path_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1- and 2-Methylnaphthalene degradation(Kegg),0.647734,0.656807,0.557137,0.424184,0.473169,0.438492,0.478672,0.6425,0.63545,0.594516,0.56237,0.632876,0.480193,0.594516,0.53638,0.595516,0.558011,0.575818,0.64113,0.576818
1-4-Dichlorobenzene degradation(Kegg),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3-Chloroacrylic acid degradation(Kegg),0.798822,0.817806,0.796039,0.7341,0.68915,0.723182,0.845878,0.791842,0.741356,0.741112,0.827948,0.796039,0.755424,0.741112,0.675395,0.759734,0.678203,0.835736,0.685988,0.75756
Acute myeloid leukemia(Kegg),0.359899,0.365481,0.398714,0.345536,0.344096,0.316362,0.377824,0.383984,0.460113,0.342276,0.392314,0.2915,0.337403,0.355758,0.366264,0.423215,0.361883,0.257353,0.37749,0.33395
Adherens junction(Kegg),0.233236,0.243647,0.28955,0.228519,0.178912,0.215123,0.280066,0.306218,0.269894,0.281987,0.309846,0.306717,0.294797,0.255783,0.27215,0.302786,0.247442,0.19818,0.21534,0.297161


## Draw a Heatmap For Path Activity.

The following UI demonstrations were done using the following libraries versions:

- plotly 5.9.0
- ipywidgets 7.6.5 (plotly currently cannot work with ipywidgets version 8). It might be possible to run other combinations.

In [4]:
import plotly.express as px
import plotly.graph_objects as go


sample_num = 0
path_id = 708708 #1-and 2-Methylnaphthalene degradation(Kegg).

df = activity_obj.activity.transpose()

#for col in df:
#    if df[col].mean() < 0.5:
#        df.drop(col, inplace=True, axis=1)

#fig = px.imshow(df, text_auto = True, aspect = 'auto', width = 1000, height = 800)
fig = go.FigureWidget(go.Heatmap(x=df.columns, y=df.index, z=df.to_numpy()))
fig.update_layout(width = 1000, height = 1000)

#Create our callback function.
hmap = fig.data[0]
fig.layout.hovermode = 'closest'

def update_point(trace, points, selector):
    path_id = points.xs[0]
    sample_num = points.ys[0]
    activity_obj.graphparser(path_id=path_id, sample_num=sample_num)
    #xml_result = activity_obj.xmlparser(path_id=path_id, sample_num=sample_num)


hmap.on_click(update_point)

fig

FigureWidget({
    'data': [{'type': 'heatmap',
              'uid': 'cab899cc-387e-4f91-95cb-9e0331920c2f',
 …

## Download a KGML File to Import to Cytoscape.


- Configure sample_num.
- Configure path_id for analysis and export. Use the following table:
https://github.com/zurkin1/Pathweigh/blob/master/README.md.

In [5]:
import bs4


sample_num = 0
path_id = 708708

xml_result = activity_obj.xmlparser(path_id=path_id, sample_num=sample_num)
print(bs4.BeautifulSoup(xml_result).prettify())

Create Kegg XML for path: 708708, sample: 0
<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "https://www.kegg.jp/kegg/xml/KGML_v0.7.2_.dtd">
<html>
 <body>
  <pathway name="path:hsa708708" number="111" org="hsa" title="1- and 2-Methylnaphthalene degradation(Kegg)">
   <entry id="708648" link="https://www.kegg.jp/dbget-bin/www_bget?hsa:4967+hsa:708648" name="hsa:4967 hsa:fm_708637" type="protein">
    <graphics bgcolor="#42ECEF" fgcolor="#FF4400" height="17" name="fm_708637" type="rectangle" width="60" x="0.8885235533793071">
    </graphics>
   </entry>
   <entry id="708709" link="https://www.kegg.jp/dbget-bin/www_bget?C00158" name="708709" type="reaction">
    <graphics bgcolor="#FFF8F8" fgcolor="#FF4400" height="15" name="reaction" type="circle" width="60" x="0.8885235533793071" y="0.8885235533793071">
    </graphics>
   </entry>
   <relation entry1="708648" entry2="708709" type="PPrel">
    <subtype name="activation" value="--&gt;">
    </subtype>
   </relation>
   <entry id="708651" 





## Draw a Pathway Activity Map.

We can select a specific sample and a specific pathway and draw the activity map.

In [6]:
sample_num = 0
path_id = 708708

activity_obj.graphparser(path_id, sample_num)

## Troubleshooting
- It was reported on problems running in Pycharm due to multiprocessing issues. There are no problems running on the command line or in Colab.