# Functional analysis of gene signatures

In this notebook we will analyze the differentially expressed genes found by DESeq2 on the bulk RNA-seq data, for our case study. 

Two replicates of Exercise Tumor (ET) RNA-seq samples were compared with 2 replicates of Non-Exercise Tumor (NT) RNA-seq samples. 

The DESeq2 analysis resulted in a resLFC_df.tsv file, which we will use here.
The Seurat analysis of scRNA-seq data ...
The Scanpy analysis of scRNA-seq data ...

In [2]:
pwd

/mnt/storage/r0974221/jupyternotebooks/assignment_1


In [3]:
mkdir -p functional_analysis_gene_signatures

In [4]:
cd functional_analysis_gene_signatures/

In [5]:
pwd

/mnt/storage/r0974221/jupyternotebooks/assignment_1/functional_analysis_gene_signatures


In [2]:
# full path results from bulk RNA seq
# /home/luna.kuleuven.be/r0974221/data/jupyternotebooks/assignment_1/bulk_RNA_seq/resLFC_df.tsv

In [15]:
cp /home/luna.kuleuven.be/r0974221/data/jupyternotebooks/assignment_1/bulk_RNA_seq/resLFC_df.tsv .

In [21]:
# Renaming file to deseq.results.tsv
mv resLFC_df.tsv deseq.results.tsv

In [6]:
ls

deseq.results.tsv


In [7]:
head deseq.results.tsv

Genes	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
Sox17	17.9218351322995	0.0567175928080931	0.14291740155147	-0.122261169761646	0.902692183280413	0.999847817557225
Mrpl15	196.706977153318	0.17245659275457	0.159496849929938	1.1349202599006	0.256408730339664	0.999847817557225
Lypla1	663.927686987207	0.0880983772182862	0.15272993090251	0.759890974281584	0.447319757132613	0.999847817557225
Tcea1	89.189070084373	-0.0562815738874985	0.167276006318027	-0.387324493824752	0.69851599451619	0.999847817557225
Atp6v1h	301.397408016249	0.0991751933429822	0.155904291214187	0.711309724268578	0.476892326963495	0.999847817557225
Rb1cc1	236.936023938396	0.000325921275590987	0.159928431890846	0.0690067186636276	0.94498427153017	0.999847817557225
4732440D04Rik	7.87055446656268	-0.0289689672155057	0.118104262515928	-0.26707065561488	0.789414760647489	0.999847817557225
Pcmtd1	447.006175752713	0.143614785641583	0.14910674857427	1.04452662085504	0.296241809355098	0.999847817557225
Gm9826	5.79032460210304	-0

In [9]:
head -1 deseq.results.tsv
grep -n Tff3 deseq.results.tsv
grep -n H19 deseq.results.tsv
grep -n BC023105 deseq.results.tsv

Genes	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
11188:Tff3	244.759418632065	1.09422282431597	0.167775382888266	3.35222619119848	0.000801644939982028	0.900247267599817
5777:H19	181.233659692708	0.961629939863601	0.167615076333602	3.48929091335083	0.00048430376888486	0.664733828559408
11784:BC023105	44.1189418047297	-0.597018737290996	0.155957722732115	-4.24420707861395	2.19367709086422e-05	0.0677462327586143


## Use arbitrary thresholds to create lists of up- and down-regulated genes
* careful: there are a lot of genes without detected expression, they have NA in the logFC column; so column 3 ($3) should not be "NA" 
* we're using awk to filter this file, selecting only rows where the logFC (column 3, indicated by "\$3") is higher than a threshold; and the padj is lower than a threshold
* print ; => prints all the columns of the rows that fulfill our requirements

In [20]:
awk '$3 != "NA" && $7 < 0.05 {print $1, $3}' deseq.results.tsv

Dbp 0.600911358130066
Meg3 -0.358620106642355
Rian -0.412696152907712


## p-value 
__Because I only had three genes that were significant with the adjusted p value, I decided to use the normal p-value so that I could continue the analysis.__

In [21]:
awk '$3 != "NA" && $6 < 0.05 {print $1, $3}' deseq.results.tsv

Hs6st1 0.317317153224412
Npas2 -0.3435393766771
Col5a2 -0.261757199863888
Tns1 0.32168911596354
Dock10 -0.301635287553384
Per2 0.473390967679951
Bcl2 -0.225004035036166
AA986860 0.302367834851354
Etnk2 0.494579448549569
Adora1 0.258688157845901
Ppfia4 -0.232059142614159
Ptprc -0.323479723956436
Cdc73 -0.329039442322891
Fam129a -0.451042458934577
Atf3 0.47595585576124
B3galt1 -0.344469080558399
Slc20a1 0.437034879493215
Sirpa -0.407885710354397
Wfdc2 0.16278699137289
Zbp1 -0.334341486921642
Was -0.253976146690579
Maoa 0.233380659769643
Eda 0.0636793647564353
5530601H04Rik -0.29255392603387
Armcx4 -0.240666035902933
G530011O06Rik -0.147299083861955
Slc2a2 0.404169048907598
Arhgef26 0.302497501036556
Rarres1 0.52824607528542
Tdo2 0.317013860332833
Gatb 0.251403327837179
Pklr 0.352180760559563
S100a10 0.282799841048424
Fmo5 0.312745086011498
Tspan2 -0.232898049660319
Gstm3 0.32371010812901
Elovl6 0.326912409230262
Gbp5 -0.320221722388502
Gbp7 -0.316539061965027
Gbp2b -0.220242102948031
Gbp

## Upregulated Genes 

In [23]:
awk '$3 != "NA" && $3 > 1 && $6 < 0.05 {print $1}' deseq.results.tsv > up-logFC1-p05.txt

## Downregulated Genes

In [24]:
awk '$3 != "NA" && $3 < -1 && $6 < 0.05 {print $1}' deseq.results.tsv > down-logFC1-p05.txt

Counting number of up and down regulated genes

In [25]:
wc -l up-logFC1-p05.txt
wc -l down-logFC1-p05.txt

1 up-logFC1-p05.txt
0 down-logFC1-p05.txt


## Mousemine.org