# Analysis of A. thaliana RNA-Seq data with pyrpipe 
Use A thaliana public RNA-Seq data to assemble transcripts.

In [1]:
from pyrpipe import sra,mapping,assembly,qc,tools
#First get the srr accessions of the runs. For this one can use the python package pysradb or R package sradb
#i will consider following randomly selected accessions
#athalRuns=['SRR976159','SRR978411','SRR978410','SRR971778','SRR1058116','SRR1058118','SRR1058121','SRR1058110','SRR1058120','SRR1058117','SRR1104134','SRR1104133','SRR1104135','SRR1104136','SRR1105825']
athalRunsSmol=['SRR976159','SRR978411','SRR971778']

[94mLogs will be written to /home/usingh/work/urmi/hoap/pyrpipe/examples/pyrpipe_logs/2019-12-09-17_52_17_pyrpipeCMD.log, /home/usingh/work/urmi/hoap/pyrpipe/examples/pyrpipe_logs/2019-12-09-17_52_17_pyrpipeOUT.log, /home/usingh/work/urmi/hoap/pyrpipe/examples/pyrpipe_logs/2019-12-09-17_52_17_pyrpipeERR.log, /home/usingh/work/urmi/hoap/pyrpipe/examples/pyrpipe_logs/2019-12-09-17_52_17_pyrpipeEnv.log[0m


## Download data and create SRA objects
First we can donload all data to disk and save SRA objects to memory.

In [2]:
#set your working directory if you don't want to use the current working directory
workingDir="/home/usingh/work/urmi/hoap/test/athalData/sraData"
##download all data in athalRuns
sraObjects=[]

for x in athalRunsSmol:
    thisSraOb=sra.SRA(x,workingDir)
    if thisSraOb.downloadSRAFile():
        sraObjects.append(thisSraOb)
    else:
        print("Download failed:"+x)

print("Following runs downloaded:")
for ob in sraObjects:
    print(ob.srrAccession)

[95mCreating SRA: SRR976159[0m
[95mDownloading SRR976159 ...[0m
[94m$ prefetch -O /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR976159 SRR976159[0m
[92mTime taken:0:00:01.433999[0m
Downloaded file: /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR976159/SRR976159.sra 381.9 MB 
[95mCreating SRA: SRR978411[0m
[95mDownloading SRR978411 ...[0m
[94m$ prefetch -O /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR978411 SRR978411[0m
[92mTime taken:0:00:00.494305[0m
Downloaded file: /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR978411/SRR978411.sra 347.2 MB 
[95mCreating SRA: SRR971778[0m
[95mDownloading SRR971778 ...[0m
[94m$ prefetch -O /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR971778 SRR971778[0m
[92mTime taken:0:00:00.579811[0m
Downloaded file: /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR971778/SRR971778.sra 434.4 MB 
Following runs downloaded:
SRR976159
SRR978411
SRR971778


## Saving current session
A reason why I have first downloaded the SRA files is that **in a typical HPC setting, one might have access to special data-transfer nodes**. These nodes could be used for downloading data efficiently but does not allow expensive computations. On the other hand data could also be downloaded from compute nodes **but you will burn most of your computing time/allocations for only downloading the data**. Thus it might be a good idea to download data separately and then start the processing.

We can save the objects created with pyrpipe and restore our session later on a compute node.

In [3]:
# save current session
from pyrpipe import pyrpipe_utils
pyrpipe_utils.savePyrpipeWorkspace(filename="mySession",outDir=workingDir)

Session saved.


## Processing sra files
 
 
 After downloading has finished, we can start processing the data.


## Convert sra to fastq file


In [8]:
for ob in sraObjects:
    ob.runFasterQDump(deleteSRA=True,**{"-e":"8","-f":"","-t":workingDir}) #use 8 threads

print("Fastq dump finished for:")
for ob in sraObjects:
    if ob.fastqFilesExistsLocally():
        print(ob.srrAccession)

Executing:fasterq-dump -e 8 -f -t /home/usingh/work/urmi/hoap/test/athalData/sraData -O /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR976159 -o SRR976159.fastq /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR976159/SRR976159.sra
spots read      : 4,728,806

reads read      : 9,457,612

reads written   : 9,457,612

Deleting file: /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR976159/SRR976159.sra 
rm /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR976159/SRR976159.sra
Executing:fasterq-dump -e 8 -f -t /home/usingh/work/urmi/hoap/test/athalData/sraData -O /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR978411 -o SRR978411.fastq /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR978411/SRR978411.sra
spots read      : 4,200,625

reads read      : 8,401,250

reads written   : 8,401,250

Deleting file: /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR978411/SRR978411.sra 
rm /home/usingh/work/urmi/hoap/test/athalData/sraData/SRR978411/SRR978411.sra
Ex