A Nextflow pipeline to download FASTQ, SRA, and processed files from the Gene Expression Omnibus (GEO) database, a public functional genomics data repository supporting MIAME-compliant data submissions.
The pipeline was created to run on the ETH Euler cluster and it relies on the server's Lmod environment modules and genome files. Thus, the pipeline needs to be adapted before running it in a different HPC cluster.
A single or multiple GEO accession numbers separated by commas.
--geo_acc
--geo_acc 'GSE129393,GSE208727,GSE54651'
Output directory where the files will be saved.
--outdir
--outdir /cluster/work/nme/data/josousa/project
-
Option to choose the file types to download from the GEO database.
--output_type
--output_type 'FastQ data' # Default --output_type 'SRA data' --output_type 'FastQ + SRA data' --output_type 'Processed data' --output_type 'SRA metadata' --output_type 'Processed metadata'
-
Option to specify the source of data on the GEO record to retrieve processed data.
--data_source
--data_source 'samples' # Default --data_source 'series' --data_source 'both'
This option only applies for the processed data download. Specifies the source of data on the GEO record to retrieve processed data, which may be attached to the collective series entity, or to individual samples. Allowable values are: samples, series or both (all). Ignored unless 'processed' flag is set.
-
Option to add extra arguments to the package geofetch.
--geofetch_args
-
Option to add extra arguments to the package sradownloader.
--sradownloader_args
The package sradownloader
was modified to replace the FTP connection to the ENA FTP server with downloading the files using the package Axel
. This was done because our HPC server doesn't allow to establish a connection with the FTP server. If you which to use the pipeline outside of our group, you have to replace sradownloader_axel
with sradownloader
in the module sradownloader.mod.nf
.
This pipeline was adapted from the Nextflow pipelines created by the Babraham Institute Bioinformatics Group and from the nf-core pipelines. We thank all the contributors for both projects. We also thank the Nextflow community and the nf-core community for all the help and support.