Skip to content

A Nextflow pipeline to download FASTQ, SRA, and processed files from the Gene Expression Omnibus (GEO) database, a public functional genomics data repository supporting MIAME-compliant data submissions.

Notifications You must be signed in to change notification settings

vonMeyennLab/nf_fetchgeo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gene Expression Omnibus (GEO) database download Pipeline

A Nextflow pipeline to download FASTQ, SRA, and processed files from the Gene Expression Omnibus (GEO) database, a public functional genomics data repository supporting MIAME-compliant data submissions.

Pipeline steps

  1. geofetch
  2. sradownloader

Required parameters

A single or multiple GEO accession numbers separated by commas.

--geo_acc 'GSE129393,GSE208727,GSE54651'

Output directory where the files will be saved.

--outdir /cluster/work/nme/data/josousa/project

Optional parameters

  • Option to choose the file types to download from the GEO database.

    --output_type 'FastQ data' # Default
    --output_type 'SRA data'
    --output_type 'FastQ + SRA data'
    --output_type 'Processed data'
    --output_type 'SRA metadata'
    --output_type 'Processed metadata'
  • Option to specify the source of data on the GEO record to retrieve processed data.

    --data_source 'samples' # Default
    --data_source 'series'
    --data_source 'both'

    This option only applies for the processed data download. Specifies the source of data on the GEO record to retrieve processed data, which may be attached to the collective series entity, or to individual samples. Allowable values are: samples, series or both (all). Ignored unless 'processed' flag is set.

Extra arguments

  • Option to add extra arguments to the package geofetch.

    --geofetch_args

  • Option to add extra arguments to the package sradownloader.

    --sradownloader_args

Downloading Options

The sradownloader package has been enhanced to support downloading files using the Axel download accelerator, wget, or FTP. This modification allows for faster and more flexible file downloads from the SRA database.

For detailed information and updates, visit the project's GitHub page: sradownloader-axel.

To specify the download method, add one of the following arguments to --sradownloader_args:

  • --axel for downloading with Axel.
  • --wget for downloading with wget.
  • --ftp for downloading via FTP.

Acknowledgements

This pipeline was adapted from the Nextflow pipelines created by the Babraham Institute Bioinformatics Group and from the nf-core pipelines. We thank all the contributors for both projects. We also thank the Nextflow community and the nf-core community for all the help and support.

About

A Nextflow pipeline to download FASTQ, SRA, and processed files from the Gene Expression Omnibus (GEO) database, a public functional genomics data repository supporting MIAME-compliant data submissions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published