Skip to content

Workflows

DrYak edited this page Apr 9, 2020 · 45 revisions

Welcome!

Welcome to the Workflows group! We are seeking folks with experience in Nextflow, CWL, Galaxy, Bioconda, workflows in general, and viral genome analysis; but anyone with interest at any experience level is also welcome!

Please first edit this Wiki page and add your contact information below.

Note: in addition to the Code of Conduct (CoC) which applies to the Virtual BioHackathon, the nf-core community also uses a separate Contributor Covenant Code of Conduct.

Communication:

Project communication for nf-core-based workflows is currently focused on Slack (you can join with this invite).

Other Workflow group communication will happen in the #workflows channel of the Virtual BioHackathon Slack.

Projects:

nf-core/viralrecon

https://github.com/nf-core/viralrecon

A workflow for analyzing Illumina sequencing data derived from amplicon and metagenomics approaches. Primary functionality involves viral genome reconstruction and low frequency variant calling and annotation of both SNPs and INDELs.

The following pipelines from BU-ISCIII that perform de novo assembly and mapping will be implemented in the same workflow, and will be ported to nf-core over the coming days:
https://github.com/BU-ISCIII/SARS_Cov2_consensus-nf
https://github.com/BU-ISCIII/SARS_Cov2_assembly-nf

nf-core is community effort to collect a curated set of analysis pipelines built using Nextflow.

We are also hoping to bridge these workflows into graph assembly/pangenome workflows, and thus are seeking integration points with Pangenome and Pangenome browser groups.

Workflow Hub: Registry of COVID-19 workflows (@stain)

Working with ELIXIR effort, this project proposes to set up an early pre-production instance of the EOSC-Life Workflow Hub, covid19.workflowhub.eu, to be a registry that gather the COVID-19 workflows and their metadata. Part of the tasks here is also to curate the existing workflows and help making them interoperable, reusable and reproducible.

We want to register in particular the workflows being developed elsewhere in this topic, but also ad-hoc scripts that potentially could become workflows.

For details, tasks and participants, see sub-topic Workflow Hub.

Proposal: Cloud-based bioinformatics analysis (WDL + GCP) + accelerated pangenomic workflows (@edawson)

The pangenomics channel is working on generating assembly-based pangenomes of SARSCov2 genomes. Since we already have a reference genome (including a GFF file of ORF annotations), I thought it might be useful to build analysis pipeline(s) that can operate in parallel or downstream of the assembly pangenome.

NextStrain already does things like convert the RNA/cDNA sequences to amino acids. I was thinking we could use either their tooling or our own to produce some automatically-generated reports of variable sites on the genome / proteome. We can also provide these annotations as GFA paths to incorporate into the pangenome, facilitate read alignment to ref genome / pangenome, or filter reads against viral or host references using Kraken / rkmh.

I'm most comfortable in WDL (which runs in Broad's Terra, DNANexus via dxWDL, and using Google's Pipelines API), but we could use any of the workflow languages in reality. I think this would be a good project for folks wanting to work in shell, WDl, python, docker, and certainly R as well.

Scope-wise, it's probably best to start with a single workflow that annotates variable sites, then try to build one that aligns reads and reports whether a new strain has novel variation at these (or other) sites. Filtering workflows could be a component of this workflow.

Workflows:

cbg-ethz/V-pipe

https://github.com/cbg-ethz/V-pipe

A data analysis workflow for clinical applications of next-generation sequencing (Illumina) data obtained from viral genomes. V-pipe assesses data quality, performs read alignment and infers intra-sample viral genomic diversity: it reconstructs local haplotypes and calls SNVs and INDELs.

A new adapted version of V-pipe has been released to analize high-throughput sequencing data of SARS-CoV-2: preset default configuration and references, ShoRAH 2 with improved speed, SNVs calls reported as standard VCF files.

Visualization and reporting are currently being updated and improved.

V-pipe uses the Bioconda for its components and is written using snakemake.

V-pipe is part of the SIB resources supporting SARS-CoV-2 research.

connor-lab/ncov2019-artic-nf

https://github.com/connor-lab/ncov2019-artic-nf

A Nextflow pipeline that automates the ARTIC network nCoV-2019 novel coronavirus bioinformatics protocol. Supports barcoded and non-barcoded Nanopore data. Uses Nextflow DSLv2.

galaxyproject/SARS-CoV-2

https://github.com/galaxyproject/SARS-CoV-2

Initial analysis of COVID-19 data using Galaxy, BioConda and public research infrastructure (XSEDE, de.NBI-cloud, ARDC cloud). Supports Illumina and Nanopore data.

No more business as usual: agile and effective responses to emerging pathogen threats require open data and open analytics

usegalaxy.org, usegalaxy.eu, usegalaxy.org.au, usegalaxy.be and hyphy.org development teams, Anton Nekrutenko, Sergei L Kosakovsky Pond.

bioRxiv 2020.02.21.959973; doi: 10.1101/2020.02.21.959973

INSaFLU/INSaFLU

https://github.com/INSaFLU/INSaFLU

INSaFLU (“INSide the FLU”) is an influenza-oriented bioinformatics free web-based platform for an effective and timely whole-genome-sequencing-based influenza laboratory surveillance. Author states this online platform can also run for COVID-19.

INSaFLU: an automated open web-based bioinformatics suite “from-reads” for influenza whole-genome-sequencing-based surveillance

Borges V, Pinheiro M et al.

Genome Medicine (2018) 10:46s; doi: 10.1186/s13073-018-0555-0

nf-core/viralrecon

https://github.com/nf-core/viralrecon

A workflow for analyzing Illumina sequencing data derived from amplicon and metagenomics approaches. Primary functionality involves viral genome reconstruction and low frequency variant calling and annotation of both SNPs and INDELs.

The following pipelines from BU-ISCIII that perform de novo assembly and mapping will be implemented in the same workflow, and will be ported to nf-core over the coming days:
https://github.com/BU-ISCIII/SARS_Cov2_consensus-nf
https://github.com/BU-ISCIII/SARS_Cov2_assembly-nf

nf-core is community effort to collect a curated set of analysis pipelines built using Nextflow.

We are also hoping to bridge these workflows into graph assembly/pangenome workflows, to support the work of other biohackathon working groups.

Project communication for nf-core-based workflows is currently focused on Slack (you can join with this invite).

fjrmoreews/cwl-workflow-SARS-CoV-2

https://github.com/fjrmoreews/cwl-workflow-SARS-CoV-2

CWL workflows related to virus genomics with focus on SARS-CoV-2.

pitagora-network/COVID-19-CWL

https://github.com/pitagora-network/COVID-19-CWL

Fork from https://github.com/galaxyproject/SARS-CoV-2.

common-workflow-lab/2020-covid-19-bh

Contains Standard Operating Protocols for genomic variants discovery using GATK4, VARSCAN and SAMTools. All these protocols have to be converted into CWL and WDL scripts. WDL script for GATK4 is in ready state and can be implemented over SARS-CoV-2 virus RNASEQ reads. Conversion of rest other SOPs to CWL and WDL is on progress. Future work remains as workflow deployment over cloud infrastructure.

https://github.com/common-workflow-lab/2020-covid-19-bh

Resources:

Participants: