Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation of a workflow to build genomes from metagenomic data #33

Open
bebatut opened this issue Aug 10, 2022 · 0 comments
Open

Evaluation of a workflow to build genomes from metagenomic data #33

bebatut opened this issue Aug 10, 2022 · 0 comments

Comments

@bebatut
Copy link
Member

bebatut commented Aug 10, 2022

Supervisor: Bérénice Batut
For degree: Master
Status: Open
Keywords: Microbiome, Metagenomics, Galaxy, Assembly, Workflow, Benchmarking

Global Biological/Research context

Microbiome is the collection of all microbes, such as bacteria, fungi, viruses, along with their genes, which live inside and outside our bodies in all environments surrounding us [1]. To investigate microbiomes, researchers use sequencing data and microbiome analyses [2] . These analyses rely uses sequencing data to investigate microbiomes. Such analysis relies on sophisticated computational approaches: assembly, binning, taxonomic classification, functional profiling etc. Analysing microbiome data makes it possible to answer two main questions for most microbiome analysis

  • who (microorganisms) are there: by extracting the community from the microbiome reads
  • what are they doing (and how): by extracting the gene/pathway abundance profile from the metagenomics reads and transcript abundance profiles from the metatranscriptomics reads and combining them

Microbiome sequencing data gives also the possibility to assembly genomes of organisms that can not be cultivated invidually (e.g. [3,4]). However, building genomes out of metagenomics data (called Metagenome Assembled Genomes or MAGs) is complex given the mix of sequences from many organisms, requires many steps [5,6] and high computational resources.

Few workflows to build MAGs this data are available (e.g. [7,8]) and most are not openly available, not transparent or not easy to use by researchers.

Project context

Ihe Freiburg Galaxy team together with the microGalaxy community use Galaxy [9] to build a MAGs building workflow, that will be open, transparent, reusable, accessible.

This workflow has been developed with data from the cloud environment. Now we would like to adapt this workflows on data from other microbiome environments, evaluate it using benchmarking data, compare it against other workflows, document and share the workflow.

Objectives of the project

  • Evaluate the results of the workflow on the cloud data
  • Benchmark the workflow on the CAMI challenge benchmarking data [10]
  • Document, and share the workflow
    • Annotate the workflow
    • Create the skeleton for a tutorial
    • Submit the workflow to IWC

Proposed agenda for the project

  1. Bibliography of metagenomic assembly, MAGs building, existing worklows
  2. Get familiar with the implemented MAGs building workflow
    1. Create the skeleton of a tutorial explaining each step and selected parameters
  3. Evaluate the results of the workflow on the cloud data
    1. Aggregate and analyze the different generated quality metrics into a Jupyter notebook
    2. Run extra steps to evaluate the quality of created MAGs
  4. Benchmark the workflow on the CAMI challenge benchmarking data
    1. Run the workflow on the different datasets from the CAMI challenge
    2. Evaluate the results
  5. Share the workflow
    1. Annotate the workflow
    2. Update the dedicated tutorial
    3. Submit the workflow to IWC

Prerequisites

  • [Required] Concepts in bioinformatics
  • [Required] Comfortable with Python
  • [Preferred] Experience of using Galaxy
  • [Preferred] Concepts of microbiome data analysis
  • [Preferred] Experience with version control and GitHub workflow

Further reading

Galaxy

References

[1] Martin J. Blaser. “The microbiome revolution” The Journal of Clinical Investigation (2014): 124.
[2] Sharpton, Thomas J. "An introduction to the analysis of shotgun metagenomic data." Fontiers in plant science 5 (2014): 209.
[3] Xie, Fei, et al. "An integrated gene catalog and over 10,000 metagenome-assembled genomes from the gastrointestinal microbiome of ruminants." Microbiome 9.1 (2021): 1-20
[4] Nishimura, Yosuke, and Susumu Yoshizawa. "The OceanDNA MAG catalog contains over 50,000 prokaryotic genomes originated from various marine environments." Scientific Data 9.1 (2022): 1-11.
[5] Chen LX, Anantharaman K, Shaiber A, Eren AM, Banfield JF (2020) Accurate and complete genomes from metagenomes. Genome Res 30(3):315–333
[6] Sieber CMK, Probst AJ, Sharrar A, Thomas BC, Hess M, Tringe SG, et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol. 2018;3(7):836–43
[7] Kieser, Silas, et al. "ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data." BMC bioinformatics 21.1 (2020): 1-8.
[8] Raguideau, Sebastien, et al. "Novel microbial syntrophies identified by longitudinal metagenomics." bioRxiv (2021).
[9] Enis Afgan, et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W537–W544, doi:10.1093/nar/gky379
[10] Meyer, Fernando, et al. "Critical Assessment of Metagenome Interpretation: the second round of challenges." Nature methods 19.4 (2022): 429-440.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants