Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate GTDB Taxonomy v214 into curatedMetagenomicData #301

Open
camilagazolla opened this issue Dec 7, 2023 · 4 comments
Open

Incorporate GTDB Taxonomy v214 into curatedMetagenomicData #301

camilagazolla opened this issue Dec 7, 2023 · 4 comments
Assignees

Comments

@camilagazolla
Copy link

Dear curatedMetagenomicData Maintainers,

Considering that GTDB taxonomy is a cornerstone for microbial genomics research, I was wondering if there is a possibility to also provide GTDB taxonomy labels, particularly the latest version (v214), within the data from the curatedMetagenomicData package.

If the incorporation is currently out of scope, could you advise on the best approach for users to translate the current MetaPhlAn3 labels to the GTDB labels?

Thank you for considering this feature request.

@lwaldron
Copy link
Member

lwaldron commented Dec 8, 2023

I'm looking for guidance on the Biobakery forum: https://forum.biobakery.org/t/converting-metaphlan3-profiles-to-gtdb/6292

@wshuai294
Copy link

wshuai294 commented Mar 6, 2024

I am also wondering if there is a way to convert the MetaPhlAn3 taxonomy to the GTDB taxonomy.

@lwaldron
Copy link
Member

lwaldron commented Mar 6, 2024

I re-posted my question on bioBakery. Pinging @seandavi to request adding gtdb profiles to the next version (apparently available as of MetaPhlan 4.0.6 - https://github.com/biobakery/MetaPhlAn/releases), pending investigation of how much computation it will add.

@lwaldron
Copy link
Member

lwaldron commented Mar 6, 2024

I just spoke with a member of the MetaPhlAn development team. The translation tool (https://github.com/biobakery/MetaPhlAn/blob/master/metaphlan/utils/sgb_to_gtdb_profile.py) isn't implemented for MetaPhlAn3, because of a complication that the mapping to GTDB is not directly n:1. For cMD4 utilizing MetaPhlAn4, the mapping will be straightforward:

  1. direct substitution when the mapping is 1:1
  2. binning if the mapping is n:1 (n>1)
  3. re-normalizing to make sample sums add to 1

Let's keep this to a wishlist item for cMD4, where it will be relatively straightforward to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants