Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including locusTags alongwith gene names #34

Open
microsud opened this issue May 4, 2019 · 0 comments
Open

Including locusTags alongwith gene names #34

microsud opened this issue May 4, 2019 · 0 comments

Comments

@microsud
Copy link

microsud commented May 4, 2019

Hi,
In the DIAMOND_analysis_counter.py the gene products are extracted and a *function.tsv file is outputted. However, due to the large inconsistencies in naming, sometimes the gene names are truncated or missed as well as all hypothetical proteins clubbed as one. Is it possible that an option for getting counts for each locus tag can be introduced? This will also likely give an idea of which locus tag and co-localized genes are actively used for a given genome and also make downstream linking of outputs to custom databases using locus tags more flexible.
The output tsv for instance can be formatted to give the following fields:

|-----------------------------------------------------------------------|
| RelativeAbundance | RawCount | LocusTag | GeneName/Product            | 
|-----------------------------------------------------------------------|
| 42.1377616129     | 2877037  | XX_0201  | Dehydrogenase               |
|-----------------------------------------------------------------------|

The locus tag, for instance, can help in pathway enrichment analysis by linking to KEGG orthologs.

Best wishes,
Sudarshan
Disclaimer: Not a bioinformatician and pardon me if this is a trivial request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants