Including locusTags alongwith gene names #34

microsud · 2019-05-04T13:42:28Z

Hi,
In the DIAMOND_analysis_counter.py the gene products are extracted and a *function.tsv file is outputted. However, due to the large inconsistencies in naming, sometimes the gene names are truncated or missed as well as all hypothetical proteins clubbed as one. Is it possible that an option for getting counts for each locus tag can be introduced? This will also likely give an idea of which locus tag and co-localized genes are actively used for a given genome and also make downstream linking of outputs to custom databases using locus tags more flexible.
The output tsv for instance can be formatted to give the following fields:

|-----------------------------------------------------------------------|
| RelativeAbundance | RawCount | LocusTag | GeneName/Product            | 
|-----------------------------------------------------------------------|
| 42.1377616129     | 2877037  | XX_0201  | Dehydrogenase               |
|-----------------------------------------------------------------------|

The locus tag, for instance, can help in pathway enrichment analysis by linking to KEGG orthologs.

Best wishes,
Sudarshan
Disclaimer: Not a bioinformatician and pardon me if this is a trivial request.

The text was updated successfully, but these errors were encountered:

transcript added the enhancement label May 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Including locusTags alongwith gene names #34

Including locusTags alongwith gene names #34

microsud commented May 4, 2019

Including locusTags alongwith gene names #34

Including locusTags alongwith gene names #34

Comments

microsud commented May 4, 2019