Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with the same proteins with slightly different names from the RefSeq_bac database? #52

Open
memoll opened this issue Jul 13, 2020 · 2 comments

Comments

@memoll
Copy link

memoll commented Jul 13, 2020

Hi Sam,

In my statistical analysis, some of the functions from the RefSeq_bac database are being categorized as different proteins only because of a small difference in their names like a dash (e.g. "(3R)-hydroxymyristoyl ACP dehydratase" "(3R)-hydroxymyristoyl-ACP dehydratase"), a comma, or lower/uppercase letters (e.g. "(2fe-2S)-binding domain-containing protein" and "(2Fe-2S)-binding domain-containing protein").
Also, some others are partial or complete sequences of the same protein (e.g. "(2Fe-2S) ferredoxin" and "(2Fe-2S) ferredoxin, partial").

I wanted to know if you correct those names in the database or after annotation-aggregation. And if yes, would you please guide me on how to do it?

-Mona

@transcript
Copy link
Owner

Hi Mona,

Currently, I don't have a correction for this. I've considered it, but there's only a limited amount that I can do to counteract the variety of naming conventions used in different RefSeq entries. I'm hesitant to force uppercase or lowercase, as this may obscure some names.

If you can provide me as many examples as possible, I could probably develop a script that would run on the results of a search to "sanitize" them (with a warning that there may be some loss of information as it attempts to correct them), but I don't see a way to correct this in the RefSeq database itself.

If you have suggestions, please let me know.

@memoll
Copy link
Author

memoll commented Jul 16, 2020

Hi Sam,

First of all, I'd like to thank you for your quick responses.

I am currently looking into other databases to see if I can get better annotation results of soil organisms and functions. But I'll try to look into my previous annotations and find more examples ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants