Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could vg annotate be a bit more flexible with GFF files? #2550

Open
LilithElina opened this issue Nov 27, 2019 · 2 comments
Open

Could vg annotate be a bit more flexible with GFF files? #2550

LilithElina opened this issue Nov 27, 2019 · 2 comments

Comments

@LilithElina
Copy link

This is a suggestion, not a bug or support request.

I tried using GFF files I downloaded from a species specific database to annotate a graph, but after using vg annotate and vg augment, vg paths -L still only listed the paths of the sequences I used to construct the graph. After some meddling in the GFF file, I realised that the tags in the attributes column were mostly lowercase, while the official GFF specification starts the tags with an uppercase letter. So I changed "name" to "Name" and suddenly I get a well annotated graph.

I know it's a problem of not correctly formatted GFF files, but maybe you could consider making vg annotate a little more flexible to be able to work with formatting errors like that as well, or give some kind of feedback to let users know that there was nothing to annotate.

@jonassibbesen
Copy link
Contributor

Hi, thank you for the suggestion. The problem is that the name of the alignments are set to an empty string when it can't find the attribute tag "Name". The graph is still augmented correctly using these, but all annotations are added as a single path with no name. For some reason it seems like the empty path is missing and added to another path in the xg index. I have made a separate issue regarding this in the xg repo (vgteam/xg#29). I will add a warning to annotate for when the attribute tag "Name" can not be found.

As an alternative you can also use vg rna for the annotate + augment pipeline. This subcommand is bit more flexible when it comes to parsing gtf/gff files. You can set the attribute tag you want used as id and the feature type you want parsed. It have been written with transcript annotations in mind, but can be used for any type of annotation. Also, if you have a set of haplotypes you can project your annotation down to each of these, creating a haplotype-specific annotation set.

@LilithElina
Copy link
Author

Oh, I see, thank you for the clarification!

I didn't know about vg rna, I will give that a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants