New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in the gff output file when using --add_genes --add_mrna (prokka version 1.12) #338
Comments
Have you considered using this other tool? It was built to convert GFF files from Prokka into EMBL format and was used to submit over 20,000 assemblies to the EMBL: http://joss.theoj.org/papers/10.21105/joss.00080 It's unclear what you mean by 'the format is invalid'? Are you saying that your software cannot parse the file because it has hard coded a eukaryotic gene model? |
By invalid I mean the gff3 produced by prokka 1.12 doesn't follow the [format specification].(https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md). I was just saying that it could be problematic for other tools to use this gff. And it would be important for this tool so widely used to create a proper gff output. That being said, I did some tests this morning: |
@Juke34 Prokka uses the GFF format that NCBI uses for bacterial genomes. This is different to the "standard" GFF gene model of gene->mRNA->exons->CDS because we don't have exons/introns. It's designed to be compatible with the standard bacterial Genbank format (.gbk.gbff) too. I do understand how the true GFF model works in eukaryotic datasets (although its usually GTF/GFF2.5 not 3.0). Because Prokka doesn't have mRNA features those bugs must be coming from elsewhere. I only produce CDS by default, but gene can be generated by I could write a post-processing step to make a fully Ensembl compatible GFF3 file. I alreadt have a prototype as |
Thank you for your input @tseemann. To summarize the gff3 file produced by default is correct but becomes slightly wrong when using the --add_genes and/or --add_mrna options (only the format is touched because the structural and functional annotation itself is not affected). P.S: Our gff3 parser deals with refseq format that doesn't contain any mRNA feature. So using the --addgenes option would haven't raised any problem. So I conclude it is the option --addmrna that have been introduced in the version 1.12 that creates unexpected gff-like file when activated. |
We figured out many problems in the gff file output from the Prokka version 1.12. Consequently the format is invalid and cannot be really used in other tools.
Related to those problems I'm not sure the convertion to EMBL with gff3toembl script still does the job. I guess it's the same with the EMBLmyGFF3.
The text was updated successfully, but these errors were encountered: