Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request: Submitting SQN files to genbank #89

Closed
john-nash opened this issue Mar 17, 2015 · 5 comments
Closed

request: Submitting SQN files to genbank #89

john-nash opened this issue Mar 17, 2015 · 5 comments

Comments

@john-nash
Copy link

I assembled and annotated a genome with prokka (about 2 years ago), and sent the draft to my colleagues to do their lab gene id work. We published and now I want to submit the original prokka annotation to genbank because my colleagues published the locus tags which were assigned by prokka back then. I should have talked to them and re-annotated using the --compliant switch but it did not happen because they are chemists and don't know of such arcane procedures (i.e. mea culpa).

So I submitted the two year old sqn file. Genbank apparently is not pleased with me and sent me a polite email with a list of things to fix. One of their instructions was

[3] Your .sqn file is not formatted correctly. You need to include a template file when you run tbl2asn. You can create a template here: http://www.ncbi.nlm.nih.gov/WebSub/template.cgi.

Followed by:

[8] We would recommend you use tbl2asn to create your WGS submission. Our web pages, http://www.ncbi.nlm.nih.gov/Genbank/wgs.html and http://www.ncbi.nlm.nih.gov/Genbank/tbl2asn2.html describe WGS submissions and how to use tbl2asn to create them.

To create your submissions, you will need a template file (.sbt) and the *.fsa files of sequences in fasta format. You can create a template here: http://www.ncbi.nlm.nih.gov/WebSub/template.cgi.

Note that, if you have annotation, you should use the same base name for the .fsa and the corresponding .tbl files so the annotation gets included in the resulting .sqn file.

For example:
contigs.fsa
contigs.tbl

Run the following tbl2asn command line including the -j argument to add the organism name and the strain name in your .sqn file like this:

tbl2asn -p path_to_files -t template.sbt -M n -Z discrep -j "[organism=Escherichia coli] [strain=ABCD]"

where path_to_files is the path to the directory where the .fsa and .tbl files are located.

To use the "-M n" argument you need to use tbl2asn version 19.6 or higher. You can download the latest version at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/tbl2asn/

For future submissions, would it be a bad thing to request the feature that if an appropriately named "template.sbt" is in the same subdirectory as the fasta file being annotated, that it can be fed to tbl2asn?

Thanks
John

@tseemann
Copy link
Owner

@john-nash I think I was expecting people to load the .SQN into Sequin, and then load their personal submitter info into the record via Sequin.

I am considering getting rid of the tbl2asn step, and generating the .gbk myself.

I believe you can submit .EMBL (same as .GBK essentially) files to ENA (rather than NCBI). @andrewjpage has written https://github.com/sanger-pathogens/gff3toembl to Convert a GFF3 file to a format acceptable for submission to EMBL and he uses Prokka output as the source.

Personally I use ENA now as it is more straightforward!

@aleimba
Copy link

aleimba commented Mar 18, 2015

If I may chime in here. It still think including tbl2asn in Prokka is a plus. After all many submit to NCBI and it's one more step included in the pipeline. Maybe have an option for that?

Aside from that submitting to NCBI is very cumbersome and you'll have to go through the fatal errors anyway. I've heard many good things about the ENA/EBI workflow, just never got around to try it myself.

There's also an option in Artemis to save an entry as EMBL submission format (under 'File -> Save an Entry as -> EMBL Submission Format'), which will also show errors that have to be resolved.

@john-nash
Copy link
Author

Moving forward to submitting our data to EMBL in the future is a good idea. However, I am stuck with 88 bioprojects in NCBI which my colleagues have started over the years. The colleague (since retired) who started many of these submissions is on the NCBI committee for phage taxonomy so he likes NCBI. Once I have cleared my plate, I think I may switch over to EMBL submission.

For the record, I hate Sequin. I don’t use Windows so is it just the OS X implementation, or are they all klunky?

@aleimba
Copy link

aleimba commented Mar 18, 2015

I can speak for Windows and Linux and they all are ;-). But, I could rant on about NCBI for days anyway ...

@tseemann
Copy link
Owner

I've made a new issue for a "template" option to be given to Prokka: #120

PS. Yes Sequin is clunky and horrible on all platforms but amazingly considering how old it is (ahead of its time) it still works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants