Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For those unable to use FTP for firewall or system admin reasons, this update should work as a rough draft to enable HTTP using Ruby's net/http library. The main section that required changes were how the "databases" function parsed the remote NCBI website; once adapted to the net/http method, dealing with the response required some additional manipulation to return the same information as the original FTP method.
Secondly, perhaps as a consequence of nodes with a high number of threads, the changes to the download function should prevent tar and gzip from failing. At least two files in at least nr and nt tarballs are identical across all 70+ files, and uncompressing the same file to the same location overwhelms tar and gzip, at least on the 96 thread system I had available to test (the smallest I have at my disposal). Therefore, I excluded these two files (one ending in .?al and the other starting taxdb*) from all but the "last" tarball, inelegantly chosen from the ".last.uniq" method (https://stackoverflow.com/questions/7749131).
I've tested it on nr, nt, and taxdump, but have not been able to try all other databases.
Including "http" in the command line when running the script uses the new http-ncbi-dbs-dgs.rake. For example:
ncbi-blast-dbs nr http
downloads nr using the new Rakefile.
As Ruby is not my strongest programming language, there's no doubt many ways of improving these changes, which I welcome!
Thanks for this awesome script, and I hope this proves useful to you and others. Downloading these databases works so much better with your script than other options!