Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add http method | Fixes #2 #9

Merged
merged 6 commits into from Jun 16, 2021
Merged

Conversation

sjfranklin
Copy link

For those unable to use FTP for firewall or system admin reasons, this update should work as a rough draft to enable HTTP using Ruby's net/http library. The main section that required changes were how the "databases" function parsed the remote NCBI website; once adapted to the net/http method, dealing with the response required some additional manipulation to return the same information as the original FTP method.
Secondly, perhaps as a consequence of nodes with a high number of threads, the changes to the download function should prevent tar and gzip from failing. At least two files in at least nr and nt tarballs are identical across all 70+ files, and uncompressing the same file to the same location overwhelms tar and gzip, at least on the 96 thread system I had available to test (the smallest I have at my disposal). Therefore, I excluded these two files (one ending in .?al and the other starting taxdb*) from all but the "last" tarball, inelegantly chosen from the ".last.uniq" method (https://stackoverflow.com/questions/7749131).
I've tested it on nr, nt, and taxdump, but have not been able to try all other databases.
Including "http" in the command line when running the script uses the new http-ncbi-dbs-dgs.rake. For example:
ncbi-blast-dbs nr http
downloads nr using the new Rakefile.
As Ruby is not my strongest programming language, there's no doubt many ways of improving these changes, which I welcome!
Thanks for this awesome script, and I hope this proves useful to you and others. Downloading these databases works so much better with your script than other options!

Franklin, Samuel added 6 commits March 29, 2019 10:47
…te over my head so this is a very quick-and-dirty fix
…m, and the actual download is essentially the same. However, I noticed that each tar.gz file has some duplicate files, and on our system, tar would fail when it tried to overwrite up to 48 of these files at a time. Hence, I included a quick method to ignore those duplicate files (at least those included in nt and nr download; untested for others) unless it matches the last-added url. Thus, it should only uncompress these duplicate files once. There is certainly better ways of doing this.
@sjfranklin sjfranklin changed the title Add http method Add http method | Fixes #2 Mar 29, 2019
@sjfranklin
Copy link
Author

This should fix #2

@yeban yeban merged commit e4065f2 into yeban:master Jun 16, 2021
@yeban
Copy link
Owner

yeban commented Jun 16, 2021

Hello. Thank you very much for the pull-request. I still need to better understand the race condition you mention - and maybe apply a similar fix to the original ftp module - but seeing how your http module is completely independent (nicely done!), I don't see why I shouldn't include this module for those who might need it.

Sorry for sitting on the pr so long. Things were a bit tough.

@yeban
Copy link
Owner

yeban commented Jun 16, 2021

Also, I think the way http module is invoked is pretty neat - at least from a user's stand point.

If you are around, would you mind adding an example of how to download via http to README?

@sjfranklin
Copy link
Author

sjfranklin commented Jun 22, 2021

Hello! I've very glad it could be useful, and no worries about the timeframe. I hope you're doing ok.

I have not used Ruby since this edit, and my personal notes on how I used it are are perhaps lacking in comprehensiveness. My intent was for it to be used as such:

ncbi-blast-dbs http nr nt

Since my Ruby skills are now lacking, does it look like the code does that? And defaults to the original FTP method otherwise?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants