Skip to content

Conversation

@pared
Copy link
Contributor

@pared pared commented Aug 14, 2020

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™
Fixes #4131

Copy link
Contributor

@pmrowla pmrowla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe cache this (per-run) so that we remember what hosts reject HEAD but accept GET? It would keep us from making a lot of unnecessary failed requests when we have to download a large dataset .

@pared
Copy link
Contributor Author

pared commented Aug 18, 2020

@pmrowla, I am not sure we want to do that. From our point of view, URL host might be completely different than the one actually forbidding the HEAD request.
Example from the original issue:
dvc get-url https://github.com/explosion/spacy-models/releases/download/ro_core_news_sm-2.3.1/ro_core_news_sm-2.3.1.tar.gz
From our point of view, it will be github.com that forbids HEAD. Inspecting the response shows that it is actually
github-production-release-asset-2e65be.s3.amazonaws.com that forbids that. And its fine, we can inspect the response and cache the github-production-release-asset-2e65be.s3.amazonaws.com as the reason, but the problem is that next time we call even the same link, we still need to make HEAD request (because github.com is not cached) just to know that it actually redirects to github-production-release-asset-2e65be.s3.amazonaws.com and that we should use GET instead of HEAD. We end up still doing 2 requests.

@efiop efiop merged commit ef944e6 into treeverse:master Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issues with dvc get and dvc get-url after 1.0

4 participants