-
Notifications
You must be signed in to change notification settings - Fork 1.3k
get: copy/download files tracked by Git #2837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
af2f216
4d14256
97862a3
8c3ffe8
26b6233
7e1423d
1f5da3f
74f3851
cadbf10
ed10497
9ece4ed
f829338
97793b4
9930e96
38bed22
3c2633e
20e7696
b17d1c5
9ae4ab8
7debaca
811855a
d59bceb
29b34ba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,22 +1,39 @@ | ||
| import logging | ||
| import os | ||
| import shutil | ||
|
|
||
| import shortuuid | ||
|
|
||
| from dvc.exceptions import GetDVCFileError | ||
| from dvc.exceptions import NotDvcRepoError | ||
| from dvc.exceptions import OutputNotFoundError | ||
| from dvc.exceptions import UrlNotDvcRepoError | ||
| from dvc.exceptions import PathOutsideRepoError | ||
| from dvc.external_repo import external_repo | ||
| from dvc.path_info import PathInfo | ||
| from dvc.stage import Stage | ||
| from dvc.state import StateNoop | ||
| from dvc.utils import resolve_output | ||
| from dvc.utils.fs import remove | ||
| from dvc.utils.compat import FileNotFoundError | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def _copy_git_file(repo, src, dst, repo_url): | ||
| src_full_path = os.path.join(repo.root_dir, src) | ||
| dst_full_path = os.path.abspath(dst) | ||
|
|
||
| if os.path.isdir(src_full_path): | ||
| shutil.copytree(src_full_path, dst_full_path) | ||
| return | ||
|
|
||
| try: | ||
| shutil.copy2(src_full_path, dst_full_path) | ||
| except FileNotFoundError: | ||
efiop marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are not really raΡing against anything, so how about we before "if os.path.isdir()" instead of wrapping it in try&except, to make it more linear? Looks like
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Like so? def _copy_git_file(repo, src, dst, repo_url):
src_full_path = os.path.join(repo.root_dir, src)
dst_full_path = os.path.abspath(dst)
if os.path.isdir(src_full_path):
shutil.copytree(src_full_path, dst_full_path)
return
try:
shutil.copy2(src_full_path, dst_full_path)
except FileNotFoundError:
raise PathOutsideRepoError(src, repo_url)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @danihodovic Well, that would do it for me too π Thanks! π |
||
| raise PathOutsideRepoError(src, repo_url) | ||
|
|
||
|
|
||
| @staticmethod | ||
| def get(url, path, out=None, rev=None): | ||
| out = resolve_output(path, out) | ||
|
|
@@ -49,16 +66,31 @@ def get(url, path, out=None, rev=None): | |
| # the same cache file might be used a few times in a directory. | ||
| repo.cache.local.cache_types = ["reflink", "hardlink", "copy"] | ||
|
|
||
| o = repo.find_out_by_relpath(path) | ||
| output = None | ||
| output_error = None | ||
|
|
||
| try: | ||
| output = repo.find_out_by_relpath(path) | ||
| except OutputNotFoundError as ex: | ||
efiop marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| output_error = ex | ||
|
|
||
| is_git_file = output_error and not os.path.isabs(path) | ||
| is_not_cached = output and not output.use_cache | ||
|
Comment on lines
+77
to
+78
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it is not cached then this is also a git file, so this var names are confusing. Overall this logic is more complicated than necessary. It is simply either cached or a git managed file, so: try:
if out and out.use_cache:
# do the pull and checkout
...
else:
# Non cached out outside repo, can't be handled
if os.path.abspath(src):
raise PathOutsideRepoError(...)
# it's git-handled and already checked out to tmp dir
# we should just copy, not a git specific operation
...
copy_dir_or_file(src_full, dst_full)
except FileNotFoundError:
raise FileMissingError(...)And again forgot about generic exception, basically: $ dvc get http://some.repo non-existing-path
Can't find non-existing-path in some.repo neither as output
nor as git handled file/dirMessage may be different, but the idea is that we don't know whether user tried to get an out or a git handled file.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Suor We've agreed to not use FileMissingError as its message is not applicable here. Hence
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed on the naming.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And indeed output_error is not wrapped as it is in repo.open.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking closer at Repo.open, it indeed has a more clear implementation than this, as it doesn't introduce is_git_file confusion. |
||
|
|
||
| if is_git_file or is_not_cached: | ||
| _copy_git_file(repo, path, out, url) | ||
| return | ||
|
|
||
| if output_error: | ||
| raise OutputNotFoundError(path) | ||
|
|
||
| with repo.state: | ||
| repo.cloud.pull(o.get_used_cache()) | ||
| o.path_info = PathInfo(os.path.abspath(out)) | ||
| with o.repo.state: | ||
| o.checkout() | ||
| repo.cloud.pull(output.get_used_cache()) | ||
| output.path_info = PathInfo(os.path.abspath(out)) | ||
| with output.repo.state: | ||
| output.checkout() | ||
|
|
||
| except NotDvcRepoError: | ||
| raise UrlNotDvcRepoError(url) | ||
| except OutputNotFoundError: | ||
| raise OutputNotFoundError(path) | ||
| finally: | ||
| remove(tmp_dir) | ||
Uh oh!
There was an error while loading. Please reload this page.