Skip to content

Conversation

@skshetry
Copy link
Collaborator

@skshetry skshetry commented Aug 25, 2020

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

This adds support for get/list/api/import/ls on a subrepo.

Also fixes #3180, by adding a granular url support for dvc.api.get_url
And, of course it fixes #3369.

@skshetry skshetry requested review from efiop, pared and pmrowla August 25, 2020 12:34
@skshetry skshetry self-assigned this Aug 25, 2020
Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good change, "configuring" external repo from the start feels much better than using nested with statements which change existing instance.

@skshetry skshetry added enhancement Enhances DVC feature is a feature labels Aug 26, 2020
@efiop
Copy link
Contributor

efiop commented Aug 27, 2020

Looks good! But I see that a lot of the changes are not actually about subrepos and could be introduced without mentioning subrepos at all (e.g. that get_hash() piece).

@skshetry
Copy link
Collaborator Author

@efiop, subrepo support is already there. This PR is more about enabling it, which is not possible without changing APIs of External(Git)Repo, which in turn affects dependency/repo.py and friends.

@skshetry skshetry marked this pull request as ready for review August 27, 2020 16:41
@skshetry skshetry changed the title [WIP] Refactor external repo, support subrepos Refactor external repo, support subrepos Aug 27, 2020
@skshetry

This comment has been minimized.

This adds support for get/list/api/import/ls on a subrepo.

Also fixes treeverse#3180, by adding a granular url support for `dvc.api.get_url`
And, of course it fixes treeverse#3369.
@skshetry skshetry changed the title Refactor external repo, support subrepos support subrepos for get/list/api/import/ls Sep 2, 2020
metadata = _repo.repo_tree.metadata(path_info)

if not metadata.is_dvc:
raise OutputNotFoundError(path, repo)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into this, we used to throw OutputNotFoundError before as well, so this does not change anything.

Regarding UrlNotDvcRepoError, it's not precise to throw now as a git-repo can have subrepo inside of it.

logger = logging.getLogger(__name__)


class IsADVCRepoError(DvcException):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thrown when the path is a root of the dvc repo during get/import.

def test_run_without_cmd(tmp_dir, dvc, kwargs):
with pytest.raises(InvalidArgumentError) as exc:
Repo().run(**kwargs)
dvc.run(**kwargs)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we used to wait for 5s for lock timeout, so this used to work. But, since we reduced the timeout, this has become flakey.

@skshetry skshetry requested a review from pared September 2, 2020 07:58
@skshetry
Copy link
Collaborator Author

skshetry commented Sep 2, 2020

@efiop @pared @pmrowla, this is ready for a review. The only issue right now is with dvc ls as it currently throws error when the path is not found in remote or if it cannot connect to a remote. This is a pre-existing issue (exaggerated by subrepo support), which I'll try fixing in a separate PR.

Also, I have separated tests and implementation commits to make it easier for review.

@skshetry skshetry added the minor label Sep 2, 2020
Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor things.

Comment on lines 76 to +80
def open(
self, path, mode="r", encoding="utf-8"
self, path, mode="r", encoding=None
): # pylint: disable=arguments-differ
assert mode in {"r", "rb"}
encoding = encoding or "utf-8"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe api.open should use utf-8 by default?

Copy link
Contributor

@efiop efiop Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared Ideally - no, it should use your default system encoding. This is how things like pathlib's Path do it. I believe, Saugat just changed this for a better practice for optional args and to match BaseTree.open (there are still some discrepancies in tree methods, that I left untouched when adding these trees).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should use default system encoding, I just tried to make api.open() work without changing any behavior here.
Something to fix it later.

* use `spy` instead of `wraps`
* remove dvc fixture in a test
* set `cache_types` on Git repo
@efiop efiop merged commit 4564e8b into treeverse:master Sep 3, 2020
@skshetry skshetry deleted the external_repo branch September 3, 2020 10:47
skshetry added a commit to treeverse/dvc.org that referenced this pull request Sep 8, 2020
@skshetry skshetry changed the title support subrepos for get/list/api/import/ls support subrepos for get/list/api/import Sep 8, 2020
shcheklein pushed a commit to treeverse/dvc.org that referenced this pull request Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhances DVC feature is a feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dvc get/import support for subrepos api: get_url does not support links for granular file

4 participants