Skip to content

Conversation

@karajan1001
Copy link
Contributor

@karajan1001 karajan1001 commented Dec 28, 2021

fix: #7152

  1. Add --rev flag to dvc exp show.
  2. Add -1 support for --num flag`.
  3. Extract revision solving to utils.
  4. Update command unit test
  5. Add a util unit test.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

@karajan1001 karajan1001 requested a review from a team as a code owner December 28, 2021 07:11
@karajan1001 karajan1001 requested a review from pared December 28, 2021 07:11
@karajan1001 karajan1001 requested review from pmrowla and removed request for pared December 28, 2021 07:35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using brancher here is a pretty heavy dependency since it creates the git-fs and changes the current repo instance for each rev. It's also bit redundant since we also use brancher on single revisions in _collect_experiment_commit (where we actually do want the git-fs and modified repo instance).

Since at this point in the call we only want the list of revisions that the combination of all_branches/all_tags/all_commits would give, what we probably want is a to split the current brancher functionality for just getting the revisions into a helper method https://github.com/iterative/dvc/blob/ceeda6d3072fb7d862518a60e9e2f6a0ae2cf3d2/dvc/repo/brancher.py#L30-L59

(so that we can get the revs but skip the git-fs/repo modification when we don't need it)

Copy link
Collaborator

@skshetry skshetry Jan 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like iter_revs perhaps?

for rev in iter_revs(revs, all_branches=all_branches, all_tags=all_tags, all_experiments=all_experiments, all_commits=all_commits):
    pass

Note that workspace probably is not a part of iter_revs, but resolve_rev should be a part of it.

Copy link
Contributor Author

@karajan1001 karajan1001 Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is OK to rename get_revs_from_flags as iter_revs for me, but the problem preventing me merge the logic between get_revs_from_flags and brancher here is that there are some problems with the output format of iter_revs.
The output format of brancher without a sha_only flag can be shas, branch names, tag names or even HEAD or HEAD~~~. Besides this when one commit have multiple names among this, we need to the one that users provide to make them more understandable to the users.
While on the get_revs_from_flags side the output is quite simple, we just crunch all different inputs into shas.
I think it is better to follow the output rule in brancher here for both of the two functions.

Copy link
Contributor

@pmrowla pmrowla Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new iterator can return the full (sha, names) pairs. It's really a brancher() design flaw that it's only designed to return either SHAs or names, but not both. Determining whether we want to display SHA's or human-readable names should be happening at the command/UI level and not at the internal repo/brancher API level.

see updated comment: #7204 (comment)

@karajan1001 karajan1001 marked this pull request as draft January 4, 2022 11:52
@karajan1001 karajan1001 changed the title exp show: add --rev flag (#7152) [WIP] exp show: add --rev flag (#7152) Jan 4, 2022
dvc/scm.py Outdated
Comment on lines 170 to 186
Copy link
Contributor

@pmrowla pmrowla Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be iter_revs as described by @skshetry (or maybe something like iter_rev_names), and it can yield the same sha, names pairs that brancher currently uses. We can also just add the num= support here as well, and should also support all_experiments the same way that brancher does.

So basically this iterator should do everything that brancher does right now except for modifying repo and generating a git-fs.

So this method would look something like

def iter_rev_names(
    scm: "Git",
    revs: List[str],
    all_branches: bool = False,
    ...
) Generator[...]:
    ... # get revs list the same way as brancher()
    if revs:
            rev_resolver = partial(resolve_rev, scm)
            # yield (rev, names) pairs
            yield from group_by(rev_resolver, revs).items()

This way code outside of brancher can use the rev, names items as needed without repo modification and git-fs overhead.

And now brancher() itself could look something like

def brancher(
    self,
    *args,
    sha_only=False,
    **kwargs,
):
    saved_fs = self.fs
    try:
        for sha, names in iter_rev_names(self.scm, *args, **kwargs):
                self.fs = GitFileSystem(scm=scm, rev=sha)
                if self.fs.exists(self.root_dir):
                    if sha_only:
                        yield sha
                    else:
                        yield ", ".join(names)

dvc/scm.py Outdated
Comment on lines 149 to 163
Copy link
Contributor

@pmrowla pmrowla Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix_exp_head logic and num logic could all be moved into the generalized iter_rev_names method (and brancher() by extension).

Everywhere we use brancher in DVC we also need the exp HEAD fixup. so we can do the fix_exp_head inside the refactored iterator and then remove some of the existing fix_exp_head calls in DVC (exp show + params/metrics/plots diff) so that it's always unified in one place

dvc/scm.py Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use OrderedDict as dict is guaranteed to be ordered based on insertions from 3.7+ onwards.

@karajan1001 karajan1001 changed the title [WIP] exp show: add --rev flag (#7152) exp show: add --rev flag (#7152) Jan 6, 2022
@karajan1001 karajan1001 marked this pull request as ready for review January 6, 2022 14:14
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry related to #7005, we don't have scm.default_branch yet?

Copy link
Collaborator

@skshetry skshetry Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git does not have a concept of default branch locally. init uses init.defaultBranch/--initial-branch and defaults to master. clone uses currently active branch as initial branch by default.

At least in scmrepo, it is not possible to know what branch is/was initial. So this feels like a tests-only requirement. Due to these issues, I have moved away from default_branch suggestion. I think the best way would be to keep a reference to scm.active_branch(), and use that instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you may be able to avoid using hardcoded master by using tmp_dir.branch context manager.

https://github.com/iterative/dvc/blob/92f714e5ca7f4a9e7701a06f7d21b478049f4af7/dvc/testing/tmp_dir.py#L222

Copy link
Contributor

@pmrowla pmrowla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need a docs command reference update

fix: treeverse#7152
1. Add `--rev` flag to `dvc exp show`.
2. Add -1 support for `--num` flag`.
3. Extract revision found logic to `utils`.
4. Brancher use this revision found logic.
5. Update command unit test
6. Add a util unit test.

Co-authored-by: Peter Rowlands (변기호) <peter@pmrowla.com>
@karajan1001 karajan1001 enabled auto-merge (rebase) January 11, 2022 10:35
@karajan1001 karajan1001 merged commit b673ac2 into treeverse:main Jan 11, 2022

head_revs = head_revs or []
revs = []
for rev in head_revs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this mutually exclusive with num/n before?

Copy link
Collaborator

@skshetry skshetry Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be done separately with n_commits(num) if it's exclusive, outside of iter_revs, considering it is not used in brancher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous version rev+num is not mutually exclusive with all_tags/all_branches. And the result will be the sum of all revs. Maybe it is better to make them exclusive? @dberenbaum
This function will be used in exp push/pull/ls/show/remove If I separate n_commits(num) here then I need to add a new function and call them five times there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the result will be the sum of all revs.

How is this supposed to work? If I do dvc exp show -a -n 10, I get:

 ───────────────────────────────────────────────────────────────────────────────────────
  Experiment                     Created        avg_prec   roc_auc   prepare.split   pre
 ───────────────────────────────────────────────────────────────────────────────────────
  workspace                      -               0.60405    0.9608   0.2             201
  11-random-forest-experiments   May 29, 2021    0.60405    0.9608   0.2             201
 ───────────────────────────────────────────────────────────────────────────────────────

So it seems like -n is being ignored?

I don't feel strongly about whether we allow these to be combined. It seems strange to combine them, but if we can explain the behavior clearly, it might be fine to allow users to combine them if they want.


def iter_revs(
scm: "Git",
head_revs: Optional[List[str]] = None,
Copy link
Collaborator

@skshetry skshetry Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not name it revs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call it revs + results compare to current head_revs + revs

raise RevError(str(exc))


def iter_revs(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iter_revs is no longer an iterator now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any name to suggest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can modify these in #7245

@karajan1001 karajan1001 deleted the fix7152 branch January 11, 2022 11:30
@karajan1001 karajan1001 added A: experiments Related to dvc exp refactoring Factoring and re-factoring ui user interface / interaction and removed ui user interface / interaction labels Jan 11, 2022
@karajan1001
Copy link
Contributor Author

Add a new demo for this PR.

asciicast

@dberenbaum
Copy link
Contributor

dvc exp show -v -n -1 throws an error for me:

2022-01-25 12:25:40,695 DEBUG: Adding '/private/tmp/example-get-started/.dvc/config.local' to gitignore file.
2022-01-25 12:25:40,697 DEBUG: Adding '/private/tmp/example-get-started/.dvc/tmp' to gitignore file.
2022-01-25 12:25:40,697 DEBUG: Adding '/private/tmp/example-get-started/.dvc/cache' to gitignore file.
2022-01-25 12:25:40,711 ERROR: unexpected error - refs/remotes/origin/HEAD~12: the given reference name 'refs/remotes/origin/HEAD~12' is not valid
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dave/Code/dvc/dvc/cli/__init__.py", line 78, in main
    ret = cmd.do_run()
  File "/Users/dave/Code/dvc/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/Users/dave/Code/dvc/dvc/commands/experiments/show.py", line 445, in run
    all_experiments = self.repo.experiments.show(
  File "/Users/dave/Code/dvc/dvc/repo/experiments/__init__.py", line 820, in show
    return show(self.repo, *args, **kwargs)
  File "/Users/dave/Code/dvc/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/Users/dave/Code/dvc/dvc/repo/experiments/show.py", line 126, in show
    iter_revs(repo.scm, revs, num, all_branches, all_tags, all_commits)
  File "/Users/dave/Code/dvc/dvc/scm.py", line 165, in iter_revs
    revs.append(resolve_rev(scm, head))
  File "/Users/dave/Code/dvc/dvc/scm.py", line 115, in resolve_rev
    return scm.resolve_rev(rev)
  File "/Users/dave/miniforge3/envs/example-get-started/lib/python3.8/site-packages/scmrepo/git/__init__.py", line 253, in _backend_func
    return func(*args, **kwargs)
  File "/Users/dave/miniforge3/envs/example-get-started/lib/python3.8/site-packages/scmrepo/git/backend/pygit2.py", line 263, in resolve_rev
    shas = {
  File "/Users/dave/miniforge3/envs/example-get-started/lib/python3.8/site-packages/scmrepo/git/backend/pygit2.py", line 264, in <setcomp>
    self.get_ref(f"refs/remotes/{remote.name}/{rev}")
  File "/Users/dave/miniforge3/envs/example-get-started/lib/python3.8/site-packages/scmrepo/git/backend/pygit2.py", line 322, in get_ref
    ref = self.repo.references.get(name)
  File "/Users/dave/miniforge3/envs/example-get-started/lib/python3.8/site-packages/pygit2/repository.py", line 1440, in get
    return self[key]
  File "/Users/dave/miniforge3/envs/example-get-started/lib/python3.8/site-packages/pygit2/repository.py", line 1436, in __getitem__
    return self._repository.lookup_reference(name)
_pygit2.InvalidSpecError: refs/remotes/origin/HEAD~12: the given reference name 'refs/remotes/origin/HEAD~12' is not valid
------------------------------------------------------------
2022-01-25 12:25:41,566 DEBUG: Adding '/private/tmp/example-get-started/.dvc/config.local' to gitignore file.
2022-01-25 12:25:41,567 DEBUG: Adding '/private/tmp/example-get-started/.dvc/tmp' to gitignore file.
2022-01-25 12:25:41,567 DEBUG: Adding '/private/tmp/example-get-started/.dvc/cache' to gitignore file.
2022-01-25 12:25:41,568 DEBUG: Version info for developers:
DVC version: 2.9.4.dev58+g47481219
---------------------------------
Platform: Python 3.8.5 on macOS-10.16-x86_64-i386-64bit
Supports:
        gdrive (pydrive2 = 1.7.3),
        hdfs (fsspec = 2022.1.0, pyarrow = 3.0.0),
        webhdfs (fsspec = 2022.1.0),
        http (aiohttp = 3.7.3, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.7.3, aiohttp-retry = 2.4.6),
        ssh (sshfs = 2021.11.2)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-01-25 12:25:41,569 DEBUG: Analytics is disabled.

@karajan1001
Copy link
Contributor Author

karajan1001 commented Jan 27, 2022

Looks like it is a bug from scmrepo. In gitpython backend it raise a proper RevError.

In [1]: from scmrepo.git import Git

In [2]: repo = Git('.')

In [3]: repo.gitpython.resolve_rev("HEAD~5")
Out[3]: '933cfcc55c98d445fddd952815881c8a5df3e797'

In [4]: repo.gitpython.resolve_rev("HEAD~6")
---------------------------------------------------------------------------
RevError                                  Traceback (most recent call last)
<ipython-input-4-2617366dae9f> in <module>
----> 1 repo.gitpython.resolve_rev("HEAD~6")

~/anaconda3/envs/dvc/lib/python3.8/site-packages/scmrepo/git/backend/gitpython.py in resolve_rev(self, rev)
    350                 return shas.pop()
    351
--> 352         raise RevError(f"unknown Git revision '{rev}'")
    353
    354     def resolve_commit(self, rev: str) -> "GitCommit":

RevError: unknown Git revision 'HEAD~6'

while in pygit2 it didn't raise a proper exception.

In [5]: repo.pygit2.resolve_rev("HEAD~6")
---------------------------------------------------------------------------
InvalidSpecError                          Traceback (most recent call last)
<ipython-input-5-d503510da33d> in <module>
----> 1 repo.pygit2.resolve_rev("HEAD~6")

~/anaconda3/envs/dvc/lib/python3.8/site-packages/scmrepo/git/backend/pygit2.py in resolve_rev(self, rev)
    261
    262         # Look for single exact match in remote refs
--> 263         shas = {
    264             self.get_ref(f"refs/remotes/{remote.name}/{rev}")
    265             for remote in self.repo.remotes

~/anaconda3/envs/dvc/lib/python3.8/site-packages/scmrepo/git/backend/pygit2.py in <setcomp>(.0)
    262         # Look for single exact match in remote refs
    263         shas = {
--> 264             self.get_ref(f"refs/remotes/{remote.name}/{rev}")
    265             for remote in self.repo.remotes
    266         } - {None}

~/anaconda3/envs/dvc/lib/python3.8/site-packages/scmrepo/git/backend/pygit2.py in get_ref(self, name, follow)
    320         from pygit2 import GIT_REF_SYMBOLIC
    321
--> 322         ref = self.repo.references.get(name)
    323         if not ref:
    324             return None

~/anaconda3/envs/dvc/lib/python3.8/site-packages/pygit2/repository.py in get(self, key)
   1438     def get(self, key):
   1439         try:
-> 1440             return self[key]
   1441         except KeyError:
   1442             return None

~/anaconda3/envs/dvc/lib/python3.8/site-packages/pygit2/repository.py in __getitem__(self, name)
   1434
   1435     def __getitem__(self, name):
-> 1436         return self._repository.lookup_reference(name)
   1437
   1438     def get(self, key):

InvalidSpecError: refs/remotes/origin/HEAD~6: the given reference name 'refs/remotes/origin/HEAD~6' is not valid

In [6]: repo.pygit2.resolve_rev("HEAD~5")
Out[6]: '933cfcc55c98d445fddd952815881c8a5df3e797'

And if i removed the origin from the test repo. pygit2 backend works correctly.


In [3]: repo.pygit2.resolve_rev("HEAD~5")
Out[3]: '933cfcc55c98d445fddd952815881c8a5df3e797'

In [4]: repo.pygit2.resolve_rev("HEAD~6")
---------------------------------------------------------------------------
RevError                                  Traceback (most recent call last)
<ipython-input-4-d503510da33d> in <module>
----> 1 repo.pygit2.resolve_rev("HEAD~6")

~/anaconda3/envs/dvc/lib/python3.8/site-packages/scmrepo/git/backend/pygit2.py in resolve_rev(self, rev)
    269         if len(shas) == 1:
    270             return shas.pop()  # type: ignore
--> 271         raise RevError(f"unknown Git revision '{rev}'")
    272
    273     def resolve_commit(self, rev: str) -> "GitCommit":

RevError: unknown Git revision 'HEAD~6'

@dberenbaum
Copy link
Contributor

Thanks for looking into it, @karajan1001! Do you know what needs to be fixes so that it actually returns the table with the full history? Should we open a new issue or PR to fix?

@karajan1001
Copy link
Contributor Author

karajan1001 commented Jan 27, 2022

I think we should first fix it on the scmrepo side and then upgrade the version here. related to treeverse/scmrepo#30

@dberenbaum
Copy link
Contributor

@karajan1001 In the meantime, should we roll back the -1 support and make a separate issue to add this? This would unblock other PRs and docs as well.

@karajan1001
Copy link
Contributor Author

@karajan1001 In the meantime, should we roll back the -1 support and make a separate issue to add this? This would unblock other PRs and docs as well.

This issue not only exists in -1 conditions but on any n numbers larger than history commits. But users do not give such a big n normally. And if it happened, users can give a smaller n to get the correct output. While for -1, it would not work unless removing the origin in the remote. So I think we can roll back for now.

karajan1001 added a commit to karajan1001/dvc.org that referenced this pull request Feb 16, 2022
1. add a new flag `--rev` to `dvc exp show` command.
related to treeverse/dvc#7204

Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
karajan1001 added a commit to karajan1001/dvc.org that referenced this pull request Mar 9, 2022
1. add a new flag `--rev` to `dvc exp show` command.
related to treeverse/dvc#7204

Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com>
jorgeorpinel added a commit to treeverse/dvc.org that referenced this pull request Mar 10, 2022
1. add a new flag `--rev` to `dvc exp show` command.
related to treeverse/dvc#7204

Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com>

Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com>
iesahin pushed a commit to treeverse/dvc.org that referenced this pull request Apr 11, 2022
1. add a new flag `--rev` to `dvc exp show` command.
related to treeverse/dvc#7204

Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com>

Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A: experiments Related to dvc exp enhancement Enhances DVC refactoring Factoring and re-factoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add --rev and updated -n=-1 behavior to exp show

5 participants