-
Notifications
You must be signed in to change notification settings - Fork 1.3k
exp show: add --rev flag (#7152)
#7204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dvc/repo/experiments/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using brancher here is a pretty heavy dependency since it creates the git-fs and changes the current repo instance for each rev. It's also bit redundant since we also use brancher on single revisions in _collect_experiment_commit (where we actually do want the git-fs and modified repo instance).
Since at this point in the call we only want the list of revisions that the combination of all_branches/all_tags/all_commits would give, what we probably want is a to split the current brancher functionality for just getting the revisions into a helper method https://github.com/iterative/dvc/blob/ceeda6d3072fb7d862518a60e9e2f6a0ae2cf3d2/dvc/repo/brancher.py#L30-L59
(so that we can get the revs but skip the git-fs/repo modification when we don't need it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like iter_revs perhaps?
for rev in iter_revs(revs, all_branches=all_branches, all_tags=all_tags, all_experiments=all_experiments, all_commits=all_commits):
passNote that workspace probably is not a part of iter_revs, but resolve_rev should be a part of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is OK to rename get_revs_from_flags as iter_revs for me, but the problem preventing me merge the logic between get_revs_from_flags and brancher here is that there are some problems with the output format of iter_revs.
The output format of brancher without a sha_only flag can be shas, branch names, tag names or even HEAD or HEAD~~~. Besides this when one commit have multiple names among this, we need to the one that users provide to make them more understandable to the users.
While on the get_revs_from_flags side the output is quite simple, we just crunch all different inputs into shas.
I think it is better to follow the output rule in brancher here for both of the two functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new iterator can return the full (sha, names) pairs. It's really a brancher() design flaw that it's only designed to return either SHAs or names, but not both. Determining whether we want to display SHA's or human-readable names should be happening at the command/UI level and not at the internal repo/brancher API level.
see updated comment: #7204 (comment)
--rev flag (#7152)--rev flag (#7152)
dvc/scm.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be iter_revs as described by @skshetry (or maybe something like iter_rev_names), and it can yield the same sha, names pairs that brancher currently uses. We can also just add the num= support here as well, and should also support all_experiments the same way that brancher does.
So basically this iterator should do everything that brancher does right now except for modifying repo and generating a git-fs.
So this method would look something like
def iter_rev_names(
scm: "Git",
revs: List[str],
all_branches: bool = False,
...
) Generator[...]:
... # get revs list the same way as brancher()
if revs:
rev_resolver = partial(resolve_rev, scm)
# yield (rev, names) pairs
yield from group_by(rev_resolver, revs).items()This way code outside of brancher can use the rev, names items as needed without repo modification and git-fs overhead.
And now brancher() itself could look something like
def brancher(
self,
*args,
sha_only=False,
**kwargs,
):
saved_fs = self.fs
try:
for sha, names in iter_rev_names(self.scm, *args, **kwargs):
self.fs = GitFileSystem(scm=scm, rev=sha)
if self.fs.exists(self.root_dir):
if sha_only:
yield sha
else:
yield ", ".join(names)
dvc/scm.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix_exp_head logic and num logic could all be moved into the generalized iter_rev_names method (and brancher() by extension).
Everywhere we use brancher in DVC we also need the exp HEAD fixup. so we can do the fix_exp_head inside the refactored iterator and then remove some of the existing fix_exp_head calls in DVC (exp show + params/metrics/plots diff) so that it's always unified in one place
dvc/scm.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use OrderedDict as dict is guaranteed to be ordered based on insertions from 3.7+ onwards.
--rev flag (#7152)--rev flag (#7152)
tests/unit/scm/test_scm.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Git does not have a concept of default branch locally. init uses init.defaultBranch/--initial-branch and defaults to master. clone uses currently active branch as initial branch by default.
At least in scmrepo, it is not possible to know what branch is/was initial. So this feels like a tests-only requirement. Due to these issues, I have moved away from default_branch suggestion. I think the best way would be to keep a reference to scm.active_branch(), and use that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you may be able to avoid using hardcoded master by using tmp_dir.branch context manager.
pmrowla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need a docs command reference update
fix: treeverse#7152 1. Add `--rev` flag to `dvc exp show`. 2. Add -1 support for `--num` flag`. 3. Extract revision found logic to `utils`. 4. Brancher use this revision found logic. 5. Update command unit test 6. Add a util unit test. Co-authored-by: Peter Rowlands (변기호) <peter@pmrowla.com>
|
|
||
| head_revs = head_revs or [] | ||
| revs = [] | ||
| for rev in head_revs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't this mutually exclusive with num/n before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be done separately with n_commits(num) if it's exclusive, outside of iter_revs, considering it is not used in brancher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous version rev+num is not mutually exclusive with all_tags/all_branches. And the result will be the sum of all revs. Maybe it is better to make them exclusive? @dberenbaum
This function will be used in exp push/pull/ls/show/remove If I separate n_commits(num) here then I need to add a new function and call them five times there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the result will be the sum of all revs.
How is this supposed to work? If I do dvc exp show -a -n 10, I get:
───────────────────────────────────────────────────────────────────────────────────────
Experiment Created avg_prec roc_auc prepare.split pre
───────────────────────────────────────────────────────────────────────────────────────
workspace - 0.60405 0.9608 0.2 201
11-random-forest-experiments May 29, 2021 0.60405 0.9608 0.2 201
───────────────────────────────────────────────────────────────────────────────────────
So it seems like -n is being ignored?
I don't feel strongly about whether we allow these to be combined. It seems strange to combine them, but if we can explain the behavior clearly, it might be fine to allow users to combine them if they want.
|
|
||
| def iter_revs( | ||
| scm: "Git", | ||
| head_revs: Optional[List[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not name it revs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can call it revs + results compare to current head_revs + revs
| raise RevError(str(exc)) | ||
|
|
||
|
|
||
| def iter_revs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iter_revs is no longer an iterator now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any name to suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can modify these in #7245
|
|
|
Looks like it is a bug from while in And if i removed the |
|
Thanks for looking into it, @karajan1001! Do you know what needs to be fixes so that it actually returns the table with the full history? Should we open a new issue or PR to fix? |
|
I think we should first fix it on the |
|
@karajan1001 In the meantime, should we roll back the |
This issue not only exists in |
1. add a new flag `--rev` to `dvc exp show` command. related to treeverse/dvc#7204 Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
1. add a new flag `--rev` to `dvc exp show` command. related to treeverse/dvc#7204 Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com> Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com>
1. add a new flag `--rev` to `dvc exp show` command. related to treeverse/dvc#7204 Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com> Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com> Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com> Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com>
1. add a new flag `--rev` to `dvc exp show` command. related to treeverse/dvc#7204 Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com> Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com> Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com> Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com>
fix: #7152
--revflag todvc exp show.--numflag`.utils.❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏