Skip to content

Conversation

@pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Mar 10, 2021

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Will close #5553.

Docs PR: treeverse/dvc.org#2286

Adjusts dvc exp run --reset behavior for checkpoint outs:

  • When --reset is used, any checkpoint outs will be removed entirely before running the experiment (regardless of whether or not a hash for the given out exists in dvc.lock)
  • When --reset is used, all other (non-checkpoint) outs in dvc.lock will be unchanged
    • Previously, --reset would reset the entire dvc.lock file to HEAD (i.e git checkout HEAD -- dvc.lock)
  • --queue now explicitly implies --reset, and --reset is now mutually exclusive with --rev
    • This was essentially already the case but we now explicitly error out if --reset is used in conjunction with --rev
    • Previously, using --rev would do the equivalent of git reset --hard <rev> (so the old behavior would do the same thing with or with --reset).
    • With the new behavior, it does not make sense to delete a checkpoint out at the same time as resuming from that checkpoint, so it's now an error condition
    • The docs for --queue/--run-all with regard to checkpoints have also been clarified

@pmrowla pmrowla requested a review from dberenbaum March 10, 2021 06:47
@pmrowla pmrowla self-assigned this Mar 10, 2021
@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 10, 2021

@dberenbaum I tested the new behavior against some of the toy checkpoints repos I have on hand (in addition to the CI tests), but would appreciate it if you can double check with a couple of your own repos as well.

Copy link
Contributor

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are checkpoints prior to HEAD, --reset will not drop the existing checkpoints, but it will create a new experiment from scratch and duplicate those checkpoints. Is it important that --reset drops those checkpoints to avoid duplication? Will it seem inconsistent that checkpoints are dropped when based off HEAD but otherwise are not?

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 11, 2021

If there are checkpoints prior to HEAD, --reset will not drop the existing checkpoints, but it will create a new experiment from scratch and duplicate those checkpoints. Is it important that --reset drops those checkpoints to avoid duplication? Will it seem inconsistent that checkpoints are dropped when based off HEAD but otherwise are not?

Experiments (checkpoints or not) are always considered unique per parent HEAD commit. Experiments derived from any other commit are considered separate, so we don't check to see if a matching experiment tied to any other commit in the entire git history. (But if an identical run does exist, we will still re-use run-cache for that experiment so execution would not be duplicated, and we would also re-use identical cache objects like with anything else in DVC)

As it stands right now, checkpoints behave the same way as standalone experiments, making --reset search for identical checkpoint(s) through the rest of the git history seems like it would be unnecessary (and probably unexpected behavior as well?)

Copy link
Contributor

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the behavior as is, but we might have to revisit this workflow if users get confused. Both this and #5567 are related to running experiments with different Git histories but matching dvc-tracked info.

@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 11, 2021

#5567 is bug that is not related to checkpoints

@pmrowla pmrowla force-pushed the checkpoint-reset-lockfile branch from e4c37ff to 2c57a77 Compare March 12, 2021 05:39
@pmrowla pmrowla merged commit 8c1d46e into treeverse:master Mar 12, 2021
@pmrowla pmrowla deleted the checkpoint-reset-lockfile branch March 12, 2021 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

exp run: reset when the lock file is in Git

2 participants