-
Notifications
You must be signed in to change notification settings - Fork 1.3k
exp run: fix issue where duplicate workspace runs would incorrectly conflict #5611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if checkpoint: | ||
| raise CheckpointExistsError(ref_info.name) | ||
| raise ExperimentExistsError(ref_info.name) | ||
| new_rev = orig_rev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the new run generates an identical commit to an existing one, we should be reusing the existing commit (this logic was already happening for tempdir runs during git fetch, but not for workspace runs where we commit directly into the main git/dvc workspace)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmrowla, is it possible to find out before even running this, that the state is the same as before (might be problematic with non-deterministic stages probably).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's difficult because we also include git repo/workspace modifications with experiments (and not just DVC dependencies). So while two pipeline runs might be identical (so they get hashed into matching stages with matching DVC-tracked deps/outs), there could be other changes in the repo that show up in git but not to DVC. So we have to also generate the final git commit and then see if that actually conflicts/diffs with the previous run.
If there are git differences, we can't really tell which experiment should be preferred (so we would error out and then require running with -f/--force to overwrite the existing one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yeah, also as you noted, I'm not sure we can rely on checkpoint stages to be deterministic, since they are persist outputs and it's not necessarily guaranteed that the user's code will always generate the identical sequence of checkpoints
skshetry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I have made some minor comments inline.
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. π
Will fix #5567