-
Notifications
You must be signed in to change notification settings - Fork 1.3k
exp run: dvc commit DVC-tracked data deps when stashing an experiment
#5859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dvc commit DVC-tracked data deps when stashing an experiment5593 exp run commitdvc commit DVC-tracked data deps when stashing an experiment
dberenbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a couple of test suggestions, but otherwise looks good!
The downside to this behavior will be that it will potentially take a long time to queue an experiment, right? Maybe we need to document that users should do dvc checkout to get back to their original data if they don't want to use the data changes in their workspace. Thoughts @jorgeorpinel?
|
Thanks @pmrowla π
Q#1. What happens to .dvc and dvc.lock files in the working tree if Q#2. by "stashing" do we litterally mean
@dberenbaum Idk if As for |
@dberenbaum This will depend on the complexity of the user's pipeline and how many data deps they have, but yes it will slow down queueing.
If
Queued experiments are git stash (merge) commits. We don't use the standard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried out the first test manually and it seems like there's maybe another test needed upstream somewhere.
Here's what I did:
mkdir repo
cd repo
git init
dvc init -q
git add .
git commit --quiet -m "init"
echo data > data
dvc add data
echo "import sys; import shutil; shutil.copyfile(sys.argv[1], sys.argv[2])" > copy.py
echo "foo: 1" > params.yaml
dvc run -n copy-file -M metrics.yaml -p foo -d copy.py -d data python copy.py params.yaml metrics.yaml
git add .
git commit -m "run stage"
echo modified > data
dvc exp runThe output from dvc exp run is:
'data.dvc' didn't change, skipping
Stage 'copy-file' didn't change, skippingNo experiment is created, and data is reverted back from modified to data.
|
@jorgeorpinel I think we may just need to document that large changes to data dependencies in the workspace may slow down experiment queueing. |
@dberenbaum bug was w/calling |
Hm... Is this something we should be worried about? Let's wait and see I guess.
Could've mentioned it in your blog post @pmrowla !
Agree. Created treeverse/dvc.org#2418 |
We can do this if we want to, but I don't think this is what users would expect. If If the user wants to roll back the repo state they can use |
|
The initial |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. π
Will close #5593
dvc commited internally when stashing an experiment so that any modifications to that data dep are preserved in both workspace and tempdir runs (previously the changes were dropped entirely by theexp rundvc checkoutstep).