-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Using DVC 0.71.0 on RHEL6, installed via conda, configured for system-wide hard links (not sure if it's relevant):
$ dvc --version
0.71.0
$ cat ~/.config/dvc/config
[cache]
protected = true
type = hardlink
I have come across what appears to be a bug, where attempting to dvc add a directory previously under DVC version control whose contents have changed, results in an error, but only when adding along with a list of files, and doesn't occur if the command is repeated (i.e. after all the other files have been added). I have reproduced this in several project directories and via the following minimal example (courtesy of @MrOutis):
dvc init --no-scm
mkdir data
echo "foo" > data/foo
dvc add data
echo "bar" > bar.txt
dvc unprotect data
echo "change" > data/foo
dvc add bar.txt data
This results in the following output:
WARNING: Output 'data' of 'data.dvc' changed because it is 'modified'
100%|██████████|Add 2.00/2.00 [00:02<00:00, 1.08s/file]
ERROR: file/directory 'data' is specified as an output in more than one stage: data.dvc
data.dvc
But if the command is re-run:
$ dvc add bar.txt data
Stage is cached, skipping.
100% Add 1.00/1.00 [00:01<00:00, 1.96s/file]
So it appears dvc is somehow mishandling the list of files. Of course, the expected behavior is that it will add the directory successfully on the first try.
Thanks for any effort to track down the source of the bug.