Skip to content

Bug: dvc add fails with a modified file (or directory) at the end of a list of files #2886

@jaredsampson

Description

@jaredsampson

Using DVC 0.71.0 on RHEL6, installed via conda, configured for system-wide hard links (not sure if it's relevant):

$ dvc --version
0.71.0

$ cat ~/.config/dvc/config
[cache]
protected = true
type = hardlink

I have come across what appears to be a bug, where attempting to dvc add a directory previously under DVC version control whose contents have changed, results in an error, but only when adding along with a list of files, and doesn't occur if the command is repeated (i.e. after all the other files have been added). I have reproduced this in several project directories and via the following minimal example (courtesy of @MrOutis):

dvc init --no-scm
mkdir data
echo "foo" > data/foo
dvc add data
echo "bar" > bar.txt
dvc unprotect data
echo "change" > data/foo
dvc add bar.txt data

This results in the following output:

WARNING: Output 'data' of 'data.dvc' changed because it is 'modified'
100%|██████████|Add                                       2.00/2.00 [00:02<00:00,  1.08s/file]
ERROR: file/directory 'data' is specified as an output in more than one stage: data.dvc
    data.dvc

But if the command is re-run:

$ dvc add bar.txt data
Stage is cached, skipping.
100% Add                                                  1.00/1.00 [00:01<00:00,  1.96s/file]

So it appears dvc is somehow mishandling the list of files. Of course, the expected behavior is that it will add the directory successfully on the first try.

Thanks for any effort to track down the source of the bug.

Metadata

Metadata

Assignees

Labels

bugDid we break something?research

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions