-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
-
Repeated
dvc addis not skipped.$ dvc add data $ dvc add data
In 1.X, it'd have been skipped. And, dvc still deletes the file and tries to restore it from the cache making it slower.
-
DVC uses move-then-checkout logic. It moves the file from the workspace to the cache and then checks it out again, rather than just using copy.
This is slow and might result in data loss if it happens to fail in between the operations.
-
DVC deletes the stage file, before even adding those files. This means that if the
dvc addoperation fails, the existing pointer file is lost, which is the only way to get access to the data. -
DVC resets the stages multiple times (only if multiple targets are provided) and forces the stage recollection which is slow.
-
To the same effect, it resets the internal state of the repo after creating each stage, which also happens to reset dulwich's ignore manager, making it horribly slow if using too many targets (or,
-R).
https://github.com/iterative/dvc/blob/4e792ae61c5927ab2e5f6a6914d985d43aa705b4/dvc/repo/add.py#L266