-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
I tried to update our dvc ci pipeline
Currently we got the following commands (among others).
dvc pull to check if everything is pushed
dvc status to check if the dvc status is clean. In other words no repro would be run if one would run dvc repro.
But pulling thats a long time and with the now new --alllow-missing feature i though i can skip that with
dvc data status --not-in-remote --json | grep -v not_in_remote
dvc repro --allow-missing --dry
the first is working like expected. Fails if data was forgotten to be pushed and succeeds if it was.
But the later just fails on missing data.
Reproduce
Example: Failure/Success on Machine Two and Three should be synced
Machine One:
- dvc repro -f
- git add . && git commit -m "repro" && dvc push && git push
- dvc repro --allow-missing --dry
--> doesnt fail, nothing changed (as expected)
Machine Two:
4. dvc data status --not-in-remote --json | grep -v not_in_remote
--> does not fail, everything is pushed and would be pulled
5. dvc repro --allow-missing --dry
--> fails on missing data (unexpected)
Machine Three
4. dvc pull
5. dvc status
--> succeeds
Expected
On a machine where i did not dvc pull i would expect on a git clean state and a clean dvc data status --not-in-remote --json | grep -v not_in_remote state that dvc repro --allow-missing --dry would succed and show me that no stage had to run.
Environment information
Linux
Output of dvc doctor:
$ dvc doctor
09:16:47 DVC version: 3.13.2 (pip)
09:16:47 -------------------------
09:16:47 Platform: Python 3.10.11 on Linux-5.9.0-0.bpo.5-amd64-x86_64-with-glibc2.35
09:16:47 Subprojects:
09:16:47 dvc_data = 2.12.1
09:16:47 dvc_objects = 0.24.1
09:16:47 dvc_render = 0.5.3
09:16:47 dvc_task = 0.3.0
09:16:47 scmrepo = 1.1.0
09:16:47 Supports:
09:16:47 azure (adlfs = 2023.4.0, knack = 0.11.0, azure-identity = 1.13.0),
09:16:47 gdrive (pydrive2 = 1.16.1),
09:16:47 gs (gcsfs = 2023.6.0),
09:16:47 hdfs (fsspec = 2023.6.0, pyarrow = 12.0.1),
09:16:47 http (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
09:16:47 https (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
09:16:47 oss (ossfs = 2021.8.0),
09:16:47 s3 (s3fs = 2023.6.0, boto3 = 1.28.17),
09:16:47 ssh (sshfs = 2023.7.0),
09:16:47 webdav (webdav4 = 0.9.8),
09:16:47 webdavs (webdav4 = 0.9.8),
09:16:47 webhdfs (fsspec = 2023.6.0)
09:16:47 Config:
09:16:47 Global: /home/runner/.config/dvc
09:16:47 System: /etc/xdg/dvc
09:16:47 Cache types: <https://error.dvc.org/no-dvc-cache>
09:16:47 Caches: local
09:16:47 Remotes: ssh
09:16:47 Workspace directory: ext4 on /dev/nvme0n1p2
09:16:47 Repo: dvc, git