Skip to content

dvc repro --dry --allow-missing: fails on missing data #9818

@Otterpatsch

Description

@Otterpatsch

I tried to update our dvc ci pipeline

Currently we got the following commands (among others).

dvc pull to check if everything is pushed
dvc status to check if the dvc status is clean. In other words no repro would be run if one would run dvc repro.

But pulling thats a long time and with the now new --alllow-missing feature i though i can skip that with

dvc data status --not-in-remote --json | grep -v not_in_remote
dvc repro --allow-missing --dry

the first is working like expected. Fails if data was forgotten to be pushed and succeeds if it was.
But the later just fails on missing data.

Reproduce

Example: Failure/Success on Machine Two and Three should be synced

Machine One:

  1. dvc repro -f
  2. git add . && git commit -m "repro" && dvc push && git push
  3. dvc repro --allow-missing --dry
    --> doesnt fail, nothing changed (as expected)

Machine Two:
4. dvc data status --not-in-remote --json | grep -v not_in_remote
--> does not fail, everything is pushed and would be pulled
5. dvc repro --allow-missing --dry
--> fails on missing data (unexpected)

Machine Three
4. dvc pull
5. dvc status
--> succeeds

Expected

On a machine where i did not dvc pull i would expect on a git clean state and a clean dvc data status --not-in-remote --json | grep -v not_in_remote state that dvc repro --allow-missing --dry would succed and show me that no stage had to run.

Environment information

Linux

Output of dvc doctor:

$ dvc doctor
09:16:47  DVC version: 3.13.2 (pip)
09:16:47  -------------------------
09:16:47  Platform: Python 3.10.11 on Linux-5.9.0-0.bpo.5-amd64-x86_64-with-glibc2.35
09:16:47  Subprojects:
09:16:47  	dvc_data = 2.12.1
09:16:47  	dvc_objects = 0.24.1
09:16:47  	dvc_render = 0.5.3
09:16:47  	dvc_task = 0.3.0
09:16:47  	scmrepo = 1.1.0
09:16:47  Supports:
09:16:47  	azure (adlfs = 2023.4.0, knack = 0.11.0, azure-identity = 1.13.0),
09:16:47  	gdrive (pydrive2 = 1.16.1),
09:16:47  	gs (gcsfs = 2023.6.0),
09:16:47  	hdfs (fsspec = 2023.6.0, pyarrow = 12.0.1),
09:16:47  	http (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
09:16:47  	https (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
09:16:47  	oss (ossfs = 2021.8.0),
09:16:47  	s3 (s3fs = 2023.6.0, boto3 = 1.28.17),
09:16:47  	ssh (sshfs = 2023.7.0),
09:16:47  	webdav (webdav4 = 0.9.8),
09:16:47  	webdavs (webdav4 = 0.9.8),
09:16:47  	webhdfs (fsspec = 2023.6.0)
09:16:47  Config:
09:16:47  	Global: /home/runner/.config/dvc
09:16:47  	System: /etc/xdg/dvc
09:16:47  Cache types: <https://error.dvc.org/no-dvc-cache>
09:16:47  Caches: local
09:16:47  Remotes: ssh
09:16:47  Workspace directory: ext4 on /dev/nvme0n1p2
09:16:47  Repo: dvc, git

Metadata

Metadata

Assignees

Labels

A: pipelinesRelated to the pipelines featureawaiting responsewe are waiting for your reply, please respond! :)bugDid we break something?p1-importantImportant, aka current backlog of things to do

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions