-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
awaiting responsewe are waiting for your reply, please respond! :)we are waiting for your reply, please respond! :)
Description
Bug Report
Validation error on an attempt to download dataset cats-dogs-v[1/2]
Description
When I try to download datasets cats-dogs-v1 or cats-dogs-v2 using command dvc get with option --rev I get an error.
Environment
- OS: Ubuntu 22.04 LTS
- Python: 3.8.10/3.11.2
- Package manager: pip==23.2.1
- Virtual environment: standard venv
- DVC: 3.x
Reproduce
# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Upgrade basic packages
pip install --upgrade pip setuptools wheel
# Install DVC
pip install dvc==3.7.0
# Try to download dataset
dvc get --rev cats-dogs-v1 \
https://github.com/iterative/dataset-registry \
use-cases/cats-dogs -o datadirerror traceback
../../../../../get-started/data.xml.dvc' validation failed in revision
'0547f58'.
extra keys not allowed, in outs -> 0 -> metric, line 4, column 3
3 outs:
4 - md5: a304afb96060aad90176268345e10355
5 path: get-started/data.xml
The same download command with option -v:
dvc get --rev cats-dogs-v1 \
https://github.com/iterative/dataset-registry \
use-cases/cats-dogs -o datadir -vverbose error traceback
2023-07-26 13:24:46,495 DEBUG: v3.7.0 (pip), CPython 3.11.2 on Linux-5.15.0-71-generic-x86_64-with-glibc2.31
2023-07-26 13:24:46,495 DEBUG: command: /media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/bin/dvc get --rev cats-dogs-v1 https://github.com/iterative/dataset-registry use-cases/cats-dogs -o datadir -v
2023-07-26 13:24:46,652 DEBUG: Creating external repo https://github.com/iterative/dataset-registry@cats-dogs-v1
2023-07-26 13:24:46,652 DEBUG: erepo: git clone 'https://github.com/iterative/dataset-registry' to a temporary dir
2023-07-26 13:24:48,637 DEBUG: erepo: using shallow clone for branch 'cats-dogs-v1'
'../../../../../get-started/data.xml.dvc' validation failed in revision '0547f58'.
extra keys not allowed, in outs -> 0 -> metric, line 4, column 3
3 outs:
4 - md5: a304afb96060aad90176268345e10355
5 path: get-started/data.xml
2023-07-26 13:24:48,678 ERROR: failed to get 'use-cases/cats-dogs' from 'https://github.com/iterative/dataset-registry' - '../../../../../get-started/data.xml.dvc' validation failed in revision '0547f58': extra keys not allowed @ data['outs'][0]['metric']
Traceback (most recent call last):
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/utils/strictyaml.py", line 267, in validate
return schema(data)
^^^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/voluptuous/schema_builder.py", line 272, in __call__
return self._compiled([], data)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/voluptuous/schema_builder.py", line 595, in validate_dict
return base_validate(path, iteritems(data), out)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/voluptuous/schema_builder.py", line 433, in validate_mapping
raise er.MultipleInvalid(errors)
voluptuous.error.MultipleInvalid: extra keys not allowed @ data['outs'][0]['metric']
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/commands/get.py", line 33, in _get_file_from_repo
Repo.get(
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/repo/get.py", line 54, in get
desc=f"Downloading {fs.path.name(path)}",
^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/fs/dvc.py", line 423, in path
return self.fs.path
^^^^^^^
File "/home/alex/.pyenv/versions/3.11.2/lib/python3.11/functools.py", line 1001, in __get__
val = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/fs/dvc.py", line 416, in fs
return _DVCFileSystem(**self.fs_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/fsspec/spec.py", line 79, in __call__
obj = super().__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/fs/dvc.py", line 163, in __init__
self._datafss[key] = DataFileSystem(index=repo.index.data["repo"])
^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/funcy/objects.py", line 25, in __get__
res = instance.__dict__[self.fget.__name__] = self.fget(instance)
^^^^^^^^^^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/repo/__init__.py", line 276, in index
return Index.from_repo(self)
^^^^^^^^^^^^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/repo/index.py", line 242, in from_repo
for _, idx in collect_files(repo, onerror=onerror):
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/repo/index.py", line 100, in collect_files
index = Index.from_file(repo, file_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/repo/index.py", line 265, in from_file
stages=list(dvcfile.stages.values()),
^^^^^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/dvcfile.py", line 197, in stages
data, raw = self._load()
^^^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/dvcfile.py", line 151, in _load
return self._load_yaml(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/dvcfile.py", line 162, in _load_yaml
return strictyaml.load(
^^^^^^^^^^^^^^^^
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/utils/strictyaml.py", line 295, in load
validate(data, schema, text=text, path=path, rev=rev)
File "/media/alex/hdd/tmp/tst_dvc_rgstr_dpnd_vrsn/.venv/lib/python3.11/site-packages/dvc/utils/strictyaml.py", line 269, in validate
raise YAMLValidationError(exc, path, text, rev=rev) from exc
dvc.utils.strictyaml.YAMLValidationError: '../../../../../get-started/data.xml.dvc' validation failed in revision '0547f58'
2023-07-26 13:24:48,715 DEBUG: Analytics is enabled.
2023-07-26 13:24:48,742 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpobvfq_w1']'
2023-07-26 13:24:48,743 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpobvfq_w1']'
Notes:
- the error occurs with different Python versions if dvc>==3.0.0; the latest working version is
dvc==2.58.2 - the same error with dataset
get-started:
dvc get --rev get-started https://github.com/iterative/dataset-registry get-started -o datadir- the error does not occur with dataset
get-started-40K:
dvc get --rev get-started-40K https://github.com/iterative/dataset-registry use-cases/cats-dogs -o datadirMetadata
Metadata
Assignees
Labels
awaiting responsewe are waiting for your reply, please respond! :)we are waiting for your reply, please respond! :)