-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
A: data-syncRelated to dvc get/fetch/import/pull/pushRelated to dvc get/fetch/import/pull/pushfs: gsRelated to the Google Cloud Storage filesystemRelated to the Google Cloud Storage filesystem
Description
Bug Report
Description
This could be a user error but something is confusing (and seems wrong) with dvc import-url gs://my-bucket/my-external-data-dir/ and where the files get downloaded to. Checksums are computed correct but the downloaded directory is empty (WARNING: 'my-external-data-dir' is empty.).
Reproduce
Create a new git repo
mkdir my_repo
cd my_repo
git init
touch README.md
git add README.md
git commit -m "Add README.md"Initialize dvc inside git repo
python -m venv .venv && source .venv/bin/activate && pip install dvc[gs]
dvc initCreate a Google Storage bucket and add some files to a directory.
gsutil mb gs://my-tmp-bucket-for-dvc-bug
touch file.txt
gsutil cp file.txt gs://my-tmp-bucket-for-dvc-bug/my-external-data-dir/file.txt
rm file.txtImport external data into DVC workspace under data/
mkdir data && cd data
dvc import-url gs://my-tmp-bucket-for-dvc-bug/my-external-data-dirFile directory got downloaded, but with wrong path. It's outside the git repo.
ls ../../my-tmp-bucket-for-dvc-bug/my-external-data-dir/
Expected
my-external-data-dir.dvc and my-external-data-dir/ with downloaded files within it, and the directory added to .gitignore, under data/
Actual
dvc import-url gs://my-tmp-bucket-for-dvc-bug/my-external-data-dir
Importing 'gs://my-tmp-bucket-for-dvc-bug/my-external-data-dir' -> 'my-external-data-dir'
WARNING: 'my-external-data-dir' is empty.cat my-external-data-dir.dvc
md5: 7fa09692529be14144a721c582399a3e
frozen: true
deps:
- md5: 081ca9f4767585c8dfe3c9baa0d41f7a.dir
size: 0
nfiles: 1
path: gs://my-tmp-bucket-for-dvc-bug/my-external-data-dir
outs:
- md5: d751713988987e9331980363e24189ce.dir
size: 0
nfiles: 0
path: my-external-data-dirEnvironment information
Output of dvc doctor:
DVC version: 2.10.1 (pip)
---------------------------------
Platform: Python 3.9.7 on Linux-5.13.0-40-generic-x86_64-with-glibc2.34
Supports:
gs (gcsfs = 2022.3.0),
webhdfs (fsspec = 2022.3.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/vgubuntu-root
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/mapper/vgubuntu-root
Repo: dvc, gitMetadata
Metadata
Assignees
Labels
A: data-syncRelated to dvc get/fetch/import/pull/pushRelated to dvc get/fetch/import/pull/pushfs: gsRelated to the Google Cloud Storage filesystemRelated to the Google Cloud Storage filesystem