-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
** Note: DVC version 0.66.1, pip install, Ubuntu 16.04
I am having difficulty configuring DVC to track files on an s3 bucket with the structure
ellesdatabucket
├── data
│ ├── img1.png
│ ├── img2.png
│ ├── ...
└── cache
Specifically, I want to use DVC to version control *.png files stored in the data folder, and use the cache folder as DVC's cache.
Based on the docs provided here, I believe I've replicated exactly the provided steps. But I hit an error when I run dvc add:
$ git init
$ dvc init
$ dvc remote add myremote s3://ellesdatabucket/data
$ dvc remote add s3cache s3://ellesdatabucket/cache
$ dvc config cache.s3 s3cache
$ dvc add s3://ellesdatabucket/data
The output looks initially encouraging,
29%|██▊ |Computing hashes (only done once691/2424 [00:03<00:07, 223md5/s]
But then I get this error message:
ERROR: s3://ellesdatabucket/data/. does not exist: An error occurred (404) when calling the HeadObject operation: Not Found
I'm positive that there are files in the data folder and can view them by aws s3 ls s3://ellesdatabucket/data. And, if I try to run dvc add with only a single file at a time instead of a whole directory, the command completes successfully. Although a bash script could dvc add each file in a loop, I want to make sure there's not a better way. Issue 2647 seems to be discussing a similar problem but I can't figure out how to apply that code to my own example here.
Thank you for any help!
More context:
https://discordapp.com/channels/485586884165107732/485596304961962003/637836708184064010