Skip to content

Unable to dvc add more than 1 file at a time in s3 bucket #2678

@elleobrien

Description

@elleobrien

** Note: DVC version 0.66.1, pip install, Ubuntu 16.04

I am having difficulty configuring DVC to track files on an s3 bucket with the structure

ellesdatabucket
       ├── data
       │     ├── img1.png
       │     ├── img2.png
       │     ├── ...
       └── cache

Specifically, I want to use DVC to version control *.png files stored in the data folder, and use the cache folder as DVC's cache.

Based on the docs provided here, I believe I've replicated exactly the provided steps. But I hit an error when I run dvc add:

$ git init
$ dvc init
$ dvc remote add myremote s3://ellesdatabucket/data
$ dvc remote add s3cache s3://ellesdatabucket/cache
$ dvc config cache.s3 s3cache
$ dvc add s3://ellesdatabucket/data

The output looks initially encouraging,

 29%|██▊       |Computing hashes (only done once691/2424 [00:03<00:07,    223md5/s]

But then I get this error message:

ERROR: s3://ellesdatabucket/data/. does not exist: An error occurred (404) when calling the HeadObject operation: Not Found

I'm positive that there are files in the data folder and can view them by aws s3 ls s3://ellesdatabucket/data. And, if I try to run dvc add with only a single file at a time instead of a whole directory, the command completes successfully. Although a bash script could dvc add each file in a loop, I want to make sure there's not a better way. Issue 2647 seems to be discussing a similar problem but I can't figure out how to apply that code to my own example here.

Thank you for any help!

More context:

https://discordapp.com/channels/485586884165107732/485596304961962003/637836708184064010

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugDid we break something?research

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions