Skip to content

gs: support directories as external dependencies/outputs #2814

@willypicard

Description

@willypicard

I have a similar issue to #2678 but for GS.

I have a bucket with the following structure

my_bucket
       ├── data
       │     ├── img1.png
       │     ├── img2.png
       │     ├── ...
       └── cache

I have then created a clean project

$ git init
$ dvc init
$ dvc remote add gscache gs://my_bucket/cache
$ dvc config cache.gs gscache
$ dvc add gs://my_bucket/data

The output is as follows:

100%|██████████|Add                                                                                                                            1/1 [00:00<00:00,  1.21file/s]
ERROR: output 'gs://my_bucket/data' does not exist

Adding a single file works (dvc add gs://my_bucket/data/img1.png).

A more verbose version:

$ dvc add gs://my_bucket/data -v 
DEBUG: PRAGMA user_version;
DEBUG: fetched: [(3,)]
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
100%|██████████|Add                                                                                                                            1/1 [00:01<00:00,  1.63s/file]
DEBUG: SELECT count from state_info WHERE rowid=?
DEBUG: fetched: [(0,)]
DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
ERROR: output 'gs://my_bucket/data' does not exist
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/egnyte/anaconda3/envs/dvc/lib/python3.7/site-packages/dvc/command/add.py", line 25, in run
    fname=self.args.file,
  File "/home/egnyte/anaconda3/envs/dvc/lib/python3.7/site-packages/dvc/repo/__init__.py", line 35, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/home/egnyte/anaconda3/envs/dvc/lib/python3.7/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/home/egnyte/anaconda3/envs/dvc/lib/python3.7/site-packages/dvc/repo/add.py", line 53, in add
    stage.save()
  File "/home/egnyte/anaconda3/envs/dvc/lib/python3.7/site-packages/dvc/stage.py", line 716, in save
    out.save()
  File "/home/egnyte/anaconda3/envs/dvc/lib/python3.7/site-packages/dvc/output/base.py", line 219, in save
    raise self.DoesNotExistError(self)
dvc.output.base.OutputDoesNotExistError: output 'gs://my_bucket/data' does not exist
------------------------------------------------------------

dvc --version = 0.68.1. I am using ubuntu, I installed using conda, python 3.7.5.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions