Skip to content

dvc get/import support for subrepos #3369

@ehutt

Description

@ehutt

UPDATE: See #3369 (comment) for main purpose of this issue


If I have a data registry monorepo with several subrepos inside it, each pointing to it's own remote storage path, I want to be able to import and/or get data from a specific subrepo/remote location.

My situation: I have an s3 bucket, s3://datasets/ where I store multiple datasets, say audio/ and text/. I want to track these data using dvc and a single, git-based data registry. Then I want to be able to selectively push and pull/import data from s3 through this registry for use in my other dvc projects.

So, I can make subrepos for audio/ and text/, each initialized with its own dvc file and remote, and push data to s3 this way. Then, if I want to only download the audio data into a new project, then I can run something like dvc import git@github.com/datasets/audio and it will automatically pull from the correct path in s3, corresponding to the default remote for the audio subrepo.

Thank you!

Metadata

Metadata

Assignees

Labels

enhancementEnhances DVCp1-importantImportant, aka current backlog of things to do

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions