-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
UPDATE: See #3369 (comment) for main purpose of this issue
If I have a data registry monorepo with several subrepos inside it, each pointing to it's own remote storage path, I want to be able to import and/or get data from a specific subrepo/remote location.
My situation: I have an s3 bucket, s3://datasets/ where I store multiple datasets, say audio/ and text/. I want to track these data using dvc and a single, git-based data registry. Then I want to be able to selectively push and pull/import data from s3 through this registry for use in my other dvc projects.
So, I can make subrepos for audio/ and text/, each initialized with its own dvc file and remote, and push data to s3 this way. Then, if I want to only download the audio data into a new project, then I can run something like dvc import git@github.com/datasets/audio and it will automatically pull from the correct path in s3, corresponding to the default remote for the audio subrepo.
Thank you!