-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
feature requestRequesting a new featureRequesting a new featurep3-nice-to-haveIt should be done this or next sprintIt should be done this or next sprint
Description
I've been working on a large project with multiple datasets. One of these datasets is large (>100 GB). If I simply run dvc pull, then it will pull the huge dataset, which takes up most available disk space on my machine.
The only way around this appears to be providing the file name to every data file to download. This is inconvenient, however, because there are many files I do want, and only one that I don't want.
I see two solutions to this:
- Allow named file groups. The user could specify groups of files in some sort of config, and pull them individually by name. I.e.,
dvc pull mnist. The user would also be able to exclude them:dvc pull all --exclude mnist. - Allow exclusion of certain files from the command line. I.e.,
dvc pull --exclude data/mnist.dvc.
shcheklein, pared, dmpetrov, PavelKovalets, courentin and 3 more
Metadata
Metadata
Assignees
Labels
feature requestRequesting a new featureRequesting a new featurep3-nice-to-haveIt should be done this or next sprintIt should be done this or next sprint