-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
feature requestRequesting a new featureRequesting a new featurequestionI have a question?I have a question?
Description
See treeverse/dvc.org/issues/682 for context.
It seems like large data sets (in the TBs) tend to get bundled and/or partitioned in different ways and formats such as HDFS/HDF5/TFRecord files. This poses a challenge for DVC data versioning which calculates checksums at the file (or directory) level.
What would be the easiest way to extend DVC support for this kind of dataset storing practice? Perhaps a tool separate to DVC itself even, as some sort of middleware that enables transparency between the actual dataset, however it's organized into bundles and partitions, and DVC commands.
Metadata
Metadata
Assignees
Labels
feature requestRequesting a new featureRequesting a new featurequestionI have a question?I have a question?