-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
feature requestRequesting a new featureRequesting a new featurep3-nice-to-haveIt should be done this or next sprintIt should be done this or next sprint
Description
Since md5 is sensitive to the order and format of the data, simple changes to the schema (eg. swapping two columns) or changing the type of a column (eg. integer to float) leads to new hash values and duplicated datasets. There are some alternatives that attempt to address this, such as UNF (http://guides.dataverse.org/en/latest/developers/unf/index.html).
It would be great to specify an alternative hash function in DVC, particularly to be able to provide a user-defined function.
shcheklein, efiop, florianjacob, xkortex, free-soellingeraj and 2 more
Metadata
Metadata
Assignees
Labels
feature requestRequesting a new featureRequesting a new featurep3-nice-to-haveIt should be done this or next sprintIt should be done this or next sprint