-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
bugDid we break something?Did we break something?p3-nice-to-haveIt should be done this or next sprintIt should be done this or next sprint
Description
In the case where binary files have a large text header, the current
checksum routine will treat said files as text files and normalize
line-endings before performing the checksum. Not only is it dangerous to
manipulate binary files like this, it also doubles the runtime of the
checksum routine, as every block of data must be read twice.
As noted in #3264, at the minimum, DVC should probably match
Git's text file detection routine, which interrogates the first 8 kilobytes
(and doesn't do heuristics on ratio of printable characters,
as DVC currently does).
efiop
Metadata
Metadata
Assignees
Labels
bugDid we break something?Did we break something?p3-nice-to-haveIt should be done this or next sprintIt should be done this or next sprint