Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop using md5 hash #1422

Closed
nico202 opened this issue Feb 22, 2018 · 4 comments
Closed

stop using md5 hash #1422

nico202 opened this issue Feb 22, 2018 · 4 comments

Comments

@nico202
Copy link

nico202 commented Feb 22, 2018

Hi, on zenodo, to "verify file integrity", the MD5 hash is provided.

It's usually suggested to use something else (example).

Becuase of NSA involvements I don't trust SHA (that might have backdoors in it, explanation), but at least there are no known collisions (like there are for md5)

@krzysztof
Copy link
Contributor

We're not using MD5 for anything regarding authentication or security. We use MD5 as a "fingerprint", to verify whether the file we have received hasn't been corrupted during upload. For that purpose MD5 is fine, since we're interested in the quick checksum, not in cryptographically-secure hashing.

It's true that MD5 can have issues with collisions, but I'm not sure what an attack vector would be in this case - user manufacturing two files with the same MD5 checksum? I don't think this can be used maliciously in any way, while the probability that a random file upload corruption will produce two identical MD5 hashes is still very low.

@nico202
Copy link
Author

nico202 commented Feb 22, 2018

MD5 is not necessarily faster than SHA1
https://omnifarious.livejournal.com/363945.html
On my tests with linux cli md5sum/sha1sum, md5 is less than 5% faster, and with julia MD5.jl/SHA1.jl, technically at-pair. But at least we stop using and "spreading" a broken algorithm (a naive user might think that using md5 is ok because you are using it, too).

@lnielsen
Copy link
Member

lnielsen commented Mar 6, 2018

MD5 is still widely used for fingerprinting files and completely fine for these use cases. Changing from MD5 to SHA1 (or any other algorithms) requires re-checksumming of 1.2 million files and would possibly break existing API integrations relying on getting MD5 backs. Thus, it's not a simple change of a text string from md5 to sha1 but a rather costly (man-power + CPU) update without no real benefits.

@lnielsen lnielsen closed this as completed Mar 6, 2018
@nico202
Copy link
Author

nico202 commented Mar 6, 2018

Ok thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants