-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-134635: add zlib.{adler32,crc32}_combine
to combine checksums
#134650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a few suggestions/thoughts but overall looks great!
Thanks for the feedback! I'm not on my dev session so I won't be able to change the clinic-related code as I'll need to regen the files, but I'll address the rest in the meantime. |
Co-authored-by: Emma Smith <emma@emmatyping.dev>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, thanks!
aa8b941
to
c5a2c55
Compare
Thanks for the reviews both of you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two suggestions for wording but otherwise looks great! Thank you for adding these.
Co-authored-by: Emma Smith <emma@emmatyping.dev>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've edited your suggestion a bit but thank you!
Just a note that the function is probably obsolete. The ticket goes back to 2019. The python interpreter has switched to zlib-ng on windows (other OS to follow). the zlib.crc is computing crc with hardware instructions on x64, which are insanely fast. binascii.crc is just an alias to zlib.crc. there is little point in computing crc32 in parallel. In fact we should probably remove or adjust the crc code to not release the GIL, as it's counterproductive on small data. |
I don't think adopting zlib-ng everywhere is settled, and there is a long tail of old Linux distributions that want to link against a system zlib that do not ship zlib-ng yet. I don't think we can assume zlib-ng performance. That being said even zlib crc32 is pretty fast. However, even with a fast sequential crc32 I think this could be useful for concurrent checksuming for say, parallel chunked file downloads.
I would want to see convincing benchmarks before making this change and proper motivation, otherwise I'm not sure if this is worth the complexity. (if you have those though please do open an issue!) |
hello @emmatyping and sorry for pinging this old ticket
Nope, zlib crc32 is horribly slow. I've been working on optimizing pip and whole percents of the duration of "pip install" is computing crc! You can find benchmarks (both compression and crc) in the zlib-ng thread:
See the thread, it's settled and there is agreement to do it. Anyway, we can get better crc function aside of zlib/zlib-ng. The python interpreter can use the crc32 function from liblzma instead. |
zlib.{adler,crc}32_combine
#134635📚 Documentation preview 📚: https://cpython-previews--134650.org.readthedocs.build/