-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No block digest written for warcinfo records #87
Comments
…ebrecorder/warcio#87) The length has to be set manually because otherwise warcio will automatically remove the header again.
Side note: |
The ('warcinfo') aspect of the bug is old. It was (accidentally) introduced in 25194bc You're right that any recordtype that's a subset of the string 'warcinfo' will match, fortunately there is no such standard recordtype. As for that explicit code that stops writing block digests for warcinfo records, that code was in the original checkin, when the code was split out of webrecorder. I'm kind of surprised that I didn't notice it while working on "warcio test". |
I dug around a bit more. a791617 is not the commit that introduced this behaviour. It appears that it was already like that back when warcinfo records were first added in webrecorder/pywb@d40edfc2; at the time, digests were only written for responses as far as I can tell. I can't really think of any good reason why block digests shouldn't simply always be present for verification purposes. The only exception would be zero-length records, where the digest is useless, but those are so rare and odd that worrying about them isn't worth it. I'll create a PR shortly. |
* Enable writing block digests for warcinfo records Fixes #87 * Fix test_recompress_arc2warc failure
I noticed today that warcio doesn't generate a block digest for warcinfo records:
warcio/warcio/recordbuilder.py
Line 30 in 8e3ceb7
This seems to have been introduced in a791617, but I was unable to figure out why. The spec permits block digests on any record (whereas a payload digest would make no sense on a warcinfo record due to the content type used normally), and it seems good practice to me to always store a digest to allow for integrity checks.
The text was updated successfully, but these errors were encountered: