Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warcio doesn't verify digests on read #18

Closed
wumpus opened this issue May 11, 2017 · 3 comments
Closed

warcio doesn't verify digests on read #18

wumpus opened this issue May 11, 2017 · 3 comments

Comments

@wumpus
Copy link
Collaborator

wumpus commented May 11, 2017

I was experimenting with injecting digests from my crawler, so that digests aren't computed twice, and noticed that records with a bad WARC-Payload-Digest don't raise an exception on read. No code for it, so I suppose this is a feature request.

The check should be disable-able, and "warcio index" and recompress ought to have a command line flag to ignore digest errors.

Lacking this feature, I don't think that warcio currently has any test to ensure that it's correctly computing digests.

@wumpus
Copy link
Collaborator Author

wumpus commented Nov 27, 2017

I coded this up by hacking LimitReader to be configurable by recordloader.py to test digests. If I turn it on for all of the existing tests, the only ones that fail are the truncated warc (expected) and the block_digest revisit record in example.warc (which might be your bug, not mine... I'll figure it out)

Because of your enthusiasm for streaming, the digest check will only fire if the user reads to the end of the record. So be it.

I have a few style questions:

  1. Should I raise ArchiveLoadFailed if the digest check fails? ArchiveDigestFailed? ValueError?

  2. Should the digest check default off? I didn't check perf, but it ought to be a lot cheaper than decompressing.

@wumpus
Copy link
Collaborator Author

wumpus commented Nov 17, 2018

New try: #54

@wumpus
Copy link
Collaborator Author

wumpus commented Jan 22, 2019

Done by "warcio check"

@wumpus wumpus closed this as completed Jan 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant