Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include offset in output of "warcio index ..." #4

Merged
merged 5 commits into from
Mar 17, 2017

Conversation

nlevitt
Copy link
Contributor

@nlevitt nlevitt commented Mar 17, 2017

I realize that the other fields are configurable and that offset is different because it's not a warc record header. I think it's reasonable to always include the offset in the output though.

By the way strictly speaking ArchiveIterator isn't actually an iterator, you have to do iter(ArchiveIterator(...)) to get the iterator. It looks like it would be a pain to refactor though.

@ikreymer
Copy link
Member

I was thinking of adding it as a special case field named offset that could be passed to the field list, eg -f offset,warc-type Are you opposed to just including it in the list explicitly? Maybe it should be in the default list.

Yes, you're right ArchiveIterator is not an iterator, but an iterable. https://docs.python.org/3.6/glossary.html

I did actually refactor it at one point, and it was ugly and realized it was unnecessary for the main semantics which is for record in ArchiveIterator(stream)
I guess calling iter() is fine if you really need to be able to call next() as well.

Not sure what's happening with the tests here?

@ikreymer
Copy link
Member

ikreymer commented Mar 17, 2017

For reference, here's what it looked like as an Iterator, implementing next directly:
https://github.com/ikreymer/pywb/blob/50354332614e0e03d27c714438366017d81ed5e0/pywb/warclib/archiveiterator.py

Well, i guess it wasn't so bad, but harder to follow :)

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0002%) to 99.896% when pulling ab31e07 on nlevitt:offset into 266cd56 on webrecorder:master.

3 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+0.0002%) to 99.896% when pulling ab31e07 on nlevitt:offset into 266cd56 on webrecorder:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0002%) to 99.896% when pulling ab31e07 on nlevitt:offset into 266cd56 on webrecorder:master.

@coveralls
Copy link

coveralls commented Mar 17, 2017

Coverage Status

Coverage increased (+0.0002%) to 99.896% when pulling ab31e07 on nlevitt:offset into 266cd56 on webrecorder:master.

@coveralls
Copy link

coveralls commented Mar 17, 2017

Coverage Status

Coverage decreased (-0.1%) to 99.792% when pulling f2331d7 on nlevitt:offset into 266cd56 on webrecorder:master.

@coveralls
Copy link

coveralls commented Mar 17, 2017

Coverage Status

Coverage increased (+0.0003%) to 99.896% when pulling 1f03d36 on nlevitt:offset into 266cd56 on webrecorder:master.

@nlevitt
Copy link
Contributor Author

nlevitt commented Mar 17, 2017

Better now? :)

I noticed the iterator thing while writing code to extract a single record by offset, which involved using next() directly. That's cool you also tried implementing it the other way. What about just renaming the class to ArchiveIterable? Anyhow it doesn't matter too much, I leave it to your discretion.

@ikreymer ikreymer merged commit 0fa527d into webrecorder:master Mar 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants