Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for WACZ creation. #2

Closed
3 tasks
ikreymer opened this issue Dec 16, 2020 · 0 comments
Closed
3 tasks

Add support for WACZ creation. #2

ikreymer opened this issue Dec 16, 2020 · 0 comments
Assignees

Comments

@ikreymer
Copy link
Member

This can be a command-line flag, say --generateWACZ similar to the --generateCDX option, which will generate a WACZ file after the crawl is done. This will also require keeping track of the pages crawled in a list that can be passed into py-wacz

This would involve:

  • Adding a --generateWACZ command line option.
  • Generate a pages/pages.jsonl file in the collection directory. Will need to make the pages dir also.
  • Run py-wacz to create the WACZ at the end of the crawl. For now, can just regenerate the CDX during wacz creation. In the future, can use the existing index in redis to speed up the process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants