Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pages from QA Configmap #1671

Merged
merged 3 commits into from Apr 12, 2024
Merged

Remove pages from QA Configmap #1671

merged 3 commits into from Apr 12, 2024

Conversation

ikreymer
Copy link
Member

Fixes #1670

No longer need to pass pages to the ConfigMap. The ConfigMap has a size limit and will fail if there are too many pages.

Note: a previous workaround would keep this, but would check the size of the ConfigMap and add pages if size was <200K. Could still do that, but this seems like a cleaner fix going forward.

With this change, the page list for QA will be read directly from the WACZ files pages.jsonl / extraPages.jsonl entries.

…e (256K)

was originally an optimization/work around to reading pages from WACZ pages.jsonl
however, the size limit makes this a bit less useful. ideally, the pages
can be read from WACZ(s) w/o having to store all the pages in the configmap
@ikreymer ikreymer requested a review from tw4l April 11, 2024 20:43
@tw4l
Copy link
Contributor

tw4l commented Apr 11, 2024

Wonder if there's a way to make people aware that latest crawler will get best results (i.e. that some pages may be missing from pages files in older versions of crawler)? Maybe since this is still very much in dev it's a non-issue?

@ikreymer
Copy link
Member Author

Wonder if there's a way to make people aware that latest crawler will get best results (i.e. that some pages may be missing from pages files in older versions of crawler)? Maybe since this is still very much in dev it's a non-issue?

Yep, we will switch to 1.1.0 crawler when we launch QA, and it will be intended crawls going forward, other 1.0.x crawls will be best effort, and 0.x crawls will be manual review only.

@ikreymer ikreymer merged commit f243d34 into main Apr 12, 2024
3 of 4 checks passed
@ikreymer ikreymer deleted the remove-pages-from-qa-config branch April 12, 2024 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Change]: Don't set pages on QA configmap (remove workaround)
2 participants