Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QA Backend: Add support for Crawl QA jobs #1498

Closed
Tracked by #1435
tw4l opened this issue Jan 29, 2024 · 0 comments · Fixed by #1586
Closed
Tracked by #1435

QA Backend: Add support for Crawl QA jobs #1498

tw4l opened this issue Jan 29, 2024 · 0 comments · Fixed by #1586
Assignees
Labels
back end Requires back end dev work

Comments

@tw4l
Copy link
Contributor

tw4l commented Jan 29, 2024

Related to #1435

  • Add CrawlQAJob background job type
  • Add CrawlManager method to kick off job
  • Add API endpoints to start/stop job
@tw4l tw4l added the back end Requires back end dev work label Jan 29, 2024
@tw4l tw4l self-assigned this Jan 29, 2024
@tw4l tw4l mentioned this issue Jan 29, 2024
2 tasks
@tw4l tw4l changed the title Add backend support for Crawl QA jobs QA Backend: Add support for Crawl QA jobs Jan 30, 2024
ikreymer added a commit that referenced this issue Mar 21, 2024
Supports running QA Runs via the QA API!

Builds on top of the `issue-1498-crawl-qa-backend-support` branch, fixes
#1498

Also requires the latest Browsertrix Crawler 1.1.0+ (from
webrecorder/browsertrix-crawler#469 branch)

Notable changes:
- QARun objects contain info about QA runs, which are crawls
performed on data loaded from existing crawls.

- Various crawl db operations can be performed on either the crawl or
`qa.` object, and core crawl fields have been moved to CoreCrawlable.

- While running,`QARun` data stored in a single `qa` object, while
finished qa runs are added to `qaFinished` dictionary on the Crawl. The
QA list API returns data from the finished list, sorted by most recent
first.

- Includes additional type fixes / type safety, especially around
BaseCrawl / Crawl / UploadedCrawl functionality, also creating specific
get_upload(), get_basecrawl(), get_crawl() getters for internal use and
get_crawl_out() for API

- Support filtering and sorting pages via `qaFilterBy` (screenshotMatch, textMatch) 
along with `gt`, `lt`, `gte`, `lte` params to return pages based on QA results.

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
back end Requires back end dev work
Projects
Status: Done!
Development

Successfully merging a pull request may close this issue.

2 participants