Releases: webrecorder/browsertrix-crawler
Releases · webrecorder/browsertrix-crawler
Browsertrix Crawler v1.11.1
What's Changed
Full Changelog: v1.11.0...v1.11.1
Browsertrix Crawler v1.11.0
What's Changed
- Update Puppeteer mobile device descriptor URL by @emma-sg in #947
- Replace fetch() with optimized undici request() by @ikreymer in #946
- deps: update brave + bump to 1.11.0 by @ikreymer in #948
- Replace minio client with aws client-s3 + lib-storage for multi-part upload by @ikreymer in #943
- add getFileOrUrlAsJson for loading local/remote JSON, don't use blob for local files by @ikreymer in #949
Full Changelog: v1.10.3...v1.11.0
Browsertrix Crawler v1.10.3
Browsertrix Crawler v1.10.2
What's Changed
Full Changelog: v1.10.1...v1.10.2
Browsertrix Crawler v1.10.1
What's Changed
- better handling of net::ERR_HTTP_RESPONSE_CODE_FAILURE: by @ikreymer in #934
- sort query args before queuing URLs by @ikreymer in #935
- Don't remove excluded-on-redirect URLs from seen list by @ikreymer in #936
- Sitemaps: parse /sitemap.xml if no sitemap listed in robots.txt by @ikreymer in #933
Full Changelog: v1.10.0...v1.10.1
Browsertrix Crawler v1.10.0
What's Changed
- improvements to support pausing: by @ikreymer in #919
- Fix typo 'runInIframes' by @HexagonWin in #918
- Add option to respect robots.txt disallows by @tw4l in #888
- Add downloads dir to cache external dependency within the crawl by @ikreymer in #921
- deps: update to browsertrix-behaviors 0.9.7, puppeteer-core 24.31.0 by @ikreymer in #922
- fix connection leaks in aborted fetch() requests by @ikreymer in #924
- crash page on prompt dialog loop to continue: by @ikreymer in #929
- sitemapper refactor to fix concurrency: by @ikreymer in #930
- Rename robots flag to --useRobots, keep --robots as alias by @tw4l in #932
New Contributors
- @HexagonWin made their first contribution in #918
Full Changelog: v1.9.3...v1.10.0
Browsertrix Crawler v1.10.0-beta.2
Browsertrix Crawler v1.10.0-beta.1
What's Changed
Full Changelog: v1.10.0-beta.0...v1.10.0-beta.1
Browsertrix Crawler v1.9.3
What's Changed
Full Changelog: v1.9.2...v1.9.3
Browsertrix Crawler v1.10.0-beta.0
New Features
- Robots.txt support for blocking pages
- Store downloads (profile, behaviors, seed list) in crawl directory, resulting in potentially smoother restarts/retries.
What's Changed
- improvements to support pausing: by @ikreymer in #919
- Fix typo 'runInIframes' by @HexagonWin in #918
- Add option to respect robots.txt disallows by @tw4l in #888
- Add downloads dir to cache external dependency within the crawl by @ikreymer in #921
- deps: update to browsertrix-behaviors 0.9.7, puppeteer-core 24.31.0 by @ikreymer in #922
New Contributors
- @HexagonWin made their first contribution in #918
Full Changelog: v1.9.2...v1.10.0-beta.0