Skip to content

Releases: webrecorder/browsertrix-crawler

Browsertrix Crawler v1.11.1

19 Jan 21:31
581a703

Choose a tag to compare

What's Changed

Full Changelog: v1.11.0...v1.11.1

Browsertrix Crawler v1.11.0

12 Jan 18:48
3ce09e6

Choose a tag to compare

What's Changed

  • Update Puppeteer mobile device descriptor URL by @emma-sg in #947
  • Replace fetch() with optimized undici request() by @ikreymer in #946
  • deps: update brave + bump to 1.11.0 by @ikreymer in #948
  • Replace minio client with aws client-s3 + lib-storage for multi-part upload by @ikreymer in #943
  • add getFileOrUrlAsJson for loading local/remote JSON, don't use blob for local files by @ikreymer in #949

Full Changelog: v1.10.3...v1.11.0

Browsertrix Crawler v1.10.3

31 Dec 20:24
d3932f9

Choose a tag to compare

What's Changed

  • Fix custom behavior class example in docs by @tw4l in #940
  • follow-up to #915, add --allow-brave-component-update flag by @ikreymer in #942
  • set ulimit before launching x11vnc to work around x11vnc bug by @ikreymer in #945

Full Changelog: v1.10.2...v1.10.3

Browsertrix Crawler v1.10.2

16 Dec 01:05
e320908

Choose a tag to compare

What's Changed

  • don't fail crawl if profile can not be saved by @ikreymer in #939

Full Changelog: v1.10.1...v1.10.2

Browsertrix Crawler v1.10.1

11 Dec 18:39
df26169

Choose a tag to compare

What's Changed

  • better handling of net::ERR_HTTP_RESPONSE_CODE_FAILURE: by @ikreymer in #934
  • sort query args before queuing URLs by @ikreymer in #935
  • Don't remove excluded-on-redirect URLs from seen list by @ikreymer in #936
  • Sitemaps: parse /sitemap.xml if no sitemap listed in robots.txt by @ikreymer in #933

Full Changelog: v1.10.0...v1.10.1

Browsertrix Crawler v1.10.0

03 Dec 22:58

Choose a tag to compare

What's Changed

  • improvements to support pausing: by @ikreymer in #919
  • Fix typo 'runInIframes' by @HexagonWin in #918
  • Add option to respect robots.txt disallows by @tw4l in #888
  • Add downloads dir to cache external dependency within the crawl by @ikreymer in #921
  • deps: update to browsertrix-behaviors 0.9.7, puppeteer-core 24.31.0 by @ikreymer in #922
  • fix connection leaks in aborted fetch() requests by @ikreymer in #924
  • crash page on prompt dialog loop to continue: by @ikreymer in #929
  • sitemapper refactor to fix concurrency: by @ikreymer in #930
  • Rename robots flag to --useRobots, keep --robots as alias by @tw4l in #932

New Contributors

Full Changelog: v1.9.3...v1.10.0

Browsertrix Crawler v1.10.0-beta.2

03 Dec 01:18

Choose a tag to compare

Pre-release

What's Changed

  • crash page on prompt dialog loop to continue: by @ikreymer in #929
  • sitemapper refactor to fix concurrency: by @ikreymer in #930
  • Rename robots flag to --useRobots, keep --robots as alias by @tw4l in #932

Full Changelog: v1.10.0-beta.1...v1.10.0-beta.2

Browsertrix Crawler v1.10.0-beta.1

28 Nov 08:42

Choose a tag to compare

Pre-release

What's Changed

  • fix connection leaks in aborted fetch() requests by @ikreymer in #924

Full Changelog: v1.10.0-beta.0...v1.10.0-beta.1

Browsertrix Crawler v1.9.3

28 Nov 05:01
5bb4527

Choose a tag to compare

What's Changed

Full Changelog: v1.9.2...v1.9.3

Browsertrix Crawler v1.10.0-beta.0

27 Nov 04:14
8658df3

Choose a tag to compare

Pre-release

New Features

  • Robots.txt support for blocking pages
  • Store downloads (profile, behaviors, seed list) in crawl directory, resulting in potentially smoother restarts/retries.

What's Changed

  • improvements to support pausing: by @ikreymer in #919
  • Fix typo 'runInIframes' by @HexagonWin in #918
  • Add option to respect robots.txt disallows by @tw4l in #888
  • Add downloads dir to cache external dependency within the crawl by @ikreymer in #921
  • deps: update to browsertrix-behaviors 0.9.7, puppeteer-core 24.31.0 by @ikreymer in #922

New Contributors

Full Changelog: v1.9.2...v1.10.0-beta.0