From 5712945991d4476c058634e77b58315a00899d7b Mon Sep 17 00:00:00 2001 From: Tessa Walsh Date: Mon, 15 Apr 2024 10:22:39 -0400 Subject: [PATCH] Update usage docs section on creating web archives (#899) Co-authored-by: Henry Wilkinson --- docs/manual/usage.rst | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/docs/manual/usage.rst b/docs/manual/usage.rst index b66e9067..01aa8ed0 100644 --- a/docs/manual/usage.rst +++ b/docs/manual/usage.rst @@ -154,20 +154,20 @@ To enable auto-indexing, run with ``wayback -a`` or ``wayback -a --auto-interval Creating a Web Archive ---------------------- -Using Webrecorder -^^^^^^^^^^^^^^^^^ +Using ArchiveWeb.page +^^^^^^^^^^^^^^^^^^^^^ -If you do not have a web archive to test, one easy way to create one is to use `Webrecorder `_ +If you do not have a web archive to test, one easy way to create one is to use the `ArchiveWeb.page `_ browser extension for Chrome and other Chromium-based browsers such as Brave Browser. ArchiveWeb.page records pages visited during an archiving session in the browser, and provides means of both replaying and downloading the archived items created. -After recording, you can click **Stop** and then click `Download Collection` to receive a WARC (`.warc.gz`) file. +Follow the instructions in `How To Create Web Archives with ArchiveWeb.page `_. After recording, press **Stop** and then `download your collection `_ to receive a WARC (`.warc.gz`) file. If you choose to download your collection in the WACZ format, the WARC files can be found inside the zipped WACZ in the ``archive/`` directory. -You can then use this with work with pywb. +You can then use your WARCs to work with pywb. Using pywb Recorder ^^^^^^^^^^^^^^^^^^^ -The core recording functionality in Webrecorder is also part of :mod:`pywb`. If you want to create a WARC locally, this can be +Recording functionality is also part of :mod:`pywb`. If you want to create a WARC locally, this can be done by directly recording into your pywb collection: 1. Create a collection: ``wb-manager init my-web-archive`` (if you haven't already created a web archive collection) @@ -180,6 +180,14 @@ In this configuration, the indexing happens every 10 seconds.. After 10 seconds, ``http://localhost:8080/my-web-archive/http://example.com/`` +Using Browsertrix +^^^^^^^^^^^^^^^^^ + +For a more automated browser-based web archiving experience, `Browsertrix `_ provides a web interface for configuring, scheduling, running, reviewing, and curating crawls of web content. Crawl activity is shown in a live screencast of the browsers used for crawling and all web archives created in Browsertrix can be easily downloaded from the application in the WACZ format. + +`Browsertrix Crawler `_, which provides the underlying crawling functionality of Browsertrix, can also be run standalone in a Docker container on your local computer. + + HTTP/S Proxy Mode Access ------------------------