Skip to content

Commit

Permalink
Update usage docs section on creating web archives (#899)
Browse files Browse the repository at this point in the history
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
  • Loading branch information
tw4l and Shrinks99 committed Apr 15, 2024
1 parent 2fd6190 commit 5712945
Showing 1 changed file with 14 additions and 6 deletions.
20 changes: 14 additions & 6 deletions docs/manual/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,20 +154,20 @@ To enable auto-indexing, run with ``wayback -a`` or ``wayback -a --auto-interval
Creating a Web Archive
----------------------

Using Webrecorder
^^^^^^^^^^^^^^^^^
Using ArchiveWeb.page
^^^^^^^^^^^^^^^^^^^^^

If you do not have a web archive to test, one easy way to create one is to use `Webrecorder <https://webrecorder.io>`_
If you do not have a web archive to test, one easy way to create one is to use the `ArchiveWeb.page <https://archiveweb.page>`_ browser extension for Chrome and other Chromium-based browsers such as Brave Browser. ArchiveWeb.page records pages visited during an archiving session in the browser, and provides means of both replaying and downloading the archived items created.

After recording, you can click **Stop** and then click `Download Collection` to receive a WARC (`.warc.gz`) file.
Follow the instructions in `How To Create Web Archives with ArchiveWeb.page <https://archiveweb.page/en/usage/>`_. After recording, press **Stop** and then `download your collection <https://archiveweb.page/en/download/>`_ to receive a WARC (`.warc.gz`) file. If you choose to download your collection in the WACZ format, the WARC files can be found inside the zipped WACZ in the ``archive/`` directory.

You can then use this with work with pywb.
You can then use your WARCs to work with pywb.


Using pywb Recorder
^^^^^^^^^^^^^^^^^^^

The core recording functionality in Webrecorder is also part of :mod:`pywb`. If you want to create a WARC locally, this can be
Recording functionality is also part of :mod:`pywb`. If you want to create a WARC locally, this can be
done by directly recording into your pywb collection:

1. Create a collection: ``wb-manager init my-web-archive`` (if you haven't already created a web archive collection)
Expand All @@ -180,6 +180,14 @@ In this configuration, the indexing happens every 10 seconds.. After 10 seconds,
``http://localhost:8080/my-web-archive/http://example.com/``


Using Browsertrix
^^^^^^^^^^^^^^^^^

For a more automated browser-based web archiving experience, `Browsertrix <https://browsertrix.com/>`_ provides a web interface for configuring, scheduling, running, reviewing, and curating crawls of web content. Crawl activity is shown in a live screencast of the browsers used for crawling and all web archives created in Browsertrix can be easily downloaded from the application in the WACZ format.

`Browsertrix Crawler <https://crawler.docs.browsertrix.com/>`_, which provides the underlying crawling functionality of Browsertrix, can also be run standalone in a Docker container on your local computer.


HTTP/S Proxy Mode Access
------------------------

Expand Down

0 comments on commit 5712945

Please sign in to comment.