-
-
Notifications
You must be signed in to change notification settings - Fork 71
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add MKDocs documentation site for Browsertrix Crawler 1.0.0 (#494)
Fixes #493 This PR updates the documentation for Browsertrix Crawler 1.0.0 and moves it from the project README to an MKDocs site. Initial docs site set to https://crawler.docs.browsertrix.com/ Many thanks to @Shrinks99 for help setting this up! --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
- Loading branch information
1 parent
6d04c95
commit e1fe028
Showing
47 changed files
with
1,238 additions
and
795 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
name: docs-publish | ||
on: | ||
push: | ||
branches: | ||
- main | ||
paths: | ||
- 'docs/**' | ||
|
||
permissions: | ||
contents: write | ||
|
||
jobs: | ||
deploy_docs: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v3 | ||
- uses: actions/setup-python@v4 | ||
with: | ||
python-version: 3.x | ||
|
||
- name: build docker image (for getting cli) | ||
run: docker-compose build | ||
|
||
- name: generate cli | ||
run: docs/gen-cli.sh | ||
|
||
- run: pip install mkdocs-material | ||
- run: cd docs/ && mkdocs gh-deploy --force |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,3 @@ | ||
#!/usr/bin/env sh | ||
. "$(dirname -- "$0")/_/husky.sh" | ||
|
||
yarn lint:fix |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
crawler.docs.browsertrix.com |
11 changes: 11 additions & 0 deletions
11
docs/docs/assets/brand/browsertrix-crawler-icon-color-dynamic.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Documentation | ||
|
||
This documentation is built with the [Mkdocs](https://www.mkdocs.org/) static site generator. | ||
|
||
## Docs Setup | ||
|
||
Python is required to build the docs, then run: | ||
|
||
pip install mkdocs-material | ||
|
||
|
||
## Docs Server | ||
|
||
To start the docs server, simply run: | ||
|
||
mkdocs serve | ||
|
||
The documentation will then be available on `http://localhost:8000/` | ||
|
||
The command-line options are rebuilt using the `docs/gen-cli.sh` script. | ||
|
||
Refer to the [Mkdocs](https://www.mkdocs.org/) and [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/) pages | ||
for more info about the documentation. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Development | ||
|
||
## Usage with Docker Compose | ||
|
||
Many examples in User Guide demonstrate running Browsertrix Crawler with `docker run`. | ||
|
||
Docker Compose is recommended for building the image and for simple configurations. A simple Docker Compose configuration file is included in the Git repository. | ||
|
||
To build the latest image, run: | ||
|
||
```sh | ||
docker-compose build | ||
``` | ||
|
||
Docker Compose also simplifies some config options, such as mounting the volume for the crawls. | ||
|
||
The following command starts a crawl with 2 workers and generates the CDX: | ||
|
||
```sh | ||
docker-compose run crawler crawl --url https://webrecorder.net/ --generateCDX --collection wr-net --workers 2 | ||
``` | ||
|
||
In this example, the crawl data is written to `./crawls/collections/wr-net` by default. | ||
|
||
While the crawl is running, the status of the crawl prints the progress to the JSON-L log output. This can be disabled by using the `--logging` option and not including `stats`. | ||
|
||
## Multi-Platform Build / Support for Apple Silicon | ||
|
||
Browsertrix Crawler uses a browser image which supports amd64 and arm64. | ||
|
||
This means Browsertrix Crawler can be built natively on Apple Silicon systems using the default settings. Running `docker-compose build` on an Apple Silicon should build a native version that should work for development. | ||
|
||
## Modifying Browser Image | ||
|
||
It is also possible to build Browsertrix Crawler with a different browser image. Currently, browser images using Brave Browser and Chrome/Chromium (depending on host system chip architecture) are supported via [browsertrix-browser-base](https://github.com/webrecorder/browsertrix-browser-base), however, only Brave Browser receives regular version updates from us. | ||
|
||
The browser base image used is specified and can be changed at the top of the Dockerfile in the Browsertrix Crawler repo. | ||
|
||
Custom browser images can be used by forking [browsertrix-browser-base](https://github.com/webrecorder/browsertrix-browser-base), locally building or publishing an image, and then modifying the Dockerfile in this repo to build from that image. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
--- | ||
hide: | ||
- navigation | ||
- toc | ||
--- | ||
|
||
# Home | ||
|
||
Welcome to the Browsertrix Crawler official documentation. | ||
|
||
Browsertrix Crawler is a simplified browser-based high-fidelity crawling system, designed to run a complex, customizable browser-based crawl in a single Docker container. Browsertrix Crawler uses [Puppeteer](https://github.com/puppeteer/puppeteer) to control one or more [Brave Browser](https://brave.com/) browser windows in parallel. Data is captured through the [Chrome Devtools Protocol (CDP)](https://chromedevtools.github.io/devtools-protocol/) in the browser. | ||
|
||
|
||
!!! note | ||
|
||
This documentation applies to Browsertrix Crawler versions 1.0.0 and above. Documentation for earlier versions of the crawler is available in the [Browsertrix Crawler Github repository](https://github.com/webrecorder/browsertrix-crawler)'s README file in older commits. | ||
|
||
|
||
## Features | ||
|
||
|
||
- Single-container, browser based crawling with a headless/headful browser running pages in multiple windows. | ||
- Support for custom browser behaviors, using [Browsertrix Behaviors](https://github.com/webrecorder/browsertrix-behaviors) including autoscroll, video autoplay, and site-specific behaviors. | ||
- YAML-based configuration, passed via file or via stdin. | ||
- Seed lists and per-seed scoping rules. | ||
- URL blocking rules to block capture of specific URLs (including by iframe URL and/or by iframe contents). | ||
- Screencasting: Ability to watch crawling in real-time. | ||
- Screenshotting: Ability to take thumbnails, full page screenshots, and/or screenshots of the initial page view. | ||
- Optimized (non-browser) capture of non-HTML resources. | ||
- Extensible Puppeteer driver script for customizing behavior per crawl or page. | ||
- Ability to create and reuse browser profiles interactively or via automated user/password login using an embedded browser. | ||
- Multi-platform support — prebuilt Docker images available for Intel/AMD and Apple Silicon (M1/M2) CPUs. | ||
|
||
## Documentation | ||
|
||
If something is missing, unclear, or seems incorrect, please open an [issue](https://github.com/webrecorder/browsertrix-crawler/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc) and we'll try to make sure that your questions get answered here in the future! | ||
|
||
## Code | ||
|
||
Browsertrix Crawler is free and open source software, with all code available in the [main repository on Github](https://github.com/webrecorder/browsertrix-crawler). |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions
3
docs/docs/overrides/.icons/bootstrap/exclamation-circle-fill.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions
3
docs/docs/overrides/.icons/bootstrap/exclamation-diamond-fill.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions
3
docs/docs/overrides/.icons/bootstrap/exclamation-triangle-fill.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions
4
docs/docs/overrides/.icons/bootstrap/exclamation-triangle.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions
3
docs/docs/overrides/.icons/bootstrap/file-earmark-text-fill.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions
3
docs/docs/overrides/.icons/bootstrap/question-circle-fill.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.