Skip to content

Commit

Permalink
Add MKDocs documentation site for Browsertrix Crawler 1.0.0 (#494)
Browse files Browse the repository at this point in the history
Fixes #493 

This PR updates the documentation for Browsertrix Crawler 1.0.0 and
moves it from the project README to an MKDocs site.

Initial docs site set to https://crawler.docs.browsertrix.com/

Many thanks to @Shrinks99 for help setting this up!

---------

Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
  • Loading branch information
3 people committed Mar 16, 2024
1 parent 6d04c95 commit e1fe028
Show file tree
Hide file tree
Showing 47 changed files with 1,238 additions and 795 deletions.
28 changes: 28 additions & 0 deletions .github/workflows/docs-publish.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: docs-publish
on:
push:
branches:
- main
paths:
- 'docs/**'

permissions:
contents: write

jobs:
deploy_docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: 3.x

- name: build docker image (for getting cli)
run: docker-compose build

- name: generate cli
run: docs/gen-cli.sh

- run: pip install mkdocs-material
- run: cd docs/ && mkdocs gh-deploy --force
1 change: 0 additions & 1 deletion .husky/pre-commit
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
#!/usr/bin/env sh
. "$(dirname -- "$0")/_/husky.sh"

yarn lint:fix
1 change: 1 addition & 0 deletions CNAME
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
crawler.docs.browsertrix.com
796 changes: 6 additions & 790 deletions README.md

Large diffs are not rendered by default.

11 changes: 11 additions & 0 deletions docs/docs/assets/brand/browsertrix-crawler-icon-color-dynamic.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions docs/docs/assets/brand/browsertrix-crawler-white.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/assets/fonts/Inter-Italic.var.woff2
Binary file not shown.
Binary file added docs/docs/assets/fonts/Inter.var.woff2
Binary file not shown.
Binary file added docs/docs/assets/fonts/Recursive_VF_1.084.woff2
Binary file not shown.
23 changes: 23 additions & 0 deletions docs/docs/develop/docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Documentation

This documentation is built with the [Mkdocs](https://www.mkdocs.org/) static site generator.

## Docs Setup

Python is required to build the docs, then run:

pip install mkdocs-material


## Docs Server

To start the docs server, simply run:

mkdocs serve

The documentation will then be available on `http://localhost:8000/`

The command-line options are rebuilt using the `docs/gen-cli.sh` script.

Refer to the [Mkdocs](https://www.mkdocs.org/) and [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/) pages
for more info about the documentation.
39 changes: 39 additions & 0 deletions docs/docs/develop/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Development

## Usage with Docker Compose

Many examples in User Guide demonstrate running Browsertrix Crawler with `docker run`.

Docker Compose is recommended for building the image and for simple configurations. A simple Docker Compose configuration file is included in the Git repository.

To build the latest image, run:

```sh
docker-compose build
```

Docker Compose also simplifies some config options, such as mounting the volume for the crawls.

The following command starts a crawl with 2 workers and generates the CDX:

```sh
docker-compose run crawler crawl --url https://webrecorder.net/ --generateCDX --collection wr-net --workers 2
```

In this example, the crawl data is written to `./crawls/collections/wr-net` by default.

While the crawl is running, the status of the crawl prints the progress to the JSON-L log output. This can be disabled by using the `--logging` option and not including `stats`.

## Multi-Platform Build / Support for Apple Silicon

Browsertrix Crawler uses a browser image which supports amd64 and arm64.

This means Browsertrix Crawler can be built natively on Apple Silicon systems using the default settings. Running `docker-compose build` on an Apple Silicon should build a native version that should work for development.

## Modifying Browser Image

It is also possible to build Browsertrix Crawler with a different browser image. Currently, browser images using Brave Browser and Chrome/Chromium (depending on host system chip architecture) are supported via [browsertrix-browser-base](https://github.com/webrecorder/browsertrix-browser-base), however, only Brave Browser receives regular version updates from us.

The browser base image used is specified and can be changed at the top of the Dockerfile in the Browsertrix Crawler repo.

Custom browser images can be used by forking [browsertrix-browser-base](https://github.com/webrecorder/browsertrix-browser-base), locally building or publishing an image, and then modifying the Dockerfile in this repo to build from that image.
40 changes: 40 additions & 0 deletions docs/docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
hide:
- navigation
- toc
---

# Home

Welcome to the Browsertrix Crawler official documentation.

Browsertrix Crawler is a simplified browser-based high-fidelity crawling system, designed to run a complex, customizable browser-based crawl in a single Docker container. Browsertrix Crawler uses [Puppeteer](https://github.com/puppeteer/puppeteer) to control one or more [Brave Browser](https://brave.com/) browser windows in parallel. Data is captured through the [Chrome Devtools Protocol (CDP)](https://chromedevtools.github.io/devtools-protocol/) in the browser.


!!! note

This documentation applies to Browsertrix Crawler versions 1.0.0 and above. Documentation for earlier versions of the crawler is available in the [Browsertrix Crawler Github repository](https://github.com/webrecorder/browsertrix-crawler)'s README file in older commits.


## Features


- Single-container, browser based crawling with a headless/headful browser running pages in multiple windows.
- Support for custom browser behaviors, using [Browsertrix Behaviors](https://github.com/webrecorder/browsertrix-behaviors) including autoscroll, video autoplay, and site-specific behaviors.
- YAML-based configuration, passed via file or via stdin.
- Seed lists and per-seed scoping rules.
- URL blocking rules to block capture of specific URLs (including by iframe URL and/or by iframe contents).
- Screencasting: Ability to watch crawling in real-time.
- Screenshotting: Ability to take thumbnails, full page screenshots, and/or screenshots of the initial page view.
- Optimized (non-browser) capture of non-HTML resources.
- Extensible Puppeteer driver script for customizing behavior per crawl or page.
- Ability to create and reuse browser profiles interactively or via automated user/password login using an embedded browser.
- Multi-platform support — prebuilt Docker images available for Intel/AMD and Apple Silicon (M1/M2) CPUs.

## Documentation

If something is missing, unclear, or seems incorrect, please open an [issue](https://github.com/webrecorder/browsertrix-crawler/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc) and we'll try to make sure that your questions get answered here in the future!

## Code

Browsertrix Crawler is free and open source software, with all code available in the [main repository on Github](https://github.com/webrecorder/browsertrix-crawler).
4 changes: 4 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/bug-fill.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/chat-left-text-fill.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/check-circle-fill.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/check-circle.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/dash-circle.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/exclamation-triangle.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/eye.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/github.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/globe.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/info-circle-fill.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/mastodon.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/mortarboard-fill.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/pencil-fill.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/pencil.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/question-circle-fill.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/quote.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/docs/overrides/.icons/bootstrap/x-octagon-fill.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit e1fe028

Please sign in to comment.