YUL Web archiving scripts
Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
LICENSE
README.md
yulWA
yulWA-calendars
yulWA-yfile

README.md

YUDL Web archiving

Description

This is a collection of shell scripts to capture and preserve York University and Government of Canada websites using Heritrix with the Web ARChive (WARC) standard, wkhtmltopdf/image, and a descriptive metadata (MODS) record.

Requirements

Installation

Setup the above requirements, clone the repository, and put the shell scripts in a path that cron can execute:

git clone https://github.com/yorkulibraries/yul-web-archiving.git
ln -s /path/to/web/archiving/script /path/that/cron/can/execute

Usage

Add to cron. Please use an appropriate time. Don't want to blow up anybody's server.

Ex:

0 3 * * * bash -c '/usr/local/bin/yulWA-yfile'

License

Public Domain

CC0

Thanks

Peter Binkley