YUDL Web archiving
This is a collection of shell scripts to capture and preserve York University and Government of Canada websites using Heritrix with the Web ARChive (WARC) standard, wkhtmltopdf/image, and a descriptive metadata (MODS) record.
Setup the above requirements, clone the repository, and put the shell scripts in a path that cron can execute:
git clone https://github.com/yorkulibraries/yul-web-archiving.git ln -s /path/to/web/archiving/script /path/that/cron/can/execute
Add to cron. Please use an appropriate time. Don't want to blow up anybody's server.
0 3 * * * bash -c '/usr/local/bin/yulWA-yfile'