• Shepherding our web archives from crawl to access.

    Python 8 2 Updated Jan 17, 2017
  • UKWA

    Java 2 Updated Jan 11, 2017
  • Front-end for the UK Web Archive

    Updated Jan 9, 2017
  • Shell Updated Jan 9, 2017
  • Shell Updated Jan 9, 2017
  • w3act is an annotation and curation tool for building web archive collections

    Java 10 Updated Jan 4, 2017
  • A pulsating crawl engine built around Heritrix3.

    1 Updated Dec 30, 2016
  • Django app. for calling PhantomJs.

    Python 5 1 Updated Dec 19, 2016
  • WARC and ARC indexing and discovery tools.

    Java 35 11 Updated Dec 12, 2016
  • ClamD in a container

    Updated Dec 1, 2016
  • Dockerised Heritrix based on LBS stable builds.

    Shell Updated Nov 28, 2016
  • An OpenWayback in Docker.

    Updated Nov 28, 2016
  • Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

    Java 395 Updated Nov 28, 2016
  • Run the tinycdxserver in a Docker container

    Shell Updated Nov 27, 2016
  • A Wayback RemoteResourceIndex server using RocksDB

    Java 3 Updated Nov 26, 2016
  • Modules for Heritrix 3.

    Java 1 Updated Nov 22, 2016
  • Prototype SOLR-powered web archive exploration UI.

    JavaScript 28 4 Updated Nov 17, 2016
  • Repository of documentation about the open datasets published by the UK Web Archive.

    Python 5 4 Updated Nov 10, 2016
  • bamboo

    Forked from nla/bamboo

    Web archive collection manager

    Java 3 Updated Oct 12, 2016
  • Experiments in testable, scaleable crawler architectures

    PHP 3 2 Updated Oct 11, 2016
  • A web archive browser built on HBase

    Java 42 Updated Sep 30, 2016
  • Web Archiving Domain Crawl Analysis Scripts

    Jupyter Notebook 8 3 Updated Aug 12, 2016
  • A simple site that uses GitHub pages to host resources for testing crawlers.

    CSS 1 Updated Aug 11, 2016
  • An acid test suite for crawlers.

    PHP 3 1 Updated Aug 9, 2016
  • Run warcprox inside Docker

    Shell Updated Aug 8, 2016
  • EThIndex

    Ruby Updated Jul 27, 2016
  • GROBID (GeneRation Of BIbliographic Data) in a Docker container.

    Updated Jul 14, 2016
  • Hopefully off-setting some of the difficulties writing to WARCs (multiple open files, size limits, etc.).

    Python 1 Updated Jul 9, 2016
  • Brozzler in a Docker container

    Shell 2 Updated Jul 6, 2016
  • Tracking the fortunes of our archived URLs.

    Jupyter Notebook 2 Updated Jun 8, 2016