Grow your team on GitHub
GitHub is home to over 28 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.Sign up
A Python wrapper around the Heritrix API.
The UKWA Heritrix3 custom modules and Docker builder.
Dashboard and monitoring system for the UK Web Archive
Shepherding our web archives from crawl to access.
Serves our WARC files for playback, wherever they may lie.
The dockerized ensemble of services that run most of the UKWA crawl and ingest processes.
Trifecta docker image
w3act is an annotation and curation tool for building web archive collections
A new user interface for the UK Web Archive
Repository of documentation about the open datasets published by the UK Web Archive.
WARC and ARC indexing and discovery tools.
A RESTful API for rendering web pages in PhantomJS
A simple web service for viewing crawl logs.
This module builds our Waybacks in the various different configurations we require.
Apache Hadoop HttpFS for cdh3
Public documentation about the technical architecture of the UK Web Archive
The dockerized ensemble of services that provide main user access to UKWA material.
Yet Another Docker Container for Apache Zeppelin
A containerised Dat server for experimental dataset hosting.
Internal UKWA website nginx service
Hadoop running in a container, with HttpFS enabled.
A web archive browser built on HBase
Utilities for working with WARC files stored on HDFS.
Core Python Web Archiving Toolkit for replay and recording of web archives
Web Archiving Domain Crawl Analysis Scripts
Luigi tasks for running Hadoop jobs and managing material held on HDFS
Run the tinycdxserver in a Docker container
A Wayback RemoteResourceIndex server using RocksDB