Pinned repositories

  1. webarchive-discovery

    WARC and ARC indexing and discovery tools.

    Java 52 14

  2. ukwa-ingest-services

    The dockerized ensemble of services that run most of the UKWA crawl and ingest processes.

    Shell 1

  3. ukwa-access-services

    The dockerized ensemble of services that provide main user access to UKWA material.

    Shell

  4. ukwa-manage

    Shepherding our web archives from crawl to access.

    Python 8 2

  5. ukwa-monitor

    Dashboard and monitoring system for the UK Web Archive

    Python

  6. awesome-web-archiving

    Forked from iipc/awesome-web-archiving

    An Awesome List for getting started with web archiving

    1

  • A Python wrapper around the Heritrix API.

    Python 2 Apache-2.0 Updated Jun 22, 2018
  • The UKWA Heritrix3 custom modules and Docker builder.

    Java 1 1 Updated Jun 22, 2018
  • Dashboard and monitoring system for the UK Web Archive

    Python Updated Jun 22, 2018
  • Shepherding our web archives from crawl to access.

    Python 8 2 Apache-2.0 Updated Jun 21, 2018
  • Serves our WARC files for playback, wherever they may lie.

    Python Updated Jun 21, 2018
  • The dockerized ensemble of services that run most of the UKWA crawl and ingest processes.

    Shell 1 Apache-2.0 Updated Jun 20, 2018
  • Trifecta docker image

    Shell 5 Apache-2.0 Updated Jun 18, 2018
  • w3act is an annotation and curation tool for building web archive collections

    Java 14 2 Apache-2.0 Updated Jun 16, 2018
  • A new user interface for the UK Web Archive

    JavaScript 2 Updated Jun 12, 2018
  • Repository of documentation about the open datasets published by the UK Web Archive.

    HTML 9 4 Updated Jun 7, 2018
  • WARC and ARC indexing and discovery tools.

    Java 52 14 Updated May 29, 2018
  • A RESTful API for rendering web pages in PhantomJS

    Python 6 1 Updated May 23, 2018
  • A simple web service for viewing crawl logs.

    Python Apache-2.0 Updated May 22, 2018
  • Generating Reports

    HTML Updated May 17, 2018
  • This module builds our Waybacks in the various different configurations we require.

    Java 1 1 Updated May 17, 2018
  • Apache Hadoop HttpFS for cdh3

    Java 14 Updated May 17, 2018
  • Public documentation about the technical architecture of the UK Web Archive

    2 Apache-2.0 Updated May 10, 2018
  • The dockerized ensemble of services that provide main user access to UKWA material.

    Shell AGPL-3.0 Updated May 4, 2018
  • Yet Another Docker Container for Apache Zeppelin

    Shell 5 Updated May 4, 2018
  • A containerised Dat server for experimental dataset hosting.

    Updated May 4, 2018
  • Internal UKWA website nginx service

    Updated Apr 27, 2018
  • Hadoop running in a container, with HttpFS enabled.

    Shell Updated Mar 26, 2018
  • A web archive browser built on HBase

    Java 49 Updated Mar 16, 2018
  • Utilities for working with WARC files stored on HDFS.

    Java Updated Mar 5, 2018
  • Python 6 GPL-3.0 Updated Mar 4, 2018
  • pywb

    Forked from webrecorder/pywb

    Core Python Web Archiving Toolkit for replay and recording of web archives

    Python 57 GPL-3.0 Updated Mar 3, 2018
  • Web Archiving Domain Crawl Analysis Scripts

    Jupyter Notebook 7 3 Updated Feb 23, 2018
  • Luigi tasks for running Hadoop jobs and managing material held on HDFS

    Python Apache-2.0 Updated Feb 22, 2018
  • Run the tinycdxserver in a Docker container

    Shell Apache-2.0 Updated Feb 1, 2018
  • A Wayback RemoteResourceIndex server using RocksDB

    Java 3 Apache-2.0 Updated Feb 1, 2018