Pinned repositories

  1. webarchive-discovery

    WARC and ARC indexing and discovery tools.

    Java 52 15

  2. ukwa-ingest-services

    The dockerized ensemble of services that run most of the UKWA crawl and ingest processes.

    Shell 1

  3. ukwa-access-services

    The dockerized ensemble of services that provide main user access to UKWA material.

    Shell

  4. ukwa-manage

    Shepherding our web archives from crawl to access.

    Python 8 2

  5. ukwa-monitor

    Dashboard and monitoring system for the UK Web Archive

    Python

  6. awesome-web-archiving

    Forked from iipc/awesome-web-archiving

    An Awesome List for getting started with web archiving

    1

  • A new user interface for the UK Web Archive

    JavaScript 2 Updated Aug 20, 2018
  • The UKWA Heritrix3 custom modules and Docker builder.

    Java 1 1 Updated Jul 27, 2018
  • Python 7 GPL-3.0 Updated Jul 25, 2018
  • Shepherding our web archives from crawl to access.

    Python 8 2 Apache-2.0 Updated Jul 25, 2018
  • The dockerized ensemble of services that run most of the UKWA crawl and ingest processes.

    Shell 1 Apache-2.0 Updated Jul 25, 2018
  • pywb

    Forked from webrecorder/pywb

    Core Python Web Archiving Toolkit for replay and recording of web archives

    Python 62 GPL-3.0 Updated Jul 25, 2018
  • Groovy 2 Updated Jul 19, 2018
  • Dashboard and monitoring system for the UK Web Archive

    Python Updated Jul 10, 2018
  • Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

    Java 570 Updated Jul 6, 2018
  • w3act is an annotation and curation tool for building web archive collections

    Java 15 2 Apache-2.0 Updated Jul 4, 2018
  • Serves our WARC files for playback, wherever they may lie.

    Python Updated Jul 2, 2018
  • This module builds our Waybacks in the various different configurations we require.

    Java 1 1 Updated Jun 30, 2018
  • A RESTful API for rendering web pages in PhantomJS

    Python 6 1 Updated Jun 25, 2018
  • A Python wrapper around the Heritrix API.

    Python 2 Apache-2.0 Updated Jun 24, 2018
  • Trifecta docker image

    Shell 5 Apache-2.0 Updated Jun 18, 2018
  • Repository of documentation about the open datasets published by the UK Web Archive.

    HTML 9 4 Updated Jun 7, 2018
  • WARC and ARC indexing and discovery tools.

    Java 52 15 Updated May 29, 2018
  • A simple web service for viewing crawl logs.

    Python Apache-2.0 Updated May 22, 2018
  • Generating Reports

    HTML Updated May 17, 2018
  • Apache Hadoop HttpFS for cdh3

    Java 14 Updated May 17, 2018
  • Public documentation about the technical architecture of the UK Web Archive

    2 Apache-2.0 Updated May 10, 2018
  • The dockerized ensemble of services that provide main user access to UKWA material.

    Shell AGPL-3.0 Updated May 4, 2018
  • Yet Another Docker Container for Apache Zeppelin

    Shell 5 Updated May 4, 2018
  • A containerised Dat server for experimental dataset hosting.

    1 Updated May 4, 2018
  • Internal UKWA website nginx service

    Updated Apr 27, 2018
  • Hadoop running in a container, with HttpFS enabled.

    Shell Updated Mar 26, 2018
  • A web archive browser built on HBase

    Java 48 Updated Mar 16, 2018
  • Utilities for working with WARC files stored on HDFS.

    Java Updated Mar 5, 2018
  • Web Archiving Domain Crawl Analysis Scripts

    Jupyter Notebook 7 3 Updated Feb 23, 2018
  • Luigi tasks for running Hadoop jobs and managing material held on HDFS

    Python Apache-2.0 Updated Feb 22, 2018