Skip to content
@bottomless-archive-project

Bottomless Archive Project

A project about archiving anything that's available digitally.

Pinned Loading

  1. library-of-alexandria Public

    Library of Alexandria (LoA in short) is a project that aims to collect and archive documents from the internet.

    Java 114 1

  2. url-collector Public

    An application that crawls the Common Crawl corpus for URLs with the specified file extensions.

    Java

  3. file-collector Public

    Java

  4. document-location-database Public

  5. java-warc Public

    Forked from laxika/java-warc

    Read Web ARChive (WARC) files in Java.

    Java 5

  6. common-crawl-client Public

    This library is a very lightweight client to Common Crawl's WARC files.

    Java

Repositories

Showing 7 of 7 repositories
  • library-of-alexandria Public

    Library of Alexandria (LoA in short) is a project that aims to collect and archive documents from the internet.

    Java 114 MIT 1 14 2 Updated Jul 5, 2024
  • library-of-alexandria.github.io Public

    The official website of the Library of Alexandria project.

    HTML 1 0 0 1 Updated May 16, 2024
  • Java 0 MIT 0 4 0 Updated Nov 14, 2021
  • 0 MIT 0 0 0 Updated Oct 18, 2021
  • url-collector Public

    An application that crawls the Common Crawl corpus for URLs with the specified file extensions.

    Java 0 MIT 0 2 0 Updated Oct 15, 2021
  • java-warc Public Forked from laxika/java-warc

    Read Web ARChive (WARC) files in Java.

    Java 5 Apache-2.0 6 0 0 Updated Sep 12, 2021
  • common-crawl-client Public

    This library is a very lightweight client to Common Crawl's WARC files.

    Java 0 0 0 0 Updated Jan 16, 2020

Top languages

Loading…

Most used topics

Loading…