🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
-
Updated
Sep 17, 2024 - Python
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Collect and revisit web pages.
InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
Automatically archive links to videos, images, and social media content from Google Sheets (and more).
Wayback Machine API interface & a command-line tool
A Tool To Push Web Resources Into Web Archives
Streaming WARC/ARC library for fast web archive IO
A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine
Social Feed Manager user interface application.
Perpetual Access To The Scholarly Record
🗄 Save an archived copy of websites from Pocket/Pinboard/Bookmarks/RSS. Outputs HTML, PDFs, and more...
A PDF classifier ensemble with REST API service
Support for writing WARC files with Scrapy
Home of the official apt/deb package for Ubuntu/Debian-based systems.
Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.
ArchiveBoxMatic: configure ArchiveBox with the simplicity of a yaml file.
Archive a list of URLs using the Wayback Machine
Send records from an EPrints server to the Internet Archive and other web archives
Add a description, image, and links to the web-archiving topic page so that developers can more easily learn about it.
To associate your repository with the web-archiving topic, visit your repo's landing page and select "manage topics."