#

web-archiving

Here are 112 public repositories matching this topic...

ArchiveBox

ArchiveBox / ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Updated Oct 31, 2024
Python

conifer

Rhizome-Conifer / conifer

Collect and revisit web pages.

python docker archives warc web-archiving wayback webrecorder pywb

Updated Nov 8, 2023
Python

webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives

python web-archiving wayback web-archives pywb

Updated Oct 31, 2024
JavaScript

webrecorder / archiveweb.page

A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!

extension archiving chromium browser-extension warc web-archiving webrecorder wacz

Updated Oct 30, 2024
TypeScript

webrecorder / replayweb.page

Serverless replay of web archives directly in the browser

service-worker warc web-archiving wayback-machine web-archive replay-web-page web-replay wacz

Updated Oct 29, 2024
TypeScript

webrecorder / browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

crawler web-crawler crawling warc web-archiving webrecorder wacz

Updated Oct 31, 2024
TypeScript

gildas-lormeau / single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

nodejs cli dockerfile crawler web-crawler archiving web-scraper web-scraping web-archiving scraping-websites single-file deno

Updated Oct 31, 2024
JavaScript

ipwb

oduwsdl / ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

python docker service-worker ipfs memento warc web-archiving wayback memento-rfc

Updated Oct 30, 2024
Python

bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

python docker service scraping archive web-archiving open-source-research

Updated Oct 6, 2024
Python

waybackpy

akamhy / waybackpy

Wayback Machine API interface & a command-line tool

osint internet-archive web-archiving wayback-machine webarchiving cdx-api internet-archiving savepagenow archive-webpage archive-webpages wayback-machine-api wayback-machine-python

Updated Feb 26, 2024
Python

webrecorder / webrecorder-player

Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)

electron warc web-archiving webrecorder pywb

Updated Sep 17, 2020
JavaScript

rahiel / archiveror

Archiveror will help you preserve the webpages you love. 💾

javascript chrome-extension bookmark archiving webextension firefox-extension browser-extension mhtml linkrot web-archiving

Updated Oct 18, 2019
JavaScript

harvard-lil / perma

Indelible links

libraries web-archiving

Updated Oct 31, 2024
JavaScript

oduwsdl / archivenow

A Tool To Push Web Resources Into Web Archives

internet-archive web-archiving

Updated Jan 23, 2024
Python

Florents-Tselai / WarcDB

WarcDB: Web crawl data as SQLite databases.

cli database sqlite crawling warc web-archiving web-data

Updated Jul 13, 2024
Python

webrecorder / warcio

Streaming WARC/ARC library for fast web archive IO

python warc web-archiving web-archives pywb

Updated Oct 28, 2024
Python

wail

machawk1 / wail

🐋 Web Archiving Integration Layer: One-Click User Instigated Preservation

python gui warc web-archiving pyinstaller wayback heritrix openwayback

Updated Oct 4, 2024
Roff

ArchiveBox / archivebox-browser-extension

Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

chrome-extension archiving svelte firefox-extension browser-extension web-archiving digital-preservation digipres internet-archiving archivebox

Updated Jul 12, 2024
TypeScript

warcreate

machawk1 / warcreate

Chrome extension to "Create WARC files from any webpage"

chrome-extension warc web-archiving

Updated Dec 6, 2023
JavaScript

webrecorder / browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

kubernetes cloud archiving warc web-archiving webrecorder web-archive wacz

Updated Oct 31, 2024
TypeScript

Improve this page

Add a description, image, and links to the web-archiving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-archiving topic, visit your repo's landing page and select "manage topics."