web-archiving

Here are 37 public repositories matching this topic...

ArchiveBox / ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Updated Sep 17, 2024
Python

Rhizome-Conifer / conifer

Star

Collect and revisit web pages.

python docker archives warc web-archiving wayback webrecorder pywb

Updated Nov 8, 2023
Python

oduwsdl / ipwb

Star

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

python docker service-worker ipfs memento warc web-archiving wayback memento-rfc

Updated Jul 12, 2024
Python

bellingcat / auto-archiver

Star

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

python docker service scraping archive web-archiving open-source-research

Updated Aug 21, 2024
Python

akamhy / waybackpy

Star

Wayback Machine API interface & a command-line tool

osint internet-archive web-archiving wayback-machine webarchiving cdx-api internet-archiving savepagenow archive-webpage archive-webpages wayback-machine-api wayback-machine-python

Updated Feb 26, 2024
Python

oduwsdl / archivenow

Star

A Tool To Push Web Resources Into Web Archives

internet-archive web-archiving

Updated Jan 23, 2024
Python

Florents-Tselai / WarcDB

Sponsor

Star

WarcDB: Web crawl data as SQLite databases.

cli database sqlite crawling warc web-archiving web-data

Updated Jul 13, 2024
Python

webrecorder / warcio

Sponsor

Star

Streaming WARC/ARC library for fast web archive IO

python warc web-archiving web-archives pywb

Updated Aug 31, 2024
Python

cocrawler / cdx_toolkit

Star

A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine

python warc web-archiving cdx web-archives commoncrawl cdx-api

Updated Sep 9, 2024
Python

gwu-libraries / sfm-ui

Star

Social Feed Manager user interface application.

social-media web-archiving code4lib social-feed-manager

Updated Jun 25, 2024
Python

internetarchive / fatcat

Star

Perpetual Access To The Scholarly Record

python rust postgresql scholarly-communication open-access web-archiving digital-library

Updated Jul 31, 2024
Python

TarekJor / bookmark-archiver

Star

🗄 Save an archived copy of websites from Pocket/Pinboard/Bookmarks/RSS. Outputs HTML, PDFs, and more...

Updated Aug 12, 2018
Python

internetarchive / pdf_trio

Star

A PDF classifier ensemble with REST API service

pdf tensorflow scholarly-communication web-archiving digital-library

Updated Mar 5, 2021
Python

webrecorder / cdxj-indexer

Sponsor

Star

CDXJ Indexing of WARC/ARCs

warc web-archiving

Updated May 22, 2024
Python

internetarchive / scrapy-warcio

Star

Support for writing WARC files with Scrapy

python scrapy warc web-archiving

Updated Dec 21, 2019
Python

ArchiveBox / debian-archivebox

Sponsor

Star

Home of the official apt/deb package for Ubuntu/Debian-based systems.

package debian apt ubuntu web-archiving aptitude digipres internet-archiving archivebox stdeb

Updated May 20, 2024
Python

ArchiveBox / archivebox-proxy

Sponsor

Star

Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.

proxy https-proxy web-archiving web-proxy digital-preservation mitmproxy digipres internet-archiving archivebox

Updated Jul 12, 2024
Python

dbeley / archiveboxmatic

Star

ArchiveBoxMatic: configure ArchiveBox with the simplicity of a yaml file.

archiving web-archiving archivebox

Updated Mar 10, 2021
Python

rybesh / capture-urls

Star

Archive a list of URLs using the Wayback Machine

web-archiving wayback-machine save-page-now

Updated Aug 27, 2024
Python

caltechlibrary / eprints2archives

Star

Send records from an EPrints server to the Internet Archive and other web archives

python terminal archiving internet-archive memento web-archiving preservation web-archives eprints

Updated May 15, 2023
Python

Improve this page

Add a description, image, and links to the web-archiving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-archiving topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web-archiving

Here are 37 public repositories matching this topic...

ArchiveBox / ArchiveBox

Rhizome-Conifer / conifer

oduwsdl / ipwb

bellingcat / auto-archiver

akamhy / waybackpy

oduwsdl / archivenow

Florents-Tselai / WarcDB

webrecorder / warcio

cocrawler / cdx_toolkit

gwu-libraries / sfm-ui

internetarchive / fatcat

TarekJor / bookmark-archiver

internetarchive / pdf_trio

webrecorder / cdxj-indexer

internetarchive / scrapy-warcio

ArchiveBox / debian-archivebox

ArchiveBox / archivebox-proxy

dbeley / archiveboxmatic

rybesh / capture-urls

caltechlibrary / eprints2archives

Improve this page

Add this topic to your repo