This system evaluates a series of mementos (archived web pages) to determine which are off topic. The series can be part of an Archive-It collection, a single TimeMap, or stored in a WARC file.
-
Updated
Nov 7, 2017 - Python
This system evaluates a series of mementos (archived web pages) to determine which are off topic. The series can be part of an Archive-It collection, a single TimeMap, or stored in a WARC file.
Data for testing the Offtopic detection software
Command-line program to download videos from YouTube.com and other video sites
Download images from Pixiv and more!
🗄 Save an archived copy of websites from Pocket/Pinboard/Bookmarks/RSS. Outputs HTML, PDFs, and more...
Download pictures (or videos) along with their captions and other metadata from Instagram.
Support for writing WARC files with Scrapy
Quick script using Bing Web Search API to retrieve list of URLs for web archiving
A PDF classifier ensemble with REST API service
ArchiveBoxMatic: configure ArchiveBox with the simplicity of a yaml file.
Repository for collecting scripts to help capture MyConvento newsroom press-releases from the MyConvento PR management suite. The README provides an analysis of the MyConvento URL architecture for users hoping to develop a solution for themselves.
Send records from an EPrints server to the Internet Archive and other web archives
Social Feed Manager user interface application.
Python Implementation for iipc/webarchive-commons
Collect and revisit web pages.
Perpetual Access To The Scholarly Record
Add a description, image, and links to the web-archiving topic page so that developers can more easily learn about it.
To associate your repository with the web-archiving topic, visit your repo's landing page and select "manage topics."