This repository contains code and sample data related to running the impresso corpus through the text reuse detection software passim.
-
Updated
May 23, 2024 - Jupyter Notebook
This repository contains code and sample data related to running the impresso corpus through the text reuse detection software passim.
The Hongkong News headline analysis project was conducted by the Chinese University of Hong Kong Library.
Awesome historical newspaper analysis tools and literature
Everything to reproduce the CLEF HIPE 2020 campaign results.
Dataset from the paper "Information Extraction from Public Meeting Articles"
Convert ALTO XML to plain text + minimal metadata
The GeoNewsMiner (GNM): An interactive spatial humanities tool to visualize geographical references in historical newspapers
Repository of JSON schemas used in the Impresso project.
Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.
Tools for the use of Tesseract OCR in R
Add a description, image, and links to the historical-newspapers topic page so that developers can more easily learn about it.
To associate your repository with the historical-newspapers topic, visit your repo's landing page and select "manage topics."