Everything to reproduce the CLEF HIPE 2020 campaign results.
-
Updated
Jul 10, 2020 - Python
Everything to reproduce the CLEF HIPE 2020 campaign results.
This repository contains code and sample data related to running the impresso corpus through the text reuse detection software passim.
Awesome historical newspaper analysis tools and literature
The GeoNewsMiner (GNM): An interactive spatial humanities tool to visualize geographical references in historical newspapers
Dataset from the paper "Information Extraction from Public Meeting Articles"
Convert ALTO XML to plain text + minimal metadata
The Hongkong News headline analysis project was conducted by the Chinese University of Hong Kong Library.
Tools for the use of Tesseract OCR in R
Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.
Repository of JSON schemas used in the Impresso project.
Add a description, image, and links to the historical-newspapers topic page so that developers can more easily learn about it.
To associate your repository with the historical-newspapers topic, visit your repo's landing page and select "manage topics."