pagexml
Here are 14 public repositories matching this topic...
Binarize, normalize, segment images and train models.
-
Updated
Jun 12, 2024 - Python
This repo provides a collection of ground truth data. The collection was compiled under different aspects (complexity of the layouts and use of the fonts). The individual data are also characterized by metadata. The metadata is based on the labeling scheme of OCR-D/PrimaLab.
-
Updated
Apr 5, 2023
Toolset for Tesseract training with PageXML Ground-Truth
-
Updated
Apr 20, 2024 - Python
This module provides access to Transkribus PageXML files via Xquery functions. It is designed to be used in context of a Basex xml database, but should work with other xml databases as well.
-
Updated
May 9, 2022 - XQuery
LECTAUREP Pipeline demonstration to TEI Publisher
-
Updated
Mar 29, 2022 - Jupyter Notebook
Library in C++ and a python wrapper for dealing with Page XML files
-
Updated
Jan 19, 2024 - C++
Some bits of javascript to transcribe scanned pages using PageXML
-
Updated
Mar 18, 2024 - HTML
A template for creating a ground truth repo with the various functions and features: such as metadata creation, data analysis and presentation.
-
Updated
Jun 11, 2024
Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format
-
Updated
Apr 16, 2024 - C++
Simple app for visual editing of Page XML files
-
Updated
Jan 12, 2024 - JavaScript
Improve this page
Add a description, image, and links to the pagexml topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pagexml topic, visit your repo's landing page and select "manage topics."