A web application for PDF content and table extraction, featuring image-based visual layout analysis, indexed document search, batch processing and extraction result annotation.
-
Updated
Jun 26, 2024 - C++
A web application for PDF content and table extraction, featuring image-based visual layout analysis, indexed document search, batch processing and extraction result annotation.
A Python + C implementation for image-based PDF page layout analysis and content extraction.
A carefully-designed OCR pipeline for universal boarded table recognition and reconstruction.
Add a description, image, and links to the table-extraction topic page so that developers can more easily learn about it.
To associate your repository with the table-extraction topic, visit your repo's landing page and select "manage topics."