A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
-
Updated
Aug 28, 2024 - Python
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
A Unified Toolkit for Deep Learning Based Document Image Analysis
OCR engine for all the languages
A toolbox of ocr models and algorithms based on MindSpore
Analysis of Chinese and English layouts 中英文版面分析
An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
[ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)
YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.
OCR-D compliant toolset for optical layout recognition on historical german-language documents published in Brazil
A more complete example of programming with PDFMiner, which continues where the default documentation stops
PdfDet aims to simplify PDF layout detect tasks for users.
针对文档类图像,整合版面分析、文字识别、表格识别和公式识别结果,还原版面布局信息。
A powerful CLI tool for visualization and encoding of PAGE-XML files
A python package to structure files using visual and style informations
OCR-D wrapper for page-xml-draw
BA-thesis in history.
Add a description, image, and links to the layout-analysis topic page so that developers can more easily learn about it.
To associate your repository with the layout-analysis topic, visit your repo's landing page and select "manage topics."