Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
-
Updated
May 3, 2024 - Python
Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
Pandora is an analysis framework to discover if a file is suspicious and conveniently show the results
AssemblyLine 4: File triage and malware analysis
RObust document image BINarization
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
Document Visual Question Answering
Post-process Amazon Textract results with Hugging Face transformer models for document understanding
Powerful web application that combines Streamlit, LangChain, and Pinecone to simplify document analysis. Powered by OpenAI's GPT-3, RAG enables dynamic, interactive document conversations, making it ideal for efficient document retrieval and summarization.
(ICFHR 2020 oral) Code for "docExtractor: An off-the-shelf historical document element extraction" paper
Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.
An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents. ICDAR, 2021"
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
Improving Document Binarization via Adversarial Noise-Texture Augmentation (ICIP 2019)
UTRNet: High-Resolution Urdu Text Recognition In Printed Documents (ICDAR'23)
Code for ICPR2022 paper: "Graph Neural Networks and Representation Embedding for table extraction in PDF Documents"
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
An open-source tool for visualisation of outputs of deep-learning models for document analysis tasks such as fully automatic, bounding box and OCR.
Official PyTorch implementation of PyramidTabNet: Transformer-based Table Recognition in Image-based Documents
[Late Submission] Solution for Kuzushiji recognition (Kaggle competition)
Add a description, image, and links to the document-analysis topic page so that developers can more easily learn about it.
To associate your repository with the document-analysis topic, visit your repo's landing page and select "manage topics."