A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
-
Updated
Jul 21, 2024 - Python
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.
Python PDF parser for scientific publications: content and figures
Analyze PDFs. With colors. And Yara.
A python client for the Sypht API
Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfintech
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
PDF parsing toolkit for preparing academic text corpus
Batch-convert pdf to text, extract data from pdf in python
Scanipy stands for "scan it with Python"—it's your smart Python library for scanning and parsing complex PDF files like books, reports, articles, and academic papers. Utilizing cutting-edge Deep Learning algorithms, Scanipy transforms your PDFs into a treasure trove of extractable information: tables, images, equations, and text.
Investigation in PDF encryption
PDF-Parser and Apriori and Simplical Complex algorithm implementations
PDF Parser based on VirusTotal API
Parallel processing and parsing PDF and TXT files, and Python objects with text (str, list) using rules (regular expressions).
📜 parse your Caisse d'Épargne PDF statements to CSV!
Automation for admin tasks, client intake, allocation of available markets, and dissemination of submissions to insurance carriers.
Add a description, image, and links to the pdf-parser topic page so that developers can more easily learn about it.
To associate your repository with the pdf-parser topic, visit your repo's landing page and select "manage topics."