A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
-
Updated
Jul 17, 2024 - Python
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
A Python program to convert PDF to png image
Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
A Python library for exploring PDFs with ease.
A tool to sign PDF files. With Linux support.
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Parse files for optimal RAG
A Python library for reading and writing PDF, powered by QPDF
Prepare documents for distribution
ALexi, EXtracteur d'Information
some stuff for "HelpNDoc" (www.helpndoc.com) - the Help authoring Software for CHM, HTML, PDF, ... help projects
Python bindings to PDFium
Summarize text using ChatGPT or a local LLM, with support for multiple large text files, PDF files and translation.
Croatian Chess is a collection of various chess variants, starting as a simple and natural enhancement to classical chess and growing ever more complex with each new variant.
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
PDF generation made easy.
Add a description, image, and links to the pdf topic page so that developers can more easily learn about it.
To associate your repository with the pdf topic, visit your repo's landing page and select "manage topics."