A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
-
Updated
Oct 25, 2021 - Python
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
A side project to easily get and annotate questions and answers to the PsychometryBot project DB using computer vision and pdf parsing
Playing with pdf doc processing 🧾
The script is to remediate the page order of PDF scans by the home printers which are limited to one-sided scanning
Berrylit is a simple chatbot interface that allows users to upload a PDF file and ask a question related to its contents. The chatbot uses the Berri API for processing.
PDF merger and stamper (watermark) using python and PyPDF2 - an open source pure-python PDF library
Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.
CLI tool to merge, compress, extract or delete pages from PDF
PDFGuard is a user-friendly Python application that helps you enhance the security of PDF files by removing potential security threats and hidden content. It does this by converting PDF pages into images and then creating new, sanitized PDFs from these images.
Merge multiple PDF files into a single PDF with ease using this simple Python PDF Merger. 🚀
Extensive analysis of user guides in Swiss government-to-citizen software, correlating guide features with canton socio-economic factors.
Multiple and Large PDF Documents Text Extraction.
library supporting NLP and CV research on scientific papers
An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.
This is some useful mini projects that I had worked for self-learning Python programming.
Add a description, image, and links to the pdf-processing topic page so that developers can more easily learn about it.
To associate your repository with the pdf-processing topic, visit your repo's landing page and select "manage topics."