Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.
-
Updated
Jul 21, 2023 - TypeScript
Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.
Multiple and Large PDF Documents Text Extraction.
library supporting NLP and CV research on scientific papers
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.
A NPM Package built on top of pdf-lib that provides functonalities like merge, rotate, split,download pdf to disk and many more...
Built with pdf-actions NPM package.
A statistical data display and notifier app for Covid-19 pandemic.
This is some useful mini projects that I had worked for self-learning Python programming.
OCR PDF Survey Response Extractor: Automate Paper Survey Analysis
PDFGuard is a user-friendly Python application that helps you enhance the security of PDF files by removing potential security threats and hidden content. It does this by converting PDF pages into images and then creating new, sanitized PDFs from these images.
The script is to remediate the page order of PDF scans by the home printers which are limited to one-sided scanning
Berrylit is a simple chatbot interface that allows users to upload a PDF file and ask a question related to its contents. The chatbot uses the Berri API for processing.
A Python script to combine multiple PDFs, allowing the insertion of one PDF before the last page of another. Flexible for adding additional documents. Perfect for document management tasks.
PDF merger and stamper (watermark) using python and PyPDF2 - an open source pure-python PDF library
A Spring Boot-based OCR Exporter tool that extracts text from image or PDF files using the OCR Space API and exports the results to various formats such as PDF, text, Word, or a database.
CLI tool to merge, compress, extract or delete pages from PDF
PDF Extractor API is a FastAPI project for extracting information from PDFs. It includes user authentication, PDF uploading, and text extraction. The API supports secure PDF uploads, keyword-based extraction, and rate limiting.
LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.
Add a description, image, and links to the pdf-processing topic page so that developers can more easily learn about it.
To associate your repository with the pdf-processing topic, visit your repo's landing page and select "manage topics."