pdf-processing

Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.

text-extraction similarity-search pdf-processing vector-embeddings chromadb

Updated Oct 23, 2023
Python

ManasMadan / pdf-actions

Star

A NPM Package built on top of pdf-lib that provides functonalities like merge, rotate, split,download pdf to disk and many more...

react javascript pdf npm reactjs react-component pdf-merge pdf-split pdf-rotate pdf-merger pdf-downloader pdf-lib pdf-splitter pdf-processing pdf-download pdf-free pdf-online

Updated Oct 31, 2023
JavaScript

ManasMadan / PDFActions

Star

Built with pdf-actions NPM package.

react pdf reactjs react-component react-components pdf-merge pdf-split pdf-rotate pdf-merger pdf-downloader pdf-lib pdf-splitter pdf-processing pdf-download

Updated May 27, 2024
JavaScript

dsckiet / covid-tracker-android-app

Star

A statistical data display and notifier app for Covid-19 pandemic.

statistics mvvm dagger2 pdf-processing

Updated May 15, 2022
Kotlin

thinhuos0913 / python_useful_mini_projects

Star

This is some useful mini projects that I had worked for self-learning Python programming.

python opencv ocr image-processing pdf-processing

Updated May 20, 2024
Python

Hadrien-Cornier / ocr-pdf

Star

OCR PDF Survey Response Extractor: Automate Paper Survey Analysis

automation ocr data-extraction questionnaire survey-analysis pdf-processing

Updated Sep 9, 2024
Python

clydeknox / PDFGuard-Secure-and-Sanitize-PDFs-with-Python

Star

PDFGuard is a user-friendly Python application that helps you enhance the security of PDF files by removing potential security threats and hidden content. It does this by converting PDF pages into images and then creating new, sanitized PDFs from these images.

python pdf gui pdf-conversion pdf-tools pdf-processing file-security pdf-security pdf-sanitization document-security

Updated Nov 7, 2023
Python

jdwh08 / Autodoc-Lifter

Star

Local RAG for PDFs

pdf rag pdf-processing llm llamaindex

Updated Sep 5, 2024
Python

tyrus-yuen / Mini-Project

Star

The script is to remediate the page order of PDF scans by the home printers which are limited to one-sided scanning

os pypdf pdf-processing

Updated Jan 16, 2023
Python

akshatpunia26 / berrylit_pdf_chat

Star

Berrylit is a simple chatbot interface that allows users to upload a PDF file and ask a question related to its contents. The chatbot uses the Berri API for processing.

python api natural-language-processing chatbot pdf-processing streamlit

Updated Jun 26, 2023
Python

ashainp / Combine-PDF

Star

A Python script to combine multiple PDFs, allowing the insertion of one PDF before the last page of another. Flexible for adding additional documents. Perfect for document management tasks.

python beginner-project simple-project pdf-processing pdf-combiner

Updated Aug 29, 2024
Python

UntaintedTech / pdf-processing

Star

PDF merger and stamper (watermark) using python and PyPDF2 - an open source pure-python PDF library

python pdf pypdf2 pdf-manipulation pdf-merger pdf-document-processor pdf-watermark pdf-processing

Updated Jul 29, 2023
Python

hyuseinleshov / ocr-exporter-api

Star

A Spring Boot-based OCR Exporter tool that extracts text from image or PDF files using the OCR Space API and exports the results to various formats such as PDF, text, Word, or a database.

ocr spring-boot text-extraction file-processing word-export pdf-processing text-export ocr-space-api

Updated Oct 17, 2024
Java

Mateusz2734 / pdf-cli

Star

CLI tool to merge, compress, extract or delete pages from PDF

python cli pdf pdf-processing pdf-tool

Updated Oct 28, 2023
Python

king04aman / PDF-Extractor-API

Star

PDF Extractor API is a FastAPI project for extracting information from PDFs. It includes user authentication, PDF uploading, and text extraction. The API supports secure PDF uploads, keyword-based extraction, and rate limiting.

python docker-compose sap rate-limiting python3 jwt-token jwt-authentication doker jwt-auth invoice-pdf invoice-management api-security pdf-processor fastapi pdf-processing

Updated Sep 19, 2024
Python

ranguy9304 / LangGraphRAG

Star

LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.

python natural-language-processing information-retrieval chatbot web-scraping nlp-machine-learning rag terminal-application pdf-processing vector-database openai-api langgraph

Updated Jul 13, 2024
Python

Improve this page

Add a description, image, and links to the pdf-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-processing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf-processing

Here are 29 public repositories matching this topic...

dissorial / doc-chatbot

ahmedkhemiri95 / PDFs-TextExtract

allenai / papermage

aws-samples / document-processing-pipeline-for-regulated-industries

Govind-S-B / pdf-to-text-chroma-search

ManasMadan / pdf-actions

ManasMadan / PDFActions

dsckiet / covid-tracker-android-app

thinhuos0913 / python_useful_mini_projects

Hadrien-Cornier / ocr-pdf

clydeknox / PDFGuard-Secure-and-Sanitize-PDFs-with-Python

jdwh08 / Autodoc-Lifter

tyrus-yuen / Mini-Project

akshatpunia26 / berrylit_pdf_chat

ashainp / Combine-PDF

UntaintedTech / pdf-processing

hyuseinleshov / ocr-exporter-api

Mateusz2734 / pdf-cli

king04aman / PDF-Extractor-API

ranguy9304 / LangGraphRAG

Improve this page

Add this topic to your repo