pdf-extractor

This is a Python application that converts non-readable PDF files, such as scanned documents, into readable Word documents. It achieves this by first converting the PDF files into images and then extracting the text from the images to create the Word documents. The application provides a user-friendly interface to do the above task.

pdf-to-text pdf-extractor scanned-pdf-documents text-extraction-tool

Updated Jun 8, 2024
Python

bytescout / pdfco-rails

Star

PDF.co Gem plugin for Ruby on Rails

ruby rails api pdf parser api-wrapper pdf-files pdf-document pdf-generator pdf-generation pdf-to-text pdf-reader pdf-manipulation pdf-merge pdf-extractor pdf-document-processor

Updated Oct 21, 2020
Ruby

pdftables / go-pdftables-api

Star

Go example of using the PDFTables.com API

pdf pdf-converter pdf-conversion pdf-to-excel pdftables pdf-extractor pdftables-api

Updated Dec 6, 2023
Go

jonix6 / minepdf

Star

Pure-Python PDF extraction tool based on PDFMiner

python pdf pdf-extractor pdfminer

Updated Jan 28, 2021
Python

serkodev / camelot-docker

Star

Docker setup of Camelot: PDF Table Extraction

docker pdf csv pdf-converter pdf-extractor camelot

Updated May 31, 2022
Dockerfile

renan-siqueira / python-pdf-tool

Star

This project facilitates the extraction of text from PDF files using various Python libraries. It is designed to be flexible, allowing the choice among different text extraction libraries and supporting both single PDF file and directory containing multiple PDF files.

python pdf mit-license pdf-to-text pypdf2 pdf-extractor pdfminer pymupdf pdfplumber

Updated Nov 18, 2023
Python

ktxo / pdf-extractor-demo

Star

POC - Data extraction from PDFs invoices

data-science extractor pdf-extractor

Updated Dec 16, 2021

PeterMosmans / apdfhelper

Star

Fix links in PDF files, rewrite links, extract text annotations, remove pages

pdf planner calendar annotations pdf-converter pdf-extractor pdf-parser

Updated Jan 4, 2024
Python

Th3Brock / PDF-tabla-extractor

Star

🚜PDF_Table_Extractor🚜 simple script en 🐍python3🐍 el script😋Extrae las tablas de un PDF🖥 es muy funcional😎 se los recomiendo😈puede ser usado en 🥴windows🥴 🐧linux🐧 y 🍎mac🍎

pdf script python3 pdf-extractor table-extraction

Updated Sep 5, 2020
Python

Improve this page

Add a description, image, and links to the pdf-extractor topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-extractor topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf-extractor

Here are 55 public repositories matching this topic...

torakiki / pdfsam

UglyToad / PdfPig

GowenGit / docnet

pdftables / python-pdftables-api

asepmaulanaismail / pdf-to-txt-python

Siltaar / doc_crawler.py

bytescout / pdf-extractor-sdk-samples

Madgrades / madgrades-extractor

Hymian7 / PDFtkSharp

gimpscape / gimpscape-ppa

talrand / DocnetExtended

arjun-mavonic / scanned-pdf-text-extractor

bytescout / pdfco-rails

pdftables / go-pdftables-api

jonix6 / minepdf

serkodev / camelot-docker

renan-siqueira / python-pdf-tool

ktxo / pdf-extractor-demo

PeterMosmans / apdfhelper

Th3Brock / PDF-tabla-extractor

Improve this page

Add this topic to your repo