#

pdftotext

Here are 62 public repositories matching this topic...

zetahernandez / pdf-to-text

Read pdf files on javascript

javascript pdf extract-text pdftotext text-pdf

Updated Mar 11, 2020
JavaScript

lu4p / cat

Extract text from plaintext, .docx, .odt and .rtf files. Pure go.

cat go golang cross-platform text-extraction extract-text pdftotext docx2txt textextracting rtf-to-text pdf2txt odt2txt

Updated Nov 25, 2023
Go

iron-software / Iron-OCR-Image-to-Text-in-CSharp

Image to Text Tutorial in C# - See https://ironsoftware.com/csharp/ocr/tutorials/how-to-read-text-from-an-image-in-csharp-net/

ocr csharp csharp-code pdftotext imagetotext

Updated Nov 16, 2018
C#

ashutoshvarma / pyxpdf

Fast and memory-efficient Python PDF Parser based on xpdf sources

python pdf cython pdf-converter pdftotext pdf-parser xpdf pdfparser pdftohtml xpdf-reader pdftopng

Updated Dec 15, 2023
Cython

tadas-s / heroku-buildpack-pdftotext

Heroku buildpack for poppler pdftotext utility

heroku heroku-buildpack poppler pdftotext

Updated Aug 31, 2017
Shell

shine-jayakumar / Extract-Data-From-PDF-In-Python

Batch-convert pdf to text, extract data from pdf in python

Updated Sep 29, 2021
Python

deardurham / ciprs-reader

Python library for reading CIPRS PDFs

python docker pdf coverage pytest codeforamerica pdftotext

Updated Oct 25, 2023
Jupyter Notebook

amenezes / aiopytesseract

A Python asyncio wrapper for Tesseract-OCR.

ocr tesseract text-extraction asyncio tesseract-ocr optical-character-recognition pdftotext pytesseract pytesseract-ocr

Updated Oct 25, 2024
Python

DataKind-BLR / covid19bharat_scrapers

All scrapers for covid19

scraper dashboard image-processing open-data india pdftotext open-datasets covid-19 covid19

Updated Feb 8, 2023
Python

yedhink / covid19-kerala-api-deprecated

Deprecated - A fast API service for retrieving day to day stats about Coronavirus(COVID-19, SARS-CoV-2) outbreak in Kerala(India).

api gin pdftotext coronavirus coronavirus-tracking coronavirus-real-time covid19 covid-data covid19-data covid-19-india covid19india covid19kerala covid19dataindia covid19datakerala

Updated Jun 2, 2023
Go

icaropires / pdf2dataset

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features

python pdf distributed-systems data-science ocr pandas-dataframe parallel distributed-computing tesseract python3 tesseract-ocr parquet ray pdftotext pytesseract pdf2image pyarrow pytesseract-ocr

Updated Sep 20, 2020
Python

Anish-M-code / pdftotext

A simple pdftotext conversion tool for Windows 8.1/10/11 and FEDORA/UBUNTU/DEBIAN/ARCH based linux distros using poppler-utils and Google's tesseract-ocr.

pdf ocr tesseract-ocr pdf-documents hacktoberfest pdftotext ocr-recognition ocr-text-reader ocr-python pdftools hacktoberfest-accepted poppler-utils hacktoberfest2022

Updated Oct 27, 2024
Python

andrealenzi11 / py-poppleract

Python library and Web service based on Poppler Pdftotext utility and Tesseract OCR for extracting text from PDF documents

ocr tesseract text-extraction tesseract-ocr pdf-to-text poppler optical-character-recognition pdf-reader pdftotext pdf2text pdf-splitting poppleract py-poppleract

Updated Oct 18, 2024
Python

raul23 / convert-to-txt

Convert documents (pdf, djvu, epub, word) to txt

python macos pdf converter convert epub djvu calibre textutil pdftotext txt msword djvulibre catdoc ebook-convert

Updated May 5, 2024
Python

ExceptedPrism3 / PDFToAudio

"PDF To Audio" is a Python tool that transforms PDF documents into audio files using OCR and Text-to-Speech technology. Ideal for accessibility and auditory learning, it supports multiple languages, parallel processing, and smart rate limit handling.

python pdf pdf-converter pdf-to-text pdftotext pdf-to-audio pdf-to-audiobook pdftoaudiobooks

Updated Jan 4, 2024
Python

ChanMo / docker-poppler

A simple RESTFul API service for poppler

poppler pdftotext pdftohtml pdftocairo pdftoppm

Updated Sep 19, 2023
Python

avinxxsh / realDataOCR

Simple code to convert pdf/s to image files and use Tesseract OCR on these image files to extract text from them. This code focuses on extracting Batch No. from pharmacy bills using RegEx. None of the actual pdfs and files could be added as all data used was real life/sensitive data.

python ocr regex tesseract-ocr pattern-recognition pdftotext pytesseract pdftoimage pytesseract-ocr

Updated Jul 5, 2022
Python

jefferis / paperutils

R package with utility functions to support preparation of journal articles

r bibtex lyx pdftotext pdftk

Updated Sep 18, 2019
R

boettner / pdf2sandwich-pdf

Convert scanned pdf into text embedded pdf.

pdf tesseract scan-tool pdftotext paperless

Updated Jan 27, 2020
Shell

bakame-php / pdftotext

extracting texts from a pdf made easy

php pdf extraction text-processing pdftotext

Updated Oct 21, 2019
PHP

Improve this page

Add a description, image, and links to the pdftotext topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdftotext topic, visit your repo's landing page and select "manage topics."