pdf-to-text

The notebook in this repository uses pytesseract to extract text from a pdf document. The script can be used to automate text acquisition from a large body of printed resources such as books. The acquired text can then be used for dowstream tasks, such as training language models, topic models, document summarization etc

ocr pdf-to-text pytesseract

Updated Apr 30, 2022

dongju93 / extract-ti-from-reports

Star

Convert PDFs to text, then transform that text into structured JSON objects for Threat Intelligence.

python pdf json regex jupyter-notebook pdf-to-text threat-intelligence text-to-json

Updated Mar 24, 2024
Jupyter Notebook

Kamaruddheen / document-scanner

Star

Extract structured text and data from documents like invoices, book pages, tables, etc.. using OpenCV and Tesseract OCR

python opencv tesseract-ocr pdf-to-text image-to-text

Updated Mar 14, 2024
HTML

pashaq / PdfToText-Converter

Star

Converting the Pdf and Fb2 documents to text or to the list of articles.

pdf csharp lib pdf-to-text itext pdf2txt fb2-to-text

Updated Aug 23, 2020
C#

mfakca / pdf2text

Star

PDF'leri metne dönüştürür

pdf-converter pdf-to-text

Updated Oct 9, 2021
Roff

53buahapel / pdf-to-text-converter

Star

python script that i made to convert pdf to text

pdf pdf-converter pdf-to-text pdf-to-image

Updated Dec 6, 2023
Python

ajaycode / unstructured

Star

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

nlp pdf machine-learning natural-language-processing information-retrieval ocr deep-learning ml docx preprocessing pdf-to-text data-pipelines donut document-image-processing pdf-to-json document-ai document-image-analysis document-parsing langchain

Updated Mar 3, 2023
HTML

datalogics / apdfl-csharp-dotnet-framework-samples

Star

Sample code for the Datalogics .NET Framework interface of the Adobe PDF Library

pdf ocr pdf-converter pdf-document pdf-conversion pdf-generation pdf-to-text pdf-manipulation pdfa pdf-split pdf-merger pdf-parser pdf-to-image pdf-tools pdf-compression pdf-lib pdf-render ocr-pdf pdf-to-office

Updated Jun 5, 2024
C#

selectpdf / selectpdf-api-perl-client

Star

Perl client for SelectPdf Online REST API

html-to-pdf pdf-generator pdf-generation pdf-to-text pdf-merge pdf-generator-api html-to-pdf-converter search-pdf html-to-pdf-api

Updated Nov 17, 2021
Perl

aishwarya-art / Pdf-to-text-extract

Star

Pdf to text extraction using PDF parser library in codeigniter 3 sample code

extraction pdf-to-text codeigniter3 composer-library pdfparser samlot

Updated Oct 5, 2023
PHP

selectpdf / selectpdf-api-nodejs-client

Star

Node.js client for SelectPdf Online REST API

pdf pdf-converter html-to-pdf pdf-to-text pdf-merge html-to-pdf-converter html-to-pdf-api pdf-merge-api pdf-to-text-api

Updated Nov 23, 2021
JavaScript

amitbd1508 / Blind-EYE

Star

A book reader with voice control functionality for blind people

windows pdf csharp winforms voice-recognition pdf-to-text voice-assistant

Updated Jun 29, 2020
C#

bytescout / pdfco-rails

Star

PDF.co Gem plugin for Ruby on Rails

ruby rails api pdf parser api-wrapper pdf-files pdf-document pdf-generator pdf-generation pdf-to-text pdf-reader pdf-manipulation pdf-merge pdf-extractor pdf-document-processor

Updated Oct 21, 2020
Ruby

princebhatt9588 / Versatile_Code_Hub

Star

VersatileCodeHub: Your one-stop repository for an array of coding projects. Explore diverse applications, from games like Flappy Bird to tools like QRCode Scanners. Expand your skills across various domains, all in one place.

browser password-generator flappy-bird snake-game pdf-to-text movie-recommendation bitcoin-mining bmi-calculator otp-generator location-search voice-texting morsecode-translator retweet-bot subtitle-synchronization doc-to-pdf net-speed-checker url-to-qrcode wifi-password-generator

Updated Aug 19, 2023
Python

Improve this page

Add a description, image, and links to the pdf-to-text topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-to-text topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf-to-text

Here are 60 public repositories matching this topic...

orijtech / tikago

fabriziomiano / pdf2txt-azure-ocr

zevio / pcu_io

kanishk-mehta / PDFBox-get-Coordinates-of-text

gabriel-batistuta / pdf-to-any

Dheovani / PDFConverter

Directorman9 / Optical-character-recognition

dongju93 / extract-ti-from-reports

Kamaruddheen / document-scanner

pashaq / PdfToText-Converter

mfakca / pdf2text

53buahapel / pdf-to-text-converter

ajaycode / unstructured

datalogics / apdfl-csharp-dotnet-framework-samples

selectpdf / selectpdf-api-perl-client

aishwarya-art / Pdf-to-text-extract

selectpdf / selectpdf-api-nodejs-client

amitbd1508 / Blind-EYE

bytescout / pdfco-rails

princebhatt9588 / Versatile_Code_Hub

Improve this page

Add this topic to your repo