text-extraction

OCR with Tesseract and OpenCV: Extract text from images effortlessly. Preprocess with OpenCV for accuracy. Display results and save output. Easy integration for document digitization and data entry automation.

python opencv machine-learning automation ocr image-processing tesseract text-extraction document-digitization data-entry-automation

Updated May 13, 2024
Python

miso-belica / jusText

Sponsor

Star

Heuristic based boilerplate removal tool

python text-extraction html-parser html-parsing

Updated May 9, 2024
Python

dataiku / dss-plugin-tesseract-ocr

Star

Dataiku DSS plugin to perform optical character recognition (OCR) using the Tesseract engine.

ocr tesseract text-extraction tesseract-ocr optical-character-recognition dataiku dss-plugin

Updated Apr 18, 2024
Python

chrismattmann / tika-python

Sponsor

Star

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Updated Apr 14, 2024
Python

zanachka / dateparser

Star

python parser for human readable dates

text-extraction html-extraction

Updated Apr 12, 2024
Python

rmottanet / unchainedtext

Star

UnchainedText: Break free from PDFs! Easily extract raw text to .txt for preprocessing.

extractor text-extraction data-extraction text-processing pdf-text-extraction text-extraction-tool

Updated Apr 2, 2024
Python

weareprestatech / hotpdf

Star

hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.six

python pdf text-extraction text-search

Updated Mar 26, 2024
Python

cdown / srt

Star

A simple library and set of tools for parsing, modifying, and composing SRT files.

python library tools command-line text-extraction subtitles subtitle srt subtitles-parsing mit-license command-line-tool subtitle-parser subtitle-fixer

Updated Mar 19, 2024
Python

Lanjkn / Text-Extractor

Star

Api to get text from multiple types of files

api text-extraction file-processing

Updated Mar 14, 2024
Python

mciccale / ScholarVista

Star

ScholarVista analyses research papers and extracts/plots information about them. It uses Grobid to extract all the content of the research papers. Then all this data is plotted and displayed using Python.

machine-learning python3 text-extraction keyword-extraction keyword-cloud

Updated Mar 6, 2024
Python

OwenOrcan / YiraBot-Crawler

Star

YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.

open-source machine-learning data-mining scraping python3 text-extraction web-scraping html-parser robots-txt data-extraction seotools command-line-tool beginner-friendly contributions-welcome big-data-analytics seo-analysis good-first-issue sitemap-parser web-crawlers

Updated Mar 3, 2024
Python

nsourlos / OCR_with_LLMs

Star

ocr text-extraction object-detection pytesseract llava

Updated Feb 8, 2024
Python

Improve this page

Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-extraction

Here are 90 public repositories matching this topic...

adbar / trafilatura

flairNLP / fundus

mhadeli / Python-Text-Extraction

ssciwr / AMMICO

zanachka / extruct

nguyen-tho / ID-card-extract-module

miso-belica / sumy

edhou20 / Medical-Texts-NLP-Clustering-

real0x0a1 / ocr-opencv

miso-belica / jusText

dataiku / dss-plugin-tesseract-ocr

chrismattmann / tika-python

zanachka / dateparser

rmottanet / unchainedtext

weareprestatech / hotpdf

cdown / srt

Lanjkn / Text-Extractor

mciccale / ScholarVista

OwenOrcan / YiraBot-Crawler

nsourlos / OCR_with_LLMs

Improve this page

Add this topic to your repo