A self-hosted search engine for documents.
-
Updated
Nov 14, 2024 - Java
A self-hosted search engine for documents.
Bachelor Thesis | Text extraction from complex video scenes
Tess4J CLI OCR Tool is a command-line application that extracts text from images and PDFs using the Tess4J library, with support for multiple languages. The extracted text is automatically copied to the clipboard for easy access.
Tika per page PDF extractor server returning content as JSON.
Simple server to extract text from a PDF
A Spring Boot-based OCR Exporter tool that extracts text from image or PDF files using the OCR Space API and exports the results to various formats such as PDF, text, Word, or a database.
Arachnio client library for Java 11+
Text extraction: a highway to systematically process car reviews
Yet Another Document 2 Text for pdf/doc/html/rft/etc - Extract text - or - convert to simplified HTML to retain layout information
Run Apache Tika as a service in AWS Lambda by scanning documents in S3 and storing the extracted text back to S3
Extract and detect text from the captured image and also selected images from the gallery.
A Cloud-Native Infrastructure for License Plate Recognition and Text Extraction with Python Integration
Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.
To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."