Apache Tika - Toolkit detects and extracts metadata
-
Updated
Nov 7, 2024 - JavaScript
Apache Tika - Toolkit detects and extracts metadata
Extract text from a document by Apache Tika
This application is designed for managing OCR (Optical Character Recognition) tasks. It allows users to define, schedule, and execute OCR tasks through a REST API. The core technologies used are Spring Framework, MongoDB, and Tesseract OCR.
A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
microservice web application for uploading and downloading audio files
AWS Lambda layer containing latest version of Apache Tika
This API use Annif as local server, NER component is included. It also includes Tesseract and uses Apache-tika software for language detection. It also has a limited multilingual support.
Document management system implemented with microservices
Extraction analysis of PixStory Social Media Dataset using language detection, language translation, tike geotopic parser, tika image object recognition/image caption generation, and PyTorch detoxify.
Analysis of PixStory social media data combined with Snapchat, COVID-19, and YouTube data. This project uses the Apache Tika Clustering software to cluster certain social media posts together.
可以将word(doc、docx)、excel、pdf、ppt、csv、txt文件的文本内容提取出来,同时能够提取出word、pdf文件的目录
Apache NiFi + Apache Tika + OptimaizeLangDetector
🚴♂️⛷Data Lake, Performance tuning for text extraction from a huge amount of files.
Tika detector for MKV and WebM
This repository holds everything that is required to run the Apache Solr Engine and its functionality to crawl documents
Text extraction from scanned pdf documents in java
Visualize unstructured data using Watson NLU
Apache Tika integration built in scala for indexing OneDrive files into ElasticSearch.
A simple information retrieval system, a PDF Search Engine for UN agencies and NGOs.
Add a description, image, and links to the apache-tika topic page so that developers can more easily learn about it.
To associate your repository with the apache-tika topic, visit your repo's landing page and select "manage topics."