Skip to content
#

document-parser

Here are 45 public repositories matching this topic...

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

  • Updated May 20, 2025
  • HTML

Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing

  • Updated Mar 31, 2025
  • Python

An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).

  • Updated Jul 19, 2024
  • Python

Improve this page

Add a description, image, and links to the document-parser topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the document-parser topic, visit your repo's landing page and select "manage topics."

Learn more