Skip to content

Files

Latest commit

 

History

History
85 lines (70 loc) · 3.05 KB

introduction.mdx

File metadata and controls

85 lines (70 loc) · 3.05 KB
title description
Introduction
Welcome to Aryn DocParse!

Hero Light Hero Dark

Aryn DocParse is a composite AI system for parsing, chunking, enriching, and storing unstructured documents at scale. It uses a set of purpose-built AI models for document segmentation, optical character recognition (OCR), and extracting tables, images, metadata, and more.

Key Features

  • Return the structured output of each document in JSON or Markdown, and provide labeled bounding boxes for titles, tables, table rows and columns, images, and regular text.

  • High quality AI models for complex table extraction, optical character recognition (OCR), image summarization, and more.

  • Process over 30 types of document formats, including PDFs, Microsoft Word, Microsoft PowerPoint, text, and more.

  • Store and index processed documents, extract metadata using GenAI, search your documents at scale with vector (semantic) or keyword search.

  • Optional integration with Python document ETL pipelines using the open source Sycamore document ETL library. Customize your pipeline with additional data transforms, LLM-based entity extraction, data enrichment, data cleaning, and loading vector databases and search engines.

You can use DocParse to prepare complex, unstructured data for retrieval-augmented generation (RAG) applications, document processing workflows, extracting content from documents (like tables), and semantic search systems.

Sign-up here for free to get an API Key and use the DocParse Playground UI to visualize how your document is processed.

You can learn more from our introduction video) or get started with a Quickstart.

Getting started

Sign-up here for free) for an API Key to get started with DocParse.

Get Started with Aryn DocParse Using the Aryn-SDK to call DocParse Access the DocParse UI to visualize how your documents will be partitioned Join the Slack community for any questions Aryn DocParse API Reference Aryn DocParse Python SDK Reference