MegaParse - Your Mega Parser for every type of documents

MegaParse is a powerful and versatile parser that can handle various types of documents with ease. Whether you're dealing with text, PDFs, Powerpoint presentations, Word documents MegaParse has got you covered. Focus on having no information loss during parsing.

Key Features 🎯

Versatile Parser: MegaParse is a powerful and versatile parser that can handle various types of documents with ease.
No Information Loss: Focus on having no information loss during parsing.
Fast and Efficient: Designed with speed and efficiency at its core.
Wide File Compatibility: Supports Text, PDF, Powerpoint presentations, Excel, CSV, Word documents.
Open Source: Freedom is beautiful, and so is MegaParse. Open source and free to use.

Support

Files: ✅ PDF ✅ Powerpoint ✅ Word
Content: ✅ Tables ✅ TOC ✅ Headers ✅ Footers ✅ Images

Example

megaparse.mp4

Installation

pip install megaparse

Usage

Add your OpenAI API key to the .env file
Install poppler on your computer (images and PDFs)
Install tesseract on your computer (images and PDFs)

from megaparse import MegaParse

megaparse = MegaParse(file_path="./test.pdf")
document = megaparse.load()
print(document.content)
megaparse.save_md(content, "./test.md")

(Optional) Use LlamaParse for Improved Results

Create an account on Llama Cloud and get your API key.
Call Megaparse with the llama_parse_api_key parameter

from megaparse import MegaParse

megaparse = MegaParse(file_path="./test.pdf", llama_parse_api_key="llx-your_api_key")
document = megaparse.load()
print(document.content)

BenchMark

Parser	Diff
LMM megaparse	36
Megaparse with LLamaParse and GPTCleaner	74
Megaparse with LLamaParse	97
Unstructured Augmented Parse	99
LLama Parse	102
Megaparse	105

Lower is better

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.github/workflows		.github/workflows
images		images
megaparse		megaparse
notebooks		notebooks
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
logo.png		logo.png
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml
release-please-config.json		release-please-config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MegaParse - Your Mega Parser for every type of documents

Key Features 🎯

Support

Example

Installation

Usage

(Optional) Use LlamaParse for Improved Results

BenchMark

Next Steps

Star History

About

Uh oh!

Releases

Packages

Languages

License

0xthierry/MegaParse

Folders and files

Latest commit

History

Repository files navigation

MegaParse - Your Mega Parser for every type of documents

Key Features 🎯

Support

Example

Installation

Usage

(Optional) Use LlamaParse for Improved Results

BenchMark

Next Steps

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages