Skip to content

pathintegral-institute/markitup

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

MarkItUp

This is a fork of MarkItDown.

While markitdown is a useful tool, its returned content is too text-focused, which is not updated to the current rise of multi-modal LLMs.

Features

  • Converts various file formats to markdown-oriented OpenAI compatible responses
  • Supports multiple file types including:
    • Documents: DOCX (not DOC)
    • Presentations: PPTX (not PPT)
    • Spreadsheets: XLSX, XLS, CSV
    • Media: Audio files (MP3, M4A)
    • Web content: HTML
    • PDF files
    • Plain text files
  • Returns OpenAI compatible response, which can be used by most LLM clients
  • Supports command line usage

Installation

Install directly from GitHub:

pip install git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup
uv add git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup

Optional Dependencies

To use audio transcription using pydub, install markitup[audio]:

uv add "git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup[audio]"

To use enhanced file type detection with python-magic, install markitup[magic]:

uv add "git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup[magic]"

To install all optional dependencies, use markitup[all]:

uv add "git+https://github.com/pathintegral-institute/markitup.git@main#subdirectory=packages/markitup[all]"

Usage

from markitup.converter_utils.utils import read_files_to_bytestreams
from markitup import MarkItUp, Config

fs = read_files_to_bytestreams('packages/markitup/tests/test_files')

miu = MarkItUp(
    config=Config(
        modalities=['image', 'audio'],
        image_use_webp=True
        )
    )

result, stream_info = miu.convert(stream=fs[file_name], file_name=file_name)

Development

Running Tests

To run the test suite, first install Hatch (which provides better test isolation):

uv tool install hatch

Then navigate to the package directory and run the tests:

cd packages/markitup
hatch test

Or for verbose output:

cd packages/markitup
hatch test -- -v

The test suite includes tests for all supported file formats and converter functionality. Hatch provides better isolation from conflicting globally installed packages than other tools.

About

Python tool for converting files and office documents to LLM messages

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 89.4%
  • Python 10.5%
  • Jupyter Notebook 0.1%