Skip to content

Chunkin v0.1.0

Latest

Choose a tag to compare

@thevyanshu thevyanshu released this 10 May 18:06
· 16 commits to master since this release

Chunkin v0.1.0

Initial release of chunkin - A Python library for document chunking and indexing into vector stores, built on LangChain.

Features

  • 8 document formats: PDF, DOCX, TXT, MD, CSV, XLSX, PPT
  • 6 chunking strategies: recursive, character, markdown, markdown_headers, html_headers, semantic
  • 50+ vector stores: FAISS, Chroma, Pinecone, Weaviate, Qdrant, Azure AI Search, and more
  • Modular extras: Install only the vector stores you need

Installation

\\�ash
pip install chunkin
pip install chunkin[core] # OpenAI + FAISS
pip install chunkin[all] # All vector stores
\\

Quick Start

\\python
from chunkin_processor import DocProcessor
from langchain_openai import OpenAIEmbeddings

processor = DocProcessor(
embeddings=OpenAIEmbeddings(),
vector_store_type="faiss",
chunk_size=500,
)

processor.process_file("document.pdf")
results = processor.search("your query", k=3)
\\

Documentation

Built on LangChain

Chunkin leverages LangChain for document loading, text splitting, and vector store integrations.