[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/integrations/data-platforms/aryn/weaviate_blog_post.ipynb)

## Chonkie and Weaviate Example

[Learn how to ....]

## Install the Dependencies 

In [3]:
!pip install chonkie weaviate-client --q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Import the Libraries

In [None]:
import weaviate
import chonkie
from chonkie import MarkdownChef
from chonkie import WeaviateHandshake   


from pathlib import Path
import os

## Ingest Chunks into Weaviate

### Initialize the Weaviate Handshake

In [None]:
weaviate_url=os.getenv("WEAVIATE_URL")
weaviate_api_key=os.getenv("WEAVIATE_API_KEY")

handshake = WeaviateHandshake(
    url=weaviate_url,
    api_key=weaviate_api_key,
    collection_name="WeaviateBlogs",
    embedding_model="text-embedding-3-small"
)

### Create the Chunks

In [None]:
# Process markdown files and extract all chunks
path = Path("../../llm-agent-frameworks/data")

docs = []
all_chunks = []

# Process all markdown files
for md_file in path.rglob("*.mdx"):  # searches subfolders recursively for .mdx files
    print(f"Processing {md_file}...")
    chef = MarkdownChef()
    doc = chef.process(str(md_file))  # ensure string path if chef expects str
    docs.append(doc)
    # Extract chunks from each document
    all_chunks.extend(doc.chunks)

In [21]:
print(f"\nProcessed {len(docs)} documents")
print(f"Total chunks: {len(all_chunks)}")
print(f"Found {len(doc.tables)} tables")
print(f"Found {len(doc.code)} code blocks")
print(f"Found {len(doc.images)} images")
print(f"Found {len(doc.chunks)} text chunks")


Processed 78 documents
Total chunks: 679
Found 0 tables
Found 0 code blocks
Found 6 images
Found 7 text chunks


### Write the Chunks to Weaviate Collection

In [11]:
handshake.write(all_chunks)

[32m2025-11-10 12:04:39.881[0m | [34m[1mDEBUG   [0m | [36mchonkie.handshakes.weaviate[0m:[36mwrite[0m:[36m320[0m - [34m[1mWriting 679 chunks to Weaviate collection: WeaviateBlogs[0m
[32m2025-11-10 12:09:26.240[0m | [1mINFO    [0m | [36mchonkie.handshakes.weaviate[0m:[36mwrite[0m:[36m375[0m - [1mSuccessfully wrote 679 chunks to Weaviate collection: WeaviateBlogs[0m


ðŸ¦› Chonkie wrote 679 chunks to Weaviate collection: WeaviateBlogs


['2397c169-832b-5d30-9039-3ab3e6d8c9fa',
 'cce18011-0e3f-57ba-ac61-1b3ad8c7d3b4',
 'f4ffad39-b32f-5db3-8060-f59581e9f575',
 '3238b25d-0d39-5e22-94f9-764aa060cb4d',
 '1fd0612b-1dd5-5407-9c3e-fefb031b9547',
 '6cb29480-8a13-577f-934a-59bcc6b20875',
 '81dc395e-5784-5204-a2ca-830ebccfae2f',
 'e512cf00-8ed1-58d8-9e79-21bf0cec6111',
 'daff102b-1645-5109-937f-b8b30de23f4d',
 'f70b1e26-a5ce-53a5-808c-b34066669d5e',
 'efc50537-14c5-5c54-a642-b080aae9fe7f',
 '9180d270-8c76-55df-b335-25c6a8afdba4',
 '29aa2a08-ccf0-59df-8057-c7243be39c0b',
 'dfe10287-0f8d-5fd2-9015-182ae2900a66',
 'd405a279-e900-5196-bf3e-3e179809d0fa',
 'a87e76b9-f7b6-5622-bbbb-7ca411dd4bed',
 '95a9c973-5ca1-516d-bd98-af4983b47d88',
 '24683968-88eb-5467-8880-3bcbc8c1f3d7',
 '90a3e424-dd6a-5131-8497-0f5befc85b2e',
 'f202b82f-0173-5f71-b3f2-bbe48132f7e2',
 'cbc79a40-3940-5202-bb39-81db353f68b1',
 '0b91604b-9c86-5e3f-8815-2c4c0b111fdf',
 'cd334534-674e-5fa7-9bde-ed2227948d24',
 '7927c5b5-2b5d-5b67-9287-11ee4c450453',
 'ebf3da8e-8f52-

## Query Time

In [19]:
results = handshake.search(
    query="Queries per second with different BQ compression rates.", 
    limit=5
)

for result in results:
    print(result["score"], result["text"], result["chunk_type"])

[32m2025-11-10 14:10:32.197[0m | [34m[1mDEBUG   [0m | [36mchonkie.handshakes.weaviate[0m:[36msearch[0m:[36m467[0m - [34m[1mSearching Weaviate collection: WeaviateBlogs with limit=5[0m
[32m2025-11-10 14:10:33.240[0m | [1mINFO    [0m | [36mchonkie.handshakes.weaviate[0m:[36msearch[0m:[36m495[0m - [1mSearch complete: found 5 matching chunks[0m


0.5414701104164124 

### Query Latency and Rate

The latency and number of queries per second are also important, particularly for monitoring usage patterns.

 Chunk
0.5390410423278809 

Weaviate improves [binary quantization (BQ)](/developers/weaviate/configuration/compression/bq-compression) in 1.24 to be faster, more memory efficient, and more cost-effective. Use BQ vector compression with [HNSW indexes](/developers/weaviate/concepts/vector-index#hierarchical-navigable-small-world-hnsw-index) to dramatically improve your query speed.

BQ compresses vector representations while preserving essential information. Uncompressed, Weaviate uses a `float32` to store each dimension. BQ uses one bit per dimension to encode Vector directionality. This means BQ compresses vectors from 32 bits per dimension to 1 bit per dimension - a savings of 32 times the space. This compression significantly reduces storage requirements.

Comparing BQ compressed vectors is fast. To calculate the distance betw