# Multi-vector embeddings with Reasoning-ModernColBERT
This recipe explores how to use [LightOn's Reason-ModernColBERT](https://huggingface.co/lightonai/Reason-ModernColBERT) model to generate multi-vector embeddings for text data and use them in Weaviate for **reasoning-intensive retrieval** tasks commonly used in [agentic RAG applications](https://weaviate.io/blog/what-is-agentic-rag).



# Prerequisites
Before starting this tutorial, ensure you have the following:



In [1]:
!pip install -U pylate
!pip install -U weaviate-client
!pip install -U sentence-transformers

Collecting sentence-transformers==4.0.2 (from pylate)
  Using cached sentence_transformers-4.0.2-py3-none-any.whl.metadata (13 kB)
Using cached sentence_transformers-4.0.2-py3-none-any.whl (340 kB)
Installing collected packages: sentence-transformers
  Attempting uninstall: sentence-transformers
    Found existing installation: sentence-transformers 4.1.0
    Uninstalling sentence-transformers-4.1.0:
      Successfully uninstalled sentence-transformers-4.1.0
Successfully installed sentence-transformers-4.0.2
Collecting sentence-transformers
  Using cached sentence_transformers-4.1.0-py3-none-any.whl.metadata (13 kB)
Using cached sentence_transformers-4.1.0-py3-none-any.whl (345 kB)
Installing collected packages: sentence-transformers
  Attempting uninstall: sentence-transformers
    Found existing installation: sentence-transformers 4.0.2
    Uninstalling sentence-transformers-4.0.2:
      Successfully uninstalled sentence-transformers-4.0.2
[31mERROR: pip's dependency resolver does n


# 1.1. Connect to Weaviate
First, connect to your Weaviate instance using your preferred client library. In this example, we assume you are connecting to a local Weaviate instance. For other types of instances, replace the connection details as needed (connection examples).

You can start a local Weaviate instance with this command:

In [2]:
!docker run --detach -p 8080:8080 -p 50051:50051 cr.weaviate.io/semitechnologies/weaviate:1.30.1

Then connect to your local Weaviate instance.

In [3]:
import weaviate
from weaviate.classes.init import Auth

# Option 1: Connect to your local Weaviate instance deployed with Docker
client = weaviate.connect_to_local()

# Option 2: Connet to an embedded Weaviate instance
# client = weaviate.connect_to_embedded()

# Option 3: Connect to your Weaviate Client Service cluster
# client = weaviate.connect_to_weaviate_cloud(
#     cluster_url="WCS-CLUSTER-URL", # Replace with your WCS cluster URL
#     auth_credentials=Auth.api_key("WCS-API-KEY" # Replace with your WCS API KEY
#     ),
# )

client.is_ready()

True

# Define collection
Next, we define a collection called "DemoCollection". Note that we do not use a model integration, as we will provide the embeddings manually.

In [4]:
from weaviate.classes.config import Configure, Property, DataType
from weaviate.util import generate_uuid5
from weaviate.classes.config import Configure

collection_name = "DemoCollection"

# Check if collection exists before deleting
if client.collections.exists(collection_name):
    client.collections.delete(collection_name)  # THIS WILL DELETE THE SPECIFIED COLLECTION AND ALL ITS OBJECTS

client.collections.create(
    collection_name,
    vectorizer_config=[
        # User-provided embeddings
        Configure.NamedVectors.none(
            name="multi_vector",
            vector_index_config=Configure.VectorIndex.hnsw(
                # Enable multi-vector index with default settings
                multi_vector=Configure.VectorIndex.MultiVector.multi_vector()
            )
        ),
    ],
    properties=[
        Property(name="text",
                 data_type=DataType.TEXT,
                 vectorize_property_name=False  # Explicitly disable property name vectorization
                 ),
        Property(name="docid",
                 data_type=DataType.TEXT,
                 vectorize_property_name=False  # Explicitly disable property name vectorization
                 ),
    ],
)

<weaviate.collections.collection.sync.Collection at 0x7d67d145bdd0>

You can double-check that you're using the MaxSim operator for the multi-vector embeddings.

In [5]:
import json

# Get collection
collection = client.collections.get(collection_name)

config = collection.config.get().vector_config['multi_vector'].vector_index_config

print(json.dumps(config.__dict__, indent=2, default=lambda o: o.__dict__ if hasattr(o, '__dict__') else str(o)))


{
  "multi_vector": {
    "aggregation": "maxSim"
  },
  "quantizer": null,
  "cleanup_interval_seconds": 300,
  "distance_metric": "cosine",
  "dynamic_ef_min": 100,
  "dynamic_ef_max": 500,
  "dynamic_ef_factor": 8,
  "ef": -1,
  "ef_construction": 128,
  "filter_strategy": "sweeping",
  "flat_search_cutoff": 40000,
  "max_connections": 32,
  "skip": false,
  "vector_cache_max_objects": 1000000000000
}


# 1.3. Import data

Now, we can import the data. Note that in this example, each object is sent to Weaviate along with the corresponding multi-vector embedding. In the example, we obtain LightOn's Reason-ModernColBERT embeddings.

Load Reason-ModernColBERT embedding model:

In [6]:
from pylate import models

# Load the ModernColBERT model
model = models.ColBERT(
    model_name_or_path="lightonai/Reason-ModernColBERT",
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Next, we will load some example data. This is were the magic of reasoning IR happens. Let's take the following query for an example:

In [None]:
query = "At home, after I water my plants, the water goes to plates below the pots. Can I reuse it for my plants next time?"

So, in an agentic RAG system, you would generate some reasoning traces to understand that this question need e.g. information about soluble salts.

In [None]:
reasoning = """
The user wants to know if reusing plant drainage water is safe.
The key issue is understanding what happens to water after it passes through soil.
It likely contains dissolved minerals and salts from fertilizers.
We need to find information about mineral buildup, salt concentration effects on plants, and whether reused water can harm plant roots through excessive salt accumulation.
"""

Let's take the following three documents.
- One is relevant because it talks about soluble salts
- One is irrelvant but would likely rank highly in other search methods like keyword search because the term "water" appears often
- One is somewhat related to the query.

In [7]:

relevant_document = """
**Soluble Salts in Container Plants**

Soluble salts are minerals dissolved in water that accumulate when water evaporates, leaving salts behind.
When drainage water is reused, these salts become concentrated and make it difficult for plants to absorb water.
High salt levels can damage roots directly and cause symptoms like brown leaf tips, wilting, and stunted growth.
The best practice is to empty drainage saucers rather than reusing the water.
"""

irrelevant_document = """
**Water Conservation in Gardening**

Water conservation is important for sustainable gardening.
Techniques include mulching to reduce evaporation, choosing drought-tolerant plants, and collecting rainwater in barrels.
Drip irrigation systems deliver water directly to plant roots with minimal waste.
These methods can reduce garden water usage by up to 50% while maintaining healthy plants.
"""

somewhat_related_document = """
**Basic Plant Watering Guidelines**

Most houseplants should be watered when the top inch of soil feels dry.
Water thoroughly until it drains from the bottom holes, then empty saucers after 30 minutes to prevent root rot.
Different plants have different needs - succulents need less water while tropical plants prefer consistently moist soil.
Overwatering causes more plant deaths than underwatering.
"""

In [8]:
# An example dataset
documents = [
    {"id": "doc1", "text": relevant_document},
    {"id": "doc2", "text": irrelevant_document},
    {"id": "doc3", "text": somewhat_related_document},
]


# Import data
with collection.batch.fixed_size(batch_size=10) as batch:
    for doc in documents:
        # Iterate through the dataset & add to batch
        batch.add_object(
            properties={"text": doc["text"], "docid": doc["id"]},
            uuid=generate_uuid5(doc["id"]),
            vector={"multi_vector": model.encode(doc["text"], is_query=False)},  # Provide the embedding manually
        )


In [9]:
# Check for errors in batch imports
if collection.batch.failed_objects:
    print(f"Number of failed imports: {len(collection.batch.failed_objects)}")
    print(f"First failed object: {collection.batch.failed_objects[0]}")

print(len(collection))  # This should print `3``

3


Let's retrieve an object and inspect the shape of its embeddings.

In [10]:
response = collection.query.fetch_objects(limit=3, include_vector=True)

for obj in response.objects:
    print(f"This embedding's shape is ({len(obj.vector['multi_vector'])}, {len(obj.vector['multi_vector'][0])})")


This embedding's shape is (92, 128)
This embedding's shape is (85, 128)
This embedding's shape is (73, 128)


# 1.4. Reasoning-based retrieval with Reason-ModernColBERT

Let's see if the Reasoning-ModernColBERT is able to retrieve the relevant, somewhat relevant and irrelevant documents in the right order **without the reasoning trace** but only the query.


Note this in contrast to a single vector, which would be a list of floats.



In [13]:
response = collection.query.near_vector(
    near_vector=model.encode(query, is_query=True),  # Raw embedding, in [[e11, e12, e13, ...], [e21, e22, e23, ...], ...] shape
    target_vector="multi_vector",
    return_metadata=weaviate.classes.query.MetadataQuery(
            distance=True,
        ),
)

for result in response.objects:
    print(result.properties)
    print(result.metadata.distance)


{'text': '\n**Soluble Salts in Container Plants**\n\nSoluble salts are minerals dissolved in water that accumulate when water evaporates, leaving salts behind. \nWhen drainage water is reused, these salts become concentrated and make it difficult for plants to absorb water. \nHigh salt levels can damage roots directly and cause symptoms like brown leaf tips, wilting, and stunted growth. \nThe best practice is to empty drainage saucers rather than reusing the water.\n', 'docid': 'doc1'}
-82.10780334472656
{'text': '\n**Basic Plant Watering Guidelines**\n\nMost houseplants should be watered when the top inch of soil feels dry. \nWater thoroughly until it drains from the bottom holes, then empty saucers after 30 minutes to prevent root rot. \nDifferent plants have different needs - succulents need less water while tropical plants prefer consistently moist soil. \nOverwatering causes more plant deaths than underwatering.\n', 'docid': 'doc3'}
-78.0878677368164
{'text': '\n**Water Conservati

As you can see, the Reason-ModernColBERT model is able to successfully retrieve the documents in the order of relevance.

Although it is not necessary, appending the reasoning trace can help boost retrieval performance, as shown below.

In [16]:
response = collection.query.near_vector(
    near_vector=model.encode((query + reasoning), is_query=True),  # Raw embedding, in [[e11, e12, e13, ...], [e21, e22, e23, ...], ...] shape
    target_vector="multi_vector",
    return_metadata=weaviate.classes.query.MetadataQuery(
            distance=True,
        ),
)

for result in response.objects:
    print(result.properties)
    print(result.metadata.distance)


{'text': '\n**Soluble Salts in Container Plants**\n\nSoluble salts are minerals dissolved in water that accumulate when water evaporates, leaving salts behind. \nWhen drainage water is reused, these salts become concentrated and make it difficult for plants to absorb water. \nHigh salt levels can damage roots directly and cause symptoms like brown leaf tips, wilting, and stunted growth. \nThe best practice is to empty drainage saucers rather than reusing the water.\n', 'docid': 'doc1'}
-97.81586456298828
{'text': '\n**Basic Plant Watering Guidelines**\n\nMost houseplants should be watered when the top inch of soil feels dry. \nWater thoroughly until it drains from the bottom holes, then empty saucers after 30 minutes to prevent root rot. \nDifferent plants have different needs - succulents need less water while tropical plants prefer consistently moist soil. \nOverwatering causes more plant deaths than underwatering.\n', 'docid': 'doc3'}
-83.30533599853516
{'text': '\n**Water Conservat

# Additional resources
You might also enjoy the following resources:

- [LightOn Unlocks Agentic RAG with new SOTA Model Reason-ModernColBERT](https://www.lighton.ai/lighton-blogs/lighton-releases-reason-colbert)
- [BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval](https://arxiv.org/pdf/2407.12883)
- Tutorial: [Weaviate multi-vector embeddings](https://weaviate.io/developers/weaviate/tutorials/multi-vector-embeddings)
- Blog: [An Overview of Late Interaction Retrieval Models: ColBERT, ColPali, and ColQwen](https://weaviate.io/blog/late-interaction-overview)
- Recipe notebooks [on multi-vector embeddings](https://github.com/weaviate/recipes/tree/main/weaviate-features/multi-vector)
